Assistant Director,High Performance Computing, Technical Services
Hong Kong
RecruitFirst
[[upButtonMessage]]
full time
Published on www.allthetopbananas.com 19 Feb 2025
Assistant Director, High Performance Computing, Technical Services
Job Description: Lead the technical services team, guiding HPC development to enhance AI R&D activities. Manage customer projects by defining requirements, estimating resources, setting milestones, identifying risks, and collaborating with business and IT teams. Establish and implement best practices for HPC operations to ensure efficiency and effectiveness. Facilitate a seamless transition of HPC projects from development to operational phases, ensuring comprehensive documentation and reliable maintenance procedures. Work closely with the IT operations team to deliver daily services and meet service level agreements (SLAs). Conduct performance tuning and capacity management for the HPC platform. Provide technical support and training in scientific computing, particularly in AI and machine learning. Offer consulting services and pre-sales support to the business development team. Engage with the community and create showcases to highlight HPC use cases. Requirement: Bachelor’s degree in engineering, science, or a related field. Advanced degrees by research are a plus. At least 10 years of hands-on experience in managing and supporting HPC platforms. Strong expertise in Artificial Intelligence, Data Analytics, Python, R, Machine Learning, and Deep Learning within HPC environments is highly desirable. In-depth knowledge of job management, cluster management, and high-performance file systems. Experience in designing and building HPC systems is advantageous. Familiarity with monitoring and optimization of HPC systems is preferred. Proficient in Linux OS, including setup and configuration for large-scale computing clusters. Knowledge of the Nvidia DGX platform and cluster management software is a plus. Experience with Kubernetes, including monitoring, logging, and alerting for clusters. Proven ability in setting up and tuning both open-source and licensed HPC applications. Understanding of Large Language Models (LLM) and Prompt Engineering. Significant experience supporting R&D projects in large enterprises or academic institutions. Strong project management and interpersonal skills. Experience in customer training and providing technical support services. Proficient in verbal and written English and Chinese (Cantonese and Putonghua). Personal attributes: independent, results-oriented, analytical, collaborative, able to multi-task, and adaptable to a fast-paced environment. Note: Candidates with less experience may be considered for the Senior Manager position. Seniority level
Director Employment type
Full-time Job function
Information Technology, Management, and Consulting Industries
Technology, Information and Media, Digital Accessibility Services, and Information Services
#J-18808-Ljbffr
Job Description: Lead the technical services team, guiding HPC development to enhance AI R&D activities. Manage customer projects by defining requirements, estimating resources, setting milestones, identifying risks, and collaborating with business and IT teams. Establish and implement best practices for HPC operations to ensure efficiency and effectiveness. Facilitate a seamless transition of HPC projects from development to operational phases, ensuring comprehensive documentation and reliable maintenance procedures. Work closely with the IT operations team to deliver daily services and meet service level agreements (SLAs). Conduct performance tuning and capacity management for the HPC platform. Provide technical support and training in scientific computing, particularly in AI and machine learning. Offer consulting services and pre-sales support to the business development team. Engage with the community and create showcases to highlight HPC use cases. Requirement: Bachelor’s degree in engineering, science, or a related field. Advanced degrees by research are a plus. At least 10 years of hands-on experience in managing and supporting HPC platforms. Strong expertise in Artificial Intelligence, Data Analytics, Python, R, Machine Learning, and Deep Learning within HPC environments is highly desirable. In-depth knowledge of job management, cluster management, and high-performance file systems. Experience in designing and building HPC systems is advantageous. Familiarity with monitoring and optimization of HPC systems is preferred. Proficient in Linux OS, including setup and configuration for large-scale computing clusters. Knowledge of the Nvidia DGX platform and cluster management software is a plus. Experience with Kubernetes, including monitoring, logging, and alerting for clusters. Proven ability in setting up and tuning both open-source and licensed HPC applications. Understanding of Large Language Models (LLM) and Prompt Engineering. Significant experience supporting R&D projects in large enterprises or academic institutions. Strong project management and interpersonal skills. Experience in customer training and providing technical support services. Proficient in verbal and written English and Chinese (Cantonese and Putonghua). Personal attributes: independent, results-oriented, analytical, collaborative, able to multi-task, and adaptable to a fast-paced environment. Note: Candidates with less experience may be considered for the Senior Manager position. Seniority level
Director Employment type
Full-time Job function
Information Technology, Management, and Consulting Industries
Technology, Information and Media, Digital Accessibility Services, and Information Services
#J-18808-Ljbffr
View all
View less