HPC/AI Cluster Administrator
Das Tübingen AI Center sucht einen HPC/AI Cluster Administrator (m/w/d, E13 TV-L, 100%, unbefristet).
Lesen Sie die vollständige Stellenausschreibung hier auf Englisch.
The Tübingen AI Center’s mission is to build the next generation of intelligent machines that is widely usable and beneficial to society. We are Europe’s leading research and innovation hub for machine learning with a fast-growing and internationally recognized community of faculty and scientists at the University of Tübingen and the Max Planck Institute for Intelligent Systems. The Center is embedded into an innovative ecosystem of startups and tech companies. At the heart of the center is our ML Cloud, a state-of-the-art compute infrastructure with powerful CPU and GPU compute capacities, petabyte-scale storage volumes, and peak performance of up to 15.8 Pflop/s used by more than 200 researchers and engineers. Our ground-breaking research fuels advancements in self-driving cars, health applications, VR/AR technologies, and much more worldwide. Join us in building the future of Artificial Intelligence.
About the Role
You will play a pivotal role in creating and administering both infrastructure and tools for scalable AI-focused scientific computing on distributed architectures. As part of a motivated team, you will design, build, and maintain the cluster and realize several large-scale future with the latest AI accelerators, networking and storage technologies. In addition, you will work and communicate with the users, help to efficiently scale our largest AI experiments and enable an ambitious research agenda through a stable, accessible and performant cluster.
What You’ll Do
- Design, procure and administer a large-scale AI-focused computing and storage cluster
- Ensure the highest possible level of availability, usability, and reliability of computing resources
- Measure and report compute and storage usage
- Document and streamline compute job submission processes
- Support large-scale AI experiments
- Support and train users
Your Profile
- Masters degree in information technology, applied computer science or computer engineering (or comparable degree).
- Experience in setting up and tuning high performance computing hardware, storage and software.
- Experience in HPC cluster networking to create high bandwidth, low latency and secure networks. Knowledge of routing, switching, and load balancing.
- Administration experience with Linux OS (SLES or REL or CentOS or Ubuntu etc.).
- Experience with cluster management SW like PBS/TORQUE/SLURM.
- Experience in at least one scripting language like Shell/Bash/Python.
- Experience in remediating security vulnerabilities, including Linux patching and package management.
- Experience in analyzing server logs, error code messages and troubleshooting
- Fluency in English.
Nice To Have
- Knowledge of designing HPC cluster using Mellanox InfiniBand interconnect solution is a plus.
- Experience in working with Git is a plus.
- Experience with containers (Docker/Singularity/Podman/ Kubernetes) is a plus.
- Exposure to scientific programming, parallel programming, and GPU programming is a plus.
- Knowledge of cloud-native architectures, microservices and operational best practices in the cloud is a plus.
- Knowledge of virtualization, multi-cloud and distributed systems is a plus.
What We Offer
- We offer a permanent full-time position with salary level E13 according to TV-L guidelines.
- You receive 30 days of paid vacation each year.
- We are growing strongly and offer a vibrant working environment with more than 200 international researchers from all over the world.
- and our regulations are family-friendly.
- We are pushing an ambitious agenda for the future design of our HPC/AI cluster.
- Tübingen ranks first across Europe in terms of research papers in top machine learning and graphics conferences like NeurIPS, ICML, CVPR or ICCV.
- We are located in a medieval university town in an area of outstanding natural beauty in the Southwest of Germany.
A final note
In line with its internationalization agenda, the university welcomes applications outside Germany. Applications from equally qualified candidates with disabilities will be given preference. Women are expressly encouraged to apply. In principle, the position can be shared. The employment will be carried out by the central administration of the University of Tübingen.
Please send your application with the usual supporting documents (cover letter, CV, credentials) as a single PDF file to Dr. Georg Hafner (applications@tue.ai) by August 21, 2022. Please direct questions about the position to Dr. Wieland Brendel (wieland.brendel@tue.ai).