Senior Software Engineer – Infrastructure & Network Automation
NVIDIA · Germany
Job Description
Senior Software Engineer – Infrastructure & Network Automation
NVIDIA is seeking a talented Senior Software Engineer to join our Network Solutions Validation (NSV) tools group. You will be instrumental in developing high-performing software automation systems for NVIDIA's Data Center environments. This is a permanent, onsite role based in Germany.
About the Role
As a senior member of the team, you will contribute to the development of an automation platform used for provisioning, configuring, and monitoring High-Performance Computing (HPC) data centers. You will design and implement scalable, reliable, and maintainable services to enhance cluster visibility and improve operational efficiency. This role involves close collaboration with architects, network engineers, developers, and other stakeholders to understand requirements and deliver robust, full-cycle solutions. You will also focus on improving the stability and performance of the provisioning pipeline through architectural enhancements and code optimizations, troubleshooting issues in distributed environments, and contributing to system observability and reliability.
Key Responsibilities
- Design and develop an automation platform for HPC data center provisioning, configuration, and monitoring.
- Implement scalable and reliable services to enhance cluster visibility and operational efficiency.
- Collaborate with internal and external stakeholders to gather requirements and deliver solutions.
- Improve the stability and performance of the provisioning pipeline.
- Troubleshoot issues in distributed environments and enhance system observability.
- Work cross-functionally with architects, DevOps engineers, and product managers.
- Participate in code reviews, technical design discussions, and continuous improvement activities.
Requirements
- Bachelor of Science in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 5+ years of strong hands-on experience on Linux-based platforms.
- Proficiency in scripting and automation using languages such as Python and tools like Ansible.
- Background in DevOps and Network Engineering practices.
- Hands-on experience with large-scale network architectures, switches/routers, OVS, SR-IOV, and network operating/management systems.
- Solid understanding of networking concepts including Ethernet, VLANs, TCP/UDP/IP, QoS, L2/L3 protocols, BGP, EVPN/VXLAN, and common network topologies.
- Practical experience with containers and cloud-native technologies such as Docker and Kubernetes, including networking performance.
- Experience with version control systems like Git and CI/CD pipelines.
- Strong problem-solving, debugging, and communication skills, with a proactive and independent approach.
Standout Qualifications
- Experience in a leadership role, such as Team Lead or Scrum Master.
- Experience in project planning, tracking, and delivery.
- Familiarity with DevOps methodologies and tools (e.g., Jenkins).
- Hands-on experience with Docker and containerized environments.
- Experience with agentic AI development.
NVIDIA is at the forefront of innovation in Artificial Intelligence, High-Performance Computing, and Data Centers. Join us to accelerate the next wave of data center quality and be part of a dynamic, meaningful, and fast-paced work environment. We are an equal opportunity employer committed to diversity.
✨ This description was enhanced by AI based on the original listing.