🚀 HPC Systems Engineer (Advanced Research Computing)

Hiring now — limited positions available!

Inside Higher Ed

💰 Earn $125.000 – $150.000 / year
  • 📍 Location: Baltimore
  • đź“… Posted: Oct 30, 2025

Job Overview

The Advanced Research Computing at Johns Hopkins University (ARCH) group is seeking an HPC Systems Engineer to join the systems team. This role will support the 45,000+ core HPC system ROCKFISH and other HPC and data-intensive science needs of the university. Responsibilities include strategic planning, design, testing, organization, and implementation of cutting‑edge technology projects, day‑to‑day administration of HPC clusters, storage systems, networking, security and related services.

Responsibilities

70% Systems Engineering, Administration, Security, and Oversight

  • Work with senior staff to design, organize, plan, test and implement cutting‑edge hardware designs for an HPC environment.
  • Document systems processes so that users and IT staff can easily find useful information and perform routine tasks.
  • Provide stable solutions for HPC resources.
  • Maintain job scheduling and storage allocation systems and policies to ensure fair allocation of shared resources.
  • Maintain extensive monitoring systems to facilitate quick, proactive responses to routine failures and provide comprehensive performance data logging.
  • Provide general system administration backup and escalation for other staff.
  • Assist with facilities‑related issues that directly affect MARCC.
  • Ensure resources meet the community's needs and are highly available with limited interruption.
  • Manage inventory of resources in coordination with vendors.
  • Automate user account creation, management, and purging.
  • Contribute to planning sessions on network and security issues for MARCC and work closely with the central networking group.
  • Implement network configuration and security measures to assure effective utilization of resources.
  • Understand HPC technical needs and collaborate with facility leadership to implement policies and procedures.
  • Create and maintain a stable, secure operating system and software environment that meets evolving research needs.
  • Implement and maintain secure measures to protect data subject to restrictions.
  • Manage data access restrictions on a per‑user and group basis.
  • Implement and maintain monitoring measures for data and system access.
  • Other systems tasks as assigned by supervisor.

20% Technological Research

  • Offer technical advice on new projects that directly involve HPC computing at Johns Hopkins.
  • Develop custom tools where necessary and contribute useful creations back to open‑source development efforts.
  • Implement and test new technologies that could benefit HPC.

10% Training/Education

  • Continuously evaluate new tools and technologies for use in existing and future clusters.
  • Attend department and university‑sponsored training to increase knowledge, improve skills, and learn new skills (may substitute for supervisor‑approved commercial courses).

Special Knowledge, Skills, & Abilities

  • Proven experience deploying large‑complex scale projects.
  • Proven experience across multiple technologies with background in applications, databases, middleware, etc.
  • In‑depth knowledge of HPC cluster design and organization.
  • In‑depth understanding of HPC cluster hardware and management software.
  • Understanding of massive high‑performance parallel storage and methodologies.
  • Expert knowledge of Unix/Linux systems administration, including monitoring, performance analysis and integration in heterogeneous environments.
  • Knowledge of networking, high‑speed interconnects and network security principles in HPC.
  • Experience with configuration management tools (e.g., xCAT, puppet, IPMI, ROCKS).
  • Ability to interact with peer institutions to support HPC directives effectively.
  • Understand, implement, troubleshoot, and support job scheduling and workload management systems.
  • Understand hierarchical file systems, parallel storage, backup systems, and robotic tape libraries.
  • Develop reports and customize tools that automate critical system monitoring.
  • Evaluate, implement, and manage appropriate high‑level complex software and hardware solutions with best practices.
  • Install and configure infrastructure applications following industry best practices.
  • Maintain effective schedules for systems backups and archive operations.
  • Audit and maintain user access, authorization and authentication.
  • Generate periodic reports on resource utilization.
  • Maintain resource inventory with best‑practice applications.
  • Advanced knowledge of Linux, Apache, SQL, PHP/Python/Perl (LAMP) technology/toolkits.
  • Handle high‑priority escalations and multitask while managing time and priorities.
  • Troubleshoot and resolve difficult system issues.
  • Exceptional organizational skills and documentation.
  • Automate systems administration tasks wherever possible.
  • Excellent oral and written interpersonal skills.
  • Ability to meet physical and confidentiality requirements.
  • Keep up‑to‑date on emerging technologies.
  • Research, recommend, and implement new technologies based on their value to the research facility.
  • Provide excellent customer service.
  • Demonstrate strong critical thinking and analytical reasoning.

Internal and External Contacts

  • Interact with departmental and central administrative offices, faculty, staff, researchers, students, and external constituents (other college administrators, industry partners, federal agencies, research foundations).
  • Provide instruction on protocol, regulations, and guidelines pertinent to the agency and/or university.
  • Work routinely with JHU and UMCP faculty, administrators, students, and researchers.
  • Collaborate regularly with professional colleagues from the central IT @ JH organization and other academic departments.
  • Collaborate regularly with colleagues in industry and at peer institutions.

Minimum Qualifications

  • Bachelor's Degree.
  • Five years related experience.
  • Additional education may substitute for required experience per JHU equivalency formula.

Preferred Qualifications

  • Seven years experience managing Linux servers with direct HPC cluster experience.
  • High‑level Linux system administration experience.
  • Experience managing mission‑critical services.
  • Familiarity with HPC software stack (MPI, OpenMP, compilers, math libraries).
  • Experience with open‑source software compilation.
  • In‑depth knowledge of TCP/IP, InfiniBand, etc.
  • Experience with scientific application management packages (pymodules, modules).
  • Excellent scripting skills (Python, Perl, Shell).
  • Experience with MySQL/MariaDB programming.
  • Expert knowledge of configuration management and monitoring tools (Puppet, Nagios).
  • Experience configuring resource manager applications (e.g., SLURM).
  • Experience with Apache administration.
  • Knowledge of scientific software in academic supercomputing environments.
  • Familiarity with data subject to restrictions.

Education and Experience Equivalency

30 undergraduate credits or 18 graduate credits may substitute for one year of experience; up to two years of unrelated college coursework may be applied toward the minimum required for this position.

Equal Opportunity Employer

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. Johns Hopkins University is an equal opportunity employer.

EEO is the Law

Accommodation Information

If you require special assistance or accommodation during the pre‑employment process, please contact the Talent Acquisition Office at For TTY users, call via Maryland Relay or dial 711.

Vaccine Requirements

Johns Hopkins University requires all faculty, staff, and students to receive the seasonal flu vaccine. Exceptions may be provided for religious or medical reasons. Additional immunization requirements may apply for clinical or research positions.

#J-18808-Ljbffr
👉 Apply Now

Hurry — interviews are being scheduled daily!