Data Center Operations Manager

3 months ago
Full time role
In-person · Keflavík, IS... more

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated,  purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Crusoe is redefining AI cloud infrastructure, with a mission to align the future of computing with the future of the climate. Our AI platform is recognized as the "gold standard" for reliability and performance. Our data centers are optimized for AI workloads and are powered by clean, renewable energy.

Be part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

About This Role:

As a valued member of the Cloud Infrastructure team, you will lead the onsite effort to install and maintain servers and network infrastructure to achieve industry leading reliability and uptime.

A Day In The Life:

  • Work with Field Operations to integrate racks and assist with intra and inter-rack IB and ethernet cabling.

  • In collaboration with the Infrastructure Systems team, support burnin/stress testing of new hardware and quickly resolve any fallout 

  • Aid in the creation of Standard Operating Procedure documentation, that includes system specs, diagrams of Crusoe-specific configurations as well as basic troubleshooting flowcharts/runbooks.

  • Maintain a hardware issue tracker that can be shared with hardware vendors as appropriate

  • Open support tickets with hardware vendors as needed

  • Maintain spares inventory and backfill as required

  • Serve as datacenter liaison for vendor support personnel

  • Assist with the qualification of new hardware (serviceability, manageability, performance optimization, suitability of vendor specific tools)

  • Occasional travel to other data centers as needed

  • Proactively identify gaps in processes and procedures and drive continuous improvement

You Will Thrive In This Role If:

  • Significant experience with diagnosing and repairing complex GPU-based servers (both air and liquid cooled)

  • A deep understanding of server hardware, BMC-based manageability, BIOS setting and  firmware deployment.

  • Willingness to provide occasional after hours support

  • Can  lift 50 lbs and work in a physically challenging (sound/vibration/thermal) environment

  • Excellent organizational, time management and communication skills

  • Ability to thrive in a fast paced dynamic environment

  • Ability travel to the Data Centre once a week

  • Six years+ Datacenter hardware hands on experience

  • Familiarity with Infiniband switches and network topology

  • Basic Linux system administration expertise

  • An Associates Degree or equivalent experience in an IT related field

  • Embody the Company values

Benefits: 

  • Competitive Paid Time Off

  • Industry competitive pay

  • Retirement benefits

  • Healthcare benefits including Medical, Dental, and Vision

  • Short and Long-Term Disability Insurance

  • Life Insurance

  • Paid Parental Leave

  • Subscription to Calm App

Compensation Range:

Compensation will be paid as salary or hourly. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.