AMVETS Jobs

Job Information

Hyundai Autoever America 10471 – Manager, SRE in Fountain Valley, California

10471 – Manager, SRE

Purpose:

The Site Reliability Engineering (SRE) Manager will be working with the development & operations team, focusing on ensuring that connected car systems are working as expected and the underlying infrastructure and network is running smoothly. This role is responsible for the day-to-day operations of the DevOps team and combines a mix of project management, team management, and engineering duties. The DevOps team are subject-matter experts within Telematics domain and provide insight and engineering advice to development and product teams, with a goal to create a highly reliable and scalable software system that can run with minimum failure

Essential Functions:

Act as primary point-of-contact (PoC) on all connected card infrastructure operations and projects
Work collaboratively with software engineering to define infrastructure and deployment requirements; be a sounding board and provide recommendations for engineering team around infrastructure design and deployment.
They first set a goal to create a highly reliable and scalable software system that can run with minimum failure
Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
Be the driving force behind our automation and observability initiatives. Build tools and automation that eliminate repetitive tasks and prevent incident occurrence.
Build and maintain operational tools for deployment, monitoring, and analysis of connected car infrastructure and systems
Perform infrastructure cost analysis and optimization
Provide project management, sprint planning, and road-mapping support to the DevOps team
Activities include designing, developing, installing, and maintaining software solutions.
Work with engineering teams to refine deployment and release processes.
Collaborate with the engineering team on projects as the expert on reliability, performance, and efficiency.
Manage on-call rotations across connected car applications, using a follow-the-sun model.
Participate in 24x7 operational support and on-call rotation shifts.
Ensure that all system design and procedures are documented and up to date.
Monitor and stress test systems to collect metrics for tuning and capacity planning.
Work to automate detection and resolution of recurring issues.
Ensure safety, predictability, repeatability, and auditability of all build and deploy processes.
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objectives

Basic requirements:
Bachelor’s or Master’s degree or equivalent in the field of computers, information systems or related degree.
2+ experience as a manager or PM or in a Technical Leadership capacity, preferably in automobile industry within the Telematics domain.
Programming experience with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
Proven track record of designing, building, optimizing, and maintaining infrastructure on a large scale.
Experience with distributed systems in a production operations environment
Expertise analyzing complex application, database, network, and OS issues across a distributed large scale customer facing system
Strong communication skills and ability to work effectively across multiple business and technical teams
Demonstrated ability to deliver results on time with high quality
Extensive experience leading customer facing systems in a high uptime 24/7 environment
A depth and breadth of experience with server-side Java development, Oracle and distributed databases
A well-developed understanding of the theory and principles of operation of the internet and packet data protocols.
Exposure to Cloud, SaaS, and virtualization concepts and performance concerns.
Working knowledge of operating system design, processes, and threading model.
Knowledge of defining and monitoring system quality measures, including SLO and SLA.
Built tooling to improve reliability of systems, automated remediation of issues, or improve scalability.
Experience with different flavors of Linux, i.e. RedHat, Ubuntu, CentOS, etc.
Hands-on experience collecting performance data, analyzing, troubleshooting, and tuning.
Experience with the operations of application with high concurrency, scalability, or availability requirements.
Experience leading high performing engineering teams.
Experience with containers and container orchestration tools (Docker, Kubernetes)
Experience with MySQL, Elasticsearch, Couchbase, Mongo and Redis

Nice to have:
Experience with stream-processing open-source frameworks/systems, i.e. Kafka, Spark, etc.
Experience with distributed storage technologies like NFS, HDFS, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)

Salary Range - $112,830 to $173,756

Apply Now

Job Information

Hyundai Autoever America 10471 – Manager, SRE in Fountain Valley, California

Current Search Criteria