Oracle Site Reliability Developer 3 in Seattle, Washington
Job Identification : 108327
Job Category : Product Development
Job Locations :
Seattle, Washington, United States
Reston, Virginia, United States
The Oracle Cloud Infrastructure (OCI) team can provide you the opportunity to build and operate a suite of massive scale, integrated cloud services in a broadly distributed, multi-tenant cloud environment. OCI is committed to providing the best in cloud products that meet the needs of our customers who are tackling some of the world’s biggest challenges.
We offer unique opportunities for smart, hands-on engineers with the expertise and passion to solve difficult problems in distributed highly available services and virtualized infrastructure. At every level, our engineers have a significant technical and business impact designing and building innovative new systems to power our customer’s business-critical applications.
Are you interested in building large-scale distributed networking solutions for the cloud? Do you love the idea of working in an environment with the excitement of a start-up, but the financial backing of a Fortune 500 company? You’ll be joining a fast-growing venture that offers a lot of autonomy and a lot of variety. This role offers huge upside potential, high visibility, and fast career growth without the risk of a typical start-up. This is a unique opportunity to work with smart people who are solving complex problems in distributed systems, networking, multi-tenant Infrastructure-as-a-Service (IaaS), and Software-Defined Networking (SDN) operating at a massive scale.
Our customers always want higher availability, more bandwidth, greater network security, less network latency, and lower overall cost. We are reimagining the traditional planning, provisioning, and life cycle by creating SDN services that allow customers to easily migrate their business to OCI or connect their on-premises networks, data center networks, and/or other networks via enterprise-grade links to Oracle’s cloud. At its core, our SDN services provide customers with rapid configuration, pay-as-you-go pricing, and seamless scalability.
OCI Network Automation team is looking for a Senior Site Reliability Engineer . As a Site Reliability Engineer, you will solve interesting technical challenges by defining, designing, deploying, and troubleshooting key Network Automation services focusing on scalability, security, and performance. The role involves software engineering, systems engineering, automation, network operations, and DevOps. You should be comfortable at building complex distributed systems. You will incorporate the ethos of software engineering and apply it to large-scale operational problems. Your primary goals are to create highly reliable and services, platforms, and infrastructure, always thinking about reliability, security, and ultra-scalable software systems to manage operations. When not working on operations, you will be working on software engineering tasks such as the design and development of systems that increase reliability, scalability, and reduce operational overhead through automation. You should value simplicity and scale, work comfortably in a collaborative, agile environment, and be excited to learn.
A great software engineer will make all the difference in delivering quality solutions to our customers. Are you passionate about designing, developing, testing, and delivering cloud services? Do you thrive in a fast-paced environment, and want to be an integral part of a truly great team?
Come join us!
We are looking for a Senior Site Reliability Engineer to be part of a team of engineers that will support a wide range of network automation and control plane services that are critical to managing and scaling our network infrastructure.
As a Senior Site Reliability Engineer , you will be responsible for:
Developing automation services to increase network automation deployment velocity.
Deep dive analytics into system uptime, service metrics, performance, deployment automation
Develop meaningful service metrics and dashboards
Managing reliability and manageability of network automation and control plane services
Develop service debugging tools, developing deployment automation solutions, build and manage test environments for services
US Government Top Secret Clearance
U.S. Citizenship or U.S. Lawful Permanent Resident Status Required – Federal Government customer
Bachelor’s or Master’s degree in CS or related engineer field
5+ years of experience in software development/operations
2+ years of experience in developing/operating large scale distributed services
Experience with scripting and compiled languages Java and Python, bash and RESTful API experience
Experience managing a Linux environment, docker, managing distributed systems
Knowledge of Linux internals, TCP/IP, DNS, Load balancing technologies, and socket programming
Knowledge of cloud compute technologies, network monitoring, data processing, and analytics
Aptitude to be a good team player and the willingness to learn and implement new Cloud technologies as needed
Excellent organizational, verbal, and written communication skills
Experience with participating in an on-call rotation and driving live site incidents to resolution
Experience with SQL or NoSQL technologies
The position is located in Reston, VA and Seattle, WA
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, and protected veteran status, or any other characteristic protected by law.
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
A BS or MS in Computer Science, or equivalent. Identifies solutions to knowledge of server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security and compliance. Experience running large scale customer facing web services. Identifies solutions to understanding of load balancing technologies and experience with development in programming languages, databases and big data stores, and container technologies. Work involves defining and documenting technical architecture of complex and highly scalable products. A minimum of 5+ years experience of running large scale customer facing web services.
Innovation starts with inclusion at Oracle. We are committed to creating a workplace where all kinds of people can be themselves and do their best work. It’s when everyone’s voice is heard and valued, that we are inspired to go beyond what’s been done before. That’s why we need people with diverse backgrounds, beliefs, and abilities to help us create the future, and are proud to be an affirmative-action equal opportunity employer.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status, age, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.