Apply

SRE/DevOps Engineer

Engineering  |  Open positions in Sunnyvale and Berkeley, CA  |  Full Time

Nefeli Networks is an exciting early stage startup in the NFV space. This is the opportunity to get in on the ground floor at a well funded company, working on an exciting new technology with a great team of developers. We are based in Berkeley, CA with an additional office in Sunnyvale, CA. 

 

We are looking for a highly motivated DevOps/Site Reliability engineer to join our exceptional team. The candidate we are looking for is ready to design, automate and support our cloud infrastructure, back-end systems and do technical integration with our partners. The ideal candidate would have some experience operating and supporting networking solutions and familiarity with automation tools and processes.

As a Dev Ops/SRE at Nefeli, you will play a critical role in helping us shape our software stack and hardware infrastructure. Your knowledge of design, analytics, development, coding, testing and application programming will enhance our development team to satisfy customer business and functional requirements. This person will also be instrumental in deploying systems at customer sites. 

Responsibilities: 

Improve the whole product lifecycle through inception, design, deployment, operation and refinement 

Design, build and operate Cloud infrastructure to enable reliable and rapid deployment of microservices with effective monitoring and resilient operations

Work with development teams to make sure applications are production ready, scalable and reliable from the ground up

Identify and drive opportunities to improve automation for code deployment, management and visibility of application services

Develop tools and framework to automate operational tasks, deployment of machines, services, applications

Write automation code for provisioning and operating infrastructure at massive scale

Establish end-to-end monitoring and alerting on all critical components of the applications, including availability, latency and overall system health 

Participate in the on-call rotation supporting the platform and/or the production application

Direct root-cause-corrective-action analysis of critical business and production issues

Develop standard methodology for Infra orchestration and troubleshooting application service in production

Represent DevOps/SRE in design reviews and works with Engineering teams on operational readiness

Technical Qualifications: 

BS Computer Science, Engineering or a related field, or equivalent professional experience

Experience with Unix/Linux operating systems internals and administration 

Experience with Python, Go and/or C++

Experience with CI/CD pipeline, GitHub and Jenkins

Good understanding of networking technologies such as SDN, NFV, SD-WAN and sound knowledge of Ethernet switch and routing technology

Good understanding in the areas of server & network virtualization, and global infrastructure, distributed systems, load balancing and security

Experience with at least one configuration management solution with hands-on experience in server virtualization (i.e.: Vmware ESXi, KVM, Hyper-V) 

Expertise in configuration management with a framework such as Ansible, Chef, Puppet or Terraform

Experience in AWS or GCP cloud computing and its related services

Strong fundamentals in HTTP including HTTP headers, Process and System API services; experience working with third party RESTful APIs

Ability to debug and optimize code 

Passion for automation and monitoring instrumentation in the code

Knowledge of best practices related to security, performance, and disaster recovery 

 

Other Qualifications:

Ability to communicate effectively and succinctly 

Strong systematic problem solving skills and able to work in ambiguity 

Excellent written and verbal communication, able to collaborate and rally support

Excellent interpersonal skills and the ability to work well in a team

Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency and drive; positive attitude with the ability to quickly learn new technologies and effectively manage parallel projects

Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions

Passionate to learn, understand, and dissect new technologies quickly and independently

 

Preferred Qualifications: 

5+ years of related experience

Experience with modern logging/reporting tools such as Prometheus

Experience with networking (e.g., TCP/IP, routing, network topologies & hardware, SDN, NFV) 

Experience with implementing monitoring tools such as Grafana, collectd, and Zabbi

Experience with etcd, NoSQL and time series Databases 

Proven experience working with customers and vendors 

Proven leadership of small informal teams 

  •  
Apply Now
*Required fields
Choose File To Include
Submit