Apply

SRE/DevOps Engineer

Engineering  |  Open positions in Sunnyvale and Berkeley, CA  |  Full Time

Nefeli Networks is an exciting early stage startup in the NFV space. This is the opportunity to get in on the ground floor at a well funded company, working on an exciting new technology with a great team of developers. We are based in Berkeley, CA with an additional office in Sunnyvale, CA. 

 

We are looking for a highly motivated DevOps/Site Reliability engineer to join our exceptional team. The candidate we are looking for is ready to design, automate and support our cloud infrastructure, back-end systems and do technical integration with our partners. The ideal candidate would have some experience operating and supporting networking solutions and familiarity with automation tools and processes.

As a Dev Ops/SRE at Nefeli, you will play a critical role in helping us shape our software stack and hardware infrastructure. Your knowledge of design, analytics, development, coding, testing and application programming will enhance our development team to satisfy customer business and functional requirements. This person will also be instrumental in deploying systems at customer sites. 

Responsibilities: 

  • Improve the whole product lifecycle through inception, design, deployment, operation and refinement 
  • Design, build and operate Cloud infrastructure to enable reliable and rapid deployment of microservices with effective monitoring and resilient operations
  • Work with development teams to make sure applications are production ready, scalable and reliable from the ground up
  • Identify and drive opportunities to improve automation for code deployment, management and visibility of application services
  • Develop tools and framework to automate operational tasks, deployment of machines, services, applications
  • Write automation code for provisioning and operating infrastructure at massive scale
  • Establish end-to-end monitoring and alerting on all critical components of the applications, including availability, latency and overall system health 
  • Participate in the on-call rotation supporting the platform and/or the production application
  • Direct root-cause-corrective-action analysis of critical business and production issues
  • Develop standard methodology for Infra orchestration and troubleshooting application service in production
  • Represent DevOps/SRE in design reviews and works with Engineering teams on operational readiness

Technical Qualifications: 

  • BS Computer Science, Engineering or a related field, or equivalent professional experience
  • Experience with Unix/Linux operating systems internals and administration 
  • Experience with Python, Go and/or C++
  • Experience with CI/CD pipeline, GitHub and Jenkins
  • Good understanding of networking technologies such as SDN, NFV, SD-WAN and sound knowledge of Ethernet switch and routing technology
  • Good understanding in the areas of server & network virtualization, and global infrastructure, distributed systems, load balancing and security
  • Experience with at least one configuration management solution with hands-on experience in server virtualization (i.e.: Vmware ESXi, KVM, Hyper-V) 
  • Expertise in configuration management with a framework such as Ansible, Chef, Puppet or Terraform
  • Experience in AWS or GCP cloud computing and its related services
  • Strong fundamentals in HTTP including HTTP headers, Process and System API services; experience working with third party RESTful APIs
  • Ability to debug and optimize code 
  • Passion for automation and monitoring instrumentation in the code
  • Knowledge of best practices related to security, performance, and disaster recovery 

Other Qualifications:

  • Ability to communicate effectively and succinctly 
  • Strong systematic problem solving skills and able to work in ambiguity 
  • Excellent written and verbal communication, able to collaborate and rally support
  • Excellent interpersonal skills and the ability to work well in a team
  • Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency and drive; positive attitude with the ability to quickly learn new technologies and effectively manage parallel projects
  • Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
  • Passionate to learn, understand, and dissect new technologies quickly and independently

Preferred Qualifications: 

  • 5+ years of related experience
  • Experience with modern logging/reporting tools such as Prometheus
  • Experience with networking (e.g., TCP/IP, routing, network topologies & hardware, SDN, NFV) 
  • Experience with implementing monitoring tools such as Grafana, collectd, and Zabbi
  • Experience with etcd, NoSQL and time series Databases
Apply Now
*Required fields
Choose File To Include
Submit