Tokyo Amazon

Description
Join the Chaos Engineering team in Amazon Search. We perform experiments in production to harden Search against outages and make sure that whenever a customer searches for products, they find what they are looking for.

In This Role You Will

  • Design, implement, execute, and automate chaos experiments to continuously test Amazon Search' resilience against hardware failures, dependency outages, traffic spikes and more.
  • Collaborate with service owners to remedy vulnerabilities, minimize blast radius and harden Amazon Search.
  • Research tools and practices in resilience engineering and adopt them as appropriate.

Joining this team, you'll experience the benefits of working in an entrepreneurial environment, while leveraging the resources of (AMZN), one of the world's leading internet companies. We are a diverse, customer-obsessed and passionate team located in Meguro, Tokyo.

Key job responsibilities

  • Develop and maintain our chaos experiment orchestrator
  • Design, execute, automate, and maintain chaos experiments
  • Develop and maintain our distributed load generator
  • Develop and maintain our petabyte-scale log archival and query service
  • Join a 12/12 on-call rotation for incident response and mitigation

Basic Qualifications

  • Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust

Preferred Qualifications

  • Experience with Linux/Unix
  • Experience in networking, storage systems, operating systems and hands-on systems engineering
  • Experience with distributed operational health and performance monitoring systems

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner.

Company
- Amazon Japan G.K.

Job ID: A3154855



  • Tokyo Amazon

    Join the Chaos Engineering team in Amazon Search to perform experiments in production to harden Search against outages. · Design implement execute automate chaos experiments to continuously test Amazon Search' resilience against hardware failures dependency outages traffic spikes ...


  • Tokyo Amazon Full time

    We perform experiments in production to harden Search against outages and make sure that whenever a customer searches for products, they find what they are looking for. · ...


  • Tokyo Amazon

    Join the Chaos Engineering team in Amazon Search. · Design, implement, execute, and automate chaos experiments to continuously test Amazon Search' resilience against hardware failures... · Collaborate with service owners to remedy vulnerabilities... · ...


  • Tokyo (株)ドワンゴ ¥8,000,000 - ¥11,000,000 per year

    AWS インフラ設計・構築・最適化(EKS、ECS、RDS/Aurora、IAM、VPC ほか)、Kubernetes 環境の設計・運用(マルチクラスタ管理、Service Mesh 改善検討など)、Terraform/Terragrunt 等による IaC と CI/CD パイプライン改善等を担当します。 · ...


  • Tokyo Woven

    We use the latest technologies to help engineering teams go faster with safety as our top priority Our modern agile and transparent services are designed to bring Woven by Toyota's vision of Mobility to Love Safety to Live into life · Provide technical leadership to the team by ...


  • Tokyo Woven by Toyota

    We are looking for a senior SRE engineer with a background in software engineering observability and cloud engineering to enhance production readiness and reliability. · ...

  • Cloud Data Engineer

    1ヶ月前


    Greater Tokyo Area Randstad Japan

    Randstad is partnered with a leading Life Insurance firm in their search for an experienced Cloud Data Engineer / SRE with specialization in Data projects. · ...


  • Tokyo Talisman Corporation

    Join a leading payment solutions provider to ensure quality and reliability of global payment systems. · ...


  • Greater Tokyo Area Randstad Japan

    Randstad is partnered with a leading Life Insurance firm in their search for an experienced Cloud SRE with specialization in Data projects. · ...

  • Senior Manager

    2週間前


    Tokyo Rakuten $120,000 - $150,000 per year

    This position is for a Senior Manager to lead the IT infrastructure's stability and reliability as part of the Rakuten ecosystem. · We are looking for someone with experience in DevOps and customer success management who can work closely with other teams and stakeholders. · ...


  • Tokyo UiPath ¥1,800,000 - ¥2,200,000 per year

    We are seeking an experienced Principal Site Reliability Engineer to join our team in Tokyo. · This role will involve leading incident command and tactical response efforts for high-stakes technical events. · ...


  • Tokyo, Tokyo UiPath $120,000 - $160,000 per year

    This is a high-impact, principal level role designed for an engineer who excels in the heat of the moment. Operating with a high degree of autonomy, you will take operational leadership to restore the stability of UiPath’s large-scale distributed services. · ...


  • Tokyo UiPath

    This is a high-impact role for an engineer who excels in the heat of the moment. The principal site reliability engineer will take operational leadership to restore stability to UiPath's distributed services. · ...


  • Tokyo Computer Futures

    Lead infrastructure reliability at a fast-growing global SaaS company. · Lead and empower an engineering team to deliver reliable solutions · ...


  • Tokyo Treasure Data

    Oversee our Japan-based Site Reliability Engineering team to ensure availability latency performance efficiency change management monitoring emergency response and capacity planning. · Manage a team of 5-8 Site Reliability Engineers by setting clear expectations and providing con ...


  • Tokyo Rakuten ¥2,000,000 - ¥2,800,000 per year

    We are looking for Entrepreneurial, Innovative, Growth-Oriented, and Customer-obsessed individuals to join our growing team to build the Telco of the Future. · Ensure high availability, resilience, and scalability across multi-region production environments through automation and ...


  • Greater Tokyo Area Randstad Japan

    The candidate will be responsible for architecting, developing and deploying solutions to automate data pipelines. They must have experience in application development with Python and designing distributed systems. · ...


  • Tokyo SMALL WORLD / Work in Japan?

    +Job Summary · DevOps & Observability Platform Engineer (L2 Support) - Telecom BSS. · +Responsibilities:Ensure operational excellence for internal DevOps and Observability platforms through proactive monitoring, alert handling, and initial troubleshooting. · ...


  • Tokyo Rakuten ¥6,000,000 - ¥9,000,000 per year

    We are looking for Entrepreneurial, Innovative, Growth-Oriented, and Customer-obsessed individuals to join our growing team to build the Telco of the Future. · This role contributes to the operational excellence of Rakuten's DevOps and Observability platforms. · Providing proacti ...


  • Tokyo G Talent

    The company has historically supported the operational efficiency of pharmacies by providing various solutionsThe challenges facing the Japanese healthcare system are complex, · making the power of technology indispensable. · ...


  • Tokyo サイバーリーズン・ジャパン(Cybereason Japan)

    +We are seeking an experienced Hands-On Rust Engineering Team Lead to lead a team of talented engineers while remaining deeply involved in architecture, design, and development. · +Design, develop, and maintain scalable backend services and selected front-end components for our p ...