We perform experiments in production to harden Search against outages and make sure that whenever a customer searches for products, they find what they are looking for.
In this role you will:
- Design, implement, execute, and automate chaos experiments to continuously test Amazon Search' resilience against hardware failures, dependency outages, traffic spikes and more.
- Collaborate with service owners to remedy vulnerabilities, minimize blast radius and harden Amazon Search.
- Research tools and practices in resilience engineering and adopt them as appropriate.
Joining this team, you'll experience the benefits of working in an entrepreneurial environment, while leveraging the resources of (AMZN), one of the world's leading internet companies.
We are a diverse, customer-obsessed and passionate team located in Meguro, Tokyo.Key job responsibilities
- Develop and maintain our chaos experiment orchestrator
- Design, execute, automate, and maintain chaos experiments
- Develop and maintain our distributed load generator
- Develop and maintain our petabytescale log archival and query service
- Join a 12/12 oncall rotation for incident response and mitigation
Basic Qualifications:
- Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
Preferred Qualifications:
- Experience with Linux/Unix
- Experience in networking, storage systems, operating systems and handson systems engineering
- Experience with distributed operational health and performance monitoring systems
If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information.
If the country/region you're applying in isn't listed, please contact your Recruiting Partner.-
Tokyo Amazon Full timeWe perform experiments in production to harden Search against outages and make sure that whenever a customer searches for products, they find what they are looking for. · ...
-
Tokyo AmazonJoin the Chaos Engineering team in Amazon Search. · Design, implement, execute, and automate chaos experiments to continuously test Amazon Search' resilience against hardware failures... · Collaborate with service owners to remedy vulnerabilities... · ...
-
Tokyo AmazonJoin the Chaos Engineering team in Amazon Search to perform experiments in production to harden Search against outages. · Design implement execute automate chaos experiments to continuously test Amazon Search' resilience against hardware failures dependency outages traffic spikes ...
-
Tokyo (株)ドワンゴ ¥8,000,000 - ¥11,000,000クラウドインフラ設計・構築・最適化(EKS、ECS、RDS/Aurora、IAM、VPC)、Kubernetes環境の設計・運用およびTerraform/TerragruntによるIaCとCI/CDパイプライン改善を行う。 · ...
-
Tokyo Treasure Data Full timeTreasure Data is seeking a Site Reliability Operations Manager to oversee our Japan-based Site Reliability Engineering team. The successful candidate will work closely with North-America-based counterparts to design and implement solutions for high-scale challenges. · ...
-
Tokyo Relocate $1,200,000 - $1,500,000 per yearWe are looking for experienced SREs who can deliver insights into system bottlenecks and ensure system reliability and scalability. · Analyze current technologies used in the company and develop monitoring and notification tools to improve observability and visibility. · ...
-
Tokyo Microsoft Full timeWith more than 45,000 employees and partners worldwide, the Customer Experience and Success (CE&S) organization is on a mission to empower customers to accelerate business value through differentiated customer experiences that leverage Microsoft's products and services. We drive ...
-
Tokyo Rakuten Mobile, Inc.The OSS & Automation Department at Rakuten Mobile Inc plays a critical role in the company's innovative and disruptive approach to telecommunications. · This role is paramount for leading the operations of our cloud-native fully containerized AI-Assisted OSS platform for telecom ...
-
Storage Engineer
2ヶ月前
Tokyo SMALL WORLD / Work in Japan? ¥8,000,000 - ¥12,000,000 per yearStorage Engineer is responsible for the design, implementation, maintenance, and optimization of the organization's storage infrastructure. · Design robust architecture. · Find technical issues and propose solutions. · Make technical decisions with accountability. · Lead team mem ...
-
Tokyo Computer FuturesJoin a fast-growing global SaaS company and lead infrastructure reliability. · Lead and empower an engineering team to deliver reliable, scalable solutions · Manage infrastructure sustainability, service procurement, and vendor relationships · ...
-
Senior Manager
1ヶ月前
Tokyo Rakuten ¥3,600,000 - ¥7,200,000 per yearWe are sustaining that infrastructure from the backend focusing on the data platform where every day very high-volume data flows in.We face challenges every day how to make · the system more efficient what kind of new technologies can make us more productive · and how we can pro ...
-
Senior Manager
1ヶ月前
Tokyo Rakuten Full time¥4,500,000 - ¥10,200,000 per yearThe successful candidate will need experience skills motivation to lead IT infrastructure's stability reliability resiliency productivity DevOps Rakuten entire eco-system platform department AI development department senior customer success manager SRE leader. · Lead improvement ...
-
Tokyo Treasure DataTreasure Data is seeking a Site Reliability Operations Manager to oversee the Japan-based Site Reliability Engineering team.We are a team of drivers—self-starters who take initiative, anticipate needs, and proactively jump in to solve problems. · Manage a team of 5-8 Site Reliabi ...
-
Tokyo RakutenWe are looking for Entrepreneurial Innovative Growth-Oriented and Customer-obsessed individuals to join our growing team to build the Telco of the Future. · We are a truly global organization with team members from Japan India North America South America Europe China Korea Austra ...
-
Tokyo Treasure DataTreasure Data employees are enthusiastic, data-driven, and customer-obsessedWe value and promote diversity,equity,inclusion,and belonging in all aspects of our businessand at all levels.Success comes from acknowledging,welcoming,and incorporating diverse perspectives. ...
-
Tokyo ByteDanceThe Search Operations team aims to improve search user experience contribute significant DAU impact to the products and drive for increase in traffic and GMV. · The team is committed to providing search and evaluation services for international products with resource support prov ...
-
Tokyo G TalentThe company actively confronts the difficult problems inherent in healthcare, · utilizes the latest technologies, · including Generative AI, · and is willing to tackle the challenging, non-glamorous aspects of the work.Comprehensive social insurance (Health, Employees' Pension, E ...
-
Tokyo Cybereason Full timeWe are seeking an experienced Hands-On Rust Engineering Team Lead to lead a team of talented engineers while remaining deeply involved in architecture design development. · This role combines technical leadership people management and hands-on engineering in a fast-paced cybersec ...
-
Tokyo Cybereason ¥1,800,000 - ¥2,500,000 per yearWe are seeking an experienced Hands-On Rust Engineering Team Lead to lead a team of talented engineers while remaining deeply involved in architecture, design, and development. · Bachelor's degree in Computer Science or a related field · At least 4 years of experience as a Team L ...
-
Tokyo RakutenWe are looking for an experienced full-stack site reliability engineer who has a passion for working on complex/large systems and understands the importance of maintaining and supporting one. · Design & Develop features on small to large scale systems · Learn about and perform lo ...
-
Tokyo TikTokThe Search Operations team supports efforts to address objectionable content on TikTok. · About the teamTikTok's global headquarters are in Los Angeles and Singapore. · ...