TokyoのSite Reliability Engineerジョブ

Site Reliability Engineer

2週間前

Tokyo Woven

About Woven by Toyota Woven by Toyota is enabling Toyota's once-in-a-century transformation into a mobility company. Inspired by a legacy of innovating for the benefit of others, our mission is to challenge the current state of mobility through human-centric innovation — expanding what "mobility" means and how it serves society.
Our work centers on four pillars: AD/ADAS, our autonomous driving and advanced driver assist technologies; Arene, our software development platform for software-defined vehicles; Woven City, a test course for mobility; and Cloud & AI, the digital infrastructure powering our collaborative foundation. Business-critical functions empower these teams to execute, and together, we're working toward one bold goal: a world with zero accidents and enhanced well-being for all.
=========================================================================
TEAM Our data platform team is working on accelerating autonomous driving by providing access to petabytes of data collected by our fleet of autonomous and non-autonomous vehicles. Efficient, fast and cost-effective access to data at large scale is key to tackle the hardest problems in AD/ADAS, from developing the Machine Learning (ML) models for perception and prediction of human driving patterns, to increasing the sophistication of our validation and simulation by identifying rare and interesting real-world driving situations. The data ecosystem developed by the Data Infrastructure team is a key building block for developing and testing modern AD/ADAS products that will impact millions of customers.
Our ML and Data pipelines are built on-top of the open-source Flyte orchestration framework and are deployed to AWS. Pipeline code is written in Python. We leverage AWS S3, GCP BigQuery and ElasticSearch for data storage and search. We schedule our workloads on AWS EKS. Our infrastructure is spread across multiple regions and multiple cloud providers. We believe strongly in automation and testing to ensure delivery of robust and correct systems. We are a distributed team, working in Japan, the UK and the US.
WHO ARE WE LOOKING FOR? The Data Infrastructure team is looking for engineers who are passionate about and enable the next generation of automotive software development. The right candidate will have excellent communication skills, solid coding skills, expertise in building scalable, reliable, highly available and fault-tolerant systems, broad knowledge of software engineering and site reliability engineering in areas such as Large-Scale Data and Compute Infrastructure, Stream Processing, Kubernetes, High-Performance Networking, Observability and Infrastructure Automation.

RESPONSIBILITIES

Design, build, maintain, optimize and support large scale, multi-region, multi-cloud compute and storage infrastructure powering our data platform and mission critical services
Work with fellow Data Infrastructure engineers and Site Reliability engineers to ensure our systems are scalable, reliable, fault-tolerant, highly available, highly performant, and observable
Manage incidents, triage product or system issues and debug/track/resolve by analyzing the root cause of these issues and the impact on users & operations
Work closely with other Data Infrastructure engineers, Site Reliability engineers, ML Platform engineers, Computer Vision and ML engineers on high-impact projects to create innovative solutions to problems in the self-drive space
Mentor junior engineers in their day to day work and drive best practices across the organization
Contribute to the long term strategy for several of our systems and products

MINIMUM QUALIFICATIONS

Bachelor's degree in Computer Science, a related field, or equivalent practical experience
5+ years of experience with data structures/algorithms and professional software engineering in one or more programming languages (e.g., Python, Go, Java, C, C++)
3+ years of experience as a Site Reliability Engineer, working with Terraform, Docker, cloud-native technologies, networking and Kubernetes in production
Experience designing, deploying, monitoring and maintaining large-scale, fault-tolerant multi-region and/or multi-cloud distributed systems
Ability to debug & optimize code, to troubleshoot distributed systems and to automate routine tasks
Business-level proficiency in English speaking, reading and writing (e.g., technical documents, software documentation)

NICE TO HAVES

Master's degree in Computer Science
Experience working as a Software Engineer on data-intensive applications, data platforms, data pipelines, workflow orchestration, batch processing, and/or distributed databases
Experience working with RPC protocols and their formats, e.g., gRPC/protobuf, Apache Avro, etc.
Experience with cloud-based (e.g. AWS, GCP, Azure) microservice architecture, event-driven, distributed architectures
Experience working in a fast-paced environment, collaborating across teams and disciplines
Experience with data governance, data privacy and security
Business-level proficiency in Japanese

========================================================================= Important Points ・All interviews will be arranged via Google Meet, unless otherwise stated. ・The same job descriptions are available in both English and Japanese; therefore, we kindly ask that you apply to only one version. ・We kindly request that you submit your resume in English, if possible. However, Japanese resumes are also acceptable. Please note that, depending on the English proficiency requirements of the role, we may request an English version of your resume later in the process.
WHAT WE OFFER ・Competitive Salary - Based on experience ・Work Hours - Flexible working time ・Paid Holiday - 20 days per year (prorated) ・Sick Leave - 6 days per year (prorated) ・Holiday - Sat & Sun, Japanese National Holidays, and other days defined by our company ・Japanese Social Insurance - Health Insurance, Pension, Workers' Comp, and Unemployment Insurance, Long-term care insurance ・Housing Allowance ・Retirement Benefits ・Rental Cars Support ・In-house Training Program (software study/language study)
Our Commitment ・We are an equal opportunity employer and value diversity. ・Any information we receive from you will be used only in the hiring and onboarding process. Please see our privacy notice for more details.

Software Engineer, Site Reliability

19時間前

Tokyo Tailor ￥800,000 - ￥1,500,000 per year

プロダクトづくりの難しい部分を簡単にし、誰もがプロダクトの作り手になれる。誰しもが自分のアイディアを簡単に具現化でき、ビジネスとエンジニアリングの境界を取り払い、多様な専門知識と技術を統合できる世界を目指しています。 · ...
Site Reliability Engineer

2週間前

Tokyo BLOOMTECH, Inc ￥5,500,000 - ￥7,500,000 per year

急拡大中の自社サービスを牽引するSRE(Site Reliability Engineering)ポジションを任せます。具体的には「どのようにしたらサービスをより多くの方に、より便利に使ってもらえるか」というユーザー視点に立ち、仮説・実行・検証のサイクルを回しながら、サービスの信頼性を高めて頂きます。 · SLA/SLO/SLIの設定・監視、モニタリング環境の改善 · OS、ミドルウェアなどの継続的アップデート · 障害対応およびボトルネック調査・対応 · AWSなど複数クラウドを使用したシステム環境の運用安定化 · アーキテクチャ改善(マイクロサービス ...
PlayStationNetwork Site Reliability Engineer

19時間前

Japan, Tokyo Sony Interactive Entertainment ￥3,000,000 - ￥9,000,000 per year

PlayStationNetwork向けに提供しているネットワークサービス"PlayStation Network"を設計、構築、運用するエンジニアリングチームのメンバーを募集しています。サイトリライアビリティエンジニアとしてサーバーサイドアプリケーション開発チームの一員として、サービスの信頼性、性能、効率、セキュリティの確保を担っていただきます。 · Linuxを用いたサーバ構築/運用/障害対応の経験 · TCP/IP、HTTPなどのネットワークプロトコルに関する基礎知識 · 周囲とオープンなコミュニケーションが取れる方 · ...
Site Reliability Engineer

2日前

Tokyo CLPS Global ￥7,680,000 - ￥11,520,000 per year

システム開発・運用プロジェクトにおいて、DevOps環境の構築・運用を担当いただきます。日本側クライアントとの技術調整・ドキュメント作成を行います。 · ...
Senior Site Reliability Engineer

2週間前

Tokyo BLOOMTECH, Inc ￥8,000,000 - ￥18,000,000 per year

デカコーン(企業価値100億ドル以上のスタートアップ)を目指す当社ですが、この目標を実現するためには「グローバル×ディープテックで勝つ必要がある」とよく言われます。　 · そのような中で、「日本発」の「グローバル×ディープテック」として、「デファクトとなるインフラ」を先陣をきってつくるべく、現在、開発チームの人員を中心に採用を急拡大しております。 · ...
SRE (Site Reliability Engineer) 業務委託

19時間前

Tokyo Tailor ￥6,800,000 - ￥10,800,000 per year

SRE (Site Reliability Engineer) 業務委託の職位を求めています。プロダクトづくりの難しい部分を簡単にすることを目標に、誰もがプロダクトを作れる世界を作りたいと考えています。 · ...
Site Reliability Engineering

2ヶ月前

Greater Tokyo Area NetEase Games

NetEase Gamesは中国を中心に独自に成長を続けている、インターネットサービス、オンラインゲームを展開するIT企業です。インフラ&オペレーションエンジニア(SRE)は、ソフトウェアエンジニアリングの手法を活用してシステムを管理し、問題を解決し、運用の自動化を実現することで、雑務を減らしサービスの可用性を向上させる役割を担います。 · NetEaseインタラクティブエンターテインメント事業の運用業務を担当 · 各ゲームのサービスアーキテクチャ、性能要件、ビジネス状況に応じて、ゲームサーバーに適した基盤環境を設計・選定 · 各種運用指標を設定・監視し ...
Customer Reliability Engineer

2ヶ月前

Tokyo LY Corporation

Customer Reliability Engineerを担当いただきます。Messaging PlatformやDeveloper Product Platformの顧客が抱える課題を深いドメイン知識と技術力を持って、カスタマーサポート(CS)チーム、開発チームと連携しながら、問題解決と支援ツールの開発をお任せします。 · 日常的に発生するCS運用業務の技術的な支援 · 調査や問題解決対応 · 情報開示請求依頼に対するデータ抽出 · 情報開示請求のツール開発 · 認証やセキュリティ、監査の対応 · セキュアルームでのCS業務の支援ツール開発 · CS ...
Customer Reliability Engineer

3週間前

Tokyo LY Corporation ￥7,000,000 - ￥12,000,000 per year

ポジション概要 · 「LINEギフト」を中心としたECサービスを支えるCREとして業務を行っていただきます。 · CREとは、エンジニアリングを通じてユーザーの信頼性を担保することを目的としてさまざまな開発業務を行うエンジニアです。これまでに延べ3,500万人以上のユーザーにご利用いただいている「LINEギフト」において、CREは非常に重要な役割を担っています。 · CREチームでは、テクニカルサポートとしてユーザーの問い合わせに回答するだけでなく、ユーザーの抱える課題に対して、技術的な視点からユーザーにとって使いやすいシステムとは何かを考えながら、継続 ...
Site Reliability Engineer

4週間前

Tokyo TEKsystems ￥5,000,000 - ￥10,000,000 per year

Site Reliability Engineer (SRE) – Azure Platform. Our client is expanding their engineering team to support a high-impact technology initiative with a target release in 2025. · Maintain and enhance the reliability of systems hosted on Azure. · Collaborate with DevOps, Development ...
Site Reliability Engineer

4週間前

Tokyo Placeton Inc ￥4,500,000 - ￥9,000,000 per year

Ensure the availability, scalability, and performance of data platforms and services. · Design, implement, and operate reliable, large-scale data systems in collaboration with engineering teams. · Develop automation scripts and tools (Python, Bash, PowerShell, Spark, etc.) to imp ...
Site Reliability Engineer

5日前

Tokyo, Tokyo Aras Corporation ￥3,600,000 - ￥6,000,000 per year

We are looking for Cloud Operations and Site Reliability Engineers who will provide excellent solutions for the security, availability, performance, efficiency, change management, and monitoring of the service. · We will work with our world-class engineering team to drive excelle ...
Site Reliability Engineer

2週間前

Tokyo, Japan AheadGroup ￥1,200,000 - ￥1,500,000 per year

Ahead Consulting is seeking a Site Reliability Engineer to join one of our Global E-Commerce clients to handle the onboarding of new large-scale services, design and maintain the search service, customize and optimize search services, troubleshoot and investigate issues, enhance ...
Site Reliability Engineer

1ヶ月前

Tokyo ZEALS ￥4,000,000 - ￥12,000,000 per year

We Are Zeals - Designing conversations. Driving conversions. · We have recently secured a Series E funding totaling $33.8M led by JIC Venture Growth Investments and Salesforce Ventures. · We're looking for a small group of elite teams to build the foundation of our global expansi ...
Site Reliability Engineer

4週間前

Tokyo NEXUS CORPORATION ￥10,000,000 - ￥20,000,000 per year

Site Reliability Engineer responsible for building tooling to support automation, management, and reliability of systems, as well as working with business partners to define SLOs and SLIs and build robust monitoring solutions. · Build tooling to support the automation, management ...
Senior Site Reliability Engineer

1ヶ月前

Greater Tokyo Area Robert Half ￥4,500,000 - ￥9,000,000 per year

Join a fast-growing digital platform in Tokyo at a pivotal expansion phase. This is a rare chance to tackle technical scaling challenges for a subscription-based, user-driven service—impacting hundreds of thousands and aiming for millions globally. · Strengthen reliability and pe ...
Customer Reliability Engineer

3週間前

Tokyo LINEヤフー株式会社￥7,000,000 - ￥12,000,000

ポジション概要 · 「LINEギフト」を中心としたECサービスを支えるCREとして業務を行っていただきます。 · CREとは、エンジニアリングを通じてユーザーの信頼性を担保することを目的としてさまざまな開発業務を行うエンジニアです。これまでに延べ3,500万人以上のユーザーにご利用いただいている「LINEギフト」において、CREは非常に重要な役割を担っています。 · CREチームでは、テクニカルサポートとしてユーザーの問い合わせに回答するだけでなく、ユーザーの抱える課題に対して、技術的な視点からユーザーにとって使いやすいシステムとは何かを考えながら、継続 ...
Site Reliability Engineer II

2ヶ月前

Tokyo, Tokyo AXS ￥900,000 - ￥1,200,000 per year

The Site Reliability Engineer (SRE) II is responsible for designing, implementing, and maintaining scalable and reliable systems and applications. Focus on automation, monitoring, and incident response to ensure high system availability and performance. · Build and scale the tech ...
Site Reliability Engineering Manager

1ヶ月前

Tokyo Placeton Inc ￥10,000,000 - ￥20,000,000 per year

Ensure the availability, scalability, and performance of data platforms and services. · Develop automation scripts and tools (Python, Bash, PowerShell, Spark, etc.) to improve operational efficiency. · Build and maintain monitoring and alerting systems (Grafana, Kibana, Splunk, A ...
Site Reliability Engineer II

2週間前

Tokyo, Japan AXS ￥6,000,000 - ￥12,000,000 per year

The Site Reliability Engineer (SRE) II is responsible for designing, implementing, and maintaining scalable and reliable systems and applications. Focus on automation, monitoring, and incident response to ensure high system availability and performance. · Build and scale the tech ...
Site Reliability Engineer(Bilingual)

2ヶ月前

Tokyo W3Global ￥6,000,000 - ￥12,000,000 per year

This role plays a crucial part in ensuring the reliability, availability, performance, and scalability of our infrastructure and services. You will work closely with the development team to balance system stability and rapid release cycles. · Design, build, and operate cloud infr ...

アメリカ大陸

ヨーロッパ

アジア / オセアニア

アフリカ

RESPONSIBILITIES

MINIMUM QUALIFICATIONS

NICE TO HAVES