TokyoのManager, Site Reliability Operationsジョブ

Manager, Site Reliability Operations

3週間前

Tokyo Treasure Data Full time

Treasure Data:

At Treasure Data, we're on a mission to radically simplify how companies use data and AI to create connected customer experiences. Our intelligent customer data platform (CDP) drives revenue growth and operational efficiency across the enterprise to deliver powerful business outcomes.

We are thrilled that Forrester has recognized Treasure Data as a Leader in The Forrester Wave: Customer Data Platforms For B2C. It's an honor to be acknowledged for our efforts in advancing the CDP industry with cutting-edge AI and real-time capabilities.

Furthermore, Treasure Data employees are enthusiastic, data-driven, and customer-obsessed. We are a team of drivers—self-starters who take initiative, anticipate needs, and proactively jump in to solve problems. Our actions reflect our values of honesty, reliability, openness, and humility.

Your Role:

Your role will be to oversee our Japan-based Site Reliability Engineering team. Our SREs own our compute platform (AWS, Kubernetes, EC2, Lambda, ECS), our common tooling, and our overall site availability. They work directly with development teams to solve product challenges and provide education around best practices. As our SRE leader in Japan, you'll work closely with your North-America-based counterparts to design and implement solutions to solve high-scale challenges.

Managers at Treasure Data prioritize solving people and communication challenges before technical problems, but are still active technical contributors. They are eager to build effective and dynamic teams that iteratively and rapidly deliver resilient systems. It will require working across product and engineering teams on complex problems where solutions require in-depth analysis and evaluation of multiple competing factors, identifying the best trade-offs for successful delivery.

This role requires leadership by example and will have you making regular individual contributions. The team and you will be directly responsible for solutions for the platform in these critical areas: availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. Additionally, as a leader within the engineering organization you'll be a part of broader planning and ultimately aligning your team with the outcomes.

Success in this role requires a passion for helping others and improving their lives. You do this by working with people to make team collaboration more effective and by helping them simplify complex systems to make them understandable and operable. You are able to effectively communicate decisions, ideas, designs, and operation of systems and services clearly and concisely but more importantly, derive a lot of satisfaction from teaching and enabling others to do this as well.

Responsibilities & Duties:

Manage a team of 5-8 Site Reliability Engineers by setting clear expectations and providing continuous feedback.
Providing ongoing career coaching on both technical and non-technical areas of improvement.
Working with Engineering and Product stakeholders to organize and execute on large projects.
Planning and facilitating agile sprints and holding the team accountable to sprint deliverables.
Improving processes by introducing metrics, experimenting with improvements, and implementing new ways of working.
Assisting with incident coordination as part of our on-call rotation.
Assisting with system design activities to make the right tradeoffs that balance reliability and delivery speed, and communicating those decisions clearly.

Required Qualifications:

Proven experience as a people manager for a technical team, including coaching, performance management, and delivering difficult feedback when necessary.
Experience managing or supporting a distributed SRE or infrastructure team across multiple time zones.
Hands-on experience with at least one major cloud provider such as AWS, Azure, or GCP.
Working familiarity with infrastructure-as-code tools, including Terraform, CloudFormation, CDK, or Ansible.
Working knowledge of at least one programming language, such as Python, Java, Ruby, or JavaScript.
Experience leading or participating in production incident response, including incident command and post-incident review.
Demonstrated ability to lead complex, cross-team software or platform initiatives from planning through delivery.
Working knowledge of agile software development practices and backlog-driven delivery.
Understanding of cloud governance fundamentals, including cost management, patching, and secure system design.
Strong communication and leadership skills, with the ability to represent reliability concerns to engineering and senior leadership.

Language Requirements:

The official language for written and verbal communication for this position is English, but Japanese fluency is strongly preferred.

Physical Requirements:

Hybrid - 3-days in office in Tokyo per week

Travel Requirements:

Minimum once a year for Team onsite.

About Treasure Data:

Treasure Data is the Intelligent Customer Data Platform (CDP) built for enterprise scale and powered by AI. Recognized as a Leader by Forrester and IDC, Treasure Data empowers the world's largest and most innovative companies to deliver hyper-personalized customer experiences at scale that increase revenue, reduce costs, and build trust.

Through unique capabilities such as the Diamond Record, AI Agent Foundry, and AI Decisioning with Real-Time Personalization, Treasure Data enables marketing and CX teams to personalize cross-channel engagement in real-time, optimize marketing spend while increasing ROI, and drive customer lifetime value through more intelligent retention and loyalty.

Our Dedication to You:

We value and promote diversity, equity, inclusion, and belonging in all aspects of our business and at all levels. Success comes from acknowledging, welcoming, and incorporating diverse perspectives.

Diverse representation alone is not the desired outcome. We also strive to create an inclusive culture that encourages growth, ownership of your role, and achieving innovation in new and unique ways. Your voice will be heard, and we will help amplify it.

Agencies and Recruiters:

We cannot consider your candidate(s) without a contract in place. Any resumes received without having an active agreement will be considered gratis referrals to us. Thank you for your understanding and cooperation

Site Reliability Engineer

2ヶ月前

Tokyo TG Japan Inc.. ￥15,000,000 - ￥20,000,000 per year

· ！ · 対象システムの自動化・運用管理・信頼性向上を支援するためのツールを設計・構築する · 対象システム向けのリリースパイプラインの構築および運用支援 · 開発/デリバリーチームの一員として、SREのプラクティスをソリューション設計に組み込む · 設計実装から停止廃止(デコミッショニング)に至るまでのシステムライフサイクル全体を管理する · ...
Site Reliability Engineer

2ヶ月前

Tokyo CLPS Global ￥7,680,000 - ￥11,520,000 per year

システム開発・運用プロジェクトにおいて、DevOps環境の構築・運用を担当いただきます。日本側クライアントとの技術調整・ドキュメント作成を行います。 · ...
Site Reliability Engineer

2週間前

Tokyo 株式会社パワーエックス

SRE/DevOpsチームでは、PowerXのサービスにおける重要な基盤を高いクオリティで実現し、より迅速に・スマートにビジネスを推進させるためのシステム開発・運用を行っています · 蓄電池を利用した新しいサービスにおける高い信頼性を実現するといったチャレンジ · 優秀なSWEと働くことのできる環境 · 自らが設計・技術選択を行い進めていくことができる · ...
Site Reliability Engineer

2ヶ月前

Tokyo TG Japan Inc.. ￥6,000,000 - ￥12,000,000 per year

「欧州系大手コンサルティングファーム」にて、SRE (Site Reliability Engineer) を募集しています。 · ...
Network Site Reliability Engineer

1ヶ月前

Tokyo PlayStation ￥3,600,000 - ￥12,000,000 per year

PlayStationNetworkの企画・設計・開発・運用を担っているエンジニアリング部門です。PlayStationのライフサイクルを構成する、クライアントソフトウェアからゲームコンテンツ配信・販売機能、オンラインゲーム機能、ソーシャルコミュニティ機能等のプラットフォームサービスまで、幅広くコンシューマーやゲームデベロッパーに提供しています。 · SITE RELIABILITY ENGINEERとしてサーバーサイドアプリケーション開発チームの一員としてサービスの信頼性、性能、効率およびセキュリティーの確保を担うこと。 · ...
Site Reliability Engineer

7日前

Greater Tokyo Area BLOOMTECH, Inc

+時価総額TOP100企業の7割以上が顧客の安定基盤、ハイブリッドワーク×フレックスタイム制で柔軟な働き方を実現、新製品のインフラ基盤をゼロから育てる面白さ。 · グローバル市場で戦う大手企業のグループ経営は、M&Aや海外展開により難易度がますます高まっています。 · 単なる保守運用にとどまらず、サービス設計から開発、長期的なブラッシュアップまで多岐にわたるフェーズに携わっていただきます。 · ...
Speeda - SRE (Site Reliability Engineer)

2日前

東京都千代田区丸の内, 株式会社ユーザベース

+自社プロダクト「Speeda」を支えるハイブリッドクラウドの構築・運用を行ったり、パフォーマンスや信頼性、スケーラビリティを高めるエンジニアを募集しています。 · +オンプレミス、GCP、AWSを利用したハイブリッドクラウドの構築 · 開発チームと共にマイクロサービスの開発、運用 · Toil削減 · Docker,Kubernetes,Istioの運用 · ...
1103_Site Reliability Engineer (SRE)

2週間前

Tokyo TIER IV ￥5,800,000 - ￥16,500,000

インターｦＵＵＶ · ！ · ...
SRE (Site Reliability Engineer)

14時間前

Tokyo Tailor

Tailor Platformは、ビジネスとエンジニアリングの境界を取り払い、多様な専門知識と技術を統合できる世界を作りたいというミッションを持つ会社です。Tailor Platformは「Headless ERP for Enterprises」というプラットフォームで、エンタープライズ企業での基幹システム構築に役立ちます。 · ...
SRE(Site Reliability Engineer)

3週間前

東京都中央区銀座一丁目駅, 株式会社テックドクター Remote job

+ · た, , . · + · . · . · ...
Site Reliability Engineer

2週間前

Tokyo PowerX, Inc.

PowerXのサービスにおける高いクオリティで実現し、より迅速に・スマートにビジネスを推進させるためのシステム開発・運用を行うSRE/DevOpsチームでは、優秀なソフトウェアエンジニアを求めています。 · ...
Operations Manager

2ヶ月前

東京都区, パーソルキャリア株式会社￥2,400,000 - ￥3,600,000 per year

組織またはプロジェクトマネジメント経験必須をお持ちの方必見です · 同社は物流システム業界においてグローバルtopメーカーのグループ会社として空港向け搬送機器の運用やメンテナンスを行っています。 · ...
Tokyo Operations Manager

2ヶ月前

Tokyo LTL Language School ￥240,000 - ￥350,000

外国文化および現地文化に興味を持つ日本の方が必須 · 明るい性格で、様々な文化の人々と話すことを楽しむ方。 · 良好な英語力 · Tokyo在住 · ...
PlayStationNetwork Site Reliability Engineer

14時間前

Japan, Tokyo Sony Interactive Entertainment

PlayStation Networkのサービスを設計、構築、運用するエンジニアリングチームのメンバーを募集しています。 · SREとしてサーバーサイドアプリケーション開発チームの一員として、サービスの信頼性、性能、効率、セキュリティの確保 · ...
Software Engineer, Site Reliability

14時間前

Tokyo Tailor

プロダクトづくりの難しい部分を簡単にし、誰もがプロダクトの作り手になれる。これがテイラーが実現したい世界です。誰しもが自分のアイディアを簡単に具現化でき、ビジネスとエンジニアリングの境界を取り払い、多様な専門知識と技術を統合できる世界を目指しています。このミッションに共感してくださる方をお待ちしています。 · ...
1103_Site Reliability Engineer (SRE)

1ヶ月前

Tokyo TIER IV

Job summary/ · /き/ · /き/ · , Autoware-equipped self-driving vehicles around the world to ensure safety and reliability. ...
Senior Site Reliability Engineer /215918

2日前

東京都港区東新橋, 株式会社UPSTART Remote job￥10,000,000 - ￥18,000,000 per year

クラウドインフラ・データ分析基盤に深い知見を持つプロダクトマネージャーおよび、dotData 製品開発チームのリーダー陣と協力しながら、製品やサービスに求められる可用性、信頼性、セキュリティなど要件および仕様を明確にしながら、システムアーキテクチャを漸進的に進化させたり、最新のテクノロジーをフル活用して運用の自動化・効率化をしたり、継続的な運用改善を行い、安定した品質で多くのお客様に利用されるサービスを継続的にリリースする役割です。また、中長期にはエンジニアリングマネージャーとして組織面でチームをリードしていく役割やスタッフエンジニアとして技術面でのチー ...
Senior SRE(Site Reliability Engineering)

2週間前

Greater Tokyo Area BLOOMTECH, Inc ￥5,000,000 - ￥13,000,000 per year

+日本・US・EUの3拠点で9,000億円を超える膨大な顧客資産を預かるシステムの信頼性を守る、非常に責任とやりがいのあるポジションです。 · +IaCによるクラウド/インフラの設計・構築 · 構成管理・運用の自動化による業務効率化 · サービス障害の防止と早期検知の仕組み設計、障害復旧対応 · ...
Manager, Site Reliability Operations

3週間前

Tokyo Treasure Data Full time

Treasure Data is seeking a Site Reliability Operations Manager to oversee our Japan-based Site Reliability Engineering team. The successful candidate will work closely with North-America-based counterparts to design and implement solutions for high-scale challenges. · ...
Manager, Site Reliability Operations

3週間前

Tokyo Treasure Data

Treasure Data employees are enthusiastic, data-driven, and customer-obsessedWe value and promote diversity,equity,inclusion,and belonging in all aspects of our businessand at all levels.Success comes from acknowledging,welcoming,and incorporating diverse perspectives. ...
Manager, Site Reliability Operations

3週間前

Tokyo Treasure Data

Treasure Data is seeking a Site Reliability Operations Manager to oversee the Japan-based Site Reliability Engineering team.We are a team of drivers—self-starters who take initiative, anticipate needs, and proactively jump in to solve problems. · Manage a team of 5-8 Site Reliabi ...

アメリカ大陸

ヨーロッパ

アジア / オセアニア

アフリカ