Ota-kuのData Engineerジョブ

Data Engineer

3週間前

Ota-ku AI Robot Association

About AIRoA

The AI Robot Association (AIRoA) is launching a groundbreaking initiative: collecting one million hours of humanoid robot operation data with hundreds of robots, and leveraging it to train the world's most powerful Vision-Language-Action (VLA) models.

What makes AIRoA unique is not only the unprecedented scale of real-world data and humanoid platforms, but also our commitment to making everything open and accessible. We are building a shared "robot data ecosystem" where datasets, trained models, and benchmarks are available to everyone. Researchers around the world will be able to evaluate their models on standardized humanoid robots through our open evaluation platform.

For researchers, this means an opportunity to:

Work on fundamental challenges in robotics and AI: multimodal learning, tactile-rich manipulation, sim-to-real transfer, and large-scale benchmarking.
Access state-of-the-art infrastructure: hundreds of humanoid robots, GPU clusters, high-fidelity simulators, and a global-scale evaluation pipeline.
Collaborate with leading experts across academia and industry, and publish results that will shape the next decade of robotics.
Contribute to an initiative that will redefine the future of embodied AI—with all results made open to the world.

Key Responsibilities

You will play a critical role in building the data backbone powering next-generation robotics foundation models:

Design and implement large-scale data pipelines that cover the full lifecycle of high-quality datasets for robotics foundation models—collection, processing, curation, and publishing.
Design, build, and maintain data schemas, storage solutions, and query interfaces to enable VLA researchers to efficiently discover, query, and consume curated datasets.
Collaborate closely with VLA researchers to capture evolving data requirements and continuously improve data pipelines through analysis and experimentation.
Design and scale distributed data-processing pipelines capable of handling petabyte-scale multimodal datasets (e.g., RGB/Depth, point clouds) with full lineage and reproducibility.
Define data-quality metrics and build feedback loops to continuously monitor and improve data quality.

Requirements

Required Qualifications

【1. Academic & Professional】

Master's degree in Computer Science, Engineering, or related field (or equivalent practical experience).
5+ years professional experience in data engineering / data platform development.
Proven record of delivering production-grade, distributed data systems.

【2. ETL / Distributed Data Processing】

3+ years designing and operating large-scale ETL / ELT pipelines using Spark, Flink, Ray or similar distributed engine.
Hands-on xperience with using orchestration tools and designing pipelines (Airflow, Kedro, Dagster).
Proven optimization of workloads (10TB+/day scale).

【3. Lakehouse / Storage Architecture】

Designed or led implementations using Delta Lake, Apache Iceberg, or Hudi.
Integrated with Trino, Athena, Databricks SQL, or Glue/Unity Catalog.
Defined schema evolution, ACID compliance, partitioning strategy, time travel, and cost-performance optimization.
Managed metadata, lineage, and catalog governance.
Equivalent experience (e.g., BigQuery-based warehouse with versioned schema management) will also be recognized.

【4. Data Modeling / Quality / Governance】

Built bronze/silver/gold data layer structures with dbt or equivalent.
Defined and enforced data quality SLAs (freshness, completeness, accuracy).
Experience with Great Expectations, DataHub, OpenMetadata, or Monte Carlo.
Implemented schema versioning, audit logging, and lineage tracking.
Designed and owned data access control and catalog taxonomy.

【5. Domain Understanding & Business Value】

Collaborated with product / analytics / AI teams to align platform design with business KPIs.
Quantified platform impact (e.g., ↓30% compute cost, ↑3× query performance).
Can explain how architecture decisions drive measurable business outcomes.

Preferred Qualifications

Experience working with terabyte or petabyte-scale datasets.
Expertise in data lake storage systems such as Apache Iceberg or Delta Lake with query systems such as Trino and catalog systems such as Nessie.
Expertise in distributed processing frameworks like Spark, Flink, or Ray.
Expertise in workflow tools such as Airflow, Kedro, or Dagster.
Experience in analyzing, monitoring, and managing data quality.

Others (linguistic qualification, etc.)

【Highly appreciated】 English proficiency at business level; Japanese proficiency a plus.

Benefits

There are currently no comparable projects in the world that collect data and develop foundation models on such a large scale. As mentioned above, this is one of Japan's leading national projects, supported by a substantial investment of 20.5 billion yen from NEDO.

This position will play a crucial role in determining the success of the project. You will have broad discretion and responsibility, and we are confident that, if successful, you will gain both a great sense of achievement and the opportunity to make a meaningful contribution to society.

Furthermore, we strongly encourage engineers to actively build their careers through this project—for example, by publishing research papers and engaging in academic activities.

●Work location

Tokyo Ryutsu Center A Bldg. AW4-5, 6-1-1 Heiwajima, Ota-ku, Tokyo , Japan

Data Center Customer Operations Engineer II

1ヶ月前

Shinagawa-ku, Tokyo Equinix ￥6,000,000 - ￥8,000,000 per year

エクイニクスはグローバルなデジタルインフラストラクチャー企業です。デジタル世界のリーダー企業は、当社の信頼性の高いプラットフォーム上に集まり、ビジネス成功のための基礎となるインフラストラクチャを相互接続しています。エクイニクスは、お客様がビジネス優位を加速するために必要となるすべての適切な場所、パートナーそして可能性にアクセスできるようにします。エクイニクスにより、お客様は俊敏性を拡大し、デジタルサービスの立ち上げを加速し、世界クラスの顧客体験を提供し、その価値をさらに高めることができます。 · ...
Data Engineer

2日前

Tokyo Denodo ￥9,000,000 - ￥12,000,000 per year

Denodoでは、カスタマーサクセス組織の一員として、グローバルオフィスチームの一員として · 高齢者を介護することを目的とした介護サービス提供の会社です。 · ...
Data Engineer

5日前

Tokyo Tenth Revolution Group

不動産×テクノロジー業界大手企業で、データエンジニアを募集しています。グループ全体のデータマネジメントを担うポジションです。 · データ基盤の設計・開発・運用 · データパイプライン(ETL / BI / Reverse ETL)の整備 · データガバナンス・セキュリティ対応 · ...
Data Platform Engineer

22時間前

Minato SB Intuitions ￥6,500,000 - ￥18,000,000

+SB Intuitionsについて+ · かつての自動車や飛行機、電話やインターネットがそうであったように、生成AIは、今、人類の営みを大きく変えようとしています。... ...
Data Engineer

2ヶ月前

Tokyo IBM ￥900,000 - ￥1,200,000 per year

データ活用基盤実装をリードしながら、技術の専門家としてIBM Salesやサービス部門と連携して、IBMストレージ技術を利用したソリューション開発やIBMデータ活用基盤に関する技術支援を行います。 · IBMデータ活用製品を利用したソリューション開発および、研修開発 · IBMデータマネジメントソフトウェア:DB2の技術Q&A支援と製品技術情報の発信 · お客様ビジネスの課題を理解/整理し、AIやデータを活用した課題解決を実現するデータ活用基盤のアプローチ策定 · IBM製品を中心としたスキルを生かし、データ活用基盤構築の提案支援/構想策定/デザイン/デ ...
Data Engineer

2日前

Tokyo Michael Page

Azure・Databricks グローバル環境でキャリアを伸ばせる · ）Job summary · Azure・Databricks ）Data Engineer ） · ...
Data Engineer

1日前

Tokyo Michael Page

Azure  Databricksを活用した最新データ基盤構築 · グローバル環境でキャリアを伸ばせる · ...
Data Engineer/Fintech

1ヶ月前

Tokyo BLOOMTECH, Inc ￥8,000,000 - ￥15,000,000 per year

Data Engineer/Fintechのデータエンジニアが主役の組織風土とグローバルな環境で、 · 東大発のFintech/ビッグデータスタートアップです。 · ...
Data Engineer_FullyRemote

1日前

Tokyo BLOOMTECH, Inc Remote job

国内有数のスタートアップで、外国籍多数活躍中のフルリモート/フルフレックスでのデータエンジニアを探しています。年収範囲は5,000万円から14,000万円です。 · ...
Software Engineer(data

2日前

Tokyo BLOOMTECH, Inc ￥6,000,000 - ￥15,000,000 per year

業績好調の上場企業×充実の福利厚生あり · ・フレックス×リモート勤務(フルリモートも相談可能) · ・外国籍エンジニアが多数活躍中 · ...
Data Engineer_FullyRemote

2日前

Tokyo BLOOMTECH, Inc

箓厥こJPめTokyo · ・国内有数のスタートアップ · ・外国籍多数活躍中 · ・フルリモート/フルフレックス · ...
Data Engineer

3週間前

Ota AI Robot Association Full time￥2,000,000 - ￥2,800,000 per year

AIRoA is launching a groundbreaking initiative: collecting one million hours of humanoid robot operation data with hundreds of robots. · The AI Robot Association (AIRoA) is looking for a Data Engineer to play a critical role in building the data backbone powering next-generation ...
Data Engineer

3週間前

Ota AIRoA (AI Robot Association)

AIRoA is launching an initiative to collect one million hours of robot operation data and train powerful VLA models.The project involves building a shared "robot data ecosystem" where datasets and trained models are available to everyone. · ...
Data Engineer

3週間前

Ota-ku AI Robot Association ￥2,800,000 per year

The AI Robot Association (AIRoA) is launching a groundbreaking initiative: collecting one million hours of humanoid robot operation data with hundreds of robots, and leveraging it to train the world's most powerful Vision-Language-Action (VLA) models. · ...
Software Engineer(data

1日前

Tokyo BLOOMTECH, Inc Remote job

当社は個人がベストのパフォーマンスを発揮できる働き方を推奨しています。コアタイムなしのマンスリーフレックス制度を導入しており、プライベートな予定や家庭の事情に合わせて勤務時間を調整したりリモートワークを活用したり、様々なフィールドのメンバーがそれぞれのスタイルで力を発揮しています。 · ...
REALITYアプリ/データエンジニア/Data Engineer

3週間前

東京都港区六本木, グリーグループメタバース事業￥2,000,000 - ￥2,800,000 per year

REALITYアプリ/データエンジニア/Data Engineerの職種です。 · 現在の課題に対処するためのプロジェクトを立ち上げましたが、人数やスキルで進められていない状況があります。このポジションでは、「データ分析基盤の改善」を推進するために必要な技術者を募集しています。 · ...
1107_Backend Engineer - Data Platform

1ヶ月前

Tokyo TIER IV ￥483,000 - ￥1,166,000

自動運転で発生するデータの収集・検索・分析を支えるデータ基盤の開発・運用を担当していただきます。 · ...
1107_Backend Engineer - Data Platform

2週間前

Tokyo TIER IV

自動運転で発生するデータの収集・検索・分析を支えるデータ基盤の開発・運用を担当していただきます。 · 車両からの走行データを収集・加工するデータパイプラインの開発・運用 · 走行履歴や統計情報を提供するAPIの開発・運用 · ...
CoDC Junior Data Center Engineer

3週間前

東京都区, 株式会社バイオス￥3,500,000 - ￥4,000,000 per year

データセンターやヘルプデスクなどのサポート業務に興味のある方、ITインフラに関する基礎知識をお持ちの方はご応募ください。 · ...
Data Engineer(MiiTel Platform Div)/287512

2週間前

東京都千代田区丸の内, 株式会社はーとふるセゾン

数千万〜数億レコード規模の大規模データの統合と活用というスケール感のある環境で、設計から運用まで一貫して携わることができます。特定領域に閉じず、広範な技術領域とビジネス領域の両方に関与できる点が大きな特徴です。また、自動化・効率化の余地が非常に大きく、裁量をもって業務改善に取り組めるのも魅力の一つです。新しい技術やツールの導入・検証も積極的に推奨される文化があり、技術志向の高い方には最適な環境です。 · ...
Data Center Customer Operations Engineer III

2ヶ月前

Ota-ku, Tokyo Equinix $40,000 - $80,000 per year

Applies acquired job skills to work on tasks that are semi-routine in nature. Focus is on semi-routine tasks within standard operating procedures. Supports the overall team. · ...

アメリカ大陸

ヨーロッパ

アジア / オセアニア