Data Engineer

2ヶ月前

Ota AI Robot Association Full time

About AIRoA

The AI Robot Association (AIRoA) is launching a groundbreaking initiative: collecting one million hours of humanoid robot operation data with hundreds of robots, and leveraging it to train the world's most powerful Vision-Language-Action (VLA) models.

What makes AIRoA unique is not only the unprecedented scale of real-world data and humanoid platforms, but also our commitment to making everything open and accessible. We are building a shared "robot data ecosystem" where datasets, trained models, and benchmarks are available to everyone. Researchers around the world will be able to evaluate their models on standardized humanoid robots through our open evaluation platform.

For researchers, this means an opportunity to:

Work on fundamental challenges in robotics and AI: multimodal learning, tactile-rich manipulation, sim-to-real transfer, and large-scale benchmarking.
Access state-of-the-art infrastructure: hundreds of humanoid robots, GPU clusters, high-fidelity simulators, and a global-scale evaluation pipeline.
Collaborate with leading experts across academia and industry, and publish results that will shape the next decade of robotics.
Contribute to an initiative that will redefine the future of embodied AI—with all results made open to the world.

Key Responsibilities

You will play a critical role in building the data backbone powering next-generation robotics foundation models:

Design and implement large-scale data pipelines that cover the full lifecycle of high-quality datasets for robotics foundation models—collection, processing, curation, and publishing.
Design, build, and maintain data schemas, storage solutions, and query interfaces to enable VLA researchers to efficiently discover, query, and consume curated datasets.
Collaborate closely with VLA researchers to capture evolving data requirements and continuously improve data pipelines through analysis and experimentation.
Design and scale distributed data-processing pipelines capable of handling petabyte-scale multimodal datasets (e.g., RGB/Depth, point clouds) with full lineage and reproducibility.
Define data-quality metrics and build feedback loops to continuously monitor and improve data quality.

Required Qualifications

【1. Academic & Professional】

Master's degree in Computer Science, Engineering, or related field (or equivalent practical experience).
5+ years professional experience in data engineering / data platform development.
Proven record of delivering production-grade, distributed data systems.

【2. ETL / Distributed Data Processing】

3+ years designing and operating large-scale ETL / ELT pipelines using Spark, Flink, Ray or similar distributed engine.
Hands-on xperience with using orchestration tools and designing pipelines (Airflow, Kedro, Dagster).
Proven optimization of workloads (10TB+/day scale).

【3. Lakehouse / Storage Architecture】

Designed or led implementations using Delta Lake, Apache Iceberg, or Hudi.
Integrated with Trino, Athena, Databricks SQL, or Glue/Unity Catalog.
Defined schema evolution, ACID compliance, partitioning strategy, time travel, and cost-performance optimization.
Managed metadata, lineage, and catalog governance.
Equivalent experience (e.g., BigQuery-based warehouse with versioned schema management) will also be recognized.

【4. Data Modeling / Quality / Governance】

Built bronze/silver/gold data layer structures with dbt or equivalent.
Defined and enforced data quality SLAs (freshness, completeness, accuracy).
Experience with Great Expectations, DataHub, OpenMetadata, or Monte Carlo.
Implemented schema versioning, audit logging, and lineage tracking.
Designed and owned data access control and catalog taxonomy.

【5. Domain Understanding & Business Value】

Collaborated with product / analytics / AI teams to align platform design with business KPIs.
Quantified platform impact (e.g., ↓30% compute cost, ↑3× query performance).
Can explain how architecture decisions drive measurable business outcomes.

Preferred Qualifications

Experience working with terabyte or petabyte-scale datasets.
Expertise in data lake storage systems such as Apache Iceberg or Delta Lake with query systems such as Trino and catalog systems such as Nessie.
Expertise in distributed processing frameworks like Spark, Flink, or Ray.
Expertise in workflow tools such as Airflow, Kedro, or Dagster.
Experience in analyzing, monitoring, and managing data quality.

Others (linguistic qualification, etc.)

【Highly appreciated】 English proficiency at business level; Japanese proficiency a plus.

There are currently no comparable projects in the world that collect data and develop foundation models on such a large scale. As mentioned above, this is one of Japan's leading national projects, supported by a substantial investment of 20.5 billion yen from NEDO.

This position will play a crucial role in determining the success of the project. You will have broad discretion and responsibility, and we are confident that, if successful, you will gain both a great sense of achievement and the opportunity to make a meaningful contribution to society.

Furthermore, we strongly encourage engineers to actively build their careers through this project—for example, by publishing research papers and engaging in academic activities.

●Work location

Tokyo Ryutsu Center A Bldg. AW4-5, 6-1-1 Heiwajima, Ota-ku, Tokyo , Japan

Data Engineer

2週間前

東京都渋谷区桜丘町, 株式会社LegalOn Technologies ￥77,000,000 - ￥107,000,000 per year

私たちは、AI分野における高度な技術力と法律・契約の専門知識を兼ね備えたグローバルリーガルAIカンパニーです。2017年の設立以来、AIを活用したリーガルAIサービスの開発に注力し、累計ラウンド総額約286億円を達成。 · データエンジニアリングに関連する技術選定、設計や構築に関わる開発物のレビューを主体的に実施する · ...
Data Engineer

4週間前

Tokyo BLOOMTECH, Inc

私たちは「モノづくり産業のポテンシャルを解放する」をミッションに、製造業におけるデータプラットフォームプロダクトを展開しています。 · 2022 年にローンチしたプロダクトは、製造業の中でも最重要といわれる図面データを機械学習など様々な技術により構造化し多様な情報と結び付けることで、情報資産としての活用を可能にしました。既に国内の大手製造業から加工会社のお客様にまで活用いただいており、急成長中です。2023 年からは海外(アメリカ・タイ・ベトナム)での販売も開始し、グローバル展開も加速させています。 · ...
Data Engineer

1ヶ月前

Tokyo Tenth Revolution Group

不動産×テクノロジー業界大手企業で、データエンジニアを募集しています。グループ全体のデータマネジメントを担うポジションです。 · データ基盤の設計・開発・運用 · データパイプライン(ETL / BI / Reverse ETL)の整備 · データガバナンス・セキュリティ対応 · ...
Data Engineer

1ヶ月前

Tokyo Michael Page

Azure・Databricks グローバル環境でキャリアを伸ばせる · ）Job summary · Azure・Databricks ）Data Engineer ） · ...
Data Engineer

1ヶ月前

Tokyo Denodo ￥9,000,000 - ￥12,000,000 per year

Denodoでは、カスタマーサクセス組織の一員として、グローバルオフィスチームの一員として · 高齢者を介護することを目的とした介護サービス提供の会社です。 · ...
Data Engineer

1ヶ月前

Tokyo Michael Page

Azure  Databricksを活用した最新データ基盤構築 · グローバル環境でキャリアを伸ばせる · ...
Data Platform Engineer

1ヶ月前

Minato SB Intuitions ￥6,500,000 - ￥18,000,000

+SB Intuitionsについて+ · かつての自動車や飛行機、電話やインターネットがそうであったように、生成AIは、今、人類の営みを大きく変えようとしています。... ...
Senior Data Engineer

4週間前

Minato AXA Japan/ アクサ・ジャパン

職務内容 · AWS、Databricks、および現代のデータスタックアーキテクチャパターンを使用して、スケーラブルでプロダクショングレードのデータプラットフォームインフラストラクチャコンポーネントを設計および実装する。 · ...
Software Engineer(data

1ヶ月前

Tokyo BLOOMTECH, Inc ￥6,000,000 - ￥15,000,000 per year

業績好調の上場企業×充実の福利厚生あり · ・フレックス×リモート勤務(フルリモートも相談可能) · ・外国籍エンジニアが多数活躍中 · ...
Data Engineer_FullyRemote

1ヶ月前

Tokyo BLOOMTECH, Inc

箓厥こJPめTokyo · ・国内有数のスタートアップ · ・外国籍多数活躍中 · ・フルリモート/フルフレックス · ...
Data Engineer_FullyRemote

1ヶ月前

Tokyo BLOOMTECH, Inc Remote job

国内有数のスタートアップで、外国籍多数活躍中のフルリモート/フルフレックスでのデータエンジニアを探しています。年収範囲は5,000万円から14,000万円です。 · ...
Software Engineer(data

1週間前

Tokyo BLOOMTECH, Inc

業績好調の上場企業×充実の福利厚生あり · ・フレックス×リモート勤務(フルリモートも相談可能) · ・外国籍エンジニアが多数活躍中 · ...
1195_Senior Data Engineer / Data Platform Architect

3週間前

東京都品川区北品川, 株式会社ティアフォー

ティアフォーでは、「自動運転の民主化」の実現に向けて、自動運転システムの開発効率化と品質向上を支える大規模なデータ基盤を構築しています。現在、車両からのデータ収集や検索システムのベース構築は進んでいますが、今後増加するデータ量への対応、リアルタイム性の高いストリーム処理、および社内全体のデータ活用文化醸成が急務となっています。 · ...
Software Engineer(data

1ヶ月前

Tokyo BLOOMTECH, Inc Remote job

当社は個人がベストのパフォーマンスを発揮できる働き方を推奨しています。コアタイムなしのマンスリーフレックス制度を導入しており、プライベートな予定や家庭の事情に合わせて勤務時間を調整したりリモートワークを活用したり、様々なフィールドのメンバーがそれぞれのスタイルで力を発揮しています。 · ...
AI Data Platform Engineer

4週間前

東京都千代田区岩本町, Apto

Aptoでは、aimodel開発に不可欠なデータ生成・アノテーション基盤をプロダクトとして提供しています。 · 現在、PoCフェーズを越え本番利用・データ量の急増・ユースケース拡張が同時に進んでおり、既存の設計ではスケールに限界が見え始めている · ...
1195_Senior Data Engineer / Data Platform Architect

3週間前

Tokyo TIER IV ￥628,000 - ￥1,232,667

· ティアフォーでは、「自動運転の民主化」の実現に向けて、自動運転システムの開発効率化と品質向上を支える大規模なデータ基盤を構築しています。現在、車両からのデータ収集や検索システムのベース構築は進んでいますが、今後増大するデータ量への対応、リアルタイム性の高いストリーム処理、そして社内全体のデータ活用文化の醸成が急務となっています。このため、中長期的な視点で全社的なデータ戦略をリードし、Data Lakeやストリーミング処理の設計・運用、中にはデータドリブンな意思決定を行う組織文化の大きなエンジニア/アーキテクトです。 · Rftware Engine ...
AI Data Research Engineer

3週間前

東京都千代田区岩本町, Apto

AIモデル性能を決める「データ設計・評価ループ」の中核エンジニアが必要です。モデルの性能をデータ側から引き上げられるエンジニアをR&Dの中核メンバーとして迎えたいと考えています。 · ...
Data Engineer/D777

4週間前

東京都台東区浅草橋, TC Career株式会社

Data Engineerは、社内のデータを整備し事業に活用できる体制、基盤を構築し、保守、運用を行います。またキャディの持つデータを活用すべく、データ収集のためのパイプライン構築だけに留まらずデータ活用の促進をリードする働きを期待します。 · ...
ISE】Data Engineer

4週間前

Chiba IBM

ISEおよびIBMのスペシャリストと部門横断で協業し、先進技術を組み合わせた新規ソリューションを創出する機会が多くあり、データ・エンジニアとしての幅を広げていただくことができます。 · ...
Data Engineer_fully remote

4週間前

Greater Tokyo Area BLOOMTECH, Inc Remote job

日々膨大な量のオンラインおよびオフラインのデータが生み出されており、蓄積されたデータの価値を適切に判断し、どのように事業を創造・変革・成長出来るかが重要テーマの一つになっています。 · 会員登録数約934万人(2024年12月末時点)となる国内最大級の転職サービスです。 · 今回は、転職サービスにおいて、データエンジニアとして担っていただき下記の業務をお任せします。 · ...
Data Engineer(MiiTel Platform Div)

3週間前

東京都千代田区丸の内, 株式会社RevComm

数千万〜数億レコード規模の大規模データの統合と活用というスケール感のある環境で、設計から運用まで一貫して携わることができます。特定領域に閉じず、広範な技術領域とビジネス領域の両方に関与できる点が大きな特徴です。 · ...

アメリカ大陸

ヨーロッパ

アジア / オセアニア

アフリカ