Back to projects
2026Active

LeRobot SO-101 Open-Source Arm

Building and training the flagship open-source SO-101 robotic arm in the Hugging Face LeRobot ecosystem using imitation learning.

LeRobotImitation LearningHardware

Step-by-step guide

Recording data, training, and running inference with LeRobot.

View guide →

Hardware Assembly & Calibration

The SO-101 platform relies on a dual-arm follower/leader configuration designed for precise Human-In-The-Loop data collection. We fully assembled two parallel arm configurations. Early on we hit some hardware turbulence (a burnt servo and ID configuration syncing issues), but after sorting those out the leader arm's teleoperation maps cleanly to the physical follower arm and passes all hardware calibration checks.

Edge Computing & Teleoperation

To ensure a highly embedded, self-contained robotic platform, we transferred the primary teleoperation and inference brain locally onto an NVIDIA Jetson (flashed with Jetpack 6.2). Using the LeRobot environment natively, the Jetson handles real-time synchronization across our multi-camera setup (Side array + Grip array) while bridging joint-state telemetry directly into our dataset directories.

Early Iterations & Lessons Learned

Our initial behavior cloning trials focused on a basic insertion task: placing a block precisely within taped boundaries. We harvested 100 teleoperated episodes, but early model training yielded highly erratic performance. The robot suffered from severe overfitting and out-of-distribution (OOD) freezing. If it encountered a slightly new state, it fundamentally didn't know what to do. Because we verified our hardware calibration was solid, we concluded the issue was a non-diverse dataset. After aggressively expanding our training sweeps and injecting variability, we successfully achieved inference. The arm can now autonomously complete the insertion task using an ACT architecture.

Current Bottlenecks & Next Steps

With a proven data pipeline, our current efforts revolve around unlocking faster onboard training and experimenting with deeper imitation models:

  • On-Device Training Optimizations: Running ACT training directly on the Jetson compute is currently severely bottlenecked (scaling incredibly slowly even at 1000 steps). We must establish optimal training conditions for the edge hardware.
  • Triangulating Vision Data: To prevent future OOD overfitting, we are physically setting up a third camera angle and expanding our data harvesting to ensure high dataset diversity.
  • Transitioning to SmolVLA: With ACT functioning, we are actively researching the deployment of Hugging Face's SmolVLA architecture to enhance generalizability.

Devlog

May 30, 2026

Tuning temporal ensembling: 20% → 50% success rate

Changing the temporal ensembling coefficient to 0.01 had an immediate and dramatic effect. The robot's success rate jumped from around 20% to 50% on the block grab task. Motion is noticeably smoother and the arm no longer overcorrects between action chunks. This is the single biggest improvement we've seen from a parameter change.

May 29, 2026

New cart setup and temporal averaging experiments planned

Thavin is building a new, more versatile setup mounted on a cart with improved camera positioning for better image acquisition. A recording protocol needs to be established for this new configuration. Tomorrow (May 30) we'll run rollouts with different values for the temporal averaging parameter. We're expecting improvements in both smoothness and success rate.

May 25, 2026

First rollout on 150-episode model: too fast, claw timing off

Tested the policy trained on 150 episodes of a single block grab. Two clear failure modes: the robot moves too fast, and the claw doesn't fully close before lifting. The arm grabs and immediately pulls up before the fingers have secured the block. Next recording session will focus on slower, more deliberate movements and spending more time holding the block against the surface before rising.

May 22, 2026

150 episodes recorded, ACT training complete

Thavin recorded 150 episodes of the blue block grab task. Training with ACT is complete (act_red_blue_lr1e-05). Attempting inference today to evaluate the model qualitatively. If performance looks promising, we'll move to quantitative logging with a defined success protocol.

Week of March 29, 2026

Jetson training bottleneck: 1000 steps takes too long

Training ACT directly on the Jetson is not viable at this scale. Even 1,000 steps runs extremely slowly. Next steps: find optimal on-device training conditions and set up a third camera to increase dataset diversity and reduce future out-of-distribution failures.

Week of March 22, 2026

ACT successfully trained on tape-boundary task

Training ACT on the task of placing a block within a taped boundary is working. The model completes the task autonomously. Next: set up the third camera angle and begin investigating SmolVLA as a potential upgrade in generalisability.

Week of March 15, 2026

Jetson setup complete, first training run, dataset diversity issues identified

Transferred primary teleoperation and inference to the Jetson (Jetpack 6.2). LeRobot commands run natively. New camera setup established: side camera + grip camera. Recorded 100 episodes of placing a block within tape boundaries. First ACT training run showed erratic performance. The model overfits and goes out of distribution whenever the scene deviates slightly from what it saw in training. One model plateaued by setting the wrist roll motor to a constant value. Verdict: hardware and calibration are solid. The issue is dataset diversity. Need broader training sweeps and more varied starting positions.