Unitree G1 - Phase 1: Baseline Locomotion (Frozen Arms)
This checkpoint represents a trained policy for the Unitree G1 humanoid robot, focusing on stable bipedal locomotion with frozen arms.
Model Description
This is a PPO-trained locomotion policy for the Unitree G1 29-DOF humanoid robot, trained in NVIDIA Isaac Gym. Phase 1 focuses on establishing baseline walking behavior with arms held in a fixed position, allowing the legs and torso to learn stable locomotion patterns.
Training Framework: Isaac Gym (GPU-accelerated physics simulation) Algorithm: Proximal Policy Optimization (PPO) Robot: Unitree G1 (29 DOF humanoid) Policy Type: Actor-Critic with continuous actions
Training Details
Configuration
- DOF: 29 (12 legs + 3 waist + 14 arms)
- Active DOF: 15 (legs + waist only, arms frozen)
- Parallel Environments: 4096
- Training Device: NVIDIA GeForce RTX 4080 SUPER
- Total Iterations: 5,000
- Training Time: ~1.2 hours
- Training Speed: 4,253 iterations/hour (~89,000-94,000 steps/second)
Training Command
python train_rl.py task=g1_curriculum_phase1 headless=true
Hyperparameters
- Learning Rate: Default PPO settings
- Batch Size: 4096 environments
- Horizon Length: 24 steps
- Discount Factor (gamma): 0.99
- GAE Lambda: 0.95
Reward Design
- Alive Reward: 5.0 (primary focus on survival and stability)
- Arm Tracking: 0.0 (arms frozen in default position)
- Base Height: Maintains upright posture
- Orientation: Penalizes tilting
- Linear Velocity Tracking: Encourages forward motion
Performance Metrics
| Metric | Value |
|---|---|
| Mean Reward | 69.46 |
| Episode Length | 836.7 steps (83% of max) |
| Alive Reward | 3.9894 |
| Noise Std | 3.78 |
| Checkpoint Iteration | 5000 |
Training Progress
- Initial mean reward: ~0.0 (random policy)
- Final mean reward: ~69.46 (steady improvement)
- Episode lengths reached near-maximum (1000 steps)
- Training converged smoothly without reward collapse
Validation
✅ Isaac Gym Playback: Robot walks stably with smooth forward locomotion and frozen arms ✅ No Falls: No instability observed during policy rollout ⏳ MuJoCo Sim2Sim: Validation pending
Usage
Prerequisites
pip install torch isaacgym
Load and Run Policy
import torch
# Load checkpoint
checkpoint = torch.load("model_5000.pt")
# Extract policy network
policy = checkpoint['model'] # Adjust key based on checkpoint structure
# Use policy for inference
# (Requires Isaac Gym environment setup)
Observation Space
- Base linear velocity
- Base angular velocity
- Projected gravity
- Joint positions
- Joint velocities
- Previous actions
Action Space
- 29 continuous actions (normalized joint position targets)
- Only 15 DOF are active (legs + waist)
- Arms receive zero-gain actions (effectively frozen)
Limitations
- Arms are frozen in default position (no manipulation capability)
- Trained only for forward locomotion on flat terrain
- No arm tracking or upper-body control
- Sim-to-real transfer not yet validated
Next Steps
This checkpoint serves as the foundation for Phase 2 (Arm Awakening), where arm control is gradually introduced while maintaining locomotion stability.
Citation
If you use this model, please cite:
@misc{unitree_g1_phase1,
title={Unitree G1 Phase 1: Baseline Locomotion},
author={PathonAI},
year={2025},
howpublished={\url{https://huggingface.co/[your-username]/unitree-g1-phase1}},
}
Training Logs
- Run ID: Dec12_16-19-24_
- TensorBoard Logs: Available in training repository
- Full Training Log: See EXPERIMENTS_LOG.md in repository
License
MIT License - See repository for full license details