Skip to content

MaureenZOU/worldstring

Repository files navigation

WorldString: Actionable World Representation

WorldString is a neural-based interactive digital twin for skinning, articulable, and soft objects. It takes keypoint-based state as input and produces 3D point clouds as output, and is capable of modeling the state manifold of real-world objects by learning directly from point clouds or RGB-D video streams.

Links: Paper (arXiv) · Project Page · Data Generation · Code (this repo)

Release Plan

Status Milestone
Project page
Data generation code (sim and real world)
Checkpoints and visualize demo (XHand, Unitree Go2, Unitree H1)
Open-source training data
Training code and model

Local demo preview

Local checkpoint visualization (python demo_worldstring.py): drag joint sliders and compare simulator ground truth (left) vs. neural prediction (right) for XHand, Go2, and H1 in one page.

Local Gradio demo — auto-looping preview

Auto-looping preview of the unified Gradio demo. Full recording (MP4)


Installation (Local Demo)

The instructions below set up a Conda environment for checkpoint-based interactive visualization. No training data download is required for the demo.

Supported robots

Robot Simulator Checkpoint Demo tab
XHand Right PyBullet ckpts/xhand_right/xhand_best.pth XHand Right
Unitree Go2 MuJoCo ckpts/go2/go2_best.pth Unitree Go2
Unitree H1 MuJoCo ckpts/h1/h1_best.pth Unitree H1

1. Prerequisites

  • OS: Linux recommended (Ubuntu 20.04+). macOS may work for CPU-only demo; Windows is not tested.
  • GPU (optional): NVIDIA GPU with CUDA for faster inference. CPU-only works but first render is slow.
  • Conda: Miniconda or Anaconda.
  • Git: to clone this repository.

2. Clone the repository

git clone git@github.com:MaureenZOU/worldstring.git
cd worldstring

3. Create and activate the Conda environment

We recommend Python 3.11:

conda create -n worldstring python=3.11 -y
conda activate worldstring

4. Install PyTorch

With NVIDIA GPU (CUDA 12.x, recommended):

pip install torch --index-url https://download.pytorch.org/whl/cu130

Verify:

python -c "import torch; print(torch.__version__, 'cuda:', torch.cuda.is_available())"

5. Install demo dependencies

pip install \
  gradio \
  numpy \
  scipy \
  pyyaml \
  plotly \
  open3d \
  pybullet \
  mujoco \
  trimesh

Versions tested locally:

Package Version
Python 3.11
torch 2.12
gradio 6.16
mujoco 3.9
pybullet 3.2
open3d 0.19
plotly 6.8
trimesh 4.12

Note: flash_attn is optional. If it is not installed, the model automatically falls back to PyTorch scaled dot-product attention.

6. Prepare checkpoints and demo assets

Ensure the following layout exists under ckpts/:

ckpts/
├── xhand_right/
│   ├── xhand_best.pth
│   ├── config.yaml
│   └── demo/current_pose/          # created at runtime; pose_current.da written here
├── go2/
│   ├── go2_best.pth
│   ├── config.yaml
│   └── demo/current_pose/
│       ├── go2_init_state.da       # reference init frame for keypoint binding
│       ├── init_world_min_max.json # normalization bounds for keypoints / inference
│       └── pose_joint_state_init.json
└── h1/
    ├── h1_best.pth
    ├── config.yaml
    └── demo/current_pose/
        ├── h1_init_state.da
        ├── init_world_min_max.json
        └── pose_joint_state_init.json

Robot meshes / URDFs are already under assets/:

assets/
├── xhand_right/urdf/xhand_right.urdf
├── go2/go2.xml
└── unitree_h1/h1.xml

If checkpoint files are distributed separately (e.g. Google Drive / Hugging Face), download them into the paths above before launching the demo.

7. Run the unified demo (all three robots, one page)

From the repository root:

conda activate worldstring
python demo_worldstring.py

Open in your browser:

http://127.0.0.1:6040

Use the tabs XHand Right, Unitree Go2, and Unitree H1 to switch robots. Each tab has joint sliders on the left and two point-cloud views on the right:

  • Left plot: ground truth from the simulator mesh (PyBullet or MuJoCo)
  • Right plot: neural prediction from the loaded checkpoint

Click Submit after moving sliders, or Reset to Initial Pose to restore the default configuration.

First load: the page runs inference for all three robots once at startup and may take 1–3 minutes depending on GPU/CPU. Subsequent updates per tab are faster.

8. Run individual demos (optional)

Script Robot URL
python demo_xhand.py XHand http://127.0.0.1:6037
python demo_go2_mujoco.py Go2 http://127.0.0.1:6038
python demo_h1_mujoco.py H1 http://127.0.0.1:6039

9. Coordinate frames (visualization)

Robot GT & neural panels
XHand Y-up normalized training frame
Go2 / H1 MuJoCo Z-up world coordinates (inference output is denormalized with init_world_min_max.json)

Model inputs are always normalized keypoints written to pose_current.da at runtime; only the displayed point clouds are transformed for consistent viewing.


Repository layout (overview)

worldstring/
├── assets/              # Robot URDF / MJCF and meshes
├── ckpts/               # Pretrained weights, configs, demo init data
├── config/              # YAML config loader
├── dataset/             # Inference-time keypoint / voxel dataset
├── modeling/            # AWR model (PolytopeModel)
├── robot_backends/      # PyBullet / MuJoCo robots, keypoint trackers, demo sessions
├── demo_worldstring.py  # Unified Gradio demo (recommended)
├── demo_xhand.py
├── demo_go2_mujoco.py
└── demo_h1_mujoco.py

Training data generation (sim + real world) lives in the separate repo WorldString_data_gen.


Citation

If you use WorldString in your research, please cite:

@misc{xu2026worldstringactionableworldrepresentation,
      title={WorldString: Actionable World Representation},
      author={Kunqi Xu and Jitao Li and Jianglong Ye and Tianshu Tang and Isabella Liu and Sifei Liu and Xueyan Zou},
      year={2026},
      eprint={2605.18743},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.18743},
}

About

[Highlight] Official implementation for Actionable World Representation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages