Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
include VERSION
recursive-include configs/ *
recursive-include embodichain/gen_sim/action_agent_pipeline/generation/templates *.json
197 changes: 45 additions & 152 deletions docs/source/features/generative_sim/agents.md
Original file line number Diff line number Diff line change
@@ -1,175 +1,68 @@
# EmbodiAgent(aborted)
# Action Agent Pipeline

EmbodiAgent is a hierarchical multi-agent system that enables robots to perform complex manipulation tasks through closed-loop planning, code generation, and validation. The system combines vision-language models (VLMs) and large language models (LLMs) to translate high-level goals into executable robot actions.
The action-agent pipeline is the supported agent workflow for generated tabletop
manipulation tasks. It converts an image or an existing generated gym project
into a task-specific simulation config, asks the task model for a JSON task
graph, compiles that graph into atomic-action specs, and executes it through the
`AtomicActionsAgent-v3` environment.

## Quick Start
The legacy Python-code generation agent stack has been removed. New demos and
task generation should use the modules under
`embodichain.gen_sim.action_agent_pipeline`.

### Prerequisites
Ensure you have access to Azure OpenAI or a compatible LLM endpoint.
## End-to-end Pipeline

```bash
# Set environment variables
export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
```

### Using Different LLM/VLM APIs
Run image-to-scene, config generation, and agent execution in one command:

The system uses LangChain's `AzureChatOpenAI` by default. To use different LLM/VLM providers, you can modify the `create_llm` function in `embodichain/agents/hierarchy/llm.py`.

#### Azure OpenAI
```bash
export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
export OPENAI_API_VERSION="2024-10-21" # Optional, defaults to "2024-10-21"
python -m embodichain.gen_sim.action_agent_pipeline.cli.run_agent_pipeline \
--use-image2scene \
--server "http://127.0.0.1:4523" \
--image-name "demo1" \
--task_description "Pick up the target object and place it in the basket." \
--config-output-dir "gym_project/action_agent_pipeline/configs/demo1_text" \
--task_name "Demo1_Text" \
--target_body_scale 0.8 \
--regenerate
```

#### OpenAI
To use OpenAI directly instead of Azure, modify `llm.py`:
```python
from langchain_openai import ChatOpenAI
## Generate Config Only

def create_llm(*, temperature=0.0, model="gpt-4o"):
return ChatOpenAI(
temperature=temperature,
model=model,
api_key=os.getenv("OPENAI_API_KEY"),
)
```
Use an existing gym project to generate the task config and agent config:

Then set:
```bash
export OPENAI_API_KEY="your-api-key"
```

#### Other Providers
You can use other LangChain-compatible providers by modifying the `create_llm` function, for example:

**Anthropic Claude:**
```python
from langchain_anthropic import ChatAnthropic

def create_llm(*, temperature=0.0, model="claude-3-opus-20240229"):
return ChatAnthropic(
temperature=temperature,
model=model,
anthropic_api_key=os.getenv("ANTHROPIC_API_KEY"),
)
python -m embodichain.gen_sim.action_agent_pipeline.cli.generate_action_agent_config \
--gym_project "gym_project/environment/image2tabletop/downloads/example_gym_project" \
--output_dir "gym_project/action_agent_pipeline/configs/demo_text" \
--task_name "Demo_Text" \
--task_description "Pick up the target object and place it in the basket." \
--target_body_scale 0.8 \
--overwrite
```

**Google Gemini:**
```python
from langchain_google_genai import ChatGoogleGenerativeAI
## Run Generated Config

def create_llm(*, temperature=0.0, model="gemini-pro"):
return ChatGoogleGenerativeAI(
temperature=temperature,
model=model,
google_api_key=os.getenv("GOOGLE_API_KEY"),
)
```

### Run the System

Run the agent system with the following command:
Run a previously generated config with the action-agent environment:

```bash
python embodichain/lab/scripts/run_agent.py \
--task_name YourTask \
--gym_config configs/gym/your_task/gym_config.yaml \
--agent_config configs/gym/agent/your_agent/agent_config.json \
--regenerate False
python -m embodichain.gen_sim.action_agent_pipeline.cli.run_agent \
--task_name "Demo_Text" \
--gym_config "gym_project/action_agent_pipeline/configs/demo_text/fast_gym_config.json" \
--agent_config "gym_project/action_agent_pipeline/configs/demo_text/agent_config.json" \
--regenerate
```

**Parameters:**
- `--task_name`: Name identifier for the task
- `--gym_config`: Path to the gym environment configuration file (``.json``, ``.yaml``, or ``.yml``)
- `--agent_config`: Path to the agent configuration file (defines prompts and agent behavior)
- `--regenerate`: If `True`, forces regeneration of plans/code even if cached

## System Architecture

The system operates on a closed-loop control cycle:

- **Observe**: The `TaskAgent` perceives the environment via multi-view camera inputs.
- **Plan**: It decomposes the goal into natural language steps.
- **Code**: The `CodeAgent` translates steps into executable Python code using atomic actions.
- **Execute**: The code runs in the environment; runtime errors are caught immediately.
- **Validate**: The `ValidationAgent` analyzes the result images, selects the best camera angle, and judges success.
- **Refine**: If validation fails, feedback is sent back to the agents to regenerate the plan or code.

---

## Core Components

### TaskAgent
*Located in:* `embodichain/agents/hierarchy/task_agent.py`

Responsible for high-level reasoning. It parses visual observations and outputs a structured plan.

* For every step, it generates a specific condition (e.g., "The cup must be held by the gripper") which is used later by the ValidationAgent.
* Prompt Strategies:
* `one_stage_prompt`: Direct VLM-to-Plan generation.
* `two_stage_prompt`: Separates visual analysis from planning logic.

### CodeAgent
*Located in:* `embodichain/agents/hierarchy/code_agent.py`

Translates natural language plans into executable Python code using atomic actions from the action bank.

* Generates Python code that follows strict coding guidelines (no loops, only provided APIs)
* Executes code in a sandboxed environment with immediate error detection
* Uses Abstract Syntax Tree (AST) parsing to ensure code safety and correctness
* Supports few-shot learning through code examples in the configuration


### ValidationAgent
*Located in:* `embodichain/agents/hierarchy/validation_agent.py`

Closes the loop by verifying if the robot actually achieved what it planned.

* Uses a specialized LLM call (`select_best_view_dir`) to analyze images from all cameras and pick the single best angle that proves the action's outcome, ignoring irrelevant views.
* If an error occurs (runtime or logic), it generates a detailed explanation which is fed back to the `TaskAgent` or `CodeAgent` for the next attempt.

---

## Configuration Guide

The `Agent` configuration block controls the context provided to the LLMs. Prompt files are resolved in the following order:

1. **Config directory**: Task-specific prompt files in the same directory as the agent configuration file (e.g., `configs/gym/agent/pour_water_agent/`)
2. **Default prompts directory**: Reusable prompt templates in `embodichain/agents/prompts/`

| Parameter | Description | Typical Use |
| :--- | :--- | :--- |
| `task_prompt` | Task-specific goal description | "Pour water from the red cup to the blue cup." |
| `basic_background` | Physical rules & constraints | World coordinate system definitions, safety rules. |
| `atom_actions` | API Documentation | List of available functions (e.g., `drive(action='pick', ...)`). |
| `code_prompt` | Coding guidelines | "Use provided APIs only. Do not use loops." |
| `code_example` | Few-shot examples | Previous successful code snippets to guide style. |

---

## File Structure

```text
embodichain/agents/
├── hierarchy/
│ ├── agent_base.py # Abstract base handling prompts & images
│ ├── task_agent.py # Plan generation logic
│ ├── code_agent.py # Code generation & AST execution engine
│ ├── validation_agent.py # Visual analysis & view selection
│ └── llm.py # LLM configuration and instances
├── mllm/
│ └── prompt/ # Prompt templates (LangChain)
└── prompts/ # Agent prompt templates
```
## Runtime Shape

---
- `TaskAgent` produces a deterministic JSON graph.
- `CompileAgent` caches and validates the graph artifact.
- `AgenticGenSimEnv` registers `AtomicActionsAgent-v3` and exposes
`create_demo_action_list()`.
- Runtime graph execution calls atomic actions from
`embodichain.gen_sim.action_agent_pipeline.runtime`.

## See Also

- [Online Data Streaming](../online_data.md) — Streaming live simulation data for training
- [RL Architecture](../../overview/rl/index.rst) — RL training pipeline and algorithms
- [Atomic Actions Tutorial](../../tutorial/atomic_actions.rst) — Action primitives used by the CodeAgent
- [SimReady Asset Pipeline](simready_pipeline.md) — Generating simulation-ready assets
- [Atomic Actions Tutorial](../../tutorial/atomic_actions.rst) — Atomic action primitives
- [Supported Tasks](../../resources/task/index.rst) — Available task environments
1 change: 1 addition & 0 deletions docs/source/features/generative_sim/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ Generative Simulation collects EmbodiChain features for generating simulation-re
.. toctree::
:maxdepth: 2

Action Agent Pipeline <agents.md>
SimReady Asset Pipeline <simready_pipeline.md>
21 changes: 21 additions & 0 deletions embodichain/gen_sim/action_agent_pipeline/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2021-2026 DexForce Technology Co., Ltd.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ----------------------------------------------------------------------------

from __future__ import annotations

"""Action-agent graph compilation and atomic-action runtime."""

__all__: list[str] = []
24 changes: 24 additions & 0 deletions embodichain/gen_sim/action_agent_pipeline/agents/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2021-2026 DexForce Technology Co., Ltd.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ----------------------------------------------------------------------------

from __future__ import annotations

__all__ = [
"agent_base",
"compile_agent",
"llm",
"task_agent",
]
96 changes: 96 additions & 0 deletions embodichain/gen_sim/action_agent_pipeline/agents/agent_base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2021-2026 DexForce Technology Co., Ltd.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ----------------------------------------------------------------------------

from __future__ import annotations

from abc import ABCMeta
import os

from embodichain.utils.utility import load_txt

__all__ = ["AgentBase"]


def _resolve_prompt_path(file_name: str, config_dir: str | None = None) -> str:
# If absolute path, use directly
if os.path.isabs(file_name):
if os.path.exists(file_name):
return file_name
raise FileNotFoundError(f"Prompt file not found: {file_name}")

# Try config directory first (for task-specific prompts)
if config_dir:
config_path = os.path.join(config_dir, file_name)
if os.path.exists(config_path):
return config_path

# Try action_agent_pipeline/prompts directory for reusable prompts.
agents_prompts_dir = os.path.join(
os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "prompts"
)
agents_path = os.path.join(agents_prompts_dir, file_name)
if os.path.exists(agents_path):
return agents_path

# If still not found, raise error with search paths
searched_paths = []
if config_dir:
searched_paths.append(f" - {config_dir}/{file_name}")
searched_paths.append(f" - {agents_prompts_dir}/{file_name}")

raise FileNotFoundError(
f"Prompt file not found: {file_name}\n"
f"Searched in:\n" + "\n".join(searched_paths)
)


class AgentBase(metaclass=ABCMeta):
def __init__(self, **kwargs) -> None:

assert (
"prompt_kwargs" in kwargs.keys()
), "Key prompt_kwargs must exist in config."

for key, value in kwargs.items():
setattr(self, key, value)

# Get config directory if provided
config_dir = kwargs.get("config_dir", None)
if config_dir:
config_dir = os.path.dirname(os.path.abspath(config_dir))

# Preload and store prompt contents inside self.prompt_kwargs
for key, val in self.prompt_kwargs.items():
if val["type"] == "text":
file_path = _resolve_prompt_path(val["name"], config_dir)
val["content"] = load_txt(file_path)
else:
raise ValueError(
f"Now only support `text` type but {val['type']} is given."
)

def generate(self, *args, **kwargs):
pass

def act(self, *args, **kwargs):
pass

def get_composed_observations(self, **kwargs):
ret = {}
for key, val in self.prompt_kwargs.items():
ret[key] = val["content"]
ret.update(kwargs)
return ret
Loading
Loading