first commit

e6651b9a · richie.li · e6651b9a · e6651b9a · e6651b9a · e6651b9a
Commit e6651b9a authored Oct 23, 2024 by richie.li
--- a/.gitignore
+++ b/.gitignore
+# These are some examples of commonly ignored file patterns.
+# You should customize this list as applicable to your project.
+# Learn more about .gitignore:
+#     https://www.atlassian.com/git/tutorials/saving-changes/gitignore
+
+# Node artifact files
+node_modules/
+dist/
+
+# Compiled Java class files
+*.class
+
+# Compiled Python bytecode
+*.py[cod]
+
+# Log files
+*.log
+
+# Package files
+*.jar
+
+# Maven
+target/
+dist/
+
+# JetBrains IDE
+.idea/
+
+# Unit test reports
+TEST*.xml
+
+# Generated by MacOS
+.DS_Store
+
+# Generated by Windows
+Thumbs.db
+
+# Applications
+*.app
+*.exe
+*.war
+
+# Large media files
+*.mp4
+*.tiff
+*.avi
+*.flv
+*.mov
+*.wmv
+
+# VS Code
+.vscode
+# logs
+logs
+runs
+
+# other
+*.egg-info
+__pycache__
+
+*.swp
+
+MUJOCO_LOG.TXT
\ No newline at end of file
--- a/README.md
+++ b/README.md
+English | [中文](README.zh_CN.md)
+## Introduction
+[AgiBot X1](https://www.zhiyuan-robot.com/qzproduct/169.html) is a modular humanoid robot with high dof developed and open-sourced by AgiBot. It is built upon AgiBot's open-source framework `AimRT` as middleware and using reinforcement learning for locomotion control.
+
+This project is about the reinforcement learning training code used by AgiBot X1. It can be used in conjunction with the [inference software](https://aimrt.org/) provided with AgiBot X1 for real-robot and simulated walking debugging, or be imported to other robot models for training.
+![](doc/id.jpg)
+
+## Start
+
+### Install Dependencies
+1. Create a new Python 3.8 virtual environment:
+   - `conda create -n myenv python=3.8`.
+2. Install pytorch 1.13 and cuda-11.7:
+   - `conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia`
+3. Install numpy-1.23:
+   - `conda install numpy=1.23`.
+4. Install Isaac Gym:
+   - Download and install Isaac Gym Preview 4 from https://developer.nvidia.com/isaac-gym.
+   - `cd isaacgym/python && pip install -e .`
+   - Run an example with `cd examples && python 1080_balls_of_solitude.py`.
+   - Consult `isaacgym/docs/index.html` for troubleshooting.
+6. Install the training code dependencies:
+   - Clone this repository.
+   - `pip install -e .`
+### Usage
+#### Train:
+```python scripts/train.py --task=x1_dh_stand --run_name=<run_name> --headless```
+- The trained model will be saved in `/log/<experiment_name>/exported_data/<date_time><run_name>/model_<iteration>.pt`, where `<experiment_name>` is defined in the config file.
+![](doc/train.gif)
+
+#### Play:
+```python /scripts/play.py --task=x1_dh_stand --load_run=<date_time><run_name>```
+![](doc/play.gif)
+#### Generate the JIT Model:
+``` python scripts/export_policy_dh.py --task=x1_dh_stand --load_run=<date_time><run_name>  ```
+- The JIT model will be saved in ``` log/exported_policies/<date_time>```
+
+#### Generate the ONNX Model:
+``` python scripts/export_onnx_dh.py --task=x1_dh_stand --load_run=<date_time>  ```
+- The ONNX model will be saved at ```log/exported_policies/<date_time>```
+
+#### Parameter Descriptions:
+- task: Task name
+- resume: Resume training from a checkpoint
+- experiment_name:  Name of the experiment to run or load.
+- run_name: Name of the run.
+- load_run: Name of the run to load when resume=True. If -1: will load the last run.
+- checkpoint: Saved model checkpoint number. If -1: will load the last checkpoint.
+- num_envs: Number of environments to create.
+- seed: Random seed.
+- max_iterations: Maximum number of training iterations.
+
+### Add New Environments
+1. Create a new folder under the `envs/` directory, and then create a configuration file `<your_env>_config.py` and an environment file `<your_env>_env.py` in the folder. The two files should inherit `LeggedRobotCfg` and `LeggedRobot` respectively.
+
+2. Place the URDF, mesh, and MJCF files of the new robot in the `resources/` folder.
+- Configure the URDF path, PD gain, body name, default_joint_angles, experiment_name, etc., for the new robot in `<your_env>_config.py`.
+
+3. Register the new robot in `humanoid/envs/__init__.py`.
+### sim2sim
+Use Mujoco for sim2sim validation:
+  ```
+  python scripts/sim2sim.py --task=x1_dh_stand --load_model /path/to/exported_policies/
+  ```
+![](doc/mujoco.gif)
+### Usage of Joystick
+We use the Logitech F710 Joystick. When starting play.py and sim2sim.py, press and hold button 4 while rotating the joystick to control the robot to move forward/backward, strafe left/right or rotate.
+![](doc/joy_map.jpg)
+|         Button           |         Command         |
+| -------------------- |:--------------------:|
+|         4 + 1-        |         Move forward          |
+|         4 + 1+        |         Move backward          |
+|         4 + 0-        |        Strafe left         |
+|         4 + 0+        |        Strafe right         |
+|         4 + 3-        |       Rotate counterclockwise       |
+|         4 + 3+        |       Rotate clockwise       |
+
+
+## Directory Structure
+```
+.
+|— humanoid           # Main code directory
+|  |—algo             # Algorithm directory
+|  |—envs             # Environment directory
+|  |—scripts          # Script directory
+|  |—utilis           # Utility and function directory
+|— logs               # Model directory
+|— resources          # Resource library
+|  |— robots          # Robot urdf, mjcf, mesh
+|— README.md          # README document
+```
+
+> References
+> * [GitHub - leggedrobotics/legged_gym: Isaac Gym Environments for Legged Robots](https://github.com/leggedrobotics/legged_gym)
+> * [GitHub - leggedrobotics/rsl_rl: Fast and simple implementation of RL algorithms, designed to run fully on GPU.](https://github.com/leggedrobotics/rsl_rl)
+> * [GitHub - roboterax/humanoid-gym: Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer https://arxiv.org/abs/2404.05695](https://github.com/roboterax/humanoid-gym)
+
--- a/README.zh_CN.md
+++ b/README.zh_CN.md
+[English](README.md) | 中文
+
+## 简介
+[智元灵犀X1](https://www.zhiyuan-robot.com/qzproduct/169.html) 是由智元研发并开源的模块化、高自由度人形机器人，X1的软件系统基于智元开源组件 `AimRT` 作为中间件实现，并且采用强化学习方法进行运动控制。
+
+本工程为智元灵犀X1所使用的强化学习训练代码，可配合智元灵犀X1配套的[推理软件](https://aimrt.org/)进行真机和仿真的行走调试，或导入其他机器人模型进行训练。
+![](doc/id.jpg)
+
+## 代码运行
+
+### 安装依赖
+1. 创建一个新的python3.8虚拟环境:
+   - `conda create -n myenv python=3.8`.
+2. 安装 pytorch 1.13 和 cuda-11.7:
+   - `conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia`
+3. 安装 numpy-1.23:
+   - `conda install numpy=1.23`.
+4. 安装 Isaac Gym:
+   - 下载并安装 Isaac Gym Preview 4  https://developer.nvidia.com/isaac-gym.
+   - `cd isaacgym/python && pip install -e .`
+   - Run an example with `cd examples && python 1080_balls_of_solitude.py`.
+   - Consult `isaacgym/docs/index.html` for troubleshooting.
+6. 安装训练代码依赖：
+   - Clone this repository.
+   - `pip install -e .`
+
+### 使用
+#### Train:
+```python scripts/train.py --task=x1_dh_stand --run_name=<run_name> --headless```
+- 训练好的模型会存`/log/<experiment_name>/exported_data/<date_time><run_name>/model_<iteration>.pt` 其中 `<experiment_name>` 在config文件中定义.
+![](doc/train.gif)
+
+#### Play:
+```python /scripts/play.py --task=x1_dh_stand --load_run=<date_time><run_name>```
+![](doc/play.gif)
+
+#### 生成jit模型:
+``` python scripts/export_policy_dh.py --task=x1_dh_stand --load_run=<date_time><run_name>  ```
+- jit模型会存在 ``` log/exported_policies/<date_time>```
+
+#### 生成onnx模型:
+``` python scripts/export_onnx_dh.py --task=x1_dh_stand --load_run=<date_time>  ```
+- onnx模型会存在 ```log/exported_policies/<date_time>```
+
+#### 参数说明：
+- task: Task name
+- resume: Resume training from a checkpoint
+- experiment_name:  Name of the experiment to run or load.
+- run_name: Name of the run.
+- load_run: Name of the run to load when resume=True. If -1: will load the last run.
+- checkpoint: Saved model checkpoint number. If -1: will load the last checkpoint.
+- num_envs: Number of environments to create.
+- seed: Random seed.
+- max_iterations: Maximum number of training iterations.
+
+### 添加新环境
+1.在 `envs/`目录下创建一个新文件夹，在新文件夹下创建一个配置文件`<your_env>_config.py`和环境文件`<your_env>_env.py`，这两个文件要分别继承`LeggedRobotCfg`和`LeggedRobot`
+
+2.将新机器的urdf, mesh, mjcf放到 `resources/`文件夹下
+- 在`<your_env>_config.py`里配置新机器的urdf path，PD gain，body name, default_joint_angles, experiment_name等
+
+3.在`humanoid/envs/__init__.py`里注册你的新机器
+
+### sim2sim
+使用mujoco来进行sim2sim验证：
+  ```
+  python scripts/sim2sim.py --task=x1_dh_stand --load_model /path/to/exported_policies/
+  ```
+![](doc/mujoco.gif)
+
+### 手柄使用
+我们使用Logitech f710手柄，在启动play.py和sim2sim.py时，按住4的同时转动摇杆可以控制机器人前后，左右和旋转。
+![](doc/joy_map.jpg)
+|         按键          |         命令         |
+| -------------------- |:--------------------:|
+|         4 + 1-       |         前进          |
+|         4 + 1+       |         后退          |
+|         4 + 0-       |        左平移         |
+|         4 + 0+       |        右平移         |
+|         4 + 3-       |       逆时针旋转       |
+|         4 + 3+       |       顺时针旋转       |
+
+
+## 目录结构
+```
+.
+|— humanoid           # 主要代码目录
+|  |—algo             # 算法目录
+|  |—envs             # 环境目录
+|  |—scripts          # 脚本目录
+|  |—utilis           # 工具、功能目录
+|— logs               # 模型目录
+|— resources          # 资源库
+|  |— robots          # 机器人urdf, mjcf, mesh
+|— README.md          # 说明文档
+```
+
+
+
+> 参考项目:
+>
+> * [GitHub - leggedrobotics/legged_gym: Isaac Gym Environments for Legged Robots](https://github.com/leggedrobotics/legged_gym)
+> * [GitHub - leggedrobotics/rsl_rl: Fast and simple implementation of RL algorithms, designed to run fully on GPU.](https://github.com/leggedrobotics/rsl_rl)
+> * [GitHub - roboterax/humanoid-gym: Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer https://arxiv.org/abs/2404.05695](https://github.com/roboterax/humanoid-gym)
+
+
+
--- a/doc/id.jpg
+++ b/doc/id.jpg
--- a/doc/joy_map.jpg
+++ b/doc/joy_map.jpg
--- a/doc/mujoco.gif
+++ b/doc/mujoco.gif
--- a/doc/play.gif
+++ b/doc/play.gif
--- a/doc/train.gif
+++ b/doc/train.gif
--- a/humanoid/__init__.py
+++ b/humanoid/__init__.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import os
+
+LEGGED_GYM_ROOT_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
+LEGGED_GYM_ENVS_DIR = os.path.join(LEGGED_GYM_ROOT_DIR, 'humanoid', 'envs')
--- a/humanoid/algo/__init__.py
+++ b/humanoid/algo/__init__.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+from .vec_env import VecEnv
+from .ppo import *
\ No newline at end of file
--- a/humanoid/algo/ppo/__init__.py
+++ b/humanoid/algo/ppo/__init__.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+from .dh_ppo import DHPPO
+from .dh_on_policy_runner import DHOnPolicyRunner
+from .actor_critic_dh import ActorCriticDH
+from .rollout_storage import RolloutStorage
--- a/humanoid/algo/ppo/actor_critic_dh.py
+++ b/humanoid/algo/ppo/actor_critic_dh.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import torch
+import torch.nn as nn
+from torch.distributions import Normal
+
+class ActorCriticDH(nn.Module):
+    def __init__(self,  num_short_obs,
+                        num_proprio_obs,
+                        num_critic_obs,
+                        num_actions,
+                        actor_hidden_dims=[256, 256, 256],
+                        critic_hidden_dims=[256, 256, 256],
+                        state_estimator_hidden_dims=[256, 128, 64],
+                        in_channels = 66,
+                        kernel_size=[6, 4],
+                        filter_size=[32, 16],
+                        stride_size=[3, 2],
+                        lh_output_dim=64,
+                        init_noise_std=1.0,
+                        activation = nn.ELU(),
+                        **kwargs):
+        if kwargs:
+            print("ActorCriticDH.__init__ got unexpected arguments, which will be ignored: " + str([key for key in kwargs.keys()]))
+        super(ActorCriticDH, self).__init__()
+
+        
+        # define actor net and critic net
+        # self.num_short_obs = int(cfg.env.num_single_obs * cfg.env.short_frame_stack), 5 history
+        # lh_output_dim is cnn output
+        # 3 is state estimator output
+        mlp_input_dim_a = num_short_obs + lh_output_dim + 3
+        # num_privileged_obs = int(c_frame_stack * single_num_privileged_obs), 3 history
+        mlp_input_dim_c = num_critic_obs
+
+        # Policy
+        actor_layers = []
+        actor_layers.append(nn.Linear(mlp_input_dim_a, actor_hidden_dims[0]))
+        actor_layers.append(activation)
+        for l in range(len(actor_hidden_dims)):
+            if l == len(actor_hidden_dims) - 1:
+                # num_actions policy output(12)
+                actor_layers.append(nn.Linear(actor_hidden_dims[l], num_actions))
+            else:
+                actor_layers.append(nn.Linear(actor_hidden_dims[l], actor_hidden_dims[l + 1]))
+                actor_layers.append(activation)
+        self.actor = nn.Sequential(*actor_layers)
+
+        # Value function
+        critic_layers = []
+        critic_layers.append(nn.Linear(mlp_input_dim_c, critic_hidden_dims[0]))
+        critic_layers.append(activation)
+        for l in range(len(critic_hidden_dims)):
+            if l == len(critic_hidden_dims) - 1:
+                critic_layers.append(nn.Linear(critic_hidden_dims[l], 1))
+            else:
+                critic_layers.append(nn.Linear(critic_hidden_dims[l], critic_hidden_dims[l + 1]))
+                critic_layers.append(activation)
+        self.critic = nn.Sequential(*critic_layers)
+
+        print(f"Actor MLP: {self.actor}")
+        print(f"Critic MLP: {self.critic}")
+
+        # Action noise
+        self.std = nn.Parameter(init_noise_std * torch.ones(num_actions))
+        self.distribution = None
+        # disable args validation for speedup
+        Normal.set_default_validate_args = False
+        
+        #define long_history CNN
+        long_history_layers = []
+        self.in_channels = in_channels
+        cnn_output_dim = num_proprio_obs
+        for out_channels, kernel_size, stride_size in zip(filter_size, kernel_size, stride_size):
+            long_history_layers.append(nn.Conv1d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride_size))
+            long_history_layers.append(nn.ReLU())
+            cnn_output_dim = (cnn_output_dim - kernel_size + stride_size) // stride_size
+            in_channels = out_channels
+        cnn_output_dim *= out_channels
+        long_history_layers.append(nn.Flatten())
+        long_history_layers.append(nn.Linear(cnn_output_dim, 128))
+        long_history_layers.append(nn.ELU())
+        long_history_layers.append(nn.Linear(128, lh_output_dim))
+        self.long_history = nn.Sequential(*long_history_layers)
+        print(f"long_history CNN: {self.long_history}")
+        
+        #define state_estimator MLP
+        # self.num_short_obs = int(cfg.env.num_single_obs * cfg.env.short_frame_stack), 5 history
+        self.num_short_obs = num_short_obs
+        state_estimator_input_dim = num_short_obs
+        state_estimator_output_dim = 3
+        state_estimator_layers = []
+        state_estimator_layers.append(nn.Linear(state_estimator_input_dim, state_estimator_hidden_dims[0]))
+        state_estimator_layers.append(activation)
+        for l in range(len(state_estimator_hidden_dims)):
+            if l == len(state_estimator_hidden_dims) - 1:
+                state_estimator_layers.append(nn.Linear(state_estimator_hidden_dims[l], state_estimator_output_dim))
+            else:
+                state_estimator_layers.append(nn.Linear(state_estimator_hidden_dims[l], state_estimator_hidden_dims[l + 1]))
+                state_estimator_layers.append(activation)
+        self.state_estimator = nn.Sequential(*state_estimator_layers)
+        print(f"state_estimator MLP: {self.state_estimator}")
+        
+        self.num_proprio_obs = num_proprio_obs
+
+    @staticmethod
+    # not used at the moment
+    def init_weights(sequential, scales):
+        [torch.nn.init.orthogonal_(module.weight, gain=scales[idx]) for idx, module in
+         enumerate(mod for mod in sequential if isinstance(mod, nn.Linear))]
+
+    def reset(self, dones=None):
+        pass
+
+    def forward(self):
+        raise NotImplementedError
+    
+    @property
+    def action_mean(self):
+        return self.distribution.mean
+
+    @property
+    def action_std(self):
+        return self.distribution.stddev
+    
+    @property
+    def entropy(self):
+        return self.distribution.entropy().sum(dim=-1)
+
+    def update_distribution(self, observations):
+        mean = self.actor(observations)
+        self.distribution = Normal(mean, mean*0. + self.std)
+
+    def act(self, observations, **kwargs):
+        short_history = observations[...,-self.num_short_obs:]
+        es_vel = self.state_estimator(short_history)
+        compressed_long_history = self.long_history(observations.view(-1, self.in_channels, self.num_proprio_obs))
+        actor_obs = torch.cat((short_history, es_vel, compressed_long_history),dim=-1)
+        self.update_distribution(actor_obs)
+        return self.distribution.sample()
+    
+    def get_actions_log_prob(self, actions):
+        return self.distribution.log_prob(actions).sum(dim=-1)
+
+    def act_inference(self, observations):
+        short_history = observations[...,-self.num_short_obs:]
+        es_vel = self.state_estimator(short_history)
+        compressed_long_history = self.long_history(observations.view(-1, self.in_channels, self.num_proprio_obs))
+        actor_obs = torch.cat((short_history, es_vel, compressed_long_history),dim=-1)
+        actions_mean = self.actor(actor_obs)
+        return actions_mean
+
+    def evaluate(self, critic_observations, **kwargs):
+        value = self.critic(critic_observations)
+        return value
\ No newline at end of file
--- a/humanoid/algo/ppo/dh_on_policy_runner.py
+++ b/humanoid/algo/ppo/dh_on_policy_runner.py
--- a/humanoid/algo/ppo/dh_ppo.py
+++ b/humanoid/algo/ppo/dh_ppo.py
--- a/humanoid/algo/ppo/rollout_storage.py
+++ b/humanoid/algo/ppo/rollout_storage.py
--- a/humanoid/algo/vec_env.py
+++ b/humanoid/algo/vec_env.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import torch
+from typing import Tuple, Union
+from abc import ABC, abstractmethod
+
+# minimal interface of the environment
+class VecEnv(ABC):
+    num_envs: int
+    num_obs: int
+    num_short_obs: int
+    num_privileged_obs: int
+    num_actions: int
+    max_episode_length: int
+    privileged_obs_buf: torch.Tensor
+    obs_buf: torch.Tensor 
+    rew_buf: torch.Tensor
+    reset_buf: torch.Tensor
+    episode_length_buf: torch.Tensor # current episode duration
+    extras: dict
+    device: torch.device
+    @abstractmethod
+    def step(self, actions: torch.Tensor) -> Tuple[torch.Tensor, Union[torch.Tensor, None], torch.Tensor, torch.Tensor, dict]:
+        pass
+    @abstractmethod
+    def reset(self, env_ids: Union[list, torch.Tensor]):
+        pass
+    @abstractmethod
+    def get_observations(self) -> torch.Tensor:
+        pass
+    @abstractmethod
+    def get_privileged_observations(self) -> Union[torch.Tensor, None]:
+        pass
\ No newline at end of file
--- a/humanoid/envs/__init__.py
+++ b/humanoid/envs/__init__.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+from humanoid import LEGGED_GYM_ROOT_DIR, LEGGED_GYM_ENVS_DIR
+from .base.legged_robot import LeggedRobot
+
+from .x1.x1_dh_stand_config  import X1DHStandCfg, X1DHStandCfgPPO
+
+from .x1.x1_dh_stand_env import X1DHStandEnv
+
+from humanoid.utils.task_registry import task_registry
+
+task_registry.register( "x1_dh_stand", X1DHStandEnv, X1DHStandCfg(), X1DHStandCfgPPO() )
--- a/humanoid/envs/base/base_config.py
+++ b/humanoid/envs/base/base_config.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import inspect
+
+class BaseConfig:
+    def __init__(self) -> None:
+        """ Initializes all member classes recursively. Ignores all namse starting with '__' (buit-in methods)."""
+        self.init_member_classes(self)
+    
+    @staticmethod
+    def init_member_classes(obj):
+        # iterate over all attributes names
+        for key in dir(obj):
+            # disregard builtin attributes
+            # if key.startswith("__"):
+            if key=="__class__":
+                continue
+            # get the corresponding attribute object
+            var =  getattr(obj, key)
+            # check if it the attribute is a class
+            if inspect.isclass(var):
+                # instantate the class
+                i_var = var()
+                # set the attribute to the instance instead of the type
+                setattr(obj, key, i_var)
+                # recursively init members of the attribute
+                BaseConfig.init_member_classes(i_var)
\ No newline at end of file
--- a/humanoid/envs/base/base_task.py
+++ b/humanoid/envs/base/base_task.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import sys
+from isaacgym import gymapi
+from isaacgym import gymutil
+import numpy as np
+import torch
+
+# Base class for RL tasks
+
+
+class BaseTask():
+
+    def __init__(self, cfg, sim_params, physics_engine, sim_device, headless):
+        self.gym = gymapi.acquire_gym()
+
+        self.sim_params = sim_params
+        self.physics_engine = physics_engine
+        self.sim_device = sim_device
+        sim_device_type, self.sim_device_id = gymutil.parse_device_str(
+            self.sim_device)
+        self.headless = headless
+
+        # env device is GPU only if sim is on GPU and use_gpu_pipeline=True, otherwise returned tensors are copied to CPU by physX.
+        if sim_device_type == 'cuda' and sim_params.use_gpu_pipeline:
+            self.device = self.sim_device
+        else:
+            self.device = 'cpu'
+
+        # graphics device for rendering, -1 for no rendering
+        self.graphics_device_id = self.sim_device_id
+        if self.headless == True:
+            self.graphics_device_id = -1
+
+        self.num_envs = cfg.env.num_envs
+        self.num_obs = cfg.env.num_observations
+        self.num_short_obs = int(cfg.env.num_single_obs * cfg.env.short_frame_stack)
+        self.num_privileged_obs = cfg.env.num_privileged_obs
+        self.num_actions = cfg.env.num_actions
+        self.num_single_obs = cfg.env.num_single_obs
+
+        # optimization flags for pytorch JIT
+        torch._C._jit_set_profiling_mode(False)
+        torch._C._jit_set_profiling_executor(False)
+
+        # allocate buffers
+        self.obs_buf = torch.zeros(
+            self.num_envs, self.num_obs, device=self.device, dtype=torch.float)
+        self.rew_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.float)
+        # new reward buffers for exp rewrads
+        self.neg_reward_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.float)
+        self.pos_reward_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.float)
+
+        self.reset_buf = torch.ones(
+            self.num_envs, device=self.device, dtype=torch.long)
+        self.episode_length_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.long)
+        self.time_out_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.bool)
+        if self.num_privileged_obs is not None:
+            self.privileged_obs_buf = torch.zeros(
+                self.num_envs, self.num_privileged_obs, device=self.device, dtype=torch.float)
+        else:
+            self.privileged_obs_buf = None
+
+        self.extras = {}
+
+        # create envs, sim and viewer
+        self.create_sim()
+        self.gym.prepare_sim(self.sim)
+        self.enable_viewer_sync = True
+        self.viewer = None
+
+        # if running with a viewer, set up keyboard shortcuts and camera
+        if self.headless == False:
+            # subscribe to keyboard shortcuts
+            self.viewer = self.gym.create_viewer(
+                self.sim, gymapi.CameraProperties())
+            self.gym.subscribe_viewer_keyboard_event(
+                self.viewer, gymapi.KEY_ESCAPE, "QUIT")
+            self.gym.subscribe_viewer_keyboard_event(
+                self.viewer, gymapi.KEY_V, "toggle_viewer_sync")
+
+            camera_properties = gymapi.CameraProperties()
+            camera_properties.width = 720
+            camera_properties.height = 480
+            camera_handle = self.gym.create_camera_sensor(
+                self.envs[0], camera_properties)
+            self.camera_handle = camera_handle
+        else:
+            # pass
+            camera_properties = gymapi.CameraProperties()
+            camera_properties.width = 720
+            camera_properties.height = 480
+            camera_handle = self.gym.create_camera_sensor(
+                self.envs[0], camera_properties)
+            self.camera_handle = camera_handle
+
+    def get_observations(self):
+        return self.obs_buf
+
+    def get_privileged_observations(self):
+        return self.privileged_obs_buf
+
+    def get_rma_observations(self):
+        return self.rma_obs_buf
+
+    def reset_idx(self, env_ids):
+        """Reset selected robots"""
+        raise NotImplementedError
+
+    def reset(self):
+        """ Reset all robots"""
+        self.reset_idx(torch.arange(self.num_envs, device=self.device))
+        obs, privileged_obs, _, _, _ = self.step(torch.zeros(
+            self.num_envs, self.num_actions, device=self.device, requires_grad=False))
+        return obs, privileged_obs
+
+    def step(self, actions):
+        raise NotImplementedError
+
+    def render(self, sync_frame_time=True):
+        if self.viewer:
+            # check for window closed
+            if self.gym.query_viewer_has_closed(self.viewer):
+                sys.exit()
+
+            # check for keyboard events
+            for evt in self.gym.query_viewer_action_events(self.viewer):
+                if evt.action == "QUIT" and evt.value > 0:
+                    sys.exit()
+                elif evt.action == "toggle_viewer_sync" and evt.value > 0:
+                    self.enable_viewer_sync = not self.enable_viewer_sync
+
+            # fetch results
+            if self.device != 'cpu':
+                self.gym.fetch_results(self.sim, True)
+
+            # step graphics
+            if self.enable_viewer_sync:
+                self.gym.step_graphics(self.sim)
+                self.gym.draw_viewer(self.viewer, self.sim, True)
+                if sync_frame_time:
+                    self.gym.sync_frame_time(self.sim)
+            else:
+                self.gym.poll_viewer_events(self.viewer)
--- a/humanoid/envs/base/legged_robot.py
+++ b/humanoid/envs/base/legged_robot.py
--- a/humanoid/envs/base/legged_robot_config.py
+++ b/humanoid/envs/base/legged_robot_config.py
--- a/humanoid/envs/x1/x1_dh_stand_config.py
+++ b/humanoid/envs/x1/x1_dh_stand_config.py
--- a/humanoid/envs/x1/x1_dh_stand_env.py
+++ b/humanoid/envs/x1/x1_dh_stand_env.py
--- a/humanoid/scripts/export_onnx_dh.py
+++ b/humanoid/scripts/export_onnx_dh.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+from humanoid import LEGGED_GYM_ROOT_DIR
+import os
+from humanoid.envs import *
+from humanoid.utils import  get_args, task_registry
+from datetime import datetime
+import torch
+
+def get_load_path(root, load_run=-1, checkpoint=-1):
+    try:
+        runs = os.listdir(root)
+        runs.sort()
+        if "exported" in runs:
+            runs.remove("exported")
+        last_run = os.path.join(root, runs[-1])
+    except:
+        raise ValueError("No runs in this directory: " + root)
+    if load_run == -1:
+        load_run = last_run
+    else:
+        load_run = os.path.join(root, load_run)
+
+    models = [file for file in os.listdir(load_run)]
+    models.sort(key=lambda m: "{0:0>15}".format(m))
+    model = models[-1]
+
+    load_path = os.path.join(load_run, model)
+    return load_path
+
+def export_onnx(args):
+    env_cfg, train_cfg = task_registry.get_cfgs(name=args.task)
+    # load jit
+    log_root = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', train_cfg.runner.experiment_name, 'exported_policies')
+    model_path = get_load_path(log_root, load_run=args.load_run, checkpoint=args.checkpoint)
+    print("Load model from:", model_path)
+    jit_model = torch.jit.load(model_path)
+    jit_model.eval()
+    
+    current_date_time = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
+    root_path = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', 
+                            train_cfg.runner.experiment_name, 'exported_onnx',
+                            current_date_time)
+    os.makedirs(root_path, exist_ok=True)
+    dir_name = args.task.split('_')[0] + "_policy.onnx"
+    path = os.path.join(root_path, dir_name)
+    example_input = torch.randn(1,env_cfg.env.num_observations)
+    # export onnx model
+    torch.onnx.export(jit_model,               # JIT model
+                    example_input,             # model example input
+                    path,                      # model output path
+                    export_params=True,        # export model params
+                    opset_version=11,          # ONNX opset version
+                    do_constant_folding=True,  # optimize constant variable folding
+                    input_names=['input'],     # model input name
+                    output_names=['output'],   # model output name
+                    )
+    print("Export onnx model to: ", path)
+if __name__ == '__main__':
+    args = get_args()
+    if args.load_run == None:
+        args.load_run = -1
+    export_onnx(args)
\ No newline at end of file
--- a/humanoid/scripts/export_policy_dh.py
+++ b/humanoid/scripts/export_policy_dh.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+from humanoid import LEGGED_GYM_ROOT_DIR
+import os
+import copy
+
+from humanoid.envs import *
+from humanoid.utils import  get_args, task_registry, Logger
+from humanoid.utils.helpers import get_load_path, class_to_dict
+from datetime import datetime
+
+import torch
+from humanoid.algo.ppo import ActorCriticDH
+
+
+class ExportedDH(torch.nn.Module):
+    def __init__(self, actor, long_history, state_estimator, num_short_obs, in_channels, num_proprio_obs):
+        super().__init__()
+        self.actor = copy.deepcopy(actor).cpu()
+        self.long_history = copy.deepcopy(long_history).cpu()
+        self.state_estimator = copy.deepcopy(state_estimator).cpu()
+        self.num_short_obs = num_short_obs
+        self.in_channels = in_channels
+        self.num_proprio_obs = num_proprio_obs
+    
+    def forward(self, observations):
+        short_history = observations[...,-self.num_short_obs:]
+        es_vel = self.state_estimator(short_history)
+        compressed_long_history = self.long_history(observations.view(-1, self.in_channels, self.num_proprio_obs))
+        actor_obs = torch.cat((short_history, es_vel, compressed_long_history),dim=-1)
+        actions_mean = self.actor(actor_obs)
+        return actions_mean
+    
+    def export(self, path):
+        self.to("cpu")
+        traced_script_module = torch.jit.script(self)
+        traced_script_module.save(path)  
+    
+
+def export_policy(args):
+    env_cfg, train_cfg = task_registry.get_cfgs(name=args.task)
+    
+    train_cfg_dict = class_to_dict(train_cfg)
+    policy_cfg = train_cfg_dict["policy"]
+    num_critic_obs = env_cfg.env.num_privileged_obs
+    if env_cfg.terrain.measure_heights:
+        num_critic_obs = env_cfg.env.c_frame_stack * (env_cfg.env.single_num_privileged_obs +env_cfg.terrain.num_height)
+    num_short_obs = env_cfg.env.short_frame_stack * env_cfg.env.num_single_obs
+    actor_critic_class = eval(train_cfg_dict["runner"]["policy_class_name"])
+    actor_critic: ActorCriticDH = actor_critic_class(
+        num_short_obs, env_cfg.env.num_single_obs, num_critic_obs, env_cfg.env.num_actions, **policy_cfg
+    )
+    # load policy
+    log_root_encoder = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', train_cfg.runner.experiment_name, 'exported_data')
+    model_path = get_load_path(log_root_encoder, load_run=args.load_run, checkpoint=args.checkpoint)
+    print("Load model from:", model_path)
+    loaded_dict = torch.load(model_path)
+    actor_critic.load_state_dict(loaded_dict["model_state_dict"])
+    
+    exported_policy = ExportedDH(actor_critic.actor,
+                                 actor_critic.long_history,
+                                 actor_critic.state_estimator,
+                                 num_short_obs,
+                                 policy_cfg["in_channels"],
+                                 env_cfg.env.num_single_obs)
+
+    current_date_time = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
+    
+    root_path = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', 
+                            train_cfg.runner.experiment_name, 'exported_policies',
+                            current_date_time)
+    os.makedirs(root_path, exist_ok=True)
+    dir_name = "policy_dh.jit"
+    path = os.path.join(root_path, dir_name)
+    exported_policy.export(path)
+    print("Export policy to:", path)
+    
+if __name__ == '__main__':
+    EXPORT_POLICY = True
+    args = get_args()
+    if args.load_run == None:
+        args.load_run = -1
+    if args.checkpoint == None:
+        args.checkpoint = -1
+    export_policy(args)
+    
\ No newline at end of file
--- a/humanoid/scripts/play.py
+++ b/humanoid/scripts/play.py
--- a/humanoid/scripts/sim2sim.py
+++ b/humanoid/scripts/sim2sim.py
--- a/humanoid/scripts/train.py
+++ b/humanoid/scripts/train.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+from humanoid.envs import *
+from humanoid.utils import get_args, task_registry
+
+def train(args):
+    env, env_cfg = task_registry.make_env(name=args.task, args=args)
+    ppo_runner, train_cfg, log_dir = task_registry.make_alg_runner(env=env, name=args.task, args=args)
+    ppo_runner.learn(num_learning_iterations=train_cfg.runner.max_iterations, init_at_random_ep_len=False)
+
+if __name__ == '__main__':
+    args = get_args()
+    train(args)
--- a/humanoid/utils/__init__.py
+++ b/humanoid/utils/__init__.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+from .helpers import class_to_dict, get_load_path, get_args, export_policy_as_jit, set_seed, update_class_from_dict
+from .task_registry import task_registry
+from .logger import Logger
+from .math import *
+from .terrain import Terrain
\ No newline at end of file
--- a/humanoid/utils/helpers.py
+++ b/humanoid/utils/helpers.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import os
+import copy
+import torch
+import numpy as np
+import random
+from isaacgym import gymapi
+from isaacgym import gymutil
+
+from humanoid import LEGGED_GYM_ROOT_DIR, LEGGED_GYM_ENVS_DIR
+
+
+def class_to_dict(obj) -> dict:
+    if not hasattr(obj, "__dict__"):
+        return obj
+    result = {}
+    for key in dir(obj):
+        if key.startswith("_"):
+            continue
+        element = []
+        val = getattr(obj, key)
+        if isinstance(val, list):
+            for item in val:
+                element.append(class_to_dict(item))
+        else:
+            element = class_to_dict(val)
+        result[key] = element
+    return result
+
+
+def update_class_from_dict(obj, dict):
+    for key, val in dict.items():
+        attr = getattr(obj, key, None)
+        if isinstance(attr, type):
+            update_class_from_dict(attr, val)
+        else:
+            setattr(obj, key, val)
+    return
+
+
+def set_seed(seed):
+    if seed == -1:
+        seed = np.random.randint(0, 10000)
+    print("Setting seed: {}".format(seed))
+
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    os.environ["PYTHONHASHSEED"] = str(seed)
+    torch.cuda.manual_seed(seed)
+    torch.cuda.manual_seed_all(seed)
+    
+    # For cudnn backend to ensure reproducibility
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+
+
+def parse_sim_params(args, cfg):
+    # code from Isaac Gym Preview 2
+    # initialize sim params
+    sim_params = gymapi.SimParams()
+
+    # set some values from args
+    if args.physics_engine == gymapi.SIM_FLEX:
+        if args.device != "cpu":
+            print("WARNING: Using Flex with GPU instead of PHYSX!")
+    elif args.physics_engine == gymapi.SIM_PHYSX:
+        sim_params.physx.use_gpu = args.use_gpu
+        sim_params.physx.num_subscenes = args.subscenes
+    sim_params.use_gpu_pipeline = args.use_gpu_pipeline
+
+    # if sim options are provided in cfg, parse them and update/override above:
+    if "sim" in cfg:
+        gymutil.parse_sim_config(cfg["sim"], sim_params)
+
+    # Override num_threads if passed on the command line
+    if args.physics_engine == gymapi.SIM_PHYSX and args.num_threads > 0:
+        sim_params.physx.num_threads = args.num_threads
+
+    return sim_params
+
+
+def get_load_path(root, load_run=-1, checkpoint=-1):
+    try:
+        runs = os.listdir(root)
+        runs.sort()
+        if "exported" in runs:
+            runs.remove("exported")
+        last_run = os.path.join(root, runs[-1])
+    except:
+        raise ValueError("No runs in this directory: " + root)
+    if load_run == -1:
+        load_run = last_run
+    else:
+        load_run = os.path.join(root, load_run)
+
+    if checkpoint == -1:
+        models = [file for file in os.listdir(load_run) if "model" in file]
+        models.sort(key=lambda m: "{0:0>15}".format(m))
+        model = models[-1]
+    else:
+        model = "model_{}.pt".format(checkpoint)
+
+    load_path = os.path.join(load_run, model)
+    return load_path
+
+
+def update_cfg_from_args(env_cfg, cfg_train, args):
+    # seed
+    if env_cfg is not None:
+        # num envs
+        if args.num_envs is not None:
+            env_cfg.env.num_envs = args.num_envs
+    if cfg_train is not None:
+        if args.seed is not None:
+            cfg_train.seed = args.seed
+        # alg runner parameters
+        if args.max_iterations is not None:
+            cfg_train.runner.max_iterations = args.max_iterations
+        if args.resume:
+            cfg_train.runner.resume = args.resume
+        if args.experiment_name is not None:
+            cfg_train.runner.experiment_name = args.experiment_name
+        if args.run_name is not None:
+            cfg_train.runner.run_name = args.run_name
+        if args.load_run is not None:
+            cfg_train.runner.load_run = args.load_run
+        if args.checkpoint is not None:
+            cfg_train.runner.checkpoint = args.checkpoint
+
+    return env_cfg, cfg_train
+
+
+def get_args():
+    custom_parameters = [
+        {
+            "name": "--task",
+            "type": str,
+            "default": "XBotL_free",
+            "help": "Resume training or start testing from a checkpoint. Overrides config file if provided.",
+        },
+        {
+            "name": "--resume",
+            "action": "store_true",
+            "default": False,
+            "help": "Resume training from a checkpoint",
+        },
+        {
+            "name": "--experiment_name",
+            "type": str,
+            "help": "Name of the experiment to run or load. Overrides config file if provided.",
+        },
+        {
+            "name": "--run_name",
+            "type": str,
+            "help": "Name of the run. Overrides config file if provided.",
+        },
+        {
+            "name": "--load_run",
+            "type": str,
+            "help": "Name of the run to load when resume=True. If -1: will load the last run. Overrides config file if provided.",
+        },
+        {
+            "name": "--checkpoint",
+            "type": int,
+            "help": "Saved model checkpoint number. If -1: will load the last checkpoint. Overrides config file if provided.",
+        },
+        {
+            "name": "--headless",
+            "action": "store_true",
+            "default": False,
+            "help": "Force display off at all times",
+        },
+        {
+            "name": "--horovod",
+            "action": "store_true",
+            "default": False,
+            "help": "Use horovod for multi-gpu training",
+        },
+        {
+            "name": "--rl_device",
+            "type": str,
+            "default": "cuda:0",
+            "help": "Device used by the RL algorithm, (cpu, gpu, cuda:0, cuda:1 etc..)",
+        },
+        {
+            "name": "--num_envs",
+            "type": int,
+            "help": "Number of environments to create. Overrides config file if provided.",
+        },
+        {
+            "name": "--seed",
+            "type": int,
+            "help": "Random seed. Overrides config file if provided.",
+        },
+        {
+            "name": "--max_iterations",
+            "type": int,
+            "help": "Maximum number of training iterations. Overrides config file if provided.",
+        },
+    ]
+    # parse arguments
+    args = gymutil.parse_arguments(
+        description="RL Policy", custom_parameters=custom_parameters
+    )
+
+    # name allignment
+    args.sim_device_id = args.compute_device_id
+    args.sim_device = args.sim_device_type
+    if args.sim_device == "cuda":
+        args.sim_device += f":{args.sim_device_id}"
+    return args
+
+
+def export_policy_as_jit(actor_critic, path):
+    os.makedirs(path, exist_ok=True)
+    path = os.path.join(path, "policy_1.pt")
+    model = copy.deepcopy(actor_critic.actor).to("cpu")
+    traced_script_module = torch.jit.script(model)
+    traced_script_module.save(path)
--- a/humanoid/utils/logger.py
+++ b/humanoid/utils/logger.py
--- a/humanoid/utils/math.py
+++ b/humanoid/utils/math.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import torch
+from torch import Tensor
+import numpy as np
+from isaacgym.torch_utils import quat_apply, normalize
+from typing import Tuple
+
+# @ torch.jit.script
+def quat_apply_yaw(quat, vec):
+    quat_yaw = quat.clone().view(-1, 4)
+    quat_yaw[:, :2] = 0.
+    quat_yaw = normalize(quat_yaw)
+    return quat_apply(quat_yaw, vec)
+
+# @ torch.jit.script
+def wrap_to_pi(angles):
+    angles %= 2*np.pi
+    angles -= 2*np.pi * (angles > np.pi)
+    return angles
+
+# @ torch.jit.script
+def torch_rand_sqrt_float(lower, upper, shape, device):
+    # type: (float, float, Tuple[int, int], str) -> Tensor
+    r = 2*torch.rand(*shape, device=device) - 1
+    r = torch.where(r<0., -torch.sqrt(-r), torch.sqrt(r))
+    r =  (r + 1.) / 2.
+    return (upper - lower) * r + lower
\ No newline at end of file
--- a/humanoid/utils/task_registry.py
+++ b/humanoid/utils/task_registry.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+import os
+from typing import Tuple
+from datetime import datetime
+
+from humanoid.algo import VecEnv
+from humanoid.algo import DHOnPolicyRunner
+
+from humanoid import LEGGED_GYM_ROOT_DIR, LEGGED_GYM_ENVS_DIR
+from .helpers import get_args, update_cfg_from_args, class_to_dict, get_load_path, set_seed, parse_sim_params
+from humanoid.envs.base.legged_robot_config import LeggedRobotCfg, LeggedRobotCfgPPO
+
+class TaskRegistry():
+    def __init__(self):
+        self.task_classes = {}
+        self.env_cfgs = {}
+        self.train_cfgs = {}
+    
+    def register(self, name: str, task_class: VecEnv, env_cfg: LeggedRobotCfg, train_cfg: LeggedRobotCfgPPO):
+        self.task_classes[name] = task_class
+        self.env_cfgs[name] = env_cfg
+        self.train_cfgs[name] = train_cfg
+    
+    def get_task_class(self, name: str) -> VecEnv:
+        return self.task_classes[name]
+    
+    def get_cfgs(self, name) -> Tuple[LeggedRobotCfg, LeggedRobotCfgPPO]:
+        train_cfg = self.train_cfgs[name]
+        env_cfg = self.env_cfgs[name]
+        # copy seed
+        env_cfg.seed = train_cfg.seed
+        return env_cfg, train_cfg
+    
+    def make_env(self, name, args=None, env_cfg=None) -> Tuple[VecEnv, LeggedRobotCfg]:
+        """ Creates an environment either from a registered namme or from the provided config file.
+
+        Args:
+            name (string): Name of a registered env.
+            args (Args, optional): Isaac Gym comand line arguments. If None get_args() will be called. Defaults to None.
+            env_cfg (Dict, optional): Environment config file used to override the registered config. Defaults to None.
+
+        Raises:
+            ValueError: Error if no registered env corresponds to 'name' 
+
+        Returns:
+            isaacgym.VecTaskPython: The created environment
+            Dict: the corresponding config file
+        """
+        # if no args passed get command line arguments
+        if args is None:
+            args = get_args()
+        # check if there is a registered env with that name
+        if name in self.task_classes:
+            task_class = self.get_task_class(name)
+        else:
+            raise ValueError(f"Task with name: {name} was not registered")
+        if env_cfg is None:
+            # load config files
+            env_cfg, _ = self.get_cfgs(name)
+        # override cfg from args (if specified)
+        env_cfg, _ = update_cfg_from_args(env_cfg, None, args)
+        set_seed(env_cfg.seed)
+        # parse sim params (convert to dict first)
+        sim_params = {"sim": class_to_dict(env_cfg.sim)}
+        sim_params = parse_sim_params(args, sim_params)
+        env = task_class(   cfg=env_cfg,
+                            sim_params=sim_params,
+                            physics_engine=args.physics_engine,
+                            sim_device=args.sim_device,
+                            headless=args.headless)
+        self.env_cfg_for_wandb = env_cfg
+        return env, env_cfg
+
+    def make_alg_runner(self, env, name=None, args=None, train_cfg=None, log_root="default") -> Tuple[DHOnPolicyRunner, LeggedRobotCfgPPO]:
+        """ Creates the training algorithm  either from a registered namme or from the provided config file.
+
+        Args:
+            env (isaacgym.VecTaskPython): The environment to train (TODO: remove from within the algorithm)
+            name (string, optional): Name of a registered env. If None, the config file will be used instead. Defaults to None.
+            args (Args, optional): Isaac Gym comand line arguments. If None get_args() will be called. Defaults to None.
+            train_cfg (Dict, optional): Training config file. If None 'name' will be used to get the config file. Defaults to None.
+            log_root (str, optional): Logging directory for Tensorboard. Set to 'None' to avoid logging (at test time for example). 
+                                      Logs will be saved in <log_root>/<date_time>_<run_name>. Defaults to "default"=<path_to_LEGGED_GYM>/logs/<experiment_name>.
+
+        Raises:
+            ValueError: Error if neither 'name' or 'train_cfg' are provided
+            Warning: If both 'name' or 'train_cfg' are provided 'name' is ignored
+
+        Returns:
+            PPO: The created algorithm
+            Dict: the corresponding config file
+        """
+        # if no args passed get command line arguments
+        if args is None:
+            args = get_args()
+        # if config files are passed use them, otherwise load from the name
+        if train_cfg is None:
+            if name is None:
+                raise ValueError("Either 'name' or 'train_cfg' must be not None")
+            # load config files
+            _, train_cfg = self.get_cfgs(name)
+        else:
+            if name is not None:
+                print(f"'train_cfg' provided -> Ignoring 'name={name}'")
+        # override cfg from args (if specified)
+        _, train_cfg = update_cfg_from_args(None, train_cfg, args)
+
+        current_date_time_str = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
+
+        if log_root=="default":
+            log_root = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', train_cfg.runner.experiment_name, 'exported_data')
+            log_dir = os.path.join(log_root, current_date_time_str + train_cfg.runner.run_name)
+        elif log_root is None:
+            log_dir = None
+        else:
+            log_dir = os.path.join(log_root, current_date_time_str + train_cfg.runner.run_name)
+        
+        train_cfg_dict = class_to_dict(train_cfg)
+        env_cfg_dict = class_to_dict(self.env_cfg_for_wandb)
+        all_cfg = {**train_cfg_dict, **env_cfg_dict}
+        
+        runner_class = eval(train_cfg_dict["runner_class_name"])
+        runner = runner_class(env, all_cfg, log_dir, device=args.rl_device)
+        #save resume path before creating a new log_dir
+        resume = train_cfg.runner.resume
+        if resume:
+            # load previously trained model
+            resume_path = get_load_path(log_root, load_run=train_cfg.runner.load_run, checkpoint=train_cfg.runner.checkpoint)
+            print(f"Loading model from: {resume_path}")
+            runner.load(resume_path, load_optimizer=False)
+        return runner, train_cfg, log_dir
+
+# make global task registry
+task_registry = TaskRegistry()
\ No newline at end of file
--- a/humanoid/utils/terrain.py
+++ b/humanoid/utils/terrain.py
--- a/resources/robots/x1/meshes/arm_r_wrist_a_ball.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_a_ball.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_a_ball_simple.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_a_ball_simple.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_a_loop.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_a_loop.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_b_ball.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_b_ball.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_b_loop.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_b_loop.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_motor_a_link.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_motor_a_link.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_motor_b_link.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_motor_b_link.STL
--- a/resources/robots/x1/meshes/base_link_simple.STL
+++ b/resources/robots/x1/meshes/base_link_simple.STL
--- a/resources/robots/x1/meshes/left_ankle_pitch.STL
+++ b/resources/robots/x1/meshes/left_ankle_pitch.STL
--- a/resources/robots/x1/meshes/left_ankle_roll.STL
+++ b/resources/robots/x1/meshes/left_ankle_roll.STL
--- a/resources/robots/x1/meshes/left_elbow_pitch.STL
+++ b/resources/robots/x1/meshes/left_elbow_pitch.STL
--- a/resources/robots/x1/meshes/left_elbow_yaw.STL
+++ b/resources/robots/x1/meshes/left_elbow_yaw.STL
--- a/resources/robots/x1/meshes/left_hip_pitch.STL
+++ b/resources/robots/x1/meshes/left_hip_pitch.STL
--- a/resources/robots/x1/meshes/left_hip_roll.STL
+++ b/resources/robots/x1/meshes/left_hip_roll.STL
--- a/resources/robots/x1/meshes/left_hip_yaw.STL
+++ b/resources/robots/x1/meshes/left_hip_yaw.STL
--- a/resources/robots/x1/meshes/left_knee_pitch.STL
+++ b/resources/robots/x1/meshes/left_knee_pitch.STL
--- a/resources/robots/x1/meshes/left_shoulder_pitch.STL
+++ b/resources/robots/x1/meshes/left_shoulder_pitch.STL
--- a/resources/robots/x1/meshes/left_shoulder_roll.STL
+++ b/resources/robots/x1/meshes/left_shoulder_roll.STL
--- a/resources/robots/x1/meshes/left_shoulder_yaw.STL
+++ b/resources/robots/x1/meshes/left_shoulder_yaw.STL
--- a/resources/robots/x1/meshes/left_wrist_pitch.STL
+++ b/resources/robots/x1/meshes/left_wrist_pitch.STL
--- a/resources/robots/x1/meshes/left_wrist_roll.STL
+++ b/resources/robots/x1/meshes/left_wrist_roll.STL
--- a/resources/robots/x1/meshes/leg_l_toe_a_ball.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_a_ball.STL
--- a/resources/robots/x1/meshes/leg_l_toe_a_link.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_a_link.STL
--- a/resources/robots/x1/meshes/leg_l_toe_a_loop.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_a_loop.STL
--- a/resources/robots/x1/meshes/leg_l_toe_b_ball.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_b_ball.STL
--- a/resources/robots/x1/meshes/leg_l_toe_b_link.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_b_link.STL
--- a/resources/robots/x1/meshes/leg_l_toe_b_loop.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_b_loop.STL
--- a/resources/robots/x1/meshes/leg_r_toe_a_ball.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_a_ball.STL
--- a/resources/robots/x1/meshes/leg_r_toe_a_link.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_a_link.STL
--- a/resources/robots/x1/meshes/leg_r_toe_a_loop.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_a_loop.STL
--- a/resources/robots/x1/meshes/leg_r_toe_b_ball.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_b_ball.STL
--- a/resources/robots/x1/meshes/leg_r_toe_b_link.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_b_link.STL
--- a/resources/robots/x1/meshes/leg_r_toe_b_loop.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_b_loop.STL
--- a/resources/robots/x1/meshes/lumber_pitch.STL
+++ b/resources/robots/x1/meshes/lumber_pitch.STL
--- a/resources/robots/x1/meshes/lumber_roll.STL
+++ b/resources/robots/x1/meshes/lumber_roll.STL
--- a/resources/robots/x1/meshes/lumber_yaw.STL
+++ b/resources/robots/x1/meshes/lumber_yaw.STL
--- a/resources/robots/x1/meshes/right_ankle_pitch.STL
+++ b/resources/robots/x1/meshes/right_ankle_pitch.STL
--- a/resources/robots/x1/meshes/right_ankle_roll.STL
+++ b/resources/robots/x1/meshes/right_ankle_roll.STL
--- a/resources/robots/x1/meshes/right_elbow_pitch.STL
+++ b/resources/robots/x1/meshes/right_elbow_pitch.STL
--- a/resources/robots/x1/meshes/right_elbow_yaw.STL
+++ b/resources/robots/x1/meshes/right_elbow_yaw.STL
--- a/resources/robots/x1/meshes/right_hand_down_1_link.STL
+++ b/resources/robots/x1/meshes/right_hand_down_1_link.STL
--- a/resources/robots/x1/meshes/right_hand_down_2_link.STL
+++ b/resources/robots/x1/meshes/right_hand_down_2_link.STL
--- a/resources/robots/x1/meshes/right_hand_down_3_link.STL
+++ b/resources/robots/x1/meshes/right_hand_down_3_link.STL
--- a/resources/robots/x1/meshes/right_hand_down_loop1_link.STL
+++ b/resources/robots/x1/meshes/right_hand_down_loop1_link.STL
--- a/resources/robots/x1/meshes/right_hand_down_loop2_link.STL
+++ b/resources/robots/x1/meshes/right_hand_down_loop2_link.STL
--- a/resources/robots/x1/meshes/right_hand_up_1_link.STL
+++ b/resources/robots/x1/meshes/right_hand_up_1_link.STL
--- a/resources/robots/x1/meshes/right_hand_up_2_link.STL
+++ b/resources/robots/x1/meshes/right_hand_up_2_link.STL
--- a/resources/robots/x1/meshes/right_hand_up_3_link.STL
+++ b/resources/robots/x1/meshes/right_hand_up_3_link.STL
--- a/resources/robots/x1/meshes/right_hand_up_loop1_link.STL
+++ b/resources/robots/x1/meshes/right_hand_up_loop1_link.STL
--- a/resources/robots/x1/meshes/right_hand_up_loop2_link.STL
+++ b/resources/robots/x1/meshes/right_hand_up_loop2_link.STL
--- a/resources/robots/x1/meshes/right_hip_pitch.STL
+++ b/resources/robots/x1/meshes/right_hip_pitch.STL
--- a/resources/robots/x1/meshes/right_hip_roll.STL
+++ b/resources/robots/x1/meshes/right_hip_roll.STL
--- a/resources/robots/x1/meshes/right_hip_yaw.STL
+++ b/resources/robots/x1/meshes/right_hip_yaw.STL
--- a/resources/robots/x1/meshes/right_knee_pitch.STL
+++ b/resources/robots/x1/meshes/right_knee_pitch.STL
--- a/resources/robots/x1/meshes/right_shoulder_pitch.STL
+++ b/resources/robots/x1/meshes/right_shoulder_pitch.STL
--- a/resources/robots/x1/meshes/right_shoulder_roll.STL
+++ b/resources/robots/x1/meshes/right_shoulder_roll.STL
--- a/resources/robots/x1/meshes/right_shoulder_yaw.STL
+++ b/resources/robots/x1/meshes/right_shoulder_yaw.STL
--- a/resources/robots/x1/meshes/right_wrist_pitch.STL
+++ b/resources/robots/x1/meshes/right_wrist_pitch.STL
--- a/resources/robots/x1/meshes/right_wrist_roll.STL
+++ b/resources/robots/x1/meshes/right_wrist_roll.STL
--- a/resources/robots/x1/meshes/waist_motor_a_ball.STL
+++ b/resources/robots/x1/meshes/waist_motor_a_ball.STL
--- a/resources/robots/x1/meshes/waist_motor_a_link.STL
+++ b/resources/robots/x1/meshes/waist_motor_a_link.STL
--- a/resources/robots/x1/meshes/waist_motor_a_loop.STL
+++ b/resources/robots/x1/meshes/waist_motor_a_loop.STL
--- a/resources/robots/x1/meshes/waist_motor_b_ball.STL
+++ b/resources/robots/x1/meshes/waist_motor_b_ball.STL
--- a/resources/robots/x1/meshes/waist_motor_b_link.STL
+++ b/resources/robots/x1/meshes/waist_motor_b_link.STL
--- a/resources/robots/x1/meshes/waist_motor_b_loop.STL
+++ b/resources/robots/x1/meshes/waist_motor_b_loop.STL
--- a/resources/robots/x1/mjcf/environment/flat.xml
+++ b/resources/robots/x1/mjcf/environment/flat.xml
+<mujoco model="flat">
+  <statistic center="0 0 0.55" extent="1.1"/>
+
+  <visual>
+    <headlight diffuse="0.6 0.6 0.6" ambient="0.3 0.3 0.3" specular="0 0 0"/>
+    <rgba haze="0.15 0.25 0.35 1"/>
+    <global azimuth="150" elevation="-20"/>
+  </visual>
+  <visual>
+    <rgba com="0.502 1.0 0 0.5" contactforce="0.98 0.4 0.4 1" contactpoint="1.0 1.0 0.6 0.4"/>
+    <scale com="1" forcewidth="0.03" contactwidth="0.01" contactheight="0.02" framewidth="0.05" framelength="0.6"/>
+    <map force="0.005"/>
+  </visual>
+  <asset>
+    <texture name="skybox" type="skybox" builtin="gradient" rgb1="0.2 0.3 0.4" rgb2="0 0 0" width="1000" height="1000" mark="random" random="0.001" markrgb="1 1 1"/>
+    <texture type="2d" name="groundplane" builtin="checker" mark="edge" rgb1="0.2 0.3 0.4" rgb2="0.1 0.2 0.3" markrgb="0.8 0.8 0.8" width="1000" height="1000"/>
+    <material name="groundplane" texture="groundplane" texuniform="true" texrepeat="5 5" reflectance="0.2"/>
+  </asset>
+
+  <worldbody>
+    <light pos="0 0 10" dir="0 0 -1" directional="true"/>
+    <geom name="floor" size="0 3 .125" type="plane" material="groundplane" conaffinity="7" condim="3" friction="1"/>
+  </worldbody>
+</mujoco>
--- a/resources/robots/x1/mjcf/robot/xyber_x1/xyber_x1_serial.xml
+++ b/resources/robots/x1/mjcf/robot/xyber_x1/xyber_x1_serial.xml
--- a/resources/robots/x1/mjcf/xyber_x1_flat.xml
+++ b/resources/robots/x1/mjcf/xyber_x1_flat.xml
--- a/resources/robots/x1/urdf/x1.urdf
+++ b/resources/robots/x1/urdf/x1.urdf
--- a/setup.py
+++ b/setup.py