first commit

e6651b9a · richie.li · e6651b9a · e6651b9a · e6651b9a · e6651b9a
Commit e6651b9a authored Oct 23, 2024 by richie.li
--- a/.gitignore
+++ b/.gitignore
+# These are some examples of commonly ignored file patterns.
+# You should customize this list as applicable to your project.
+# Learn more about .gitignore:
+#     https://www.atlassian.com/git/tutorials/saving-changes/gitignore
+
+# Node artifact files
+node_modules/
+dist/
+
+# Compiled Java class files
+*.class
+
+# Compiled Python bytecode
+*.py[cod]
+
+# Log files
+*.log
+
+# Package files
+*.jar
+
+# Maven
+target/
+dist/
+
+# JetBrains IDE
+.idea/
+
+# Unit test reports
+TEST*.xml
+
+# Generated by MacOS
+.DS_Store
+
+# Generated by Windows
+Thumbs.db
+
+# Applications
+*.app
+*.exe
+*.war
+
+# Large media files
+*.mp4
+*.tiff
+*.avi
+*.flv
+*.mov
+*.wmv
+
+# VS Code
+.vscode
+# logs
+logs
+runs
+
+# other
+*.egg-info
+__pycache__
+
+*.swp
+
+MUJOCO_LOG.TXT
\ No newline at end of file
--- a/README.md
+++ b/README.md
+English | [中文](README.zh_CN.md)
+## Introduction
+[AgiBot X1](https://www.zhiyuan-robot.com/qzproduct/169.html) is a modular humanoid robot with high dof developed and open-sourced by AgiBot. It is built upon AgiBot's open-source framework `AimRT` as middleware and using reinforcement learning for locomotion control.
+
+This project is about the reinforcement learning training code used by AgiBot X1. It can be used in conjunction with the [inference software](https://aimrt.org/) provided with AgiBot X1 for real-robot and simulated walking debugging, or be imported to other robot models for training.
+![](doc/id.jpg)
+
+## Start
+
+### Install Dependencies
+1. Create a new Python 3.8 virtual environment:
+   - `conda create -n myenv python=3.8`.
+2. Install pytorch 1.13 and cuda-11.7:
+   - `conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia`
+3. Install numpy-1.23:
+   - `conda install numpy=1.23`.
+4. Install Isaac Gym:
+   - Download and install Isaac Gym Preview 4 from https://developer.nvidia.com/isaac-gym.
+   - `cd isaacgym/python && pip install -e .`
+   - Run an example with `cd examples && python 1080_balls_of_solitude.py`.
+   - Consult `isaacgym/docs/index.html` for troubleshooting.
+6. Install the training code dependencies:
+   - Clone this repository.
+   - `pip install -e .`
+### Usage
+#### Train:
+```python scripts/train.py --task=x1_dh_stand --run_name=<run_name> --headless```
+- The trained model will be saved in `/log/<experiment_name>/exported_data/<date_time><run_name>/model_<iteration>.pt`, where `<experiment_name>` is defined in the config file.
+![](doc/train.gif)
+
+#### Play:
+```python /scripts/play.py --task=x1_dh_stand --load_run=<date_time><run_name>```
+![](doc/play.gif)
+#### Generate the JIT Model:
+``` python scripts/export_policy_dh.py --task=x1_dh_stand --load_run=<date_time><run_name>  ```
+- The JIT model will be saved in ``` log/exported_policies/<date_time>```
+
+#### Generate the ONNX Model:
+``` python scripts/export_onnx_dh.py --task=x1_dh_stand --load_run=<date_time>  ```
+- The ONNX model will be saved at ```log/exported_policies/<date_time>```
+
+#### Parameter Descriptions:
+- task: Task name
+- resume: Resume training from a checkpoint
+- experiment_name:  Name of the experiment to run or load.
+- run_name: Name of the run.
+- load_run: Name of the run to load when resume=True. If -1: will load the last run.
+- checkpoint: Saved model checkpoint number. If -1: will load the last checkpoint.
+- num_envs: Number of environments to create.
+- seed: Random seed.
+- max_iterations: Maximum number of training iterations.
+
+### Add New Environments
+1. Create a new folder under the `envs/` directory, and then create a configuration file `<your_env>_config.py` and an environment file `<your_env>_env.py` in the folder. The two files should inherit `LeggedRobotCfg` and `LeggedRobot` respectively.
+
+2. Place the URDF, mesh, and MJCF files of the new robot in the `resources/` folder.
+- Configure the URDF path, PD gain, body name, default_joint_angles, experiment_name, etc., for the new robot in `<your_env>_config.py`.
+
+3. Register the new robot in `humanoid/envs/__init__.py`.
+### sim2sim
+Use Mujoco for sim2sim validation:
+  ```
+  python scripts/sim2sim.py --task=x1_dh_stand --load_model /path/to/exported_policies/
+  ```
+![](doc/mujoco.gif)
+### Usage of Joystick
+We use the Logitech F710 Joystick. When starting play.py and sim2sim.py, press and hold button 4 while rotating the joystick to control the robot to move forward/backward, strafe left/right or rotate.
+![](doc/joy_map.jpg)
+|         Button           |         Command         |
+| -------------------- |:--------------------:|
+|         4 + 1-        |         Move forward          |
+|         4 + 1+        |         Move backward          |
+|         4 + 0-        |        Strafe left         |
+|         4 + 0+        |        Strafe right         |
+|         4 + 3-        |       Rotate counterclockwise       |
+|         4 + 3+        |       Rotate clockwise       |
+
+
+## Directory Structure
+```
+.
+|— humanoid           # Main code directory
+|  |—algo             # Algorithm directory
+|  |—envs             # Environment directory
+|  |—scripts          # Script directory
+|  |—utilis           # Utility and function directory
+|— logs               # Model directory
+|— resources          # Resource library
+|  |— robots          # Robot urdf, mjcf, mesh
+|— README.md          # README document
+```
+
+> References
+> * [GitHub - leggedrobotics/legged_gym: Isaac Gym Environments for Legged Robots](https://github.com/leggedrobotics/legged_gym)
+> * [GitHub - leggedrobotics/rsl_rl: Fast and simple implementation of RL algorithms, designed to run fully on GPU.](https://github.com/leggedrobotics/rsl_rl)
+> * [GitHub - roboterax/humanoid-gym: Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer https://arxiv.org/abs/2404.05695](https://github.com/roboterax/humanoid-gym)
+
--- a/README.zh_CN.md
+++ b/README.zh_CN.md
+[English](README.md) | 中文
+
+## 简介
+[智元灵犀X1](https://www.zhiyuan-robot.com/qzproduct/169.html) 是由智元研发并开源的模块化、高自由度人形机器人，X1的软件系统基于智元开源组件 `AimRT` 作为中间件实现，并且采用强化学习方法进行运动控制。
+
+本工程为智元灵犀X1所使用的强化学习训练代码，可配合智元灵犀X1配套的[推理软件](https://aimrt.org/)进行真机和仿真的行走调试，或导入其他机器人模型进行训练。
+![](doc/id.jpg)
+
+## 代码运行
+
+### 安装依赖
+1. 创建一个新的python3.8虚拟环境:
+   - `conda create -n myenv python=3.8`.
+2. 安装 pytorch 1.13 和 cuda-11.7:
+   - `conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia`
+3. 安装 numpy-1.23:
+   - `conda install numpy=1.23`.
+4. 安装 Isaac Gym:
+   - 下载并安装 Isaac Gym Preview 4  https://developer.nvidia.com/isaac-gym.
+   - `cd isaacgym/python && pip install -e .`
+   - Run an example with `cd examples && python 1080_balls_of_solitude.py`.
+   - Consult `isaacgym/docs/index.html` for troubleshooting.
+6. 安装训练代码依赖：
+   - Clone this repository.
+   - `pip install -e .`
+
+### 使用
+#### Train:
+```python scripts/train.py --task=x1_dh_stand --run_name=<run_name> --headless```
+- 训练好的模型会存`/log/<experiment_name>/exported_data/<date_time><run_name>/model_<iteration>.pt` 其中 `<experiment_name>` 在config文件中定义.
+![](doc/train.gif)
+
+#### Play:
+```python /scripts/play.py --task=x1_dh_stand --load_run=<date_time><run_name>```
+![](doc/play.gif)
+
+#### 生成jit模型:
+``` python scripts/export_policy_dh.py --task=x1_dh_stand --load_run=<date_time><run_name>  ```
+- jit模型会存在 ``` log/exported_policies/<date_time>```
+
+#### 生成onnx模型:
+``` python scripts/export_onnx_dh.py --task=x1_dh_stand --load_run=<date_time>  ```
+- onnx模型会存在 ```log/exported_policies/<date_time>```
+
+#### 参数说明：
+- task: Task name
+- resume: Resume training from a checkpoint
+- experiment_name:  Name of the experiment to run or load.
+- run_name: Name of the run.
+- load_run: Name of the run to load when resume=True. If -1: will load the last run.
+- checkpoint: Saved model checkpoint number. If -1: will load the last checkpoint.
+- num_envs: Number of environments to create.
+- seed: Random seed.
+- max_iterations: Maximum number of training iterations.
+
+### 添加新环境
+1.在 `envs/`目录下创建一个新文件夹，在新文件夹下创建一个配置文件`<your_env>_config.py`和环境文件`<your_env>_env.py`，这两个文件要分别继承`LeggedRobotCfg`和`LeggedRobot`
+
+2.将新机器的urdf, mesh, mjcf放到 `resources/`文件夹下
+- 在`<your_env>_config.py`里配置新机器的urdf path，PD gain，body name, default_joint_angles, experiment_name等
+
+3.在`humanoid/envs/__init__.py`里注册你的新机器
+
+### sim2sim
+使用mujoco来进行sim2sim验证：
+  ```
+  python scripts/sim2sim.py --task=x1_dh_stand --load_model /path/to/exported_policies/
+  ```
+![](doc/mujoco.gif)
+
+### 手柄使用
+我们使用Logitech f710手柄，在启动play.py和sim2sim.py时，按住4的同时转动摇杆可以控制机器人前后，左右和旋转。
+![](doc/joy_map.jpg)
+|         按键          |         命令         |
+| -------------------- |:--------------------:|
+|         4 + 1-       |         前进          |
+|         4 + 1+       |         后退          |
+|         4 + 0-       |        左平移         |
+|         4 + 0+       |        右平移         |
+|         4 + 3-       |       逆时针旋转       |
+|         4 + 3+       |       顺时针旋转       |
+
+
+## 目录结构
+```
+.
+|— humanoid           # 主要代码目录
+|  |—algo             # 算法目录
+|  |—envs             # 环境目录
+|  |—scripts          # 脚本目录
+|  |—utilis           # 工具、功能目录
+|— logs               # 模型目录
+|— resources          # 资源库
+|  |— robots          # 机器人urdf, mjcf, mesh
+|— README.md          # 说明文档
+```
+
+
+
+> 参考项目:
+>
+> * [GitHub - leggedrobotics/legged_gym: Isaac Gym Environments for Legged Robots](https://github.com/leggedrobotics/legged_gym)
+> * [GitHub - leggedrobotics/rsl_rl: Fast and simple implementation of RL algorithms, designed to run fully on GPU.](https://github.com/leggedrobotics/rsl_rl)
+> * [GitHub - roboterax/humanoid-gym: Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer https://arxiv.org/abs/2404.05695](https://github.com/roboterax/humanoid-gym)
+
+
+
--- a/doc/id.jpg
+++ b/doc/id.jpg
--- a/doc/joy_map.jpg
+++ b/doc/joy_map.jpg
--- a/doc/mujoco.gif
+++ b/doc/mujoco.gif
--- a/doc/play.gif
+++ b/doc/play.gif
--- a/doc/train.gif
+++ b/doc/train.gif
--- a/humanoid/__init__.py
+++ b/humanoid/__init__.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import os
+
+LEGGED_GYM_ROOT_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
+LEGGED_GYM_ENVS_DIR = os.path.join(LEGGED_GYM_ROOT_DIR, 'humanoid', 'envs')
--- a/humanoid/algo/__init__.py
+++ b/humanoid/algo/__init__.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+from .vec_env import VecEnv
+from .ppo import *
\ No newline at end of file
--- a/humanoid/algo/ppo/__init__.py
+++ b/humanoid/algo/ppo/__init__.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+from .dh_ppo import DHPPO
+from .dh_on_policy_runner import DHOnPolicyRunner
+from .actor_critic_dh import ActorCriticDH
+from .rollout_storage import RolloutStorage
--- a/humanoid/algo/ppo/actor_critic_dh.py
+++ b/humanoid/algo/ppo/actor_critic_dh.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import torch
+import torch.nn as nn
+from torch.distributions import Normal
+
+class ActorCriticDH(nn.Module):
+    def __init__(self,  num_short_obs,
+                        num_proprio_obs,
+                        num_critic_obs,
+                        num_actions,
+                        actor_hidden_dims=[256, 256, 256],
+                        critic_hidden_dims=[256, 256, 256],
+                        state_estimator_hidden_dims=[256, 128, 64],
+                        in_channels = 66,
+                        kernel_size=[6, 4],
+                        filter_size=[32, 16],
+                        stride_size=[3, 2],
+                        lh_output_dim=64,
+                        init_noise_std=1.0,
+                        activation = nn.ELU(),
+                        **kwargs):
+        if kwargs:
+            print("ActorCriticDH.__init__ got unexpected arguments, which will be ignored: " + str([key for key in kwargs.keys()]))
+        super(ActorCriticDH, self).__init__()
+
+        
+        # define actor net and critic net
+        # self.num_short_obs = int(cfg.env.num_single_obs * cfg.env.short_frame_stack), 5 history
+        # lh_output_dim is cnn output
+        # 3 is state estimator output
+        mlp_input_dim_a = num_short_obs + lh_output_dim + 3
+        # num_privileged_obs = int(c_frame_stack * single_num_privileged_obs), 3 history
+        mlp_input_dim_c = num_critic_obs
+
+        # Policy
+        actor_layers = []
+        actor_layers.append(nn.Linear(mlp_input_dim_a, actor_hidden_dims[0]))
+        actor_layers.append(activation)
+        for l in range(len(actor_hidden_dims)):
+            if l == len(actor_hidden_dims) - 1:
+                # num_actions policy output(12)
+                actor_layers.append(nn.Linear(actor_hidden_dims[l], num_actions))
+            else:
+                actor_layers.append(nn.Linear(actor_hidden_dims[l], actor_hidden_dims[l + 1]))
+                actor_layers.append(activation)
+        self.actor = nn.Sequential(*actor_layers)
+
+        # Value function
+        critic_layers = []
+        critic_layers.append(nn.Linear(mlp_input_dim_c, critic_hidden_dims[0]))
+        critic_layers.append(activation)
+        for l in range(len(critic_hidden_dims)):
+            if l == len(critic_hidden_dims) - 1:
+                critic_layers.append(nn.Linear(critic_hidden_dims[l], 1))
+            else:
+                critic_layers.append(nn.Linear(critic_hidden_dims[l], critic_hidden_dims[l + 1]))
+                critic_layers.append(activation)
+        self.critic = nn.Sequential(*critic_layers)
+
+        print(f"Actor MLP: {self.actor}")
+        print(f"Critic MLP: {self.critic}")
+
+        # Action noise
+        self.std = nn.Parameter(init_noise_std * torch.ones(num_actions))
+        self.distribution = None
+        # disable args validation for speedup
+        Normal.set_default_validate_args = False
+        
+        #define long_history CNN
+        long_history_layers = []
+        self.in_channels = in_channels
+        cnn_output_dim = num_proprio_obs
+        for out_channels, kernel_size, stride_size in zip(filter_size, kernel_size, stride_size):
+            long_history_layers.append(nn.Conv1d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride_size))
+            long_history_layers.append(nn.ReLU())
+            cnn_output_dim = (cnn_output_dim - kernel_size + stride_size) // stride_size
+            in_channels = out_channels
+        cnn_output_dim *= out_channels
+        long_history_layers.append(nn.Flatten())
+        long_history_layers.append(nn.Linear(cnn_output_dim, 128))
+        long_history_layers.append(nn.ELU())
+        long_history_layers.append(nn.Linear(128, lh_output_dim))
+        self.long_history = nn.Sequential(*long_history_layers)
+        print(f"long_history CNN: {self.long_history}")
+        
+        #define state_estimator MLP
+        # self.num_short_obs = int(cfg.env.num_single_obs * cfg.env.short_frame_stack), 5 history
+        self.num_short_obs = num_short_obs
+        state_estimator_input_dim = num_short_obs
+        state_estimator_output_dim = 3
+        state_estimator_layers = []
+        state_estimator_layers.append(nn.Linear(state_estimator_input_dim, state_estimator_hidden_dims[0]))
+        state_estimator_layers.append(activation)
+        for l in range(len(state_estimator_hidden_dims)):
+            if l == len(state_estimator_hidden_dims) - 1:
+                state_estimator_layers.append(nn.Linear(state_estimator_hidden_dims[l], state_estimator_output_dim))
+            else:
+                state_estimator_layers.append(nn.Linear(state_estimator_hidden_dims[l], state_estimator_hidden_dims[l + 1]))
+                state_estimator_layers.append(activation)
+        self.state_estimator = nn.Sequential(*state_estimator_layers)
+        print(f"state_estimator MLP: {self.state_estimator}")
+        
+        self.num_proprio_obs = num_proprio_obs
+
+    @staticmethod
+    # not used at the moment
+    def init_weights(sequential, scales):
+        [torch.nn.init.orthogonal_(module.weight, gain=scales[idx]) for idx, module in
+         enumerate(mod for mod in sequential if isinstance(mod, nn.Linear))]
+
+    def reset(self, dones=None):
+        pass
+
+    def forward(self):
+        raise NotImplementedError
+    
+    @property
+    def action_mean(self):
+        return self.distribution.mean
+
+    @property
+    def action_std(self):
+        return self.distribution.stddev
+    
+    @property
+    def entropy(self):
+        return self.distribution.entropy().sum(dim=-1)
+
+    def update_distribution(self, observations):
+        mean = self.actor(observations)
+        self.distribution = Normal(mean, mean*0. + self.std)
+
+    def act(self, observations, **kwargs):
+        short_history = observations[...,-self.num_short_obs:]
+        es_vel = self.state_estimator(short_history)
+        compressed_long_history = self.long_history(observations.view(-1, self.in_channels, self.num_proprio_obs))
+        actor_obs = torch.cat((short_history, es_vel, compressed_long_history),dim=-1)
+        self.update_distribution(actor_obs)
+        return self.distribution.sample()
+    
+    def get_actions_log_prob(self, actions):
+        return self.distribution.log_prob(actions).sum(dim=-1)
+
+    def act_inference(self, observations):
+        short_history = observations[...,-self.num_short_obs:]
+        es_vel = self.state_estimator(short_history)
+        compressed_long_history = self.long_history(observations.view(-1, self.in_channels, self.num_proprio_obs))
+        actor_obs = torch.cat((short_history, es_vel, compressed_long_history),dim=-1)
+        actions_mean = self.actor(actor_obs)
+        return actions_mean
+
+    def evaluate(self, critic_observations, **kwargs):
+        value = self.critic(critic_observations)
+        return value
\ No newline at end of file
--- a/humanoid/algo/ppo/dh_on_policy_runner.py
+++ b/humanoid/algo/ppo/dh_on_policy_runner.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import os
+import time
+import torch
+import wandb
+import statistics
+from collections import deque
+from datetime import datetime
+from .dh_ppo import DHPPO
+from .actor_critic_dh import ActorCriticDH
+from humanoid.algo.vec_env import VecEnv
+from torch.utils.tensorboard import SummaryWriter
+
+class DHOnPolicyRunner:
+
+    def __init__(self, env: VecEnv, train_cfg, log_dir=None, device="cpu"):
+
+        self.cfg = train_cfg["runner"]
+        self.alg_cfg = train_cfg["algorithm"]
+        self.policy_cfg = train_cfg["policy"]
+        self.all_cfg = train_cfg
+        self.wandb_run_name = (
+            datetime.now().strftime("%b%d_%H-%M-%S")
+            + "_"
+            + train_cfg["runner"]["experiment_name"]
+            + "_"
+            + train_cfg["runner"]["run_name"]
+        )
+        self.device = device
+        self.env = env
+        if self.env.num_privileged_obs is not None:
+            num_critic_obs = self.env.num_privileged_obs
+        else:
+            num_critic_obs = self.env.num_obs
+        if self.env.cfg.terrain.measure_heights:
+            num_critic_obs = self.env.cfg.env.c_frame_stack * (self.env.cfg.env.single_num_privileged_obs + self.env.cfg.terrain.num_height)
+        actor_critic_class = eval(self.cfg["policy_class_name"])  # ActorCritic
+        actor_critic: ActorCriticDH = actor_critic_class(
+            self.env.num_short_obs, self.env.num_single_obs, num_critic_obs, self.env.num_actions, **self.policy_cfg
+        ).to(self.device)
+        
+        alg_class = eval(self.cfg["algorithm_class_name"])  # PPO
+        self.alg: DHPPO = alg_class(actor_critic, device=self.device, **self.alg_cfg)
+        self.num_steps_per_env = self.cfg["num_steps_per_env"]
+        self.save_interval = self.cfg["save_interval"]
+
+        # init storage and model
+        self.alg.init_storage(
+            self.env.num_envs,
+            self.num_steps_per_env,
+            [self.env.num_obs],
+            [num_critic_obs],
+            [self.env.num_actions],
+        )
+
+        # Log
+        self.log_dir = log_dir
+        self.writer = None
+        self.tot_timesteps = 0
+        self.tot_time = 0
+        self.current_learning_iteration = 0
+        self.it = 0
+
+        _, _ = self.env.reset()
+
+    def learn(self, num_learning_iterations, init_at_random_ep_len=False):
+        # initialize writer
+        if self.log_dir is not None and self.writer is None:
+            self.writer = SummaryWriter(log_dir=self.log_dir, flush_secs=10)
+        if init_at_random_ep_len:
+            self.env.episode_length_buf = torch.randint_like(
+                self.env.episode_length_buf, high=int(self.env.max_episode_length)
+            )
+        obs = self.env.get_observations()
+        privileged_obs = self.env.get_privileged_observations()
+        critic_obs = privileged_obs if privileged_obs is not None else obs
+        obs, critic_obs = obs.to(self.device), critic_obs.to(self.device)
+        self.alg.actor_critic.train()  # switch to train mode (for dropout for example)
+
+        ep_infos = []
+        rewbuffer = deque(maxlen=100)
+        lenbuffer = deque(maxlen=100)
+        cur_reward_sum = torch.zeros(
+            self.env.num_envs, dtype=torch.float, device=self.device
+        )
+        cur_episode_length = torch.zeros(
+            self.env.num_envs, dtype=torch.float, device=self.device
+        )
+
+        tot_iter = self.current_learning_iteration + num_learning_iterations
+
+        for it in range(self.current_learning_iteration, tot_iter):
+            self.it = it
+            start = time.time()
+
+            obs_std, obs_mean = torch.std_mean(obs, dim=0)
+            # Rollout
+            with torch.inference_mode():
+                for i in range(self.num_steps_per_env):
+                    actions = self.alg.act(obs, critic_obs)
+                    obs, privileged_obs, rewards, dones, infos = self.env.step(actions)
+                    critic_obs = privileged_obs if privileged_obs is not None else obs
+                    obs, critic_obs, rewards, dones = (
+                        obs.to(self.device),
+                        critic_obs.to(self.device),
+                        rewards.to(self.device),
+                        dones.to(self.device),
+                    )
+                    self.alg.process_env_step(rewards, dones, infos)
+
+                    if self.log_dir is not None:
+                        # Book keeping
+                        if "episode" in infos:
+                            ep_infos.append(infos["episode"])
+                        cur_reward_sum += rewards
+                        cur_episode_length += 1
+                        new_ids = (dones > 0).nonzero(as_tuple=False)
+                        rewbuffer.extend(
+                            cur_reward_sum[new_ids][:, 0].cpu().numpy().tolist()
+                        )
+                        lenbuffer.extend(
+                            cur_episode_length[new_ids][:, 0].cpu().numpy().tolist()
+                        )
+                        cur_reward_sum[new_ids] = 0
+                        cur_episode_length[new_ids] = 0
+
+                stop = time.time()
+                collection_time = stop - start
+
+                # Learning step
+                start = stop
+                self.alg.compute_returns(critic_obs)
+
+            mean_value_loss, mean_surrogate_loss, mean_state_estimator_loss = self.alg.update()
+            stop = time.time()
+            learn_time = stop - start
+            if self.log_dir is not None:
+                self.log(locals())
+            if it % self.save_interval == 0:
+                self.save(os.path.join(self.log_dir, "model_{}.pt".format(it)))
+            ep_infos.clear()
+
+        self.current_learning_iteration += num_learning_iterations
+        self.save(
+            os.path.join(
+                self.log_dir, "model_{}.pt".format(self.current_learning_iteration)
+            )
+        )
+
+    def log(self, locs, width=80, pad=35):
+        self.tot_timesteps += self.num_steps_per_env * self.env.num_envs
+        self.tot_time += locs["collection_time"] + locs["learn_time"]
+        iteration_time = locs["collection_time"] + locs["learn_time"]
+
+        ep_string = f""
+        if locs["ep_infos"]:
+            for key in locs["ep_infos"][0]:
+                infotensor = torch.tensor([], device=self.device)
+                for ep_info in locs["ep_infos"]:
+                    # handle scalar and zero dimensional tensor infos
+                    if not isinstance(ep_info[key], torch.Tensor):
+                        ep_info[key] = torch.Tensor([ep_info[key]])
+                    if len(ep_info[key].shape) == 0:
+                        ep_info[key] = ep_info[key].unsqueeze(0)
+                    infotensor = torch.cat((infotensor, ep_info[key].to(self.device)))
+                value = torch.mean(infotensor)
+                self.writer.add_scalar("Episode/" + key, value, locs["it"])
+                ep_string += f"""{f'Mean episode {key}:':>{pad}} {value:.4f}\n"""
+        mean_std = self.alg.actor_critic.std.mean()
+        fps = int(
+            self.num_steps_per_env
+            * self.env.num_envs
+            / (locs["collection_time"] + locs["learn_time"])
+        )
+
+        self.writer.add_scalar(
+            "Loss/value_function", locs["mean_value_loss"], locs["it"]
+        )
+        self.writer.add_scalar(
+            "Loss/surrogate", locs["mean_surrogate_loss"], locs["it"]
+        )
+        self.writer.add_scalar(
+            "Loss/state_estimator", locs["mean_state_estimator_loss"], locs["it"]
+        )
+        self.writer.add_scalar("Loss/learning_rate", self.alg.learning_rate, locs["it"])
+        self.writer.add_scalar("Policy/mean_noise_std", mean_std.item(), locs["it"])
+        self.writer.add_scalar("Perf/total_fps", fps, locs["it"])
+        self.writer.add_scalar(
+            "Perf/collection time", locs["collection_time"], locs["it"]
+        )
+        self.writer.add_scalar("Perf/learning_time", locs["learn_time"], locs["it"])
+        if len(locs["rewbuffer"]) > 0:
+            self.writer.add_scalar(
+                "Train/mean_reward", statistics.mean(locs["rewbuffer"]), locs["it"]
+            )
+            self.writer.add_scalar(
+                "Train/mean_episode_length",
+                statistics.mean(locs["lenbuffer"]),
+                locs["it"],
+            )
+            self.writer.add_scalar(
+                "Train/mean_reward/time",
+                statistics.mean(locs["rewbuffer"]),
+                self.tot_time,
+            )
+            self.writer.add_scalar(
+                "Train/mean_episode_length/time",
+                statistics.mean(locs["lenbuffer"]),
+                self.tot_time,
+            )
+
+
+        for i in range(47):
+            self.writer.add_scalar('Observation/obs_mean_'+str(i), locs['obs_mean'][i], locs['it'])
+            self.writer.add_scalar('Observation/obs_std_'+str(i), locs['obs_std'][i], locs['it'])
+
+        logstr = f" \033[1m Learning iteration {locs['it']}/{self.current_learning_iteration + locs['num_learning_iterations']} \033[0m "
+
+        if len(locs["rewbuffer"]) > 0:
+            log_string = (
+                f"""{'#' * width}\n"""
+                f"""{logstr.center(width, ' ')}\n\n"""
+                f"""{'Computation:':>{pad}} {fps:.0f} steps/s (collection: {locs[
+                            'collection_time']:.3f}s, learning {locs['learn_time']:.3f}s)\n"""
+                f"""{'Value function loss:':>{pad}} {locs['mean_value_loss']:.4f}\n"""
+                f"""{'Surrogate loss:':>{pad}} {locs['mean_surrogate_loss']:.4f}\n"""
+                f"""{'State estimator loss:':>{pad}} {locs['mean_state_estimator_loss']:.4f}\n"""
+                f"""{'Mean action noise std:':>{pad}} {mean_std.item():.2f}\n"""
+                f"""{'Mean reward:':>{pad}} {statistics.mean(locs['rewbuffer']):.2f}\n"""
+                f"""{'Mean episode length:':>{pad}} {statistics.mean(locs['lenbuffer']):.2f}\n"""
+            )
+            #   f"""{'Mean reward/step:':>{pad}} {locs['mean_reward']:.2f}\n"""
+            #   f"""{'Mean episode length/episode:':>{pad}} {locs['mean_trajectory_length']:.2f}\n""")
+        else:
+            log_string = (
+                f"""{'#' * width}\n"""
+                f"""{logstr.center(width, ' ')}\n\n"""
+                f"""{'Computation:':>{pad}} {fps:.0f} steps/s (collection: {locs[
+                            'collection_time']:.3f}s, learning {locs['learn_time']:.3f}s)\n"""
+                f"""{'Value function loss:':>{pad}} {locs['mean_value_loss']:.4f}\n"""
+                f"""{'Surrogate loss:':>{pad}} {locs['mean_surrogate_loss']:.4f}\n"""
+                f"""{'Mean action noise std:':>{pad}} {mean_std.item():.2f}\n"""
+            )
+            #   f"""{'Mean reward/step:':>{pad}} {locs['mean_reward']:.2f}\n"""
+            #   f"""{'Mean episode length/episode:':>{pad}} {locs['mean_trajectory_length']:.2f}\n""")
+
+        log_string += ep_string
+        log_string += (
+            f"""{'-' * width}\n"""
+            f"""{'Total timesteps:':>{pad}} {self.tot_timesteps}\n"""
+            f"""{'Iteration time:':>{pad}} {iteration_time:.2f}s\n"""
+            f"""{'Total time:':>{pad}} {self.tot_time:.2f}s\n"""
+            f"""{'ETA:':>{pad}} {self.tot_time / (locs['it'] + 1) * (
+                               locs['num_learning_iterations'] - locs['it']):.1f}s\n"""
+        )
+        print(log_string)
+
+    def save(self, path, infos=None):
+        torch.save(
+            {
+                "model_state_dict": self.alg.actor_critic.state_dict(),
+                "optimizer_state_dict": self.alg.optimizer.state_dict(),
+                "es_optimizer_state_dict": self.alg.state_estimator_optimizer.state_dict(),
+                "iter": self.it,
+                "infos": infos,
+            },
+            path,
+        )
+
+    def load(self, path, load_optimizer=True):
+        loaded_dict = torch.load(path)
+        self.alg.actor_critic.load_state_dict(loaded_dict["model_state_dict"])
+        if load_optimizer:
+            self.alg.optimizer.load_state_dict(loaded_dict["optimizer_state_dict"])
+            self.alg.state_estimator_optimizer.load_state_dict(loaded_dict["es_optimizer_state_dict"])
+        self.current_learning_iteration = loaded_dict["iter"]
+        return loaded_dict["infos"]
+
+    def get_inference_policy(self, device=None):
+        self.alg.actor_critic.eval()  # switch to evaluation mode (dropout for example)
+        if device is not None:
+            self.alg.actor_critic.to(device)
+        return self.alg.actor_critic.act_inference
+
+    def get_inference_critic(self, device=None):
+        self.alg.actor_critic.eval()  # switch to evaluation mode (dropout for example)
+        if device is not None:
+            self.alg.actor_critic.to(device)
+        return self.alg.actor_critic.evaluate
--- a/humanoid/algo/ppo/dh_ppo.py
+++ b/humanoid/algo/ppo/dh_ppo.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import torch
+import torch.nn as nn
+import torch.optim as optim
+
+from .actor_critic_dh import ActorCriticDH
+from .rollout_storage import RolloutStorage
+
+class DHPPO:
+    actor_critic: ActorCriticDH
+    def __init__(self,
+                 actor_critic,
+                 num_learning_epochs=1,
+                 num_mini_batches=1,
+                 clip_param=0.2,
+                 gamma=0.998,
+                 lam=0.95,
+                 value_loss_coef=1.0,
+                 entropy_coef=0.0,
+                 learning_rate=1e-3,
+                 max_grad_norm=1.0,
+                 lin_vel_idx = 45,
+                 use_clipped_value_loss=True,
+                 schedule="fixed",
+                 desired_kl=0.01,
+                 device='cpu',
+                 ):
+
+        self.device = device
+
+        self.desired_kl = desired_kl
+        self.schedule = schedule
+        self.learning_rate = learning_rate
+
+        # PPO components
+        self.actor_critic: ActorCriticDH = actor_critic
+        self.actor_critic.to(self.device)
+        self.storage = None # initialized later
+        self.optimizer = optim.Adam(self.actor_critic.parameters(),
+                                    lr=learning_rate)
+        self.state_estimator_optimizer = optim.Adam(self.actor_critic.state_estimator.parameters(),
+                                        lr=learning_rate)
+        self.transition = RolloutStorage.Transition()
+
+        # PPO parameters
+        self.clip_param = clip_param
+        self.num_learning_epochs = num_learning_epochs
+        self.num_mini_batches = num_mini_batches
+        self.value_loss_coef = value_loss_coef
+        self.entropy_coef = entropy_coef
+        self.gamma = gamma
+        self.lam = lam
+        self.max_grad_norm = max_grad_norm
+        self.use_clipped_value_loss = use_clipped_value_loss
+        self.num_short_obs = self.actor_critic.num_short_obs
+        self.lin_vel_idx = lin_vel_idx
+
+    def init_storage(self, num_envs, num_transitions_per_env, actor_obs_shape, critic_obs_shape, action_shape):
+        self.storage = RolloutStorage(num_envs, num_transitions_per_env, actor_obs_shape, critic_obs_shape, action_shape, None, self.device)
+
+    def test_mode(self):
+        self.actor_critic.test()
+    
+    def train_mode(self):
+        self.actor_critic.train()
+
+    def act(self, obs, critic_obs):
+        # Compute the actions and values
+        self.transition.actions = self.actor_critic.act(obs).detach()
+        self.transition.values = self.actor_critic.evaluate(critic_obs).detach()
+        self.transition.actions_log_prob = self.actor_critic.get_actions_log_prob(self.transition.actions).detach()
+        self.transition.action_mean = self.actor_critic.action_mean.detach()
+        self.transition.action_sigma = self.actor_critic.action_std.detach()
+        # need to record obs and critic_obs before env.step()
+        self.transition.observations = obs
+        self.transition.critic_observations = critic_obs
+        return self.transition.actions
+    
+    def process_env_step(self, rewards, dones, infos):
+        self.transition.rewards = rewards.clone()
+        self.transition.dones = dones
+        # Bootstrapping on time outs
+        if 'time_outs' in infos:
+            self.transition.rewards += self.gamma * torch.squeeze(self.transition.values * infos['time_outs'].unsqueeze(1).to(self.device), 1)
+
+        # Record the transition
+        self.storage.add_transitions(self.transition)
+        self.transition.clear()
+        self.actor_critic.reset(dones)
+    
+    def compute_returns(self, last_critic_obs):
+        last_values= self.actor_critic.evaluate(last_critic_obs).detach()
+        self.storage.compute_returns(last_values, self.gamma, self.lam)
+
+    def update(self):
+        mean_value_loss = 0
+        mean_surrogate_loss = 0
+        mean_state_estimator_loss = 0
+
+        generator = self.storage.mini_batch_generator(self.num_mini_batches, self.num_learning_epochs)
+        for obs_batch, critic_obs_batch, actions_batch, target_values_batch, advantages_batch, returns_batch, old_actions_log_prob_batch, \
+            old_mu_batch, old_sigma_batch, hid_states_batch, masks_batch in generator:
+
+                self.actor_critic.act(obs_batch, masks=masks_batch, hidden_states=hid_states_batch[0])
+                state_estimator_input = obs_batch[:,-self.num_short_obs:]
+                est_lin_vel = self.actor_critic.state_estimator(state_estimator_input)
+                ref_lin_vel = critic_obs_batch[:,self.lin_vel_idx:self.lin_vel_idx+3].clone()
+                actions_log_prob_batch = self.actor_critic.get_actions_log_prob(actions_batch)
+                value_batch = self.actor_critic.evaluate(critic_obs_batch, masks=masks_batch, hidden_states=hid_states_batch[1])
+                mu_batch = self.actor_critic.action_mean
+                sigma_batch = self.actor_critic.action_std
+                entropy_batch = self.actor_critic.entropy
+
+                # KL
+                if self.desired_kl != None and self.schedule == 'adaptive':
+                    with torch.inference_mode():
+                        kl = torch.sum(
+                            torch.log(sigma_batch / old_sigma_batch + 1.e-5) + (torch.square(old_sigma_batch) + torch.square(old_mu_batch - mu_batch)) / (2.0 * torch.square(sigma_batch)) - 0.5, axis=-1)
+                        kl_mean = torch.mean(kl)
+
+                        if kl_mean > self.desired_kl * 2.0:
+                            self.learning_rate = max(1e-5, self.learning_rate / 1.5)
+                        elif kl_mean < self.desired_kl / 2.0 and kl_mean > 0.0:
+                            self.learning_rate = min(1e-2, self.learning_rate * 1.5)
+                        
+                        for param_group in self.optimizer.param_groups:
+                            param_group['lr'] = self.learning_rate
+
+                # Surrogate loss
+                ratio = torch.exp(actions_log_prob_batch - torch.squeeze(old_actions_log_prob_batch))
+                surrogate = -torch.squeeze(advantages_batch) * ratio
+                surrogate_clipped = -torch.squeeze(advantages_batch) * torch.clamp(ratio, 1.0 - self.clip_param,
+                                                                                1.0 + self.clip_param)
+                surrogate_loss = torch.max(surrogate, surrogate_clipped).mean()
+
+                # Value function loss
+                if self.use_clipped_value_loss:
+                    value_clipped = target_values_batch + (value_batch - target_values_batch).clamp(-self.clip_param,
+                                                                                                    self.clip_param)
+                    value_losses = (value_batch - returns_batch).pow(2)
+                    value_losses_clipped = (value_clipped - returns_batch).pow(2)
+                    value_loss = torch.max(value_losses, value_losses_clipped).mean()
+                else:
+                    value_loss = (returns_batch - value_batch).pow(2).mean()
+
+                # update all actor_critic.parameters()
+                loss = (surrogate_loss + 
+                        self.value_loss_coef * value_loss - 
+                        self.entropy_coef * entropy_batch.mean() +
+                        torch.nn.MSELoss()(est_lin_vel, ref_lin_vel))
+                
+                # Gradient step
+                self.optimizer.zero_grad()
+                loss.backward()
+                nn.utils.clip_grad_norm_(self.actor_critic.parameters(), self.max_grad_norm)
+                self.optimizer.step()
+
+                state_estimator_loss = torch.nn.MSELoss()(est_lin_vel, ref_lin_vel)
+
+                mean_value_loss += value_loss.item()
+                mean_surrogate_loss += surrogate_loss.item()
+                mean_state_estimator_loss += state_estimator_loss.item()
+
+        num_updates = self.num_learning_epochs * self.num_mini_batches
+        mean_value_loss /= num_updates
+        mean_surrogate_loss /= num_updates
+        mean_state_estimator_loss /= num_updates
+        self.storage.clear()
+
+        return mean_value_loss, mean_surrogate_loss, mean_state_estimator_loss
--- a/humanoid/algo/ppo/rollout_storage.py
+++ b/humanoid/algo/ppo/rollout_storage.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import torch
+
+class RolloutStorage:
+    class Transition:
+        def __init__(self):
+            self.observations = None
+            self.critic_observations = None
+            self.actions = None
+            self.rewards = None
+            self.dones = None
+            self.values = None
+            self.actions_log_prob = None
+            self.action_mean = None
+            self.action_sigma = None
+            self.hidden_states = None
+            self.next_proprio_obs = None
+        
+        def clear(self):
+            self.__init__()
+
+    def __init__(self, num_envs, num_transitions_per_env, obs_shape, privileged_obs_shape, actions_shape, num_single_obs=None, device='cpu'):
+
+        self.device = device
+
+        self.obs_shape = obs_shape
+        self.privileged_obs_shape = privileged_obs_shape
+        self.actions_shape = actions_shape
+
+        # Core
+        self.observations = torch.zeros(num_transitions_per_env, num_envs, *obs_shape, device=self.device)
+        if privileged_obs_shape[0] is not None:
+            self.privileged_observations = torch.zeros(num_transitions_per_env, num_envs, *privileged_obs_shape, device=self.device)
+        else:
+            self.privileged_observations = None
+        self.rewards = torch.zeros(num_transitions_per_env, num_envs, 1, device=self.device)
+        self.actions = torch.zeros(num_transitions_per_env, num_envs, *actions_shape, device=self.device)
+        self.dones = torch.zeros(num_transitions_per_env, num_envs, 1, device=self.device).byte()
+
+        # For PPO
+        self.actions_log_prob = torch.zeros(num_transitions_per_env, num_envs, 1, device=self.device)
+        self.values = torch.zeros(num_transitions_per_env, num_envs, 1, device=self.device)
+        self.returns = torch.zeros(num_transitions_per_env, num_envs, 1, device=self.device)
+        self.advantages = torch.zeros(num_transitions_per_env, num_envs, 1, device=self.device)
+        self.mu = torch.zeros(num_transitions_per_env, num_envs, *actions_shape, device=self.device)
+        self.sigma = torch.zeros(num_transitions_per_env, num_envs, *actions_shape, device=self.device)
+
+        self.num_transitions_per_env = num_transitions_per_env
+        self.num_envs = num_envs
+        if num_single_obs is not None:
+            self.next_proprio_obs = torch.zeros(num_transitions_per_env, num_envs, num_single_obs, device=self.device)
+            
+        self.num_single_obs = num_single_obs
+        # rnn
+        self.saved_hidden_states_a = None
+        self.saved_hidden_states_c = None
+
+        self.step = 0
+
+    def add_transitions(self, transition: Transition):
+        if self.step >= self.num_transitions_per_env:
+            raise AssertionError("Rollout buffer overflow")
+        self.observations[self.step].copy_(transition.observations)
+        if self.privileged_observations is not None: self.privileged_observations[self.step].copy_(transition.critic_observations)
+        self.actions[self.step].copy_(transition.actions)
+        self.rewards[self.step].copy_(transition.rewards.view(-1, 1))
+        self.dones[self.step].copy_(transition.dones.view(-1, 1))
+        self.values[self.step].copy_(transition.values)
+        self.actions_log_prob[self.step].copy_(transition.actions_log_prob.view(-1, 1))
+        self.mu[self.step].copy_(transition.action_mean)
+        self.sigma[self.step].copy_(transition.action_sigma)
+        self._save_hidden_states(transition.hidden_states)
+        if self.num_single_obs is not None:
+            self.next_proprio_obs[self.step].copy_(transition.next_proprio_obs)
+        self.step += 1
+
+    def _save_hidden_states(self, hidden_states):
+        if hidden_states is None or hidden_states==(None, None):
+            return
+        # make a tuple out of GRU hidden state sto match the LSTM format
+        hid_a = hidden_states[0] if isinstance(hidden_states[0], tuple) else (hidden_states[0],)
+        hid_c = hidden_states[1] if isinstance(hidden_states[1], tuple) else (hidden_states[1],)
+
+        # initialize if needed 
+        if self.saved_hidden_states_a is None:
+            self.saved_hidden_states_a = [torch.zeros(self.observations.shape[0], *hid_a[i].shape, device=self.device) for i in range(len(hid_a))]
+            self.saved_hidden_states_c = [torch.zeros(self.observations.shape[0], *hid_c[i].shape, device=self.device) for i in range(len(hid_c))]
+        # copy the states
+        for i in range(len(hid_a)):
+            self.saved_hidden_states_a[i][self.step].copy_(hid_a[i])
+            self.saved_hidden_states_c[i][self.step].copy_(hid_c[i])
+
+
+    def clear(self):
+        self.step = 0
+
+    def compute_returns(self, last_values, gamma, lam):
+        advantage = 0
+        for step in reversed(range(self.num_transitions_per_env)):
+            if step == self.num_transitions_per_env - 1:
+                next_values = last_values
+            else:
+                next_values = self.values[step + 1]
+            next_is_not_terminal = 1.0 - self.dones[step].float()
+            delta = self.rewards[step] + next_is_not_terminal * gamma * next_values - self.values[step]
+            advantage = delta + next_is_not_terminal * gamma * lam * advantage
+            self.returns[step] = advantage + self.values[step]
+
+        # Compute and normalize the advantages
+        self.advantages = self.returns - self.values
+        self.advantages = (self.advantages - self.advantages.mean()) / (self.advantages.std() + 1e-8)
+
+    def get_statistics(self):
+        done = self.dones
+        done[-1] = 1
+        flat_dones = done.permute(1, 0, 2).reshape(-1, 1)
+        done_indices = torch.cat((flat_dones.new_tensor([-1], dtype=torch.int64), flat_dones.nonzero(as_tuple=False)[:, 0]))
+        trajectory_lengths = (done_indices[1:] - done_indices[:-1])
+        return trajectory_lengths.float().mean(), self.rewards.mean()
+
+    def mini_batch_generator(self, num_mini_batches, num_epochs=8):
+        batch_size = self.num_envs * self.num_transitions_per_env
+        mini_batch_size = batch_size // num_mini_batches
+        indices = torch.randperm(num_mini_batches*mini_batch_size, requires_grad=False, device=self.device)
+
+        observations = self.observations.flatten(0, 1)
+        if self.privileged_observations is not None:
+            critic_observations = self.privileged_observations.flatten(0, 1)
+        else:
+            critic_observations = observations
+
+        actions = self.actions.flatten(0, 1)
+        values = self.values.flatten(0, 1)
+        returns = self.returns.flatten(0, 1)
+        old_actions_log_prob = self.actions_log_prob.flatten(0, 1)
+        advantages = self.advantages.flatten(0, 1)
+        old_mu = self.mu.flatten(0, 1)
+        old_sigma = self.sigma.flatten(0, 1)
+        if self.num_single_obs is not None:
+            next_proprio_obs = self.next_proprio_obs.flatten(0, 1)
+            rewards = self.rewards.flatten(0, 1)
+        for epoch in range(num_epochs):
+            for i in range(num_mini_batches):
+
+                start = i*mini_batch_size
+                end = (i+1)*mini_batch_size
+                batch_idx = indices[start:end]
+
+                obs_batch = observations[batch_idx]
+                critic_observations_batch = critic_observations[batch_idx]
+                actions_batch = actions[batch_idx]
+                target_values_batch = values[batch_idx]
+                returns_batch = returns[batch_idx]
+                old_actions_log_prob_batch = old_actions_log_prob[batch_idx]
+                advantages_batch = advantages[batch_idx]
+                old_mu_batch = old_mu[batch_idx]
+                old_sigma_batch = old_sigma[batch_idx]
+                if self.num_single_obs is not None:
+                    next_proprio_obs_batch = next_proprio_obs[batch_idx]
+                    rewards_batch = rewards[batch_idx]
+                    yield next_proprio_obs_batch, rewards_batch, obs_batch, critic_observations_batch, actions_batch, target_values_batch, advantages_batch, returns_batch, \
+                        old_actions_log_prob_batch, old_mu_batch, old_sigma_batch, (None, None), None
+                else:
+                    yield obs_batch, critic_observations_batch, actions_batch, target_values_batch, advantages_batch, returns_batch, \
+                        old_actions_log_prob_batch, old_mu_batch, old_sigma_batch, (None, None), None
--- a/humanoid/algo/vec_env.py
+++ b/humanoid/algo/vec_env.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import torch
+from typing import Tuple, Union
+from abc import ABC, abstractmethod
+
+# minimal interface of the environment
+class VecEnv(ABC):
+    num_envs: int
+    num_obs: int
+    num_short_obs: int
+    num_privileged_obs: int
+    num_actions: int
+    max_episode_length: int
+    privileged_obs_buf: torch.Tensor
+    obs_buf: torch.Tensor 
+    rew_buf: torch.Tensor
+    reset_buf: torch.Tensor
+    episode_length_buf: torch.Tensor # current episode duration
+    extras: dict
+    device: torch.device
+    @abstractmethod
+    def step(self, actions: torch.Tensor) -> Tuple[torch.Tensor, Union[torch.Tensor, None], torch.Tensor, torch.Tensor, dict]:
+        pass
+    @abstractmethod
+    def reset(self, env_ids: Union[list, torch.Tensor]):
+        pass
+    @abstractmethod
+    def get_observations(self) -> torch.Tensor:
+        pass
+    @abstractmethod
+    def get_privileged_observations(self) -> Union[torch.Tensor, None]:
+        pass
\ No newline at end of file
--- a/humanoid/envs/__init__.py
+++ b/humanoid/envs/__init__.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+from humanoid import LEGGED_GYM_ROOT_DIR, LEGGED_GYM_ENVS_DIR
+from .base.legged_robot import LeggedRobot
+
+from .x1.x1_dh_stand_config  import X1DHStandCfg, X1DHStandCfgPPO
+
+from .x1.x1_dh_stand_env import X1DHStandEnv
+
+from humanoid.utils.task_registry import task_registry
+
+task_registry.register( "x1_dh_stand", X1DHStandEnv, X1DHStandCfg(), X1DHStandCfgPPO() )
--- a/humanoid/envs/base/base_config.py
+++ b/humanoid/envs/base/base_config.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import inspect
+
+class BaseConfig:
+    def __init__(self) -> None:
+        """ Initializes all member classes recursively. Ignores all namse starting with '__' (buit-in methods)."""
+        self.init_member_classes(self)
+    
+    @staticmethod
+    def init_member_classes(obj):
+        # iterate over all attributes names
+        for key in dir(obj):
+            # disregard builtin attributes
+            # if key.startswith("__"):
+            if key=="__class__":
+                continue
+            # get the corresponding attribute object
+            var =  getattr(obj, key)
+            # check if it the attribute is a class
+            if inspect.isclass(var):
+                # instantate the class
+                i_var = var()
+                # set the attribute to the instance instead of the type
+                setattr(obj, key, i_var)
+                # recursively init members of the attribute
+                BaseConfig.init_member_classes(i_var)
\ No newline at end of file
--- a/humanoid/envs/base/base_task.py
+++ b/humanoid/envs/base/base_task.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import sys
+from isaacgym import gymapi
+from isaacgym import gymutil
+import numpy as np
+import torch
+
+# Base class for RL tasks
+
+
+class BaseTask():
+
+    def __init__(self, cfg, sim_params, physics_engine, sim_device, headless):
+        self.gym = gymapi.acquire_gym()
+
+        self.sim_params = sim_params
+        self.physics_engine = physics_engine
+        self.sim_device = sim_device
+        sim_device_type, self.sim_device_id = gymutil.parse_device_str(
+            self.sim_device)
+        self.headless = headless
+
+        # env device is GPU only if sim is on GPU and use_gpu_pipeline=True, otherwise returned tensors are copied to CPU by physX.
+        if sim_device_type == 'cuda' and sim_params.use_gpu_pipeline:
+            self.device = self.sim_device
+        else:
+            self.device = 'cpu'
+
+        # graphics device for rendering, -1 for no rendering
+        self.graphics_device_id = self.sim_device_id
+        if self.headless == True:
+            self.graphics_device_id = -1
+
+        self.num_envs = cfg.env.num_envs
+        self.num_obs = cfg.env.num_observations
+        self.num_short_obs = int(cfg.env.num_single_obs * cfg.env.short_frame_stack)
+        self.num_privileged_obs = cfg.env.num_privileged_obs
+        self.num_actions = cfg.env.num_actions
+        self.num_single_obs = cfg.env.num_single_obs
+
+        # optimization flags for pytorch JIT
+        torch._C._jit_set_profiling_mode(False)
+        torch._C._jit_set_profiling_executor(False)
+
+        # allocate buffers
+        self.obs_buf = torch.zeros(
+            self.num_envs, self.num_obs, device=self.device, dtype=torch.float)
+        self.rew_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.float)
+        # new reward buffers for exp rewrads
+        self.neg_reward_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.float)
+        self.pos_reward_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.float)
+
+        self.reset_buf = torch.ones(
+            self.num_envs, device=self.device, dtype=torch.long)
+        self.episode_length_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.long)
+        self.time_out_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.bool)
+        if self.num_privileged_obs is not None:
+            self.privileged_obs_buf = torch.zeros(
+                self.num_envs, self.num_privileged_obs, device=self.device, dtype=torch.float)
+        else:
+            self.privileged_obs_buf = None
+
+        self.extras = {}
+
+        # create envs, sim and viewer
+        self.create_sim()
+        self.gym.prepare_sim(self.sim)
+        self.enable_viewer_sync = True
+        self.viewer = None
+
+        # if running with a viewer, set up keyboard shortcuts and camera
+        if self.headless == False:
+            # subscribe to keyboard shortcuts
+            self.viewer = self.gym.create_viewer(
+                self.sim, gymapi.CameraProperties())
+            self.gym.subscribe_viewer_keyboard_event(
+                self.viewer, gymapi.KEY_ESCAPE, "QUIT")
+            self.gym.subscribe_viewer_keyboard_event(
+                self.viewer, gymapi.KEY_V, "toggle_viewer_sync")
+
+            camera_properties = gymapi.CameraProperties()
+            camera_properties.width = 720
+            camera_properties.height = 480
+            camera_handle = self.gym.create_camera_sensor(
+                self.envs[0], camera_properties)
+            self.camera_handle = camera_handle
+        else:
+            # pass
+            camera_properties = gymapi.CameraProperties()
+            camera_properties.width = 720
+            camera_properties.height = 480
+            camera_handle = self.gym.create_camera_sensor(
+                self.envs[0], camera_properties)
+            self.camera_handle = camera_handle
+
+    def get_observations(self):
+        return self.obs_buf
+
+    def get_privileged_observations(self):
+        return self.privileged_obs_buf
+
+    def get_rma_observations(self):
+        return self.rma_obs_buf
+
+    def reset_idx(self, env_ids):
+        """Reset selected robots"""
+        raise NotImplementedError
+
+    def reset(self):
+        """ Reset all robots"""
+        self.reset_idx(torch.arange(self.num_envs, device=self.device))
+        obs, privileged_obs, _, _, _ = self.step(torch.zeros(
+            self.num_envs, self.num_actions, device=self.device, requires_grad=False))
+        return obs, privileged_obs
+
+    def step(self, actions):
+        raise NotImplementedError
+
+    def render(self, sync_frame_time=True):
+        if self.viewer:
+            # check for window closed
+            if self.gym.query_viewer_has_closed(self.viewer):
+                sys.exit()
+
+            # check for keyboard events
+            for evt in self.gym.query_viewer_action_events(self.viewer):
+                if evt.action == "QUIT" and evt.value > 0:
+                    sys.exit()
+                elif evt.action == "toggle_viewer_sync" and evt.value > 0:
+                    self.enable_viewer_sync = not self.enable_viewer_sync
+
+            # fetch results
+            if self.device != 'cpu':
+                self.gym.fetch_results(self.sim, True)
+
+            # step graphics
+            if self.enable_viewer_sync:
+                self.gym.step_graphics(self.sim)
+                self.gym.draw_viewer(self.viewer, self.sim, True)
+                if sync_frame_time:
+                    self.gym.sync_frame_time(self.sim)
+            else:
+                self.gym.poll_viewer_events(self.viewer)
--- a/humanoid/envs/base/legged_robot.py
+++ b/humanoid/envs/base/legged_robot.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import os
+import numpy as np
+
+from isaacgym.torch_utils import *
+from isaacgym import gymtorch, gymapi, gymutil
+from collections import deque
+
+import torch
+
+from humanoid import LEGGED_GYM_ROOT_DIR
+from humanoid.envs.base.base_task import BaseTask
+# from humanoid.utils.terrain import Terrain
+from humanoid.utils.math import quat_apply_yaw, wrap_to_pi, torch_rand_sqrt_float
+from humanoid.utils.helpers import class_to_dict
+from .legged_robot_config import LeggedRobotCfg
+from humanoid.utils.terrain import Terrain
+
+def copysign_new(a, b):
+
+    a = torch.tensor(a, device=b.device, dtype=torch.float)
+    a = a.expand_as(b)
+    return torch.abs(a) * torch.sign(b)
+
+def get_euler_rpy(q):
+    qx, qy, qz, qw = 0, 1, 2, 3
+    # roll (x-axis rotation)
+    sinr_cosp = 2.0 * (q[..., qw] * q[..., qx] + q[..., qy] * q[..., qz])
+    cosr_cosp = q[..., qw] * q[..., qw] - q[..., qx] * \
+        q[..., qx] - q[..., qy] * q[..., qy] + q[..., qz] * q[..., qz]
+    roll = torch.atan2(sinr_cosp, cosr_cosp)
+
+    # pitch (y-axis rotation)
+    sinp = 2.0 * (q[..., qw] * q[..., qy] - q[..., qz] * q[..., qx])
+    pitch = torch.where(torch.abs(sinp) >= 1, copysign_new(
+        np.pi / 2.0, sinp), torch.asin(sinp))
+
+    # yaw (z-axis rotation)
+    siny_cosp = 2.0 * (q[..., qw] * q[..., qz] + q[..., qx] * q[..., qy])
+    cosy_cosp = q[..., qw] * q[..., qw] + q[..., qx] * \
+        q[..., qx] - q[..., qy] * q[..., qy] - q[..., qz] * q[..., qz]
+    yaw = torch.atan2(siny_cosp, cosy_cosp)
+
+    return roll % (2*np.pi), pitch % (2*np.pi), yaw % (2*np.pi)
+
+def get_euler_xyz_tensor(quat):
+    r, p, w = get_euler_rpy(quat)
+    # stack r, p, w in dim1
+    euler_xyz = torch.stack((r, p, w), dim=-1)
+    euler_xyz[euler_xyz > np.pi] -= 2 * np.pi
+    return euler_xyz
+
+class LeggedRobot(BaseTask):
+    def __init__(self, cfg: LeggedRobotCfg, sim_params, physics_engine, sim_device, headless):
+        """ Parses the provided config file,
+            calls create_sim() (which creates, simulation, terrain and environments),
+            initilizes pytorch buffers used during training
+
+        Args:
+            cfg (Dict): Environment config file
+            sim_params (gymapi.SimParams): simulation parameters
+            physics_engine (gymapi.SimType): gymapi.SIM_PHYSX (must be PhysX)
+            device_type (string): 'cuda' or 'cpu'
+            device_id (int): 0, 1, ...
+            headless (bool): Run without rendering if True
+        """
+        self.cfg = cfg
+        self.sim_params = sim_params
+        self.height_samples = None
+        self.debug_viz = False
+        self.init_done = False
+        self._parse_cfg(self.cfg)
+        super().__init__(self.cfg, sim_params, physics_engine, sim_device, headless)
+        if not self.headless:
+            self.set_camera(self.cfg.viewer.pos, self.cfg.viewer.lookat)
+        self._init_buffers()
+        self._prepare_reward_function()
+        self.init_done = True
+
+    def step(self, actions):
+        """ Apply actions, simulate, call self.post_physics_step()
+
+        Args:
+            actions (torch.Tensor): Tensor of shape (num_envs, num_actions_per_env)
+        """
+        clip_actions = self.cfg.normalization.clip_actions
+        self.actions = torch.clip(actions, -clip_actions, clip_actions).to(self.device)
+        # step physics and render each frame
+        self.render()
+        for _ in range(self.cfg.control.decimation):
+            self.torques = self._compute_torques(self.actions).view(self.torques.shape)
+            self.gym.set_dof_actuation_force_tensor(self.sim, gymtorch.unwrap_tensor(self.torques))
+            self.gym.simulate(self.sim)
+            if self.device == 'cpu':
+                self.gym.fetch_results(self.sim, True)
+            self.gym.refresh_dof_state_tensor(self.sim)
+            if self.cfg.domain_rand.add_dof_lag:
+                q = self.dof_pos
+                dq = self.dof_vel
+                self.dof_lag_buffer[:,:,1:] = self.dof_lag_buffer[:,:,:self.cfg.domain_rand.dof_lag_timesteps_range[1]].clone()
+                self.dof_lag_buffer[:,:,0] = torch.cat((q, dq), 1).clone()
+            if self.cfg.domain_rand.add_dof_pos_vel_lag:
+                q = self.dof_pos
+                self.dof_pos_lag_buffer[:,:,1:] = self.dof_pos_lag_buffer[:,:,:self.cfg.domain_rand.dof_pos_lag_timesteps_range[1]].clone()
+                self.dof_pos_lag_buffer[:,:,0] = q.clone()
+                dq = self.dof_vel
+                self.dof_vel_lag_buffer[:,:,1:] = self.dof_vel_lag_buffer[:,:,:self.cfg.domain_rand.dof_vel_lag_timesteps_range[1]].clone()
+                self.dof_vel_lag_buffer[:,:,0] = dq.clone()
+            if self.cfg.domain_rand.add_imu_lag:
+                self.gym.refresh_actor_root_state_tensor(self.sim)
+                self.base_quat[:] = self.root_states[:, 3:7]
+                self.base_ang_vel[:] = quat_rotate_inverse(self.base_quat, self.root_states[:, 10:13])
+                self.base_euler_xyz = get_euler_xyz_tensor(self.base_quat)
+                self.imu_lag_buffer[:,:,1:] = self.imu_lag_buffer[:,:,:self.cfg.domain_rand.imu_lag_timesteps_range[1]].clone()
+                self.imu_lag_buffer[:,:,0] = torch.cat((self.base_ang_vel, self.base_euler_xyz ), 1).clone()
+        
+        self.post_physics_step()
+
+        # return clipped obs, clipped states (None), rewards, dones and infos
+        clip_obs = self.cfg.normalization.clip_observations
+        self.obs_buf = torch.clip(self.obs_buf, -clip_obs, clip_obs)
+        if self.privileged_obs_buf is not None:
+            self.privileged_obs_buf = torch.clip(self.privileged_obs_buf, -clip_obs, clip_obs)
+        return self.obs_buf, self.privileged_obs_buf, self.rew_buf, self.reset_buf, self.extras
+
+    def reset(self):
+        """ Reset all robots"""
+        self.reset_idx(torch.arange(self.num_envs, device=self.device))
+        obs, privileged_obs, _, _, _ = self.step(torch.zeros(
+            self.num_envs, self.num_actions, device=self.device, requires_grad=False))
+        return obs, privileged_obs
+    
+    def post_physics_step(self):
+        """ check terminations, compute observations and rewards
+            calls self._post_physics_step_callback() for common computations 
+            calls self._draw_debug_vis() if needed
+        """
+        self.gym.refresh_actor_root_state_tensor(self.sim)
+        self.gym.refresh_net_contact_force_tensor(self.sim)
+        self.gym.refresh_rigid_body_state_tensor(self.sim)
+
+        self.episode_length_buf += 1
+        self.common_step_counter += 1
+
+        # prepare quantities
+        self.base_quat[:] = self.root_states[:, 3:7]
+        self.base_lin_vel[:] = quat_rotate_inverse(self.base_quat, self.root_states[:, 7:10])
+        self.base_ang_vel[:] = quat_rotate_inverse(self.base_quat, self.root_states[:, 10:13])
+        self.projected_gravity[:] = quat_rotate_inverse(self.base_quat, self.gravity_vec)
+        self.base_euler_xyz = get_euler_xyz_tensor(self.base_quat)
+        self.feet_quat = self.rigid_state[:, self.feet_indices, 3:7]
+        self.feet_euler_xyz = get_euler_xyz_tensor(self.feet_quat)
+
+        self._post_physics_step_callback()
+
+        # compute observations, rewards, resets, ...
+        self.check_termination()
+        self.compute_reward()
+        env_ids = self.reset_buf.nonzero(as_tuple=False).flatten()
+        self.reset_idx(env_ids)
+        self.compute_observations() # in some cases a simulation step might be required to refresh some obs (for example body positions)
+
+        # get last status
+        self.last_last_actions[:] = torch.clone(self.last_actions[:])
+        self.last_actions[:] = self.actions[:]
+        self.last_dof_vel[:] = self.dof_vel[:]
+        self.last_root_vel[:] = self.root_states[:, 7:13]
+        self.last_rigid_state[:] = self.rigid_state[:]
+        self.last_contact_forces[:] = self.contact_forces[:]
+        self.last_torques[:] = self.torques[:]
+
+        if self.viewer and self.enable_viewer_sync and self.debug_viz:
+            self._draw_debug_vis()
+
+    def check_termination(self):
+        """ Check if environments need to be reset
+        """
+        self.reset_buf = torch.any(torch.norm(self.contact_forces[:, self.termination_contact_indices, :], dim=-1) > 1., dim=1)
+        self.time_out_buf = self.episode_length_buf > self.max_episode_length # no terminal reward for time-outs
+        
+        roll_cutoff = torch.abs(self.base_euler_xyz[:,0]) > 1.5
+        pitch_cutoff = torch.abs(self.base_euler_xyz[:,1]) > 1.5
+
+        self.reset_buf |= self.time_out_buf
+        self.reset_buf |= roll_cutoff
+        self.reset_buf |= pitch_cutoff
+
+    def reset_idx(self, env_ids):
+        """ Reset some environments.
+            Calls self._reset_dofs(env_ids), self._reset_root_states(env_ids), and self._resample_commands(env_ids)
+            [Optional] calls self._update_terrain_curriculum(env_ids), self.update_command_curriculum(env_ids) and
+            Logs episode info
+            Resets some buffers
+
+        Args:
+            env_ids (list[int]): List of environment ids which must be reset
+        """
+        if len(env_ids) == 0:
+            return
+        # update curriculum
+        if self.cfg.terrain.curriculum:
+            # if robot could move terrain_length, reset in a difficult terrain
+            self._update_terrain_curriculum(env_ids)
+        # avoid updating command curriculum at each step since the maximum command is common to all envs
+        if self.cfg.commands.curriculum and (self.common_step_counter % self.max_episode_length==0):
+            # If the tracking reward is above 80% of the maximum, increase the range of commands
+            self.update_command_curriculum(env_ids)
+        
+        # reset rand dof_pos and dof_vel=0
+        self._reset_dofs(env_ids)
+
+        # reset base position
+        self._reset_root_states(env_ids)
+
+        # sample cmd based on self.command_ranges
+        self._resample_commands(env_ids)
+    
+        # Randomize joint parameters, like torque gain friction ...
+        self.randomize_dof_props(env_ids)
+        self._refresh_actor_dof_props(env_ids)
+        self.randomize_lag_props(env_ids)
+        
+        # reset buffers
+        self.last_last_actions[env_ids] = 0.
+        self.actions[env_ids] = 0.
+        self.last_actions[env_ids] = 0.
+        self.last_torques[env_ids] = 0.
+        self.last_rigid_state[env_ids] = 0.
+        self.last_contact_forces[env_ids] = 0.
+        self.last_dof_vel[env_ids] = 0.
+        self.last_root_vel[env_ids] = 0.
+        self.feet_air_time[env_ids] = 0.
+        self.episode_length_buf[env_ids] = 0
+        self.reset_buf[env_ids] = 1
+        # fill extras
+        self.extras["episode"] = {}
+        for key in self.episode_sums.keys():
+            self.extras["episode"]['rew_' + key] = torch.mean(self.episode_sums[key][env_ids]) / self.max_episode_length_s
+            self.episode_sums[key][env_ids] = 0.
+        # log additional curriculum info
+        if self.cfg.terrain.mesh_type == "trimesh":
+            self.extras["episode"]["terrain_level"] = torch.mean(self.terrain_levels.float())
+        if self.cfg.commands.curriculum:
+            self.extras["episode"]["max_command_x"] = self.command_ranges["lin_vel_x"][1]
+        # send timeout info to the algorithm
+        if self.cfg.env.send_timeouts:
+            self.extras["time_outs"] = self.time_out_buf
+            
+        self.gym.refresh_actor_root_state_tensor(self.sim)
+        self.gym.refresh_net_contact_force_tensor(self.sim)
+        self.gym.refresh_rigid_body_state_tensor(self.sim)
+        
+        # fix reset gravity bug
+        self.base_quat[env_ids] = self.root_states[env_ids, 3:7]
+        self.base_euler_xyz = get_euler_xyz_tensor(self.base_quat)
+        self.projected_gravity[env_ids] = quat_rotate_inverse(self.base_quat[env_ids], self.gravity_vec[env_ids])
+        self.base_lin_vel[env_ids] = quat_rotate_inverse(self.base_quat[env_ids], self.root_states[env_ids, 7:10])
+        self.base_ang_vel[env_ids] = quat_rotate_inverse(self.base_quat[env_ids], self.root_states[env_ids, 10:13])
+        self.feet_quat[env_ids] = self.rigid_state[env_ids, self.feet_indices, 3:7]
+        self.feet_euler_xyz = get_euler_xyz_tensor(self.feet_quat)
+    
+    def randomize_lag_props(self,env_ids): 
+        """ random add lag
+        """
+        if self.cfg.domain_rand.add_lag:   
+            self.lag_buffer[env_ids, :, :] = 0.0
+            if self.cfg.domain_rand.randomize_lag_timesteps:
+                self.lag_timestep[env_ids] = torch.randint(self.cfg.domain_rand.lag_timesteps_range[0], 
+                                                           self.cfg.domain_rand.lag_timesteps_range[1]+1,(len(env_ids),),device=self.device) 
+                if self.cfg.domain_rand.randomize_lag_timesteps_perstep:
+                    self.last_lag_timestep[env_ids] = self.cfg.domain_rand.lag_timesteps_range[1]
+            else:
+                self.lag_timestep[env_ids] = self.cfg.domain_rand.lag_timesteps_range[1]
+                      
+        if self.cfg.domain_rand.add_dof_lag:
+            self.dof_lag_buffer[env_ids, :, :] = 0.0
+            if self.cfg.domain_rand.randomize_dof_lag_timesteps:
+                self.dof_lag_timestep[env_ids] = torch.randint(self.cfg.domain_rand.dof_lag_timesteps_range[0],
+                                                        self.cfg.domain_rand.dof_lag_timesteps_range[1]+1, (len(env_ids),),device=self.device)
+                if self.cfg.domain_rand.randomize_dof_lag_timesteps_perstep:
+                    self.last_dof_lag_timestep[env_ids] = self.cfg.domain_rand.dof_lag_timesteps_range[1]
+            else:
+                self.dof_lag_timestep[env_ids] = self.cfg.domain_rand.dof_lag_timesteps_range[1]
+ 
+        if self.cfg.domain_rand.add_imu_lag:                
+            self.imu_lag_buffer[env_ids, :, :] = 0.0   
+            if self.cfg.domain_rand.randomize_imu_lag_timesteps:
+                self.imu_lag_timestep[env_ids] = torch.randint(self.cfg.domain_rand.imu_lag_timesteps_range[0],
+                                                        self.cfg.domain_rand.imu_lag_timesteps_range[1]+1, (len(env_ids),),device=self.device)
+                if self.cfg.domain_rand.randomize_imu_lag_timesteps_perstep:
+                    self.last_imu_lag_timestep[env_ids] = self.cfg.domain_rand.imu_lag_timesteps_range[1]
+            else:
+                self.imu_lag_timestep[env_ids] = self.cfg.domain_rand.imu_lag_timesteps_range[1]
+    
+        if self.cfg.domain_rand.add_dof_pos_vel_lag:
+            self.dof_pos_lag_buffer[env_ids, :, :] = 0.0
+            self.dof_vel_lag_buffer[env_ids, :, :] = 0.0
+            if self.cfg.domain_rand.randomize_dof_pos_lag_timesteps:
+                self.dof_pos_lag_timestep[env_ids] = torch.randint(self.cfg.domain_rand.dof_pos_lag_timesteps_range[0],
+                                                        self.cfg.domain_rand.dof_pos_lag_timesteps_range[1]+1, (len(env_ids),),device=self.device)
+                if self.cfg.domain_rand.randomize_dof_pos_lag_timesteps_perstep:
+                    self.last_dof_pos_lag_timestep[env_ids] = self.cfg.domain_rand.dof_pos_lag_timesteps_range[1]
+            else:
+                self.dof_pos_lag_timestep[env_ids] = self.cfg.domain_rand.dof_pos_lag_timesteps_range[1]
+            if self.cfg.domain_rand.randomize_dof_vel_lag_timesteps:
+                self.dof_vel_lag_timestep[env_ids] = torch.randint(self.cfg.domain_rand.dof_vel_lag_timesteps_range[0],
+                                                        self.cfg.domain_rand.dof_vel_lag_timesteps_range[1]+1, (len(env_ids),),device=self.device)
+                if self.cfg.domain_rand.randomize_dof_vel_lag_timesteps_perstep:
+                    self.last_dof_vel_lag_timestep[env_ids] = self.cfg.domain_rand.dof_vel_lag_timesteps_range[1]
+            else:
+                self.dof_vel_lag_timestep[env_ids] = self.cfg.domain_rand.dof_vel_lag_timesteps_range[1]
+                      
+    
+    def compute_reward(self):
+        """ Compute rewards
+            Calls each reward function which had a non-zero scale (processed in self._prepare_reward_function())
+            adds each terms to the episode sums and to the total reward
+        """
+        self.rew_buf[:] = 0.
+
+        for i in range(len(self.reward_functions)):
+            name = self.reward_names[i]
+            rew = self.reward_functions[i]() * self.reward_scales[name]
+            self.rew_buf += rew
+            self.episode_sums[name] += rew
+        if self.cfg.rewards.only_positive_rewards:
+            self.rew_buf[:] = torch.clip(self.rew_buf[:], min=0.)
+        # add termination reward after clipping
+        if "termination" in self.reward_scales:
+            rew = self._reward_termination() * self.reward_scales["termination"]
+            self.rew_buf += rew
+            self.episode_sums["termination"] += rew
+
+    def set_camera(self, position, lookat):
+        """ Set camera position and direction
+        """
+        cam_pos = gymapi.Vec3(position[0], position[1], position[2])
+        cam_target = gymapi.Vec3(lookat[0], lookat[1], lookat[2])
+        self.gym.viewer_camera_look_at(self.viewer, None, cam_pos, cam_target)
+
+    #------------- Callbacks --------------
+    def randomize_rigid_body_props(self, env_ids):
+        ''' Randomise some of the rigid body properties of the actor in the given environments, i.e.
+            sample the mass, centre of mass position, friction and restitution.'''
+        if self.cfg.domain_rand.randomize_base_mass:
+            min_payload, max_payload = self.cfg.domain_rand.added_mass_range
+
+            self.payload_masses[env_ids] = torch_rand_float(min_payload, max_payload, (len(env_ids), 1), device=self.device)
+        
+        if self.cfg.domain_rand.randomize_link_mass:
+            min_link_mass, max_link_mass = self.cfg.domain_rand.added_link_mass_range
+
+            self.link_masses[env_ids] = torch_rand_float(min_link_mass, max_link_mass, (len(env_ids), self.num_bodies-1), device=self.device)
+
+        if self.cfg.domain_rand.randomize_com:
+            comx_displacement, comy_displacement, comz_displacement = self.cfg.domain_rand.com_displacement_range
+            self.com_displacements[env_ids, :] = torch.cat((torch_rand_float(comx_displacement[0], comx_displacement[1], (len(env_ids), 1), device=self.device),
+                                                            torch_rand_float(comy_displacement[0], comy_displacement[1], (len(env_ids), 1), device=self.device),
+                                                            torch_rand_float(comz_displacement[0], comz_displacement[1], (len(env_ids), 1), device=self.device)),
+                                                            dim=-1)
+        
+        if self.cfg.domain_rand.randomize_link_com:
+            comx_displacement, comy_displacement, comz_displacement = self.cfg.domain_rand.link_com_displacement_range
+            self.link_com_displacements[env_ids, :, :] = torch.cat((torch_rand_float(comx_displacement[0], comx_displacement[1], (len(env_ids), self.num_bodies-1, 1), device=self.device),
+                                                                    torch_rand_float(comy_displacement[0], comy_displacement[1], (len(env_ids), self.num_bodies-1, 1), device=self.device),
+                                                                    torch_rand_float(comz_displacement[0], comz_displacement[1], (len(env_ids), self.num_bodies-1, 1), device=self.device)),
+                                                                    dim=-1)
+        if self.cfg.domain_rand.randomize_base_inertia:
+            inertia_x, inertia_y, inertia_z = self.cfg.domain_rand.base_inertial_range
+            self.base_inertia_x[env_ids, :, :] = torch_rand_float(inertia_x[0], inertia_x[1], (len(env_ids), 1), device=self.device)
+            self.base_inertia_y[env_ids, :, :] = torch_rand_float(inertia_y[0], inertia_y[1], (len(env_ids), 1), device=self.device)
+            self.base_inertia_z[env_ids, :, :] = torch_rand_float(inertia_z[0], inertia_z[1], (len(env_ids), 1), device=self.device)
+            
+        if self.cfg.domain_rand.randomize_link_inertia:
+            inertia_x, inertia_y, inertia_z = self.cfg.domain_rand.link_inertial_range
+            self.link_inertia_x[env_ids, :, :] = torch_rand_float(inertia_x[0], inertia_x[1], (len(env_ids), self.num_bodies-1), device=self.device)
+            self.link_inertia_y[env_ids, :, :] = torch_rand_float(inertia_y[0], inertia_y[1], (len(env_ids), self.num_bodies-1), device=self.device)
+            self.link_inertia_z[env_ids, :, :] = torch_rand_float(inertia_z[0], inertia_z[1], (len(env_ids), self.num_bodies-1), device=self.device)
+            
+    def randomize_dof_props(self, env_ids):
+        # Randomise the motor strength:
+        # rand ouput torque
+        if self.cfg.domain_rand.randomize_torque:
+            motor_strength_ranges = self.cfg.domain_rand.torque_multiplier_range
+            self.torque_multi[env_ids] = torch_rand_float(motor_strength_ranges[0], motor_strength_ranges[1], (len(env_ids),self.num_actions), device=self.device)
+
+        # rand motor position offset
+        if self.cfg.domain_rand.randomize_motor_offset:
+            min_offset, max_offset = self.cfg.domain_rand.motor_offset_range
+            self.motor_offsets[env_ids, :] = torch_rand_float(min_offset, max_offset, (len(env_ids),self.num_actions), device=self.device)
+        
+        # rand kp kd gain
+        if self.cfg.domain_rand.randomize_gains:
+            p_gains_range = self.cfg.domain_rand.stiffness_multiplier_range
+            d_gains_range = self.cfg.domain_rand.damping_multiplier_range
+
+            self.randomized_p_gains[env_ids] = torch_rand_float(p_gains_range[0], p_gains_range[1], (len(env_ids),self.num_actions), device=self.device) * self.p_gains
+            self.randomized_d_gains[env_ids] =  torch_rand_float(d_gains_range[0], d_gains_range[1], (len(env_ids),self.num_actions), device=self.device) * self.d_gains    
+
+        # rand joint friciton on torque
+        if self.cfg.domain_rand.randomize_coulomb_friction:
+            joint_coulomb_range = self.cfg.domain_rand.joint_coulomb_range
+            joint_viscous_range = self.cfg.domain_rand.joint_viscous_range
+
+            self.randomized_joint_coulomb[env_ids] = torch_rand_float(joint_coulomb_range[0], joint_coulomb_range[1], (len(env_ids),self.num_actions), device=self.device)
+            self.randomized_joint_viscous[env_ids] =  torch_rand_float(joint_viscous_range[0], joint_viscous_range[1], (len(env_ids),self.num_actions), device=self.device)  
+        
+        # rand joint friction set in sim
+        if self.cfg.domain_rand.randomize_joint_friction:
+            if self.cfg.domain_rand.randomize_joint_friction_each_joint:
+                for i in range(self.num_dofs):
+                    range_key = f'joint_{i+1}_friction_range'
+                    friction_range = getattr(self.cfg.domain_rand, range_key)
+                    self.joint_friction_coeffs[env_ids, i] = torch_rand_float(friction_range[0], friction_range[1], (len(env_ids), 1), device=self.device).reshape(-1)
+            else:                      
+                joint_friction_range = self.cfg.domain_rand.joint_friction_range
+                self.joint_friction_coeffs[env_ids] = torch_rand_float(joint_friction_range[0], joint_friction_range[1], (len(env_ids), 1), device=self.device)
+
+        # rand joint damping set in sim
+        if self.cfg.domain_rand.randomize_joint_damping:
+            if self.cfg.domain_rand.randomize_joint_damping_each_joint:
+                for i in range(self.num_dofs):
+                    range_key = f'joint_{i+1}_damping_range'
+                    damping_range = getattr(self.cfg.domain_rand, range_key)
+                    self.joint_damping_coeffs[env_ids, i] = torch_rand_float(damping_range[0], damping_range[1], (len(env_ids), 1), device=self.device).reshape(-1)
+            else:
+                joint_damping_range = self.cfg.domain_rand.joint_damping_range
+                self.joint_damping_coeffs[env_ids] = torch_rand_float(joint_damping_range[0], joint_damping_range[1], (len(env_ids), 1), device=self.device)
+        
+        # rand joint armature inertia set in sim
+        if self.cfg.domain_rand.randomize_joint_armature:
+            if self.cfg.domain_rand.randomize_joint_armature_each_joint:
+                for i in range(self.num_dofs):
+                    range_key = f'joint_{i+1}_armature_range'
+                    armature_range = getattr(self.cfg.domain_rand, range_key)
+                    self.joint_armatures[env_ids, i] = torch_rand_float(armature_range[0], armature_range[1], (len(env_ids), 1), device=self.device).reshape(-1)
+            else:
+                joint_armature_range = self.cfg.domain_rand.joint_armature_range
+                self.joint_armatures[env_ids] = torch_rand_float(joint_armature_range[0], joint_armature_range[1], (len(env_ids), 1), device=self.device)
+            
+    def _process_rigid_shape_props(self, props, env_id):
+        """ Callback allowing to store/change/randomize the rigid shape properties of each environment.
+            Called During environment creation.
+            Base behavior: randomizes the friction of each environment
+
+        Args:
+            props (List[gymapi.RigidShapeProperties]): Properties of each shape of the asset
+            env_id (int): Environment id
+
+        Returns:
+            [List[gymapi.RigidShapeProperties]]: Modified rigid shape properties
+        """
+        # rigid shapes set rand friction and restitution
+        if self.cfg.domain_rand.randomize_friction:
+            if env_id==0:
+                # prepare friction randomization
+                friction_range = self.cfg.domain_rand.friction_range
+                restitution_range = self.cfg.domain_rand.restitution_range
+                num_buckets = 256
+                bucket_ids = torch.randint(0, num_buckets, (self.num_envs, 1))
+                friction_buckets = torch_rand_float(friction_range[0], friction_range[1], (num_buckets,1), device='cpu')
+                restitution_buckets = torch_rand_float(restitution_range[0], restitution_range[1], (num_buckets,1), device='cpu')
+                self.friction_coeffs = friction_buckets[bucket_ids]
+                self.restitution_coeffs = restitution_buckets[bucket_ids]
+
+            for s in range(len(props)):
+                props[s].friction = self.friction_coeffs[env_id]
+                props[s].restitution = self.restitution_coeffs[env_id]
+                
+            self.env_frictions[env_id] = self.friction_coeffs[env_id]
+            
+        return props
+
+    def _process_dof_props(self, props, env_id):
+        """ Callback allowing to store/change/randomize the DOF properties of each environment.
+            Called During environment creation.
+            Base behavior: stores position, velocity and torques limits defined in the URDF
+
+        Args:
+            props (numpy.array): Properties of each DOF of the asset
+            env_id (int): Environment id
+
+        Returns:
+            [numpy.array]: Modified DOF properties
+        """
+        # all robot has same dof property, so use first one to update is enough
+        if env_id==0:
+            self.dof_pos_limits = torch.zeros(self.num_dof, 2, dtype=torch.float, device=self.device, requires_grad=False)
+            self.dof_vel_limits = torch.zeros(self.num_dof, dtype=torch.float, device=self.device, requires_grad=False)
+            self.torque_limits = torch.zeros(self.num_dof, dtype=torch.float, device=self.device, requires_grad=False)
+            for i in range(len(props)):
+                self.dof_pos_limits[i, 0] = props["lower"][i].item() * self.cfg.safety.pos_limit
+                self.dof_pos_limits[i, 1] = props["upper"][i].item() * self.cfg.safety.pos_limit
+                self.dof_vel_limits[i] = props["velocity"][i].item() * self.cfg.safety.vel_limit
+                self.torque_limits[i] = props["effort"][i].item() * self.cfg.safety.torque_limit
+                # soft limits
+                m = (self.dof_pos_limits[i, 0] + self.dof_pos_limits[i, 1]) / 2
+                r = self.dof_pos_limits[i, 1] - self.dof_pos_limits[i, 0]
+                self.dof_pos_limits[i, 0] = m - 0.5 * r * self.cfg.rewards.soft_dof_pos_limit
+                self.dof_pos_limits[i, 1] = m + 0.5 * r * self.cfg.rewards.soft_dof_pos_limit
+
+        return props
+
+    def _process_rigid_body_props(self, props, env_id):
+        # props[0].mass is sum mass of all fix link
+        # len(props) is revolute num + 1
+        # add rand payload on fix body
+        if self.cfg.domain_rand.randomize_base_mass:
+            self.init_body_mass[env_id] = props[0].mass
+            props[0].mass += self.payload_masses[env_id]
+        self.body_mass[env_id] = props[0].mass
+
+        # rand all link mass and recalculate total mass
+        if self.cfg.domain_rand.randomize_link_mass:
+            for i in range(1, len(props)):
+                props[i].mass *= self.link_masses[env_id, i-1]
+        for i in range(1, len(props)):    
+            self.total_mass[env_id] += props[i].mass
+
+        # rand fix body com
+        if self.cfg.domain_rand.randomize_com:
+             props[0].com = gymapi.Vec3(self.com_displacements[env_id, 0], self.com_displacements[env_id, 1],
+                                    self.com_displacements[env_id, 2])
+
+        # rand link com
+        if self.cfg.domain_rand.randomize_link_com:
+            for i in range(1, len(props)):
+                props[i].com = gymapi.Vec3(self.link_com_displacements[env_id, i-1, 0], self.link_com_displacements[env_id, i-1, 1],
+                                           self.link_com_displacements[env_id, i-1, 2])     
+        
+        # rand fix body inertia
+        if self.cfg.domain_rand.randomize_base_inertia:
+            props[0].inertia.x.x *= self.base_inertia_x[env_id]
+            props[0].inertia.y.y *= self.base_inertia_y[env_id]
+            props[0].inertia.z.z *= self.base_inertia_z[env_id]
+        
+        # rand link inertia
+        if self.cfg.domain_rand.randomize_link_inertia:
+            for i in range(1, len(props)):
+                props[i].inertia.x.x *= self.link_inertia_x[env_id, i-1]
+                props[i].inertia.y.y *= self.link_inertia_y[env_id, i-1]
+                props[i].inertia.z.z *= self.link_inertia_z[env_id, i-1]
+                
+        return props
+    
+    def randomize_rigid_props(self,env_ids):
+        if self.cfg.domain_rand.randomize_friction:
+            min_friction, max_friction = self.cfg.domain_rand.friction_range
+            self.friction[env_ids, :] = torch_rand_float(min_friction, max_friction, (len(env_ids), 1), device=self.device)
+
+    def _refresh_actor_rigid_shape_props(self, env_ids):
+        ''' Refresh the rigid shape properties of the actor in the given environments, i.e.
+            set the friction and restitution coefficients to the desired values.
+        '''
+        for env_id in env_ids:
+            rigid_shape_props = self.gym.get_actor_rigid_shape_properties(self.envs[env_id], 0)
+
+            for i in range(len(rigid_shape_props)):
+                rigid_shape_props[i].friction = self.friction[env_id, 0]
+
+            self.gym.set_actor_rigid_shape_properties(self.envs[env_id], 0, rigid_shape_props)
+
+    def _refresh_actor_rigid_body_props(self, env_ids):
+        ''' Refresh the rigid body properties of the actor in the given environments, i.e.
+            set the payload_mass to the desired values.
+        '''
+        for env_id in env_ids:
+            rigid_body_props = self.gym.get_actor_rigid_body_properties(self.envs[env_id], 0)
+
+            rigid_body_props[0].mass = self.payload_masses[env_id] + self.init_body_mass[env_id]
+
+            self.gym.set_actor_rigid_body_properties(self.envs[env_id], 0, rigid_body_props)
+    
+    # a function to update dof friction, damping, armature
+    def _refresh_actor_dof_props(self, env_ids):
+        ''' Refresh the dof properties of the actor in the given environments, i.e.
+            dof friction, damping, armature
+        '''
+        for env_id in env_ids:
+            dof_props = self.gym.get_actor_dof_properties(self.envs[env_id], 0)
+
+            for i in range(self.num_dof):
+                if self.cfg.domain_rand.randomize_joint_friction:
+                    if self.cfg.domain_rand.randomize_joint_friction_each_joint:
+                        dof_props["friction"][i] *= self.joint_friction_coeffs[env_id, i]
+                    else:    
+                        dof_props["friction"][i] *= self.joint_friction_coeffs[env_id, 0]
+                if self.cfg.domain_rand.randomize_joint_damping:
+                    if self.cfg.domain_rand.randomize_joint_damping_each_joint:
+                        dof_props["damping"][i] *= self.joint_damping_coeffs[env_id, i]
+                    else:
+                        dof_props["damping"][i] *= self.joint_damping_coeffs[env_id, 0]
+                        
+                if self.cfg.domain_rand.randomize_joint_armature:
+                    if self.cfg.domain_rand.randomize_joint_armature_each_joint:
+                        dof_props["armature"][i] = self.joint_armatures[env_id, i]
+                    else:
+                        dof_props["armature"][i] = self.joint_armatures[env_id, 0]
+            self.gym.set_actor_dof_properties(self.envs[env_id], 0, dof_props)
+    
+    def _post_physics_step_callback(self):
+        ''' Callback called before computing terminations, rewards, and observations
+            Default behaviour: Compute ang vel command based on target and heading, compute measured terrain heights and randomly push robots
+        '''
+        env_ids = (self.episode_length_buf % int(self.cfg.commands.resampling_time / self.dt)==0).nonzero(as_tuple=False).flatten()
+        self._resample_commands(env_ids)
+        if self.cfg.commands.heading_command:
+            forward = quat_apply(self.base_quat, self.forward_vec)
+            heading = torch.atan2(forward[:, 1], forward[:, 0])
+            self.commands[:, 2] = torch.clip(0.5*wrap_to_pi(self.commands[:, 3] - heading), -1., 1.)
+
+        if self.cfg.terrain.measure_heights:
+            self.measured_heights = self._get_heights()
+
+        if self.cfg.domain_rand.push_robots:
+            i = int(self.common_step_counter/self.cfg.domain_rand.update_step)
+            if i >= len(self.cfg.domain_rand.push_duration):
+                i = len(self.cfg.domain_rand.push_duration) - 1
+            duration = self.cfg.domain_rand.push_duration[i]/self.dt
+            if self.common_step_counter % self.cfg.domain_rand.push_interval <= duration:
+                self._push_robots()
+            else:
+                self.rand_push_force.zero_()
+                self.rand_push_torque.zero_()
+                
+        if self.cfg.domain_rand.add_ext_force:
+            i = int(self.common_step_counter/self.cfg.domain_rand.add_update_step)
+            if i >= len(self.cfg.domain_rand.add_duration):
+                i = len(self.cfg.domain_rand.add_duration) - 1
+            duration = self.cfg.domain_rand.add_duration[i]/self.dt
+            if self.common_step_counter % self.cfg.domain_rand.ext_force_interval <= duration:
+                self._add_ext_force()
+
+    def _add_ext_force(self):
+        ''' Apply force and torque on fix body
+        '''
+        apply_forces = torch.zeros((self.num_envs, self.num_bodies, 3), device=self.device, dtype=torch.float)
+        apply_torques = torch.zeros((self.num_envs, self.num_bodies, 3), device=self.device, dtype=torch.float)
+        force_xy = torch_rand_float(-self.cfg.domain_rand.ext_force_max_xy, self.cfg.domain_rand.ext_force_max_xy,(self.num_envs,2),device=self.device)
+        force_z = torch_rand_float(-self.cfg.domain_rand.ext_force_max_z, self.cfg.domain_rand.ext_force_max_z,(self.num_envs,1),device=self.device)
+        self.ext_forces = torch.cat((force_xy, force_z), 1)
+        self.ext_torques = torch_rand_float(-self.cfg.domain_rand.ext_torque_max, self.cfg.domain_rand.ext_torque_max,(self.num_envs,3),device=self.device)
+        # apply force and torque on fix body
+        apply_forces[:, 0, :] = self.ext_forces
+        apply_torques[:, 0, :] = self.ext_torques
+        self.gym.apply_rigid_body_force_tensors(self.sim, gymtorch.unwrap_tensor(apply_forces), gymtorch.unwrap_tensor(apply_torques), gymapi.ENV_SPACE)
+        
+    def _resample_commands(self, env_ids):
+        """ Randommly select commands of some environments
+
+        Args:
+            env_ids (List[int]): Environments ids for which new commands are needed
+        """
+        self.commands[env_ids, 0] = torch_rand_float(self.command_ranges["lin_vel_x"][0], self.command_ranges["lin_vel_x"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+        self.commands[env_ids, 1] = torch_rand_float(self.command_ranges["lin_vel_y"][0], self.command_ranges["lin_vel_y"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+        if self.cfg.commands.heading_command:
+            self.commands[env_ids, 3] = torch_rand_float(self.command_ranges["heading"][0], self.command_ranges["heading"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+        else:
+            self.commands[env_ids, 2] = torch_rand_float(self.command_ranges["ang_vel_yaw"][0], self.command_ranges["ang_vel_yaw"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+
+        # set small commands to zero
+        self.commands[env_ids, :2] *= (torch.norm(self.commands[env_ids, :2], dim=1) > 0.2).unsqueeze(1)
+        # self.commands[env_ids, 2] *= (torch.norm(self.commands[env_ids, 2], dim=1) > 0.2).unsqueeze(1)
+
+    def _compute_torques(self, actions):
+        """ Compute torques from actions.
+            Actions can be interpreted as position or velocity targets given to a PD controller, or directly as scaled torques.
+            [NOTE]: torques must have the same dimension as the number of DOFs, even if some DOFs are not actuated.
+
+        Args:
+            actions (torch.Tensor): Actions
+
+        Returns:
+            [torch.Tensor]: Torques sent to the simulation
+        """
+        # pd controller
+        actions_scaled = actions * self.cfg.control.action_scale
+        if self.cfg.domain_rand.add_lag:
+            self.lag_buffer[:,:,1:] = self.lag_buffer[:,:,:self.cfg.domain_rand.lag_timesteps_range[1]].clone()
+            self.lag_buffer[:,:,0] = actions_scaled.clone()
+            if self.cfg.domain_rand.randomize_lag_timesteps_perstep:
+                self.lag_timestep = torch.randint(self.cfg.domain_rand.lag_timesteps_range[0], 
+                                                  self.cfg.domain_rand.lag_timesteps_range[1]+1,(self.num_envs,),device=self.device)
+                cond = self.lag_timestep > self.last_lag_timestep + 1
+                self.lag_timestep[cond] = self.last_lag_timestep[cond] + 1
+                self.last_lag_timestep = self.lag_timestep.clone()
+            self.lagged_actions_scaled = self.lag_buffer[torch.arange(self.num_envs),:,self.lag_timestep.long()]
+        else:
+            self.lagged_actions_scaled = actions_scaled
+
+        if self.cfg.domain_rand.randomize_gains:
+            p_gains = self.randomized_p_gains
+            d_gains = self.randomized_d_gains
+        else:
+            p_gains = self.p_gains
+            d_gains = self.d_gains
+            
+        if self.cfg.domain_rand.randomize_coulomb_friction:
+            torques = p_gains * (self.lagged_actions_scaled + self.default_dof_pos - self.dof_pos + self.motor_offsets) -\
+            d_gains * self.dof_vel -\
+            self.randomized_joint_viscous * self.dof_vel - self.randomized_joint_coulomb * torch.sign(self.dof_vel)
+        else: 
+            torques = p_gains * (self.lagged_actions_scaled + self.default_dof_pos - self.dof_pos + self.motor_offsets) - d_gains * self.dof_vel
+
+        if self.cfg.domain_rand.randomize_torque:
+            motor_strength_ranges = self.cfg.domain_rand.torque_multiplier_range
+            self.torque_multi = torch_rand_float(motor_strength_ranges[0], motor_strength_ranges[1], (self.num_envs,self.num_actions), device=self.device)
+            torques *= self.torque_multi
+            
+        return torch.clip(torques, -self.torque_limits, self.torque_limits)
+
+    
+    def _reset_dofs(self, env_ids):
+        """ Resets DOF position and velocities of selected environmments
+        Positions are randomly selected within 0.5:1.5 x default positions.
+        Velocities are set to zero.
+
+        Args:
+            env_ids (List[int]): Environemnt ids
+        """
+        # rand env reset self.dof_pos, which sync with self.dof_state
+        self.dof_pos[env_ids] = self.default_dof_pos + torch_rand_float(-0.1, 0.1, (len(env_ids), self.num_dof), device=self.device)
+        self.dof_vel[env_ids] = 0.
+
+        env_ids_int32 = env_ids.to(dtype=torch.int32)
+        self.gym.set_dof_state_tensor_indexed(self.sim,
+                                              gymtorch.unwrap_tensor(self.dof_state),
+                                              gymtorch.unwrap_tensor(env_ids_int32), len(env_ids_int32))
+
+    def _reset_root_states(self, env_ids):
+        """ Resets ROOT states position and velocities of selected environmments
+            Sets base position based on the curriculum
+            Selects randomized base velocities within -0.5:0.5 [m/s, rad/s]
+        Args:
+            env_ids (List[int]): Environemnt ids
+        """
+        # base position
+        if self.custom_origins:
+            self.root_states[env_ids] = self.base_init_state
+            self.root_states[env_ids, :3] += self.env_origins[env_ids]
+            if self.cfg.terrain.curriculum:
+                platform = self.cfg.terrain.platform
+                self.root_states[env_ids, :2] += torch_rand_float(-platform/3, platform/3, (len(env_ids), 2), device=self.device) # xy position within 1m of the center
+            else:
+                terrain_length = self.cfg.terrain.terrain_length
+                self.root_states[env_ids, :2] += torch_rand_float(-terrain_length/2, terrain_length/2, (len(env_ids), 2), device=self.device) # xy position within 1m of the center
+        else:
+            self.root_states[env_ids] = self.base_init_state
+            self.root_states[env_ids, :3] += self.env_origins[env_ids]
+        # base velocities
+        # if base is fix, set fix_base_link True
+        if self.cfg.asset.fix_base_link:
+            self.root_states[env_ids, 7:13] = 0
+            self.root_states[env_ids, 2] += 1.8
+        env_ids_int32 = env_ids.to(dtype=torch.int32)
+        self.gym.set_actor_root_state_tensor_indexed(self.sim,
+                                                     gymtorch.unwrap_tensor(self.root_states),
+                                                     gymtorch.unwrap_tensor(env_ids_int32), len(env_ids_int32))
+
+    def _push_robots(self):
+        """ Random pushes the robots. Emulates an impulse by setting a randomized base velocity. 
+        """
+        max_vel = self.cfg.domain_rand.max_push_vel_xy
+        max_push_angular = self.cfg.domain_rand.max_push_ang_vel
+        self.root_states[:, 7:9] = torch_rand_float(-max_vel, max_vel, (self.num_envs, 2), device=self.device) # lin vel x/y
+        self.root_states[:, 10:13] = torch_rand_float(-max_push_angular, max_push_angular, (self.num_envs, 3), device=self.device) # ang vel
+        self.gym.set_actor_root_state_tensor(self.sim, gymtorch.unwrap_tensor(self.root_states))
+
+
+    def _update_terrain_curriculum(self, env_ids):
+        """ Implements the game-inspired curriculum.
+
+        Args:
+            env_ids (List[int]): ids of environments being reset
+        """
+        # Implement Terrain curriculum
+        if not self.init_done:
+            # don't change on initial reset
+            return
+        distance = torch.norm(self.root_states[env_ids, :2] - self.env_origins[env_ids, :2], dim=1)
+        # robots that walked far enough progress to harder terains
+        move_up = distance > self.terrain.env_length / 2
+        # robots that walked less than half of their required distance go to simpler terrains
+        move_down = (distance < torch.norm(self.commands[env_ids, :2], dim=1)*self.max_episode_length_s*0.5) * ~move_up
+        self.terrain_levels[env_ids] += 1 * move_up - 1 * move_down
+        # Robots that solve the last level are sent to a random one
+        self.terrain_levels[env_ids] = torch.where(self.terrain_levels[env_ids]>=self.max_terrain_level,
+                                                   torch.randint_like(self.terrain_levels[env_ids], self.max_terrain_level),
+                                                   torch.clip(self.terrain_levels[env_ids], 0)) # (the minumum level is zero)
+        self.env_origins[env_ids] = self.terrain_origins[self.terrain_levels[env_ids], self.terrain_types[env_ids]]
+    
+    def update_command_curriculum(self, env_ids):
+        """ Implements a curriculum of increasing commands
+
+        Args:
+            env_ids (List[int]): ids of environments being reset
+        """
+        # If the tracking reward is above 80% of the maximum, increase the range of commands
+        if torch.mean(self.episode_sums["tracking_lin_vel"][env_ids]) / self.max_episode_length > 0.8 * self.reward_scales["tracking_lin_vel"]:
+            self.command_ranges["lin_vel_x"][0] = np.clip(self.command_ranges["lin_vel_x"][0] - 0.25, -self.cfg.commands.max_curriculum/2, 0.)
+            self.command_ranges["lin_vel_x"][1] = np.clip(self.command_ranges["lin_vel_x"][1] + 0.5, 0., self.cfg.commands.max_curriculum)
+
+    #----------------------------------------
+    def _init_buffers(self):
+        """ Initialize torch tensors which will contain simulation states and processed quantities
+        """
+        # get gym GPU state tensors
+        actor_root_state = self.gym.acquire_actor_root_state_tensor(self.sim)
+        dof_state_tensor = self.gym.acquire_dof_state_tensor(self.sim)
+        net_contact_forces = self.gym.acquire_net_contact_force_tensor(self.sim)
+        rigid_body_state = self.gym.acquire_rigid_body_state_tensor(self.sim)
+
+        self.gym.refresh_dof_state_tensor(self.sim)
+        self.gym.refresh_actor_root_state_tensor(self.sim)
+        self.gym.refresh_net_contact_force_tensor(self.sim)
+        self.gym.refresh_rigid_body_state_tensor(self.sim)
+        
+        # create some wrapper tensors for different slices
+        self.root_states = gymtorch.wrap_tensor(actor_root_state)
+        self.dof_state = gymtorch.wrap_tensor(dof_state_tensor)
+        self.dof_pos = self.dof_state.view(self.num_envs, self.num_dof, 2)[..., 0]
+        self.dof_vel = self.dof_state.view(self.num_envs, self.num_dof, 2)[..., 1]
+        self.base_quat = self.root_states[:, 3:7]
+        self.base_euler_xyz = get_euler_xyz_tensor(self.base_quat)
+
+        self.contact_forces = gymtorch.wrap_tensor(net_contact_forces).view(self.num_envs, -1, 3) # shape: num_envs, num_bodies, xyz axis
+        self.rigid_state = gymtorch.wrap_tensor(rigid_body_state).view(self.num_envs, self.num_bodies, 13)
+        self.feet_quat = self.rigid_state[:, self.feet_indices, 3:7]
+        self.feet_euler_xyz = get_euler_xyz_tensor(self.feet_quat)
+
+
+        # initialize some data used later on
+        self.common_step_counter = 0
+        self.extras = {}
+        self.noise_scale_vec = self._get_noise_scale_vec(self.cfg)
+        self.gravity_vec = to_torch(get_axis_params(-1., self.up_axis_idx), device=self.device).repeat((self.num_envs, 1))
+        self.forward_vec = to_torch([1., 0., 0.], device=self.device).repeat((self.num_envs, 1))
+        self.torques = torch.zeros(self.num_envs, self.num_actions, dtype=torch.float, device=self.device, requires_grad=False)
+        self.last_torques = torch.zeros(self.num_envs, self.num_actions, dtype=torch.float, device=self.device, requires_grad=False)
+        self.actions = torch.zeros(self.num_envs, self.num_actions, dtype=torch.float, device=self.device, requires_grad=False)
+        self.last_actions = torch.zeros(self.num_envs, self.num_actions, dtype=torch.float, device=self.device, requires_grad=False)
+        self.last_last_actions = torch.zeros(self.num_envs, self.num_actions, dtype=torch.float, device=self.device, requires_grad=False)
+        self.last_rigid_state = torch.zeros_like(self.rigid_state)
+        self.last_contact_forces = torch.zeros_like(self.contact_forces)
+        self.last_dof_vel = torch.zeros_like(self.dof_vel)
+        self.last_root_vel = torch.zeros_like(self.root_states[:, 7:13])
+        self.commands = torch.zeros(self.num_envs, self.cfg.commands.num_commands, dtype=torch.float, device=self.device, requires_grad=False) # x vel, y vel, yaw vel, heading
+        self.commands_scale = torch.tensor([self.obs_scales.lin_vel, self.obs_scales.lin_vel, self.obs_scales.ang_vel], device=self.device, requires_grad=False,)
+        self.feet_air_time = torch.zeros(self.num_envs, self.feet_indices.shape[0], dtype=torch.float, device=self.device, requires_grad=False)
+        self.last_contacts = torch.zeros(self.num_envs, len(self.feet_indices), dtype=torch.bool, device=self.device, requires_grad=False)
+        self.base_lin_vel = quat_rotate_inverse(self.base_quat, self.root_states[:, 7:10])
+        self.base_ang_vel = quat_rotate_inverse(self.base_quat, self.root_states[:, 10:13])
+        self.projected_gravity = quat_rotate_inverse(self.base_quat, self.gravity_vec)
+        
+        if self.cfg.terrain.measure_heights:
+            self.height_points = self._init_height_points()
+        self.measured_heights = 0
+
+        # joint positions offsets and PD gains
+        self.default_dof_pos = torch.zeros(self.num_dof, dtype=torch.float, device=self.device, requires_grad=False)
+        for i in range(self.num_dofs):
+            name = self.dof_names[i]
+            self.default_dof_pos[i] = self.cfg.init_state.default_joint_angles[name]
+            found = False
+            for dof_name in self.cfg.control.stiffness.keys():
+
+                if dof_name in name:
+                    self.p_gains[i] = self.cfg.control.stiffness[dof_name]
+                    self.d_gains[i] = self.cfg.control.damping[dof_name]
+                    found = True
+            if not found:
+                self.p_gains[i] = 0.
+                self.d_gains[i] = 0.
+                if self.cfg.control.control_type in ["P", "V"]:
+                    print(f"PD gain of joint {name} were not defined, setting them to zero")
+        
+
+        self.rand_push_force = torch.zeros((self.num_envs, 3), dtype=torch.float32, device=self.device)
+        self.rand_push_torque = torch.zeros((self.num_envs, 3), dtype=torch.float32, device=self.device)
+        self.default_dof_pos = self.default_dof_pos.unsqueeze(0)
+
+        self.default_joint_pd_target = self.default_dof_pos.clone()
+        self.obs_history = deque(maxlen=self.cfg.env.frame_stack)
+        self.critic_history = deque(maxlen=self.cfg.env.c_frame_stack)
+        for _ in range(self.cfg.env.frame_stack):
+            self.obs_history.append(torch.zeros(
+                self.num_envs, self.cfg.env.num_single_obs, dtype=torch.float, device=self.device))
+        for _ in range(self.cfg.env.c_frame_stack):
+            if self.cfg.terrain.measure_heights:
+                self.critic_history.append(torch.zeros(
+                self.num_envs, self.cfg.env.single_num_privileged_obs + self.cfg.terrain.num_height, dtype=torch.float, device=self.device))
+            else:
+                self.critic_history.append(torch.zeros(
+                self.num_envs, self.cfg.env.single_num_privileged_obs, dtype=torch.float, device=self.device))
+        
+        if self.cfg.domain_rand.add_lag:   
+            self.lag_buffer = torch.zeros(self.num_envs,self.num_actions,self.cfg.domain_rand.lag_timesteps_range[1]+1,device=self.device)
+            if self.cfg.domain_rand.randomize_lag_timesteps:
+                self.lag_timestep = torch.randint(self.cfg.domain_rand.lag_timesteps_range[0],
+                                                  self.cfg.domain_rand.lag_timesteps_range[1]+1,(self.num_envs,),device=self.device) 
+                if self.cfg.domain_rand.randomize_lag_timesteps_perstep:
+                    self.last_lag_timestep = torch.ones(self.num_envs,device=self.device,dtype=int) * self.cfg.domain_rand.lag_timesteps_range[1]
+            else:
+                self.lag_timestep = torch.ones(self.num_envs,device=self.device) * self.cfg.domain_rand.lag_timesteps_range[1]
+
+        if self.cfg.domain_rand.add_dof_lag:
+            self.dof_lag_buffer = torch.zeros(self.num_envs,self.num_actions * 2,self.cfg.domain_rand.dof_lag_timesteps_range[1]+1,device=self.device)
+            if self.cfg.domain_rand.randomize_dof_lag_timesteps:
+                self.dof_lag_timestep = torch.randint(self.cfg.domain_rand.dof_lag_timesteps_range[0],
+                                                        self.cfg.domain_rand.dof_lag_timesteps_range[1]+1, (self.num_envs,),device=self.device)
+                if self.cfg.domain_rand.randomize_dof_lag_timesteps_perstep:
+                    self.last_dof_lag_timestep = torch.ones(self.num_envs,device=self.device,dtype=int) * self.cfg.domain_rand.dof_lag_timesteps_range[1]
+            else:
+                self.dof_lag_timestep = torch.ones(self.num_envs,device=self.device) * self.cfg.domain_rand.dof_lag_timesteps_range[1]
+
+        if self.cfg.domain_rand.add_imu_lag:
+            self.imu_lag_buffer = torch.zeros(self.num_envs, 6, self.cfg.domain_rand.imu_lag_timesteps_range[1]+1,device=self.device)
+            if self.cfg.domain_rand.randomize_imu_lag_timesteps:
+                self.imu_lag_timestep = torch.randint(self.cfg.domain_rand.imu_lag_timesteps_range[0],
+                                                        self.cfg.domain_rand.imu_lag_timesteps_range[1]+1, (self.num_envs,),device=self.device)
+                if self.cfg.domain_rand.randomize_imu_lag_timesteps_perstep:
+                    self.last_imu_lag_timestep = torch.ones(self.num_envs,device=self.device,dtype=int) * self.cfg.domain_rand.imu_lag_timesteps_range[1]
+            else:
+                self.imu_lag_timestep = torch.ones(self.num_envs,device=self.device) * self.cfg.domain_rand.imu_lag_timesteps_range[1]
+                
+        if self.cfg.domain_rand.add_dof_pos_vel_lag:
+            self.dof_pos_lag_buffer = torch.zeros(self.num_envs,self.num_actions,self.cfg.domain_rand.dof_pos_lag_timesteps_range[1]+1,device=self.device)
+            self.dof_vel_lag_buffer = torch.zeros(self.num_envs,self.num_actions,self.cfg.domain_rand.dof_vel_lag_timesteps_range[1]+1,device=self.device)
+            if self.cfg.domain_rand.randomize_dof_pos_lag_timesteps:
+                self.dof_pos_lag_timestep = torch.randint(self.cfg.domain_rand.dof_pos_lag_timesteps_range[0],
+                                                        self.cfg.domain_rand.dof_pos_lag_timesteps_range[1]+1, (self.num_envs,),device=self.device)
+                if self.cfg.domain_rand.randomize_dof_pos_lag_timesteps_perstep:
+                    self.last_dof_pos_lag_timestep = torch.ones(self.num_envs,device=self.device,dtype=int) * self.cfg.domain_rand.dof_pos_lag_timesteps_range[1]
+            else:
+                self.dof_pos_lag_timestep = torch.ones(self.num_envs,device=self.device) * self.cfg.domain_rand.dof_pos_lag_timesteps_range[1]
+            if self.cfg.domain_rand.randomize_dof_vel_lag_timesteps:
+                self.dof_vel_lag_timestep = torch.randint(self.cfg.domain_rand.dof_vel_lag_timesteps_range[0],
+                                                        self.cfg.domain_rand.dof_vel_lag_timesteps_range[1]+1, (self.num_envs,),device=self.device)
+                if self.cfg.domain_rand.randomize_dof_vel_lag_timesteps_perstep:
+                    self.last_dof_vel_lag_timestep = torch.ones(self.num_envs,device=self.device,dtype=int) * self.cfg.domain_rand.dof_vel_lag_timesteps_range[1]
+            else:
+                self.dof_vel_lag_timestep = torch.ones(self.num_envs,device=self.device) * self.cfg.domain_rand.dof_vel_lag_timesteps_range[1]
+                          
+    def _prepare_reward_function(self):
+        """ Prepares a list of reward functions, which will be called to compute the total reward.
+            Looks for self._reward_<REWARD_NAME>, where <REWARD_NAME> are names of all non zero reward scales in the cfg.
+        """
+        # remove zero scales + multiply non-zero ones by dt
+        for key in list(self.reward_scales.keys()):
+            scale = self.reward_scales[key]
+            if scale==0:
+                self.reward_scales.pop(key) 
+            else:
+                self.reward_scales[key] *= self.dt
+        # prepare list of functions
+        self.reward_functions = []
+        self.reward_names = []
+        for name, scale in self.reward_scales.items():
+            if name=="termination":
+                continue
+            self.reward_names.append(name)
+            name = '_reward_' + name
+            self.reward_functions.append(getattr(self, name))
+
+        # reward episode sums
+        self.episode_sums = {name: torch.zeros(self.num_envs, dtype=torch.float, device=self.device, requires_grad=False)
+                             for name in self.reward_scales.keys()}
+
+    def _create_ground_plane(self):
+        """ Adds a ground plane to the simulation, sets friction and restitution based on the cfg.
+        """
+        plane_params = gymapi.PlaneParams()
+        plane_params.normal = gymapi.Vec3(0.0, 0.0, 1.0)
+        plane_params.static_friction = self.cfg.terrain.static_friction
+        plane_params.dynamic_friction = self.cfg.terrain.dynamic_friction
+        plane_params.restitution = self.cfg.terrain.restitution
+        self.gym.add_ground(self.sim, plane_params)
+    
+    def _create_heightfield(self):
+        """ Adds a heightfield terrain to the simulation, sets parameters based on the cfg.
+        """
+        hf_params = gymapi.HeightFieldParams()
+        hf_params.column_scale = self.terrain.cfg.horizontal_scale
+        hf_params.row_scale = self.terrain.cfg.horizontal_scale
+        hf_params.vertical_scale = self.terrain.cfg.vertical_scale
+        hf_params.nbRows = self.terrain.tot_cols
+        hf_params.nbColumns = self.terrain.tot_rows 
+        hf_params.transform.p.x = -self.terrain.cfg.border_size 
+        hf_params.transform.p.y = -self.terrain.cfg.border_size
+        hf_params.transform.p.z = 0.0
+        hf_params.static_friction = self.cfg.terrain.static_friction
+        hf_params.dynamic_friction = self.cfg.terrain.dynamic_friction
+        hf_params.restitution = self.cfg.terrain.restitution
+
+        self.gym.add_heightfield(self.sim, self.terrain.heightsamples, hf_params)
+        self.height_samples = torch.tensor(self.terrain.heightsamples).view(self.terrain.tot_rows, self.terrain.tot_cols).to(self.device)
+
+    def _create_trimesh(self):
+        """ Adds a triangle mesh terrain to the simulation, sets parameters based on the cfg.
+        # """
+        tm_params = gymapi.TriangleMeshParams()
+        tm_params.nb_vertices = self.terrain.vertices.shape[0]
+        tm_params.nb_triangles = self.terrain.triangles.shape[0]
+
+        tm_params.transform.p.x = -self.terrain.cfg.border_size 
+        tm_params.transform.p.y = -self.terrain.cfg.border_size
+        tm_params.transform.p.z = 0.0
+        tm_params.static_friction = self.cfg.terrain.static_friction
+        tm_params.dynamic_friction = self.cfg.terrain.dynamic_friction
+        tm_params.restitution = self.cfg.terrain.restitution
+        self.gym.add_triangle_mesh(self.sim, self.terrain.vertices.flatten(order='C'), self.terrain.triangles.flatten(order='C'), tm_params)   
+        self.height_samples = torch.tensor(self.terrain.heightsamples).view(self.terrain.tot_rows, self.terrain.tot_cols).to(self.device)
+
+    def create_sim(self):
+        """ Creates simulation, terrain and evironments
+        """
+        self.up_axis_idx = 2  # 2 for z, 1 for y -> adapt gravity accordingly
+        self.sim = self.gym.create_sim(
+            self.sim_device_id, self.graphics_device_id, self.physics_engine, self.sim_params)
+        mesh_type = self.cfg.terrain.mesh_type
+        if mesh_type in ['heightfield', 'trimesh']:
+            self.terrain = Terrain(self.cfg.terrain, self.num_envs)
+        if mesh_type == 'plane':
+            self._create_ground_plane()
+        elif mesh_type == 'heightfield':
+            self._create_heightfield()
+        elif mesh_type == 'trimesh':
+            self._create_trimesh()
+        elif mesh_type is not None:
+            raise ValueError(
+                "Terrain mesh type not recognised. Allowed types are [None, plane, heightfield, trimesh]")
+        self._create_envs()
+        
+    def _create_envs(self):
+        """ Creates environments:
+             1. loads the robot URDF/MJCF asset,
+             2. For each environment
+                2.1 creates the environment, 
+                2.2 calls DOF and Rigid shape properties callbacks,
+                2.3 create actor with these properties and add them to the env
+             3. Store indices of different bodies of the robot
+        """
+        asset_path = self.cfg.asset.file.format(LEGGED_GYM_ROOT_DIR=LEGGED_GYM_ROOT_DIR)
+        asset_root = os.path.dirname(asset_path)
+        asset_file = os.path.basename(asset_path)
+
+        asset_options = gymapi.AssetOptions()
+        asset_options.default_dof_drive_mode = self.cfg.asset.default_dof_drive_mode
+        asset_options.collapse_fixed_joints = self.cfg.asset.collapse_fixed_joints
+        asset_options.replace_cylinder_with_capsule = self.cfg.asset.replace_cylinder_with_capsule
+        asset_options.flip_visual_attachments = self.cfg.asset.flip_visual_attachments
+        asset_options.fix_base_link = self.cfg.asset.fix_base_link
+        asset_options.density = self.cfg.asset.density
+        asset_options.angular_damping = self.cfg.asset.angular_damping
+        asset_options.linear_damping = self.cfg.asset.linear_damping
+        asset_options.max_angular_velocity = self.cfg.asset.max_angular_velocity
+        asset_options.max_linear_velocity = self.cfg.asset.max_linear_velocity
+        asset_options.armature = self.cfg.asset.armature
+        asset_options.thickness = self.cfg.asset.thickness
+        asset_options.disable_gravity = self.cfg.asset.disable_gravity
+
+        robot_asset = self.gym.load_asset(self.sim, asset_root, asset_file, asset_options)
+        self.num_dof = self.gym.get_asset_dof_count(robot_asset)
+        self.num_bodies = self.gym.get_asset_rigid_body_count(robot_asset)
+        dof_props_asset = self.gym.get_asset_dof_properties(robot_asset)
+        rigid_shape_props_asset = self.gym.get_asset_rigid_shape_properties(robot_asset)
+
+        # save body names from the asset
+        body_names = self.gym.get_asset_rigid_body_names(robot_asset)
+        self.dof_names = self.gym.get_asset_dof_names(robot_asset)
+        self.num_bodies = len(body_names)
+        self.num_dofs = len(self.dof_names)
+        feet_names = [s for s in body_names if self.cfg.asset.foot_name in s]
+        print(body_names)
+        print(feet_names)
+        knee_names = [s for s in body_names if self.cfg.asset.knee_name in s]
+        penalized_contact_names = []
+        for name in self.cfg.asset.penalize_contacts_on:
+            penalized_contact_names.extend([s for s in body_names if name in s])
+        termination_contact_names = []
+        for name in self.cfg.asset.terminate_after_contacts_on:
+            termination_contact_names.extend([s for s in body_names if name in s])
+
+        base_init_state_list = self.cfg.init_state.pos + self.cfg.init_state.rot + self.cfg.init_state.lin_vel + self.cfg.init_state.ang_vel
+        self.base_init_state = to_torch(base_init_state_list, device=self.device, requires_grad=False)
+        start_pose = gymapi.Transform()
+        start_pose.p = gymapi.Vec3(*self.base_init_state[:3])
+        self.p_gains = torch.zeros(self.num_actions, dtype=torch.float, device=self.device, requires_grad=False)
+        self.d_gains = torch.zeros(self.num_actions, dtype=torch.float, device=self.device, requires_grad=False)
+        self.init_randomize_props()
+        
+        self._get_env_origins()
+        env_lower = gymapi.Vec3(0., 0., 0.)
+        env_upper = gymapi.Vec3(0., 0., 0.)
+        self.actor_handles = []
+        self.envs = []
+        self.env_frictions = torch.zeros(self.num_envs, 1, dtype=torch.float32, device=self.device)
+
+        self.body_mass = torch.zeros(self.num_envs, 1, dtype=torch.float32, device=self.device, requires_grad=False)
+        self.init_body_mass = torch.zeros(self.num_envs, 1, dtype=torch.float32, device=self.device, requires_grad=False)
+        self.total_mass = torch.zeros(self.num_envs, 1, dtype=torch.float32, device=self.device, requires_grad=False)
+        
+        self.randomize_rigid_body_props(torch.arange(self.num_envs, device=self.device))
+        self.randomize_dof_props(torch.arange(self.num_envs, device=self.device))
+        for i in range(self.num_envs):
+            # create env instance
+            env_handle = self.gym.create_env(self.sim, env_lower, env_upper, int(np.sqrt(self.num_envs)))
+            pos = self.env_origins[i].clone()
+            pos[:2] += torch_rand_float(-1., 1., (2,1), device=self.device).squeeze(1)
+            start_pose.p = gymapi.Vec3(*pos)
+            rigid_shape_props = self._process_rigid_shape_props(rigid_shape_props_asset, i)
+            self.gym.set_asset_rigid_shape_properties(robot_asset, rigid_shape_props)
+            actor_handle = self.gym.create_actor(env_handle, robot_asset, start_pose, self.cfg.asset.name, i, self.cfg.asset.self_collisions, 0)
+            dof_props = self._process_dof_props(dof_props_asset, i)
+            self.gym.set_actor_dof_properties(env_handle, actor_handle, dof_props)
+            body_props = self.gym.get_actor_rigid_body_properties(env_handle, actor_handle)
+            body_props = self._process_rigid_body_props(body_props, i)
+            self.gym.set_actor_rigid_body_properties(env_handle, actor_handle, body_props, recomputeInertia=True)
+            self.envs.append(env_handle)
+            self.actor_handles.append(actor_handle)
+
+        self._refresh_actor_dof_props(torch.arange(self.num_envs, device=self.device))
+        self.feet_indices = torch.zeros(len(feet_names), dtype=torch.long, device=self.device, requires_grad=False)
+        for i in range(len(feet_names)):
+            self.feet_indices[i] = self.gym.find_actor_rigid_body_handle(self.envs[0], self.actor_handles[0], feet_names[i])
+        self.knee_indices = torch.zeros(len(knee_names), dtype=torch.long, device=self.device, requires_grad=False)
+        for i in range(len(knee_names)):
+            self.knee_indices[i] = self.gym.find_actor_rigid_body_handle(self.envs[0], self.actor_handles[0], knee_names[i])
+
+        self.penalised_contact_indices = torch.zeros(len(penalized_contact_names), dtype=torch.long, device=self.device, requires_grad=False)
+        for i in range(len(penalized_contact_names)):
+            self.penalised_contact_indices[i] = self.gym.find_actor_rigid_body_handle(self.envs[0], self.actor_handles[0], penalized_contact_names[i])
+
+        self.termination_contact_indices = torch.zeros(len(termination_contact_names), dtype=torch.long, device=self.device, requires_grad=False)
+        for i in range(len(termination_contact_names)):
+            self.termination_contact_indices[i] = self.gym.find_actor_rigid_body_handle(self.envs[0], self.actor_handles[0], termination_contact_names[i])
+            
+    def init_randomize_props(self):
+        ''' Initialize torch tensors for random properties
+        '''
+        if self.cfg.domain_rand.randomize_base_mass:
+            self.payload_masses = torch.zeros(self.num_envs, 1, dtype=torch.float, device=self.device,requires_grad=False)
+            
+        if self.cfg.domain_rand.randomize_com:
+            self.com_displacements = torch.zeros(self.num_envs, 3, dtype=torch.float, device=self.device,
+                                        requires_grad=False)
+            
+        if self.cfg.domain_rand.randomize_base_inertia:
+            self.base_inertia_x = torch.ones(self.num_envs, 1, dtype=torch.float, device=self.device, requires_grad=False)
+            self.base_inertia_y = torch.ones(self.num_envs, 1, dtype=torch.float, device=self.device, requires_grad=False)
+            self.base_inertia_z = torch.ones(self.num_envs, 1, dtype=torch.float, device=self.device, requires_grad=False)
+            
+        if self.cfg.domain_rand.randomize_link_mass:
+            self.link_masses = torch.ones(self.num_envs, self.num_bodies-1, dtype=torch.float, device=self.device,requires_grad=False)
+            
+        if self.cfg.domain_rand.randomize_link_com:
+            self.link_com_displacements = torch.zeros(self.num_envs, self.num_bodies-1, 3, dtype=torch.float, device=self.device, requires_grad=False)
+            
+        if self.cfg.domain_rand.randomize_link_inertia:
+            self.link_inertia_x = torch.ones(self.num_envs, self.num_bodies-1, dtype=torch.float, device=self.device, requires_grad=False)
+            self.link_inertia_y = torch.ones(self.num_envs, self.num_bodies-1, dtype=torch.float, device=self.device, requires_grad=False)
+            self.link_inertia_z = torch.ones(self.num_envs, self.num_bodies-1, dtype=torch.float, device=self.device, requires_grad=False)
+            
+        if self.cfg.domain_rand.randomize_friction:
+            self.friction = torch.zeros(self.num_envs, 1, dtype=torch.float, device=self.device,requires_grad=False)     
+               
+        if self.cfg.domain_rand.randomize_joint_friction_each_joint:
+            self.joint_friction_coeffs = torch.ones(self.num_envs, self.num_dofs, dtype=torch.float, device=self.device,requires_grad=False)
+        else:
+            self.joint_friction_coeffs = torch.ones(self.num_envs, 1, dtype=torch.float, device=self.device,requires_grad=False)
+            
+        if self.cfg.domain_rand.randomize_joint_damping_each_joint:
+            self.joint_damping_coeffs = torch.ones(self.num_envs, self.num_dofs, dtype=torch.float, device=self.device,requires_grad=False)
+        else:
+            self.joint_damping_coeffs = torch.ones(self.num_envs, 1, dtype=torch.float, device=self.device,requires_grad=False)
+            
+        if self.cfg.domain_rand.randomize_joint_armature_each_joint:
+            self.joint_armatures = torch.zeros(self.num_envs, self.num_dofs, dtype=torch.float, device=self.device,requires_grad=False)  
+        else:
+            self.joint_armatures = torch.zeros(self.num_envs, 1, dtype=torch.float, device=self.device,requires_grad=False)
+            
+        if self.cfg.domain_rand.randomize_torque:
+            self.torque_multi = torch.ones(self.num_envs, self.num_actions, dtype=torch.float, device=self.device,requires_grad=False)
+            
+        self.motor_offsets = torch.zeros(self.num_envs, self.num_actions, dtype=torch.float, device=self.device,requires_grad=False) 
+            
+        if self.cfg.domain_rand.randomize_gains:
+            self.randomized_p_gains = torch.zeros(self.num_envs,self.num_actions, dtype=torch.float, device=self.device, requires_grad=False) * self.p_gains
+            self.randomized_d_gains = torch.zeros(self.num_envs,self.num_actions, dtype=torch.float, device=self.device, requires_grad=False) * self.d_gains
+            
+        if self.cfg.domain_rand.randomize_coulomb_friction:
+            self.randomized_joint_coulomb = torch.zeros(self.num_envs,self.num_actions, dtype=torch.float, device=self.device, requires_grad=False) * self.p_gains
+            self.randomized_joint_viscous = torch.zeros(self.num_envs,self.num_actions, dtype=torch.float, device=self.device, requires_grad=False) * self.d_gains
+
+    def _get_env_origins(self):
+        ''' Sets environment origins. On rough terrain the origins are defined by the terrain platforms.
+            Otherwise create a grid.
+        '''
+        if self.cfg.terrain.mesh_type in ["heightfield", "trimesh"]:
+            self.custom_origins = True
+            self.env_origins = torch.zeros(self.num_envs, 3, device=self.device, requires_grad=False)
+            # put robots at the origins defined by the terrain
+            max_init_level = self.cfg.terrain.max_init_terrain_level
+            if not self.cfg.terrain.curriculum: max_init_level = self.cfg.terrain.num_rows - 1
+            self.terrain_levels = torch.randint(0, max_init_level+1, (self.num_envs,), device=self.device)
+            self.terrain_types = torch.div(torch.arange(self.num_envs, device=self.device), (self.num_envs/self.cfg.terrain.num_cols), rounding_mode='floor').to(torch.long)
+            self.max_terrain_level = self.cfg.terrain.num_rows
+            self.terrain_origins = torch.from_numpy(self.terrain.env_origins).to(self.device).to(torch.float)
+            self.env_origins[:] = self.terrain_origins[self.terrain_levels, self.terrain_types]
+        else:
+            self.custom_origins = False
+            self.env_origins = torch.zeros(self.num_envs, 3, device=self.device, requires_grad=False)
+            # create a grid of robots
+            num_cols = np.floor(np.sqrt(self.num_envs))
+            num_rows = np.ceil(self.num_envs / num_cols)
+            xx, yy = torch.meshgrid(torch.arange(num_rows), torch.arange(num_cols))
+            spacing = self.cfg.env.env_spacing
+            self.env_origins[:, 0] = spacing * xx.flatten()[:self.num_envs]
+            self.env_origins[:, 1] = spacing * yy.flatten()[:self.num_envs]
+            self.env_origins[:, 2] = 0.
+
+    def _parse_cfg(self, cfg):
+        ''' Parse simulation, reward, command, terrain, random configuration
+        '''
+        self.dt = self.cfg.control.decimation * self.sim_params.dt
+        self.obs_scales = self.cfg.normalization.obs_scales
+        self.reward_scales = class_to_dict(self.cfg.rewards.scales)
+        self.command_ranges = class_to_dict(self.cfg.commands.ranges)
+        if self.cfg.terrain.mesh_type not in ['heightfield', 'trimesh']:
+            self.cfg.terrain.curriculum = False
+        self.max_episode_length_s = self.cfg.env.episode_length_s
+        self.max_episode_length = np.ceil(self.max_episode_length_s / self.dt)
+
+        self.cfg.domain_rand.push_interval = np.ceil(self.cfg.domain_rand.push_interval_s / self.dt)
+        self.cfg.domain_rand.ext_force_interval = np.ceil(self.cfg.domain_rand.ext_force_interval_s / self.dt)
+
+    def _draw_debug_vis(self):
+        ''' Draws visualizations for dubugging (slows down simulation a lot).
+            Default behaviour: draws height measurement points
+        '''
+        # draw height lines
+        if not self.terrain.cfg.measure_heights:
+            return
+        self.gym.clear_lines(self.viewer)
+        self.gym.refresh_rigid_body_state_tensor(self.sim)
+        sphere_geom = gymutil.WireframeSphereGeometry(0.02, 4, 4, None, color=(1, 1, 0))
+        for i in range(self.num_envs):
+            base_pos = (self.root_states[i, :3]).cpu().numpy()
+            heights = self.measured_heights[i].cpu().numpy()
+            height_points = quat_apply_yaw(self.base_quat[i].repeat(heights.shape[0]), self.height_points[i]).cpu().numpy()
+            for j in range(heights.shape[0]):
+                x = height_points[j, 0] + base_pos[0]
+                y = height_points[j, 1] + base_pos[1]
+                z = heights[j]
+                sphere_pose = gymapi.Transform(gymapi.Vec3(x, y, z), r=None)
+                gymutil.draw_lines(sphere_geom, self.gym, self.viewer, self.envs[i], sphere_pose) 
+
+    def _init_height_points(self):
+        """ Returns points at which the height measurments are sampled (in base frame)
+
+        Returns:
+            [torch.Tensor]: Tensor of shape (num_envs, self.num_height_points, 3)
+        """
+        # get (x,y) coordinate based on measured_points
+        y = torch.tensor(self.cfg.terrain.measured_points_y, device=self.device, requires_grad=False)
+        x = torch.tensor(self.cfg.terrain.measured_points_x, device=self.device, requires_grad=False)
+        grid_x, grid_y = torch.meshgrid(x, y)
+
+        self.num_height_points = grid_x.numel()
+        points = torch.zeros(self.num_envs, self.num_height_points, 3, device=self.device, requires_grad=False)
+        points[:, :, 0] = grid_x.flatten()
+        points[:, :, 1] = grid_y.flatten()
+        return points
+
+    def _get_heights(self, env_ids=None):
+        """ Samples heights of the terrain at required points around each robot.
+            The points are offset by the base's position and rotated by the base's yaw
+
+        Args:
+            env_ids (List[int], optional): Subset of environments for which to return the heights. Defaults to None.
+
+        Raises:
+            NameError: [description]
+
+        Returns:
+            [type]: [description]
+        """
+
+        if self.cfg.terrain.mesh_type == 'plane':
+            return torch.zeros(self.num_envs, self.num_height_points, device=self.device, requires_grad=False)
+        elif self.cfg.terrain.mesh_type == 'none':
+            raise NameError("Can't measure height with terrain mesh type 'none'")
+
+        if env_ids:
+            # height_points is in base frame
+            # points is in world frame
+            points = quat_apply_yaw(self.base_quat[env_ids].repeat(1, self.num_height_points), self.height_points[env_ids]) + (self.root_states[env_ids, :3]).unsqueeze(1)
+        else:
+            points = quat_apply_yaw(self.base_quat.repeat(1, self.num_height_points), self.height_points) + (self.root_states[:, :3]).unsqueeze(1)
+
+        points += self.terrain.cfg.border_size
+        # points position to int row col
+        points = (points/self.terrain.cfg.horizontal_scale).long()
+
+        # px get all robot points(x)
+        px = points[:, :, 0].view(-1)
+        # py get all robot points(y)
+        py = points[:, :, 1].view(-1)
+        # clip within whole terrain
+        px = torch.clip(px, 0, self.height_samples.shape[0]-2)
+        py = torch.clip(py, 0, self.height_samples.shape[1]-2)
+
+        # select (x,y) (x+1,y) (x,y+1), min height as heights
+        heights1 = self.height_samples[px, py]
+        heights2 = self.height_samples[px+1, py]
+        heightXBotL = self.height_samples[px, py+1]
+        heights = torch.min(heights1, heights2)
+        heights = torch.min(heights, heightXBotL)
+
+        return heights.view(self.num_envs, -1) * self.terrain.cfg.vertical_scale
--- a/humanoid/envs/base/legged_robot_config.py
+++ b/humanoid/envs/base/legged_robot_config.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+from .base_config import BaseConfig
+
+class LeggedRobotCfg(BaseConfig):
+    class env:
+        short_frame_stack = 4
+        num_envs = 4096
+        num_observations = 235
+        num_privileged_obs = None # if not None a priviledge_obs_buf will be returned by step() (critic obs for assymetric training). None is returned otherwise 
+        num_actions = 12
+        env_spacing = 3.  # not used with heightfields/trimeshes 
+        send_timeouts = True # send time out information to the algorithm
+        episode_length_s = 20 # episode length in seconds
+        num_commands = 5
+        add_stand_bool = False #used for stand
+
+    class terrain:
+        mesh_type = 'trimesh' # "heightfield" # none, plane, heightfield or trimesh
+        horizontal_scale = 0.1 # [m]
+        vertical_scale = 0.005 # [m]
+        border_size = 25 # [m]
+        curriculum = True
+        static_friction = 1.0
+        dynamic_friction = 1.0
+        restitution = 0.
+        # rough terrain only:
+        measure_heights = False
+        # robot surrending measure points
+        measured_points_x = [-0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0., 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8] # 1mx1.6m rectangle (without center line)
+        measured_points_y = [-0.5, -0.4, -0.3, -0.2, -0.1, 0., 0.1, 0.2, 0.3, 0.4, 0.5]
+        measured_base_points_x = [-0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0., 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8] # 1mx1.6m rectangle (without center line)
+        measured_base_points_y = [-0.5, -0.4, -0.3, -0.2, -0.1, 0., 0.1, 0.2, 0.3, 0.4, 0.5]
+        measured_feet_points_x = [-0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0., 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8] # 1mx1.6m rectangle (without center line)
+        measured_feet_points_y = [-0.5, -0.4, -0.3, -0.2, -0.1, 0., 0.1, 0.2, 0.3, 0.4, 0.5]
+        num_height = len(measured_points_x) * len(measured_points_y)
+        selected = False # select a unique terrain type and pass all arguments
+        terrain_kwargs = None # Dict of arguments for selected terrain
+        max_init_terrain_level = 5 # starting curriculum state
+        terrain_length = 8.
+        terrain_width = 8.
+        num_rows= 10 # number of terrain rows (levels)
+        num_cols = 20 # number of terrain cols (types)
+        platform = 3.
+        terrain_dict = {"flat": 0.15, 
+                        "rough flat": 0.15,
+                        "rough slope up": 0.0,
+                        "rough slope down": 0.0, 
+                        "slope up": 0.,
+                        "slope down": 0., 
+                        "stairs up": 0.35, 
+                        "stairs down": 0.25,
+                        "discrete": 0.0, 
+                        "wave": 0.0,}
+        terrain_proportions = list(terrain_dict.values())
+
+        rough_flat_range = [0.005, 0.02]  # meter
+        slope_range = [0, 0.4]   # rad
+        rough_slope_range = [0.005, 0.02]
+        stair_width_range = [0.25, 0.25]
+        stair_height_range = [0.04, 0.1]
+        discrete_height_range = [0.05, 0.25]
+        slope_treshold = 0.75 # slopes above this threshold will be corrected to vertical surfaces
+
+    class commands:
+        curriculum = True
+        max_curriculum = 1.
+        num_commands = 4 # default: lin_vel_x, lin_vel_y, ang_vel_yaw, heading (in heading mode ang_vel_yaw is recomputed from heading error)
+        resampling_time = 10. # time before command are changed[s]
+        heading_command = True # if true: compute ang vel command from heading error
+        class ranges:
+            lin_vel_x = [-1.0, 1.0] # min max [m/s]
+            lin_vel_y = [-1.0, 1.0]   # min max [m/s]
+            ang_vel_yaw = [-1, 1]    # min max [rad/s]
+            heading = [-3.14, 3.14]
+
+    class init_state:
+        pos = [0.0, 0.0, 1.] # x,y,z [m]
+        rot = [0.0, 0.0, 0.0, 1.0] # x,y,z,w [quat]
+        lin_vel = [0.0, 0.0, 0.0]  # x,y,z [m/s]
+        ang_vel = [0.0, 0.0, 0.0]  # x,y,z [rad/s]
+        default_joint_angles = { # target angles when action = 0.0
+            "joint_a": 0., 
+            "joint_b": 0.}
+
+    class control:
+        control_type = 'P' # P: position, V: velocity, T: torques
+        # PD Drive parameters:
+        stiffness = {'joint_a': 10.0, 'joint_b': 15.}  # [N*m/rad]
+        damping = {'joint_a': 1.0, 'joint_b': 1.5}     # [N*m*s/rad]
+        # action scale: target angle = actionScale * action + defaultAngle
+        action_scale = 0.5
+        # decimation: Number of control action updates @ sim DT per policy DT
+        decimation = 4
+
+    class asset:
+        file = ""
+        name = "legged_robot"  # actor name
+        foot_name = "None" # name of the feet bodies, used to index body state and contact force tensors
+        penalize_contacts_on = []
+        terminate_after_contacts_on = []
+        disable_gravity = False
+        collapse_fixed_joints = True # merge bodies connected by fixed joints. Specific fixed joints can be kept by adding " <... dont_collapse="true">
+        fix_base_link = False # fixe the base of the robot
+        default_dof_drive_mode = 3 # see GymDofDriveModeFlags (0 is none, 1 is pos tgt, 2 is vel tgt, 3 effort)
+        self_collisions = 0 # 1 to disable, 0 to enable...bitwise filter
+        replace_cylinder_with_capsule = True # replace collision cylinders with capsules, leads to faster/more stable simulation
+        flip_visual_attachments = True # Some .obj meshes must be flipped from y-up to z-up
+        
+        density = 0.001
+        angular_damping = 0.
+        linear_damping = 0.
+        max_angular_velocity = 1000.
+        max_linear_velocity = 1000.
+        armature = 0.
+        thickness = 0.01
+
+    class domain_rand:
+        randomize_friction = False
+        friction_range = [0.2, 1.3]
+        restitution_range = [0.0, 0.4]
+
+        push_robots = False
+        push_interval_s = 4
+        update_step = 2000 * 60
+        push_duration = [0, 0.1, 0.2, 0.3]
+        max_push_vel_xy = 0.2
+        max_push_ang_vel = 0.2
+
+        add_ext_force = False
+        ext_force_max_xy = 10
+        ext_force_max_z = 5
+        ext_torque_max = 0
+        ext_force_interval_s = 10
+        add_update_step = 2000 * 60
+        add_duration = [0, 0.1, 0.2, 0.3]
+        
+        continuous_push = False
+        max_push_force = 0.5
+        max_push_torque = 0.5
+        push_force_noise = 0.5
+        push_torque_noise = 0.5
+
+        randomize_base_mass = False
+        added_mass_range = [-2.5, 2.5]
+
+        randomize_com = False
+        com_displacement_range = [[-0.05, 0.05],
+                                  [-0.05, 0.05],
+                                  [-0.05, 0.05]]
+
+        randomize_link_com = False
+        link_com_displacement_range = [[-0.005, 0.005],
+                                  [-0.005, 0.005],
+                                  [-0.005, 0.005]]
+        
+        randomize_base_inertia = False
+        base_inertial_range = [[0.98,1.02],
+                               [0.98,1.02],
+                               [0.98,1.02]]
+        
+        randomize_link_inertia = False
+        link_inertial_range = [[0.98,1.02],
+                               [0.98,1.02],
+                               [0.98,1.02]]
+        
+        randomize_gains = False
+        stiffness_multiplier_range = [0.8, 1.2]  # Factor
+        damping_multiplier_range = [0.8, 1.2]    # Factor
+
+        randomize_torque = False
+        torque_multiplier_range = [0.8, 1.2]
+
+        randomize_link_mass = False
+        added_link_mass_range = [0.9, 1.1]
+
+        randomize_motor_offset = False
+        motor_offset_range = [-0.035, 0.035] # Offset to add to the motor angles
+
+        randomize_joint_friction = False
+        randomize_joint_friction_each_joint = False
+        joint_friction_range = [0.01, 1.15]   #multiplier
+        joint_1_friction_range = [0.01, 1.15]
+        joint_2_friction_range = [0.01, 1.15]
+        joint_3_friction_range = [0.01, 1.15]
+        joint_4_friction_range = [0.5, 1.3]
+        joint_5_friction_range = [0.5, 1.3]
+        joint_6_friction_range = [0.01, 1.15]
+        joint_7_friction_range = [0.01, 1.15]
+        joint_8_friction_range = [0.01, 1.15]
+        joint_9_friction_range = [0.5, 1.3]
+        joint_10_friction_range = [0.5, 1.3]
+
+        randomize_joint_damping = False
+        randomize_joint_damping_each_joint = False
+        joint_damping_range = [0.3, 1.5]       #multiplier
+        joint_1_damping_range = [0.3, 1.5]
+        joint_2_damping_range = [0.3, 1.5]
+        joint_3_damping_range = [0.3, 1.5]
+        joint_4_damping_range = [0.9, 1.5]
+        joint_5_damping_range = [0.9, 1.5]
+        joint_6_damping_range = [0.3, 1.5]
+        joint_7_damping_range = [0.3, 1.5]
+        joint_8_damping_range = [0.3, 1.5]
+        joint_9_damping_range = [0.9, 1.5]
+        joint_10_damping_range = [0.9, 1.5]
+        
+        randomize_joint_armature = False
+        randomize_joint_armature_each_joint = False
+        joint_armature_range = [0.0001, 0.05]     # Factor
+        joint_1_armature_range = [0.0001, 0.05]
+        joint_2_armature_range = [0.0001, 0.05]
+        joint_3_armature_range = [0.0001, 0.05]
+        joint_4_armature_range = [0.0001, 0.05]
+        joint_5_armature_range = [0.0001, 0.05]
+        joint_6_armature_range = [0.0001, 0.05]
+        joint_7_armature_range = [0.0001, 0.05]
+        joint_8_armature_range = [0.0001, 0.05]
+        joint_9_armature_range = [0.0001, 0.05]
+        joint_10_armature_range = [0.0001, 0.05]
+
+        add_lag = False
+        randomize_lag_timesteps = True
+        randomize_lag_timesteps_perstep = False
+        lag_timesteps_range = [5, 70]
+        
+        add_dof_lag = False                # 这个是接收信号（dof_pos和dof_vel)的延迟,dof_pos 和dof_vel延迟一样
+        randomize_dof_lag_timesteps = True
+        randomize_dof_lag_timesteps_perstep = False  # 不常用always False
+        dof_lag_timesteps_range = [0, 40]
+        
+        add_dof_pos_vel_lag = False        # 这个是接收信号（dof_pos和dof_vel)的延迟,dof_pos 和dof_vel延迟不同
+        randomize_dof_pos_lag_timesteps = True
+        randomize_dof_pos_lag_timesteps_perstep = False          # 不常用always False
+        dof_pos_lag_timesteps_range = [7, 25]
+        randomize_dof_vel_lag_timesteps = True
+        randomize_dof_vel_lag_timesteps_perstep = False          # 不常用always False
+        dof_vel_lag_timesteps_range = [7, 25]
+        
+        add_imu_lag = False                    # 这个是 imu 的延迟
+        randomize_imu_lag_timesteps = True
+        randomize_imu_lag_timesteps_perstep = False         # 不常用always False
+        imu_lag_timesteps_range = [1, 10]
+        
+        randomize_coulomb_friction = False
+        joint_coulomb_range = [0.1, 0.9]
+        joint_viscous_range = [0.10, 0.70]
+    
+
+    class rewards:
+        class scales:
+            termination = -0.0
+            tracking_lin_vel = 0.
+            tracking_ang_vel = 0.
+            lin_vel_z = -0.
+            ang_vel_xy = -0.0
+            orientation = -0.
+            torques = -0.0
+            dof_vel = -0.
+            dof_acc = -0.0
+            base_height = -0. 
+            feet_air_time =  0.0
+            collision = -0.
+            feet_stumble = -0.0 
+            action_rate = -0.
+            stand_still = -0.
+
+        only_positive_rewards = True # if true negative total rewards are clipped at zero (avoids early termination problems)
+        tracking_sigma = 0.25 # tracking reward = exp(-error^2/sigma)
+        soft_dof_pos_limit = 1. # percentage of urdf limits, values above this limit are penalized
+        soft_dof_vel_limit = 1.
+        soft_torque_limit = 1.
+        max_contact_force = 100. # forces above this value are penalized
+
+    class normalization:
+        class obs_scales:
+            lin_vel = 2.0
+            ang_vel = 0.25
+            dof_pos = 1.0
+            dof_vel = 0.05
+            height_measurements = 5.0
+        clip_observations = 100.
+        clip_actions = 100.
+
+    class noise:
+        add_noise = True
+        noise_level = 1.0 # scales other values
+        class noise_scales:
+            dof_pos = 0.01
+            dof_vel = 1.5
+            lin_vel = 0.1
+            ang_vel = 0.2
+            gravity = 0.05
+            height_measurements = 0.1
+
+    # viewer camera:
+    class viewer:
+        ref_env = 0
+        pos = [10, 0, 6]  # [m]
+        lookat = [11., 5, 3.]  # [m]
+
+    class sim:
+        dt =  0.005
+        substeps = 1
+        gravity = [0., 0. ,-9.81]  # [m/s^2]
+        up_axis = 1  # 0 is y, 1 is z
+
+        class physx:
+            num_threads = 10
+            solver_type = 1  # 0: pgs, 1: tgs
+            num_position_iterations = 4
+            num_velocity_iterations = 0
+            contact_offset = 0.01  # [m]
+            rest_offset = 0.0   # [m]
+            bounce_threshold_velocity = 0.5 #0.5 [m/s]
+            max_depenetration_velocity = 1.0
+            max_gpu_contact_pairs = 2**23 #2**24 -> needed for 8000 envs and more
+            default_buffer_size_multiplier = 5
+            contact_collection = 2 # 0: never, 1: last sub-step, 2: all sub-steps (default=2)
+
+class LeggedRobotCfgPPO(BaseConfig):
+    seed = 1
+    runner_class_name = 'OnPolicyRunner'
+    class policy:
+        init_noise_std = 1.0
+        actor_hidden_dims = [512, 256, 128]
+        critic_hidden_dims = [512, 256, 128]
+
+    class algorithm:
+        # training params
+        value_loss_coef = 1.0
+        use_clipped_value_loss = True
+        clip_param = 0.2
+        entropy_coef = 0.01
+        num_learning_epochs = 5
+        num_mini_batches = 4 # mini batch size = num_envs*nsteps / nminibatches
+        learning_rate = 1.e-3 #5.e-4
+        schedule = 'adaptive' # could be adaptive, fixed
+        gamma = 0.99
+        lam = 0.95
+        desired_kl = 0.01
+        max_grad_norm = 1.
+
+    class runner:
+        policy_class_name = 'ActorCritic'
+        algorithm_class_name = 'PPO'
+        num_steps_per_env = 24 # per iteration
+        max_iterations = 1500 # number of policy updates
+
+        # logging
+        save_interval = 100 # check for potential saves every this many iterations
+        experiment_name = 'test'
+        run_name = ''
+        # load and resume
+        resume = False
+        load_run = -1 # -1 = last run
+        checkpoint = -1 # -1 = last saved model
+        resume_path = None # updated from load_run and chkpt
\ No newline at end of file
--- a/humanoid/envs/x1/x1_dh_stand_config.py
+++ b/humanoid/envs/x1/x1_dh_stand_config.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+from humanoid.envs.base.legged_robot_config import LeggedRobotCfg, LeggedRobotCfgPPO
+
+class X1DHStandCfg(LeggedRobotCfg):
+    """
+    Configuration class for the XBotL humanoid robot.
+    """
+    class env(LeggedRobotCfg.env):
+        # change the observation dim
+        frame_stack = 66      #all histroy obs num
+        short_frame_stack = 5   #short history step
+        c_frame_stack = 3  #all histroy privileged obs num
+        num_single_obs = 47
+        num_observations = int(frame_stack * num_single_obs)
+        single_num_privileged_obs = 73
+        single_linvel_index = 53
+        num_privileged_obs = int(c_frame_stack * single_num_privileged_obs)
+        num_actions = 12
+        num_envs = 4096
+        episode_length_s = 24 #episode length in seconds
+        use_ref_actions = False
+        num_commands = 5 # sin_pos cos_pos vx vy vz
+
+    class safety:
+        # safety factors
+        pos_limit = 1.0
+        vel_limit = 1.0
+        torque_limit = 0.85
+
+
+    class asset(LeggedRobotCfg.asset):
+        file = '{LEGGED_GYM_ROOT_DIR}/resources/robots/x1/urdf/x1.urdf'
+        xml_file = '{LEGGED_GYM_ROOT_DIR}/resources/robots/x1/mjcf/xyber_x1_flat.xml'
+
+        name = "x1"
+        foot_name = "ankle_roll"
+        knee_name = "knee_pitch"
+
+        terminate_after_contacts_on = ['base_link']
+        penalize_contacts_on = ["base_link"]
+        self_collisions = 0  # 1 to disable, 0 to enable...bitwise filter
+        flip_visual_attachments = False
+        replace_cylinder_with_capsule = False
+        fix_base_link = False
+
+    class terrain(LeggedRobotCfg.terrain):
+        # mesh_type = 'plane'
+        mesh_type = 'trimesh'
+        curriculum = False
+        # rough terrain only:
+        measure_heights = False
+        static_friction = 0.6
+        dynamic_friction = 0.6
+        terrain_length = 8.
+        terrain_width = 8.
+        num_rows = 20  # number of terrain rows (levels)
+        num_cols = 20  # number of terrain cols (types)
+        max_init_terrain_level = 5  # starting curriculum state
+        platform = 3.
+        terrain_dict = {"flat": 0.3, 
+                        "rough flat": 0.2,
+                        "slope up": 0.2,
+                        "slope down": 0.2, 
+                        "rough slope up": 0.0,
+                        "rough slope down": 0.0, 
+                        "stairs up": 0., 
+                        "stairs down": 0.,
+                        "discrete": 0.1, 
+                        "wave": 0.0,}
+        terrain_proportions = list(terrain_dict.values())
+
+        rough_flat_range = [0.005, 0.01]  # meter
+        slope_range = [0, 0.1]   # rad
+        rough_slope_range = [0.005, 0.02]
+        stair_width_range = [0.25, 0.25]
+        stair_height_range = [0.01, 0.1]
+        discrete_height_range = [0.0, 0.01]
+        restitution = 0.
+
+    class noise(LeggedRobotCfg.noise):
+        add_noise = True
+        noise_level = 1.5    # scales other values
+
+        class noise_scales(LeggedRobotCfg.noise.noise_scales):
+            dof_pos = 0.02
+            dof_vel = 1.5 
+            ang_vel = 0.2   
+            lin_vel = 0.1   
+            quat = 0.1
+            gravity = 0.05
+            height_measurements = 0.1
+
+
+    class init_state(LeggedRobotCfg.init_state):
+        pos = [0.0, 0.0, 0.7]
+
+        default_joint_angles = {  # = target angles [rad] when action = 0.0
+            'left_hip_pitch_joint': 0.4,
+            'left_hip_roll_joint': 0.05,
+            'left_hip_yaw_joint': -0.31,
+            'left_knee_pitch_joint': 0.49,
+            'left_ankle_pitch_joint': -0.21,
+            'left_ankle_roll_joint': 0.0,
+            'right_hip_pitch_joint': -0.4,
+            'right_hip_roll_joint': -0.05,
+            'right_hip_yaw_joint': 0.31,
+            'right_knee_pitch_joint': 0.49,
+            'right_ankle_pitch_joint': -0.21, 
+            'right_ankle_roll_joint': 0.0,
+        }
+
+    class control(LeggedRobotCfg.control):
+        # PD Drive parameters:
+        control_type = 'P'
+
+        stiffness = {'hip_pitch_joint': 30, 'hip_roll_joint': 40,'hip_yaw_joint': 35,
+                     'knee_pitch_joint': 100, 'ankle_pitch_joint': 35, 'ankle_roll_joint': 35}
+        damping = {'hip_pitch_joint': 3, 'hip_roll_joint': 3.0,'hip_yaw_joint': 4, 
+                   'knee_pitch_joint': 10, 'ankle_pitch_joint': 0.5, 'ankle_roll_joint': 0.5}
+
+        # action scale: target angle = actionScale * action + defaultAngle
+        action_scale = 0.5
+        # decimation: Number of control action updates @ sim DT per policy DT
+        decimation = 10  # 50hz 100hz
+
+    class sim(LeggedRobotCfg.sim):
+        dt = 0.001  # 200 Hz 1000 Hz
+        substeps = 1  # 2
+        up_axis = 1  # 0 is y, 1 is z
+     
+        class physx(LeggedRobotCfg.sim.physx):
+            num_threads = 10
+            solver_type = 1  # 0: pgs, 1: tgs
+            num_position_iterations = 4
+            num_velocity_iterations = 0
+            contact_offset = 0.01  # [m]
+            rest_offset = 0.0   # [m]
+            bounce_threshold_velocity = 0.5  # 0.5 #0.5 [m/s]
+            max_depenetration_velocity = 1.0
+            max_gpu_contact_pairs = 2**23  # 2**24 -> needed for 8000 envs and more
+            default_buffer_size_multiplier = 5
+            # 0: never, 1: last sub-step, 2: all sub-steps (default=2)
+            contact_collection = 2
+
+    class domain_rand(LeggedRobotCfg.domain_rand):
+        randomize_friction = True
+        friction_range = [0.2, 1.3]
+        restitution_range = [0.0, 0.4]
+
+        # push
+        push_robots = True
+        push_interval_s = 4 # every this second, push robot
+        update_step = 2000 * 24 # after this count, increase push_duration index
+        push_duration = [0, 0.05, 0.1, 0.15, 0.2, 0.25] # increase push duration during training
+        max_push_vel_xy = 0.2
+        max_push_ang_vel = 0.2
+
+        randomize_base_mass = True
+        added_mass_range = [-3, 3] # base mass rand range, base mass is all fix link sum mass
+
+        randomize_com = True
+        com_displacement_range = [[-0.05, 0.05],
+                                  [-0.05, 0.05],
+                                  [-0.05, 0.05]]
+
+        randomize_gains = True
+        stiffness_multiplier_range = [0.8, 1.2]  # Factor
+        damping_multiplier_range = [0.8, 1.2]    # Factor
+
+        randomize_torque = True
+        torque_multiplier_range = [0.8, 1.2]
+
+        randomize_link_mass = True
+        added_link_mass_range = [0.9, 1.1]
+
+        randomize_motor_offset = True
+        motor_offset_range = [-0.035, 0.035] # Offset to add to the motor angles
+        
+        randomize_joint_friction = True
+        randomize_joint_friction_each_joint = False
+        joint_friction_range = [0.01, 1.15]
+        joint_1_friction_range = [0.01, 1.15]
+        joint_2_friction_range = [0.01, 1.15]
+        joint_3_friction_range = [0.01, 1.15]
+        joint_4_friction_range = [0.5, 1.3]
+        joint_5_friction_range = [0.5, 1.3]
+        joint_6_friction_range = [0.01, 1.15]
+        joint_7_friction_range = [0.01, 1.15]
+        joint_8_friction_range = [0.01, 1.15]
+        joint_9_friction_range = [0.5, 1.3]
+        joint_10_friction_range = [0.5, 1.3]
+
+        randomize_joint_damping = True
+        randomize_joint_damping_each_joint = False
+        joint_damping_range = [0.3, 1.5]
+        joint_1_damping_range = [0.3, 1.5]
+        joint_2_damping_range = [0.3, 1.5]
+        joint_3_damping_range = [0.3, 1.5]
+        joint_4_damping_range = [0.9, 1.5]
+        joint_5_damping_range = [0.9, 1.5]
+        joint_6_damping_range = [0.3, 1.5]
+        joint_7_damping_range = [0.3, 1.5]
+        joint_8_damping_range = [0.3, 1.5]
+        joint_9_damping_range = [0.9, 1.5]
+        joint_10_damping_range = [0.9, 1.5]
+
+        randomize_joint_armature = True
+        randomize_joint_armature_each_joint = False
+        joint_armature_range = [0.0001, 0.05]     # Factor
+        joint_1_armature_range = [0.0001, 0.05]
+        joint_2_armature_range = [0.0001, 0.05]
+        joint_3_armature_range = [0.0001, 0.05]
+        joint_4_armature_range = [0.0001, 0.05]
+        joint_5_armature_range = [0.0001, 0.05]
+        joint_6_armature_range = [0.0001, 0.05]
+        joint_7_armature_range = [0.0001, 0.05]
+        joint_8_armature_range = [0.0001, 0.05]
+        joint_9_armature_range = [0.0001, 0.05]
+        joint_10_armature_range = [0.0001, 0.05]
+
+        add_lag = True
+        randomize_lag_timesteps = True
+        randomize_lag_timesteps_perstep = False
+        lag_timesteps_range = [5, 40]
+        
+        add_dof_lag = True
+        randomize_dof_lag_timesteps = True
+        randomize_dof_lag_timesteps_perstep = False
+        dof_lag_timesteps_range = [0, 40]
+        
+        add_dof_pos_vel_lag = False
+        randomize_dof_pos_lag_timesteps = False
+        randomize_dof_pos_lag_timesteps_perstep = False
+        dof_pos_lag_timesteps_range = [7, 25]
+        randomize_dof_vel_lag_timesteps = False
+        randomize_dof_vel_lag_timesteps_perstep = False
+        dof_vel_lag_timesteps_range = [7, 25]
+        
+        add_imu_lag = False
+        randomize_imu_lag_timesteps = True
+        randomize_imu_lag_timesteps_perstep = False
+        imu_lag_timesteps_range = [1, 10]
+        
+        randomize_coulomb_friction = True
+        joint_coulomb_range = [0.1, 0.9]
+        joint_viscous_range = [0.05, 0.1]
+        
+    class commands(LeggedRobotCfg.commands):
+        curriculum = True
+        max_curriculum = 1.5
+        # Vers: lin_vel_x, lin_vel_y, ang_vel_yaw, heading (in heading mode ang_vel_yaw is recomputed from heading error)
+        num_commands = 4
+        resampling_time = 25.  # time before command are changed[s]
+        gait = ["walk_omnidirectional","stand","walk_omnidirectional"] # gait type during training
+        # proportion during whole life time
+        gait_time_range = {"walk_sagittal": [2,6],
+                           "walk_lateral": [2,6],
+                           "rotate": [2,3],
+                           "stand": [2,3],
+                           "walk_omnidirectional": [4,6]}
+
+        heading_command = False  # if true: compute ang vel command from heading error
+        stand_com_threshold = 0.05 # if (lin_vel_x, lin_vel_y, ang_vel_yaw).norm < this, robot should stand
+        sw_switch = True # use stand_com_threshold or not
+
+        class ranges:
+            lin_vel_x = [-0.4, 1.2] # min max [m/s] 
+            lin_vel_y = [-0.4, 0.4]   # min max [m/s]
+            ang_vel_yaw = [-0.6, 0.6]    # min max [rad/s]
+            heading = [-3.14, 3.14]
+
+    class rewards:
+        soft_dof_pos_limit = 0.98
+        soft_dof_vel_limit = 0.9
+        soft_torque_limit = 0.9
+        base_height_target = 0.61
+        foot_min_dist = 0.2
+        foot_max_dist = 1.0
+
+        # final_swing_joint_pos = final_swing_joint_delta_pos + default_pos
+        final_swing_joint_delta_pos = [0.25, 0.05, -0.11, 0.35, -0.16, 0.0, -0.25, -0.05, 0.11, 0.35, -0.16, 0.0]
+        target_feet_height = 0.03 
+        target_feet_height_max = 0.06
+        feet_to_ankle_distance = 0.041
+        cycle_time = 0.7
+        # if true negative total rewards are clipped at zero (avoids early termination problems)
+        only_positive_rewards = True
+        # tracking reward = exp(-error*sigma)
+        tracking_sigma = 5 
+        max_contact_force = 700  # forces above this value are penalized
+        
+        class scales:
+            ref_joint_pos = 2.2
+            feet_clearance = 1.
+            feet_contact_number = 2.0
+            # gait
+            feet_air_time = 1.2
+            foot_slip = -0.1
+            feet_distance = 0.2
+            knee_distance = 0.2
+            # contact 
+            feet_contact_forces = -0.01
+            # vel tracking
+            tracking_lin_vel = 1.8
+            tracking_ang_vel = 1.1
+            vel_mismatch_exp = 0.5  # lin_z; ang x,y
+            low_speed = 0.2
+            track_vel_hard = 0.5
+            # base pos
+            default_joint_pos = 1.0
+            orientation = 1.
+            feet_rotation = 0.3
+            base_height = 0.2
+            base_acc = 0.2
+            # energy
+            action_smoothness = -0.002
+            torques = -8e-9
+            dof_vel = -2e-8
+            dof_acc = -1e-7
+            collision = -1.
+            stand_still = 2.5
+            # limits
+            dof_vel_limits = -1
+            dof_pos_limits = -10.
+            dof_torque_limits = -0.1
+
+    class normalization:
+        class obs_scales:
+            lin_vel = 2.
+            ang_vel = 1.
+            dof_pos = 1.
+            dof_vel = 0.05
+            quat = 1.
+            height_measurements = 5.0
+        clip_observations = 100.
+        clip_actions = 100.
+
+
+class X1DHStandCfgPPO(LeggedRobotCfgPPO):
+    seed = 5
+    runner_class_name = 'DHOnPolicyRunner'   # DWLOnPolicyRunner
+
+    class policy:
+        init_noise_std = 1.0
+        actor_hidden_dims = [512, 256, 128]
+        critic_hidden_dims = [768, 256, 128]
+        state_estimator_hidden_dims=[256, 128, 64]
+        
+        #for long_history cnn only
+        kernel_size=[6, 4]
+        filter_size=[32, 16]
+        stride_size=[3, 2]
+        lh_output_dim= 64   #long history output dim
+        in_channels = X1DHStandCfg.env.frame_stack
+
+    class algorithm(LeggedRobotCfgPPO.algorithm):
+        entropy_coef = 0.001
+        learning_rate = 1e-5
+        num_learning_epochs = 2
+        gamma = 0.994
+        lam = 0.9
+        num_mini_batches = 4
+        if X1DHStandCfg.terrain.measure_heights:
+            lin_vel_idx = (X1DHStandCfg.env.single_num_privileged_obs + X1DHStandCfg.terrain.num_height) * (X1DHStandCfg.env.c_frame_stack - 1) + X1DHStandCfg.env.single_linvel_index
+        else:
+            lin_vel_idx = X1DHStandCfg.env.single_num_privileged_obs * (X1DHStandCfg.env.c_frame_stack - 1) + X1DHStandCfg.env.single_linvel_index
+
+    class runner:
+        policy_class_name = 'ActorCriticDH'
+        algorithm_class_name = 'DHPPO'
+        num_steps_per_env = 24  # per iteration
+        max_iterations = 20000  # number of policy updates
+
+        # logging
+        save_interval = 100  # check for potential saves every this many iterations
+        experiment_name = 'x1_dh_stand'
+        run_name = ''
+        # load and resume
+        resume = False
+        load_run = -1  # -1 = last run
+        checkpoint = -1  # -1 = last saved model
+        resume_path = None  # updated from load_run and chkpt
--- a/humanoid/envs/x1/x1_dh_stand_env.py
+++ b/humanoid/envs/x1/x1_dh_stand_env.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+from humanoid.envs.base.legged_robot_config import LeggedRobotCfg
+
+from isaacgym.torch_utils import *
+from isaacgym import gymtorch, gymapi
+from humanoid.utils.math import wrap_to_pi
+
+
+import torch
+from humanoid.envs import LeggedRobot
+
+from humanoid.utils.terrain import  Terrain
+
+def copysign_new(a, b):
+
+    a = torch.tensor(a, device=b.device, dtype=torch.float)
+    a = a.expand_as(b)
+    return torch.abs(a) * torch.sign(b)
+
+def get_euler_rpy(q):
+    qx, qy, qz, qw = 0, 1, 2, 3
+    # roll (x-axis rotation)
+    sinr_cosp = 2.0 * (q[..., qw] * q[..., qx] + q[..., qy] * q[..., qz])
+    cosr_cosp = q[..., qw] * q[..., qw] - q[..., qx] * \
+        q[..., qx] - q[..., qy] * q[..., qy] + q[..., qz] * q[..., qz]
+    roll = torch.atan2(sinr_cosp, cosr_cosp)
+
+    # pitch (y-axis rotation)
+    sinp = 2.0 * (q[..., qw] * q[..., qy] - q[..., qz] * q[..., qx])
+    pitch = torch.where(torch.abs(sinp) >= 1, copysign_new(
+        np.pi / 2.0, sinp), torch.asin(sinp))
+
+    # yaw (z-axis rotation)
+    siny_cosp = 2.0 * (q[..., qw] * q[..., qz] + q[..., qx] * q[..., qy])
+    cosy_cosp = q[..., qw] * q[..., qw] + q[..., qx] * \
+        q[..., qx] - q[..., qy] * q[..., qy] - q[..., qz] * q[..., qz]
+    yaw = torch.atan2(siny_cosp, cosy_cosp)
+
+    return roll % (2*np.pi), pitch % (2*np.pi), yaw % (2*np.pi)
+
+def get_euler_xyz_tensor(quat):
+    r, p, w = get_euler_rpy(quat)
+    # stack r, p, w in dim1
+    euler_xyz = torch.stack((r, p, w), dim=-1)
+    euler_xyz[euler_xyz > np.pi] -= 2 * np.pi
+    return euler_xyz
+
+class X1DHStandEnv(LeggedRobot):
+    '''
+    X1DHStandEnv is a class that represents a custom environment for a legged robot.
+
+    Args:
+        cfg (LeggedRobotCfg): Configuration object for the legged robot.
+        sim_params: Parameters for the simulation.
+        physics_engine: Physics engine used in the simulation.
+        sim_device: Device used for the simulation.
+        headless: Flag indicating whether the simulation should be run in headless mode.
+
+    Attributes:
+        last_feet_z (float): The z-coordinate of the last feet position.
+        feet_height (torch.Tensor): Tensor representing the height of the feet.
+        sim (gymtorch.GymSim): The simulation object.
+        terrain (Terrain): The terrain object.
+        up_axis_idx (int): The index representing the up axis.
+        command_input (torch.Tensor): Tensor representing the command input.
+        privileged_obs_buf (torch.Tensor): Tensor representing the privileged observations buffer.
+        obs_buf (torch.Tensor): Tensor representing the observations buffer.
+        obs_history (collections.deque): Deque containing the history of observations.
+        critic_history (collections.deque): Deque containing the history of critic observations.
+
+    Methods:
+        _push_robots(): Randomly pushes the robots by setting a randomized base velocity.
+        _get_phase(): Calculates the phase of the gait cycle.
+        _get_stance_mask(): Calculates the gait phase.
+        compute_ref_state(): Computes the reference state.
+        create_sim(): Creates the simulation, terrain, and environments.
+        _get_noise_scale_vec(cfg): Sets a vector used to scale the noise added to the observations.
+        step(actions): Performs a simulation step with the given actions.
+        compute_observations(): Computes the observations.
+        reset_idx(env_ids): Resets the environment for the specified environment IDs.
+    '''
+    def __init__(self, cfg: LeggedRobotCfg, sim_params, physics_engine, sim_device, headless):
+        super().__init__(cfg, sim_params, physics_engine, sim_device, headless)
+        self.last_feet_z = self.cfg.rewards.feet_to_ankle_distance
+        self.feet_height = torch.zeros((self.num_envs, 2), device=self.device)
+        self.ref_dof_pos = torch.zeros((self.num_envs, self.num_actions), device=self.device)      
+
+
+    def _push_robots(self):
+        """ Random pushes the robots. Emulates an impulse by setting a randomized base velocity. 
+        """
+        max_vel = self.cfg.domain_rand.max_push_vel_xy
+        max_push_angular = self.cfg.domain_rand.max_push_ang_vel
+        self.rand_push_force[:, :2] = torch_rand_float(
+            -max_vel, max_vel, (self.num_envs, 2), device=self.device)  # lin vel x/y
+        self.root_states[:, 7:9] = self.rand_push_force[:, :2]
+
+        self.rand_push_torque = torch_rand_float(
+            -max_push_angular, max_push_angular, (self.num_envs, 3), device=self.device)  #angular vel xyz
+
+        self.root_states[:, 10:13] = self.rand_push_torque
+        self.gym.set_actor_root_state_tensor(
+            self.sim, gymtorch.unwrap_tensor(self.root_states))
+
+    def  _get_phase(self):
+        cycle_time = self.cfg.rewards.cycle_time
+        if self.cfg.commands.sw_switch:
+            stand_command = (torch.norm(self.commands[:, :3], dim=1) <= self.cfg.commands.stand_com_threshold)
+            self.phase_length_buf[stand_command] = 0 # set this as 0 for which env is standing
+            # self.gait_start is rand 0 or 0.5
+            phase = (self.phase_length_buf * self.dt / cycle_time + self.gait_start) * (~stand_command)
+        else:
+            phase = self.episode_length_buf * self.dt / cycle_time + self.gait_start
+
+        # phase continue increase，if want robot stand, set 0
+        return phase
+
+    def _get_stance_mask(self):
+        # return float mask 1 is stance, 0 is swing
+        phase = self._get_phase()
+        sin_pos = torch.sin(2 * torch.pi * phase)
+        
+        stance_mask = torch.zeros((self.num_envs, 2), device=self.device)
+        # left foot stance
+        stance_mask[:, 0] = sin_pos >= 0
+        # right foot stance
+        stance_mask[:, 1] = sin_pos < 0
+        # Add double support phase
+        stance_mask[torch.abs(sin_pos) < 0.1] = 1
+
+        # stand mask == 1 means stand leg 
+        return stance_mask
+
+    def generate_gait_time(self,envs):
+        if len(envs) == 0:
+            return
+
+        # rand sample 
+        random_tensor_list = []
+        for i in range(len(self.cfg.commands.gait)):
+            name = self.cfg.commands.gait[i]
+            gait_time_range = self.cfg.commands.gait_time_range[name]
+            random_tensor_single = torch_rand_float(gait_time_range[0],
+                                            gait_time_range[1],
+                                            (len(envs), 1),device=self.device)
+            random_tensor_list.append(random_tensor_single)
+
+        random_tensor = torch.cat([random_tensor_list[i] for i in range(len(self.cfg.commands.gait))], dim=1)
+        current_sum = torch.sum(random_tensor,dim=1,keepdim=True)
+        # scaled_tensor store proportion for each gait type
+        scaled_tensor = random_tensor * (self.max_episode_length / current_sum)
+        scaled_tensor[:,1:] = scaled_tensor[:,:-1].clone()
+        scaled_tensor[:,0] *= 0.0
+        # self.gait_time accumulate gait_duration_tick
+        # self.gait_time = |__gait1__|__gait2__|__gait3__|
+        # self.gait_time triger resample gait command
+        self.gait_time[envs] = torch.cumsum(scaled_tensor,dim=1).int()
+     
+    def _resample_commands(self):
+        """ Randommly select commands of some environments
+
+        Args:
+            env_ids (List[int]): Environments ids for which new commands are needed
+        """
+        for i in range(len(self.cfg.commands.gait)):
+            # if env finish current gait type, resample command for next gait
+            env_ids = (self.episode_length_buf == self.gait_time[:,i]).nonzero(as_tuple=False).flatten()
+            if len(env_ids) > 0:
+                # according to gait type create a name
+                name = '_resample_' + self.cfg.commands.gait[i] + '_command'
+                # get function from self based on name
+                resample_command = getattr(self, name)
+                # resample_command stands for _resample_stand_command/_resample_walk_sagittal_command/...
+                resample_command(env_ids)
+
+    def _resample_stand_command(self, env_ids):
+        self.commands[env_ids, 0] = torch.zeros(len(env_ids), device=self.device)
+        self.commands[env_ids, 1] = torch.zeros(len(env_ids), device=self.device)
+        if self.cfg.commands.heading_command:
+            self.commands[env_ids, 3] = torch.zeros(len(env_ids), device=self.device)
+        else:
+            self.commands[env_ids, 2] = torch.zeros(len(env_ids), device=self.device)
+            
+    def _resample_walk_sagittal_command(self, env_ids):
+        self.commands[env_ids, 0] = torch_rand_float(self.command_ranges["lin_vel_x"][0], self.command_ranges["lin_vel_x"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+        self.commands[env_ids, 1] = torch.zeros(len(env_ids), device=self.device)
+        if self.cfg.commands.heading_command:
+            self.commands[env_ids, 3] = torch.zeros(len(env_ids), device=self.device)
+        else:
+            self.commands[env_ids, 2] = torch.zeros(len(env_ids), device=self.device)
+
+    def _resample_walk_lateral_command(self, env_ids):
+        self.commands[env_ids, 0] = torch.zeros(len(env_ids), device=self.device)
+        self.commands[env_ids, 1] = torch_rand_float(self.command_ranges["lin_vel_y"][0], self.command_ranges["lin_vel_y"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+        if self.cfg.commands.heading_command:
+            self.commands[env_ids, 3] = torch.zeros(len(env_ids), device=self.device)
+        else:
+            self.commands[env_ids, 2] = torch.zeros(len(env_ids), device=self.device)
+    
+    def _resample_rotate_command(self, env_ids):
+        self.commands[env_ids, 0] = torch.zeros(len(env_ids), device=self.device)
+        self.commands[env_ids, 1] = torch.zeros(len(env_ids), device=self.device)
+        if self.cfg.commands.heading_command:
+            self.commands[env_ids, 3] = torch_rand_float(self.command_ranges["heading"][0], self.command_ranges["heading"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+        else:
+            self.commands[env_ids, 2] = torch_rand_float(self.command_ranges["ang_vel_yaw"][0], self.command_ranges["ang_vel_yaw"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+
+    def _resample_walk_omnidirectional_command(self,env_ids):
+        self.commands[env_ids, 0] = torch_rand_float(self.command_ranges["lin_vel_x"][0], self.command_ranges["lin_vel_x"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+        self.commands[env_ids, 1] = torch_rand_float(self.command_ranges["lin_vel_y"][0], self.command_ranges["lin_vel_y"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+        if self.cfg.commands.heading_command:
+            self.commands[env_ids, 3] = torch_rand_float(self.command_ranges["heading"][0], self.command_ranges["heading"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+        else:
+            self.commands[env_ids, 2] = torch_rand_float(self.command_ranges["ang_vel_yaw"][0], self.command_ranges["ang_vel_yaw"][1], (len(env_ids), 1), device=self.device).squeeze(1)
+        # self.commands[env_ids, :2] *= (torch.norm(self.commands[env_ids, :2], dim=1) > 0.05).unsqueeze(1)
+        
+    def _post_physics_step_callback(self):
+        """ Callback called before computing terminations, rewards, and observations
+            Default behaviour: Compute ang vel command based on target and heading, compute measured terrain heights and randomly push robots
+        """
+        self.phase_length_buf += 1
+        self._resample_commands()
+        if self.cfg.commands.heading_command:
+            forward = quat_apply(self.base_quat, self.forward_vec)
+            heading = torch.atan2(forward[:, 1], forward[:, 0])
+            self.commands[:, 2] = torch.clip(0.5*wrap_to_pi(self.commands[:, 3] - heading), -1., 1.)
+
+        if self.cfg.terrain.measure_heights:
+            # get all robot surrounding height
+            self.measured_heights = self._get_heights()
+
+        if self.cfg.domain_rand.push_robots:
+            i = int(self.common_step_counter/self.cfg.domain_rand.update_step)
+            if i >= len(self.cfg.domain_rand.push_duration):
+                i = len(self.cfg.domain_rand.push_duration) - 1
+            duration = self.cfg.domain_rand.push_duration[i]/self.dt
+            if self.common_step_counter % self.cfg.domain_rand.push_interval <= duration:
+                self._push_robots()
+            else:
+                self.rand_push_force.zero_()
+                self.rand_push_torque.zero_()
+
+    def compute_ref_state(self):
+        phase = self._get_phase()
+        sin_pos = torch.sin(2 * torch.pi * phase)
+        sin_pos_l = sin_pos.clone()
+        sin_pos_r = sin_pos.clone()
+
+        self.ref_dof_pos = torch.zeros_like(self.dof_pos)
+        # left swing
+        sin_pos_l[sin_pos_l > 0] = 0
+        self.ref_dof_pos[:, 0] = -sin_pos_l * self.cfg.rewards.final_swing_joint_delta_pos[0]
+        self.ref_dof_pos[:, 1] = -sin_pos_l * self.cfg.rewards.final_swing_joint_delta_pos[1]
+        self.ref_dof_pos[:, 2] = -sin_pos_l * self.cfg.rewards.final_swing_joint_delta_pos[2]
+        self.ref_dof_pos[:, 3] = -sin_pos_l * self.cfg.rewards.final_swing_joint_delta_pos[3]
+        self.ref_dof_pos[:, 4] = -sin_pos_l * self.cfg.rewards.final_swing_joint_delta_pos[4]
+        self.ref_dof_pos[:, 5] = -sin_pos_l * self.cfg.rewards.final_swing_joint_delta_pos[5]
+        # right
+        sin_pos_r[sin_pos_r < 0] = 0
+        self.ref_dof_pos[:, 6] = sin_pos_r *  self.cfg.rewards.final_swing_joint_delta_pos[6]
+        self.ref_dof_pos[:, 7] = sin_pos_r *  self.cfg.rewards.final_swing_joint_delta_pos[7]
+        self.ref_dof_pos[:, 8] = sin_pos_r *  self.cfg.rewards.final_swing_joint_delta_pos[8]
+        self.ref_dof_pos[:, 9] = sin_pos_r *  self.cfg.rewards.final_swing_joint_delta_pos[9]
+        self.ref_dof_pos[:, 10] = sin_pos_r * self.cfg.rewards.final_swing_joint_delta_pos[10]
+        self.ref_dof_pos[:, 11] = sin_pos_r * self.cfg.rewards.final_swing_joint_delta_pos[11]
+
+        self.ref_dof_pos[torch.abs(sin_pos) < 0.1] = 0.
+        
+        # if use_ref_actions=True, action += ref_action
+        self.ref_action = 2 * self.ref_dof_pos
+        
+        # self.ref_dof_pos set ref dof pos for swing leg, ref_dof_pos=0 for stance leg
+        self.ref_dof_pos += self.default_dof_pos
+
+
+    def create_sim(self):
+        """ Creates simulation, terrain and evironments
+        """
+        self.up_axis_idx = 2  # 2 for z, 1 for y -> adapt gravity accordingly
+        self.sim = self.gym.create_sim(
+            self.sim_device_id, self.graphics_device_id, self.physics_engine, self.sim_params)
+        mesh_type = self.cfg.terrain.mesh_type
+        if mesh_type in ['heightfield', 'trimesh']:
+            self.terrain = Terrain(self.cfg.terrain, self.num_envs)
+
+        if mesh_type == 'plane':
+            self._create_ground_plane()
+        elif mesh_type == 'heightfield':
+            self._create_heightfield()
+        elif mesh_type == 'trimesh':
+            self._create_trimesh()
+        elif mesh_type is not None:
+            raise ValueError(
+                "Terrain mesh type not recognised. Allowed types are [None, plane, heightfield, trimesh]")
+        self._create_envs()
+
+
+    def _get_noise_scale_vec(self, cfg):
+        """ Sets a vector used to scale the noise added to the observations.
+            [NOTE]: Must be adapted when changing the observations structure
+
+        Args:
+            cfg (Dict): Environment config file
+
+        Returns:
+            [torch.Tensor]: Vector of scales used to multiply a uniform distribution in [-1, 1]
+        """
+        noise_vec = torch.zeros(
+            self.cfg.env.num_single_obs, device=self.device)
+        self.add_noise = self.cfg.noise.add_noise
+        noise_scales = self.cfg.noise.noise_scales
+        noise_vec[0: self.cfg.env.num_commands] = 0.  # commands
+        noise_vec[self.cfg.env.num_commands: self.cfg.env.num_commands+self.num_actions] = noise_scales.dof_pos * self.obs_scales.dof_pos
+        noise_vec[self.cfg.env.num_commands+self.num_actions: self.cfg.env.num_commands+2*self.num_actions] = noise_scales.dof_vel * self.obs_scales.dof_vel
+        noise_vec[self.cfg.env.num_commands+2*self.num_actions: self.cfg.env.num_commands+3*self.num_actions] = 0.  # previous actions
+        noise_vec[self.cfg.env.num_commands+3*self.num_actions: self.cfg.env.num_commands+3*self.num_actions + 3] = noise_scales.ang_vel * self.obs_scales.ang_vel   # ang vel
+        noise_vec[self.cfg.env.num_commands+3*self.num_actions + 3: self.cfg.env.num_commands+3*self.num_actions + 6] = noise_scales.quat * self.obs_scales.quat         # euler x,y
+        return noise_vec
+
+
+
+    def step(self, actions):
+        if self.cfg.env.use_ref_actions:
+            actions += self.ref_action
+        return super().step(actions)
+
+    def compute_observations(self):
+
+        phase = self._get_phase()
+        self.compute_ref_state()
+
+        sin_pos = torch.sin(2 * torch.pi * phase).unsqueeze(1)
+        cos_pos = torch.cos(2 * torch.pi * phase).unsqueeze(1)
+
+        stance_mask = self._get_stance_mask()
+        contact_mask = self.contact_forces[:, self.feet_indices, 2] > 5.
+
+        self.command_input = torch.cat(
+            (sin_pos, cos_pos, self.commands[:, :3] * self.commands_scale), dim=1)
+        
+        # critic no lag
+        diff = self.dof_pos - self.ref_dof_pos
+        # 73
+        privileged_obs_buf = torch.cat((
+            self.command_input,  # 2 + 3
+            (self.dof_pos - self.default_joint_pd_target) * self.obs_scales.dof_pos,  # 12
+            self.dof_vel * self.obs_scales.dof_vel,  # 12
+            self.actions,  # 12
+            diff,  # 12
+            self.base_lin_vel * self.obs_scales.lin_vel,  # 3
+            self.base_ang_vel * self.obs_scales.ang_vel,  # 3
+            self.base_euler_xyz * self.obs_scales.quat,  # 3
+            self.rand_push_force[:, :2],  # 2
+            self.rand_push_torque,  # 3
+            self.env_frictions,  # 1
+            self.body_mass / 10.,  # 1 # sum of all fix link mass
+            stance_mask,  # 2
+            contact_mask,  # 2
+        ), dim=-1)
+        
+        # random add dof_pos and dof_vel same lag
+        if self.cfg.domain_rand.add_dof_lag:
+            if self.cfg.domain_rand.randomize_dof_lag_timesteps_perstep:
+                self.dof_lag_timestep = torch.randint(self.cfg.domain_rand.dof_lag_timesteps_range[0], 
+                                                  self.cfg.domain_rand.dof_lag_timesteps_range[1]+1,(self.num_envs,),device=self.device)
+                cond = self.dof_lag_timestep > self.last_dof_lag_timestep + 1
+                self.dof_lag_timestep[cond] = self.last_dof_lag_timestep[cond] + 1
+                self.last_dof_lag_timestep = self.dof_lag_timestep.clone()
+            self.lagged_dof_pos = self.dof_lag_buffer[torch.arange(self.num_envs), :self.num_actions, self.dof_lag_timestep.long()]
+            self.lagged_dof_vel = self.dof_lag_buffer[torch.arange(self.num_envs), -self.num_actions:, self.dof_lag_timestep.long()]  
+        # random add dof_pos and dof_vel different lag
+        elif self.cfg.domain_rand.add_dof_pos_vel_lag:
+            if self.cfg.domain_rand.randomize_dof_pos_lag_timesteps_perstep:
+                self.dof_pos_lag_timestep = torch.randint(self.cfg.domain_rand.dof_pos_lag_timesteps_range[0], 
+                                                  self.cfg.domain_rand.dof_pos_lag_timesteps_range[1]+1,(self.num_envs,),device=self.device)
+                cond = self.dof_pos_lag_timestep > self.last_dof_pos_lag_timestep + 1
+                self.dof_pos_lag_timestep[cond] = self.last_dof_pos_lag_timestep[cond] + 1
+                self.last_dof_pos_lag_timestep = self.dof_pos_lag_timestep.clone()
+            self.lagged_dof_pos = self.dof_pos_lag_buffer[torch.arange(self.num_envs), :, self.dof_pos_lag_timestep.long()]
+                
+            if self.cfg.domain_rand.randomize_dof_vel_lag_timesteps_perstep:
+                self.dof_vel_lag_timestep = torch.randint(self.cfg.domain_rand.dof_vel_lag_timesteps_range[0], 
+                                                  self.cfg.domain_rand.dof_vel_lag_timesteps_range[1]+1,(self.num_envs,),device=self.device)
+                cond = self.dof_vel_lag_timestep > self.last_dof_vel_lag_timestep + 1
+                self.dof_vel_lag_timestep[cond] = self.last_dof_vel_lag_timestep[cond] + 1
+                self.last_dof_vel_lag_timestep = self.dof_vel_lag_timestep.clone()
+            self.lagged_dof_vel = self.dof_vel_lag_buffer[torch.arange(self.num_envs), :, self.dof_vel_lag_timestep.long()]
+        # dof_pos and dof_vel has no lag
+        else:
+            self.lagged_dof_pos = self.dof_pos
+            self.lagged_dof_vel = self.dof_vel
+
+        # imu lag, including rpy and omega
+        if self.cfg.domain_rand.add_imu_lag:    
+            if self.cfg.domain_rand.randomize_imu_lag_timesteps_perstep:
+                self.imu_lag_timestep = torch.randint(self.cfg.domain_rand.imu_lag_timesteps_range[0], 
+                                                  self.cfg.domain_rand.imu_lag_timesteps_range[1]+1,(self.num_envs,),device=self.device)
+                cond = self.imu_lag_timestep > self.last_imu_lag_timestep + 1
+                self.imu_lag_timestep[cond] = self.last_imu_lag_timestep[cond] + 1
+                self.last_imu_lag_timestep = self.imu_lag_timestep.clone()
+            self.lagged_imu = self.imu_lag_buffer[torch.arange(self.num_envs), :, self.imu_lag_timestep.int()]
+            self.lagged_base_ang_vel = self.lagged_imu[:,:3].clone()
+            self.lagged_base_euler_xyz = self.lagged_imu[:,-3:].clone()
+        # no imu lag
+        else:              
+            self.lagged_base_ang_vel = self.base_ang_vel[:,:3]
+            self.lagged_base_euler_xyz = self.base_euler_xyz[:,-3:]
+        
+        # obs q and dq
+        q = (self.lagged_dof_pos - self.default_dof_pos) * self.obs_scales.dof_pos
+        dq = self.lagged_dof_vel * self.obs_scales.dof_vel  
+
+        # 47
+        obs_buf = torch.cat((
+            self.command_input,  # 5 = 2D(sin cos) + 3D(vel_x, vel_y, aug_vel_yaw)
+            q,    # 12
+            dq,  # 12
+            self.actions,   # 12
+            self.lagged_base_ang_vel * self.obs_scales.ang_vel,  # 3
+            self.lagged_base_euler_xyz * self.obs_scales.quat,  # 3
+        ), dim=-1)
+
+        if self.cfg.env.num_single_obs == 48:
+            stand_command = (torch.norm(self.commands[:, :3], dim=1, keepdim=True) <= self.cfg.commands.stand_com_threshold)
+            obs_buf = torch.cat((obs_buf, stand_command),dim=1)
+            
+        if self.cfg.terrain.measure_heights:
+            heights = torch.clip(self.root_states[:, 2].unsqueeze(1) - 0.5 - self.measured_heights, -1, 1.) * self.obs_scales.height_measurements
+            privileged_obs_buf = torch.cat((privileged_obs_buf.clone(), heights), dim=-1)
+        
+        if self.add_noise:  
+            # add obs noise
+            obs_now = obs_buf.clone() + (2 * torch.rand_like(obs_buf) -1) * self.noise_scale_vec * self.cfg.noise.noise_level
+        else:
+            obs_now = obs_buf.clone()
+
+        self.obs_history.append(obs_now)
+        self.critic_history.append(privileged_obs_buf)
+
+        obs_buf_all = torch.stack([self.obs_history[i]
+                                   for i in range(self.obs_history.maxlen)], dim=1)  # N,T,K
+
+        self.obs_buf = obs_buf_all.reshape(self.num_envs, -1)  # N, T*K
+        self.privileged_obs_buf = torch.cat([self.critic_history[i] for i in range(self.cfg.env.c_frame_stack)], dim=1)
+
+    def reset_idx(self, env_ids):
+        """ Reset some environments.
+            Calls self._reset_dofs(env_ids), self._reset_root_states(env_ids), and self._resample_commands(env_ids)
+            [Optional] calls self._update_terrain_curriculum(env_ids), self.update_command_curriculum(env_ids) and
+            Logs episode info
+            Resets some buffers
+
+        Args:
+            env_ids (list[int]): List of environment ids which must be reset
+        """
+        if len(env_ids) == 0:
+            return
+        # update curriculum
+        if self.cfg.terrain.curriculum:
+            self._update_terrain_curriculum(env_ids)
+        # avoid updating command curriculum at each step since the maximum command is common to all envs
+        if self.cfg.commands.curriculum and (self.common_step_counter % self.max_episode_length==0):
+            self.update_command_curriculum(env_ids)
+        
+        # reset rand dof_pos and dof_vel=0
+        self._reset_dofs(env_ids)
+
+        # reset base position
+        self._reset_root_states(env_ids)
+        
+        # Randomize joint parameters, like torque gain friction ...
+        self.randomize_dof_props(env_ids)
+        self._refresh_actor_dof_props(env_ids)
+        self.randomize_lag_props(env_ids)
+        
+        # reset buffers
+        self.last_last_actions[env_ids] = 0.
+        self.actions[env_ids] = 0.
+        self.last_actions[env_ids] = 0.
+        self.last_rigid_state[env_ids] = 0.
+        self.last_dof_vel[env_ids] = 0.
+        self.last_root_vel[env_ids] = 0.
+        self.feet_air_time[env_ids] = 0.
+        self.episode_length_buf[env_ids] = 0
+        self.phase_length_buf[env_ids] = 0
+        self.reset_buf[env_ids] = 1
+        # rand 0 or 0.5
+        self.gait_start[env_ids] = torch.randint(0, 2, (len(env_ids),)).to(self.device)*0.5
+        
+        #resample command
+        self.generate_gait_time(env_ids)
+        self._resample_commands()
+        
+        # fill extras
+        self.extras["episode"] = {}
+        for key in self.episode_sums.keys():
+            self.extras["episode"]['rew_' + key] = torch.mean(self.episode_sums[key][env_ids]) / self.max_episode_length_s
+            self.episode_sums[key][env_ids] = 0.
+        # log additional curriculum info
+        if self.cfg.terrain.mesh_type == "trimesh":
+            self.extras["episode"]["terrain_level"] = torch.mean(self.terrain_levels.float())
+        if self.cfg.commands.curriculum:
+            self.extras["episode"]["max_command_x"] = self.command_ranges["lin_vel_x"][1]
+        # send timeout info to the algorithm
+        if self.cfg.env.send_timeouts:
+            self.extras["time_outs"] = self.time_out_buf
+            
+        # fix reset gravity bug
+        self.gym.refresh_actor_root_state_tensor(self.sim)
+        self.gym.refresh_net_contact_force_tensor(self.sim)
+        self.gym.refresh_rigid_body_state_tensor(self.sim)
+        
+        self.base_quat[env_ids] = self.root_states[env_ids, 3:7]
+        self.base_euler_xyz = get_euler_xyz_tensor(self.base_quat)
+        self.projected_gravity[env_ids] = quat_rotate_inverse(self.base_quat[env_ids], self.gravity_vec[env_ids])
+        self.base_lin_vel[env_ids] = quat_rotate_inverse(self.base_quat[env_ids], self.root_states[env_ids, 7:10])
+        self.base_ang_vel[env_ids] = quat_rotate_inverse(self.base_quat[env_ids], self.root_states[env_ids, 10:13])
+        self.feet_quat = self.rigid_state[:, self.feet_indices, 3:7]
+        self.feet_euler_xyz = get_euler_xyz_tensor(self.feet_quat)
+        
+        # clear obs history buffer and privileged obs buffer
+        for i in range(self.obs_history.maxlen):
+            self.obs_history[i][env_ids] *= 0
+        for i in range(self.critic_history.maxlen):
+            self.critic_history[i][env_ids] *= 0
+        
+    
+    def _init_buffers(self):
+        """ Initialize torch tensors which will contain simulation states and processed quantities
+        """
+        super()._init_buffers()
+        self.gait_time = torch.zeros(self.num_envs, len(self.cfg.commands.gait) ,dtype=torch.int, device=self.device, requires_grad=False)
+        self.phase_length_buf = torch.zeros(
+            self.num_envs, device=self.device, dtype=torch.long)
+        self.gait_start = torch.randint(0, 2, (self.num_envs,)).to(self.device)*0.5
+
+# ================================================ Rewards ================================================== #
+    def _reward_ref_joint_pos(self):
+        """
+        Calculates the reward based on the difference between the current joint positions and the target joint positions.
+        """
+        joint_pos = self.dof_pos.clone()
+        pos_target = self.ref_dof_pos.clone()
+        stand_command = (torch.norm(self.commands[:, :3], dim=1) <= self.cfg.commands.stand_com_threshold)
+        pos_target[stand_command] = self.default_dof_pos.clone()
+        diff = joint_pos - pos_target
+        r = torch.exp(-2 * torch.norm(diff, dim=1)) - 0.2 * torch.norm(diff, dim=1).clamp(0, 0.5)
+        r[stand_command] = 1.0
+        return r
+    
+    def _reward_feet_distance(self):
+        """
+        Calculates the reward based on the distance between the feet. Penilize feet get close to each other or too far away.
+        """
+        foot_pos = self.rigid_state[:, self.feet_indices, :2]
+        foot_dist = torch.norm(foot_pos[:, 0, :] - foot_pos[:, 1, :], dim=1)
+        fd = self.cfg.rewards.foot_min_dist
+        max_df = self.cfg.rewards.foot_max_dist
+        d_min = torch.clamp(foot_dist - fd, -0.5, 0.)
+        d_max = torch.clamp(foot_dist - max_df, 0, 0.5)
+        return (torch.exp(-torch.abs(d_min) * 100) + torch.exp(-torch.abs(d_max) * 100)) / 2
+
+    def _reward_knee_distance(self):
+        """
+        Calculates the reward based on the distance between the knee of the humanoid.
+        """
+        foot_pos = self.rigid_state[:, self.knee_indices, :2]
+        foot_dist = torch.norm(foot_pos[:, 0, :] - foot_pos[:, 1, :], dim=1)
+        fd = self.cfg.rewards.foot_min_dist
+        max_df = self.cfg.rewards.foot_max_dist / 2
+        d_min = torch.clamp(foot_dist - fd, -0.5, 0.)
+        d_max = torch.clamp(foot_dist - max_df, 0, 0.5)
+        return (torch.exp(-torch.abs(d_min) * 100) + torch.exp(-torch.abs(d_max) * 100)) / 2
+
+    def _reward_foot_slip(self):
+        """
+        Calculates the reward for minimizing foot slip. The reward is based on the contact forces 
+        and the speed of the feet. A contact threshold is used to determine if the foot is in contact 
+        with the ground. The speed of the foot is calculated and scaled by the contact conditions.
+        """
+        contact = self.contact_forces[:, self.feet_indices, 2] > 5.
+        foot_speed_norm = torch.norm(self.rigid_state[:, self.feet_indices, 10:12], dim=2)
+        rew = torch.sqrt(foot_speed_norm)
+        rew *= contact
+        return torch.sum(rew, dim=1)
+
+    def _reward_feet_air_time(self):
+        """
+        Calculates the reward for feet air time, promoting longer steps. This is achieved by
+        checking the first contact with the ground after being in the air. The air time is
+        limited to a maximum value for reward calculation.
+        """
+        contact = self.contact_forces[:, self.feet_indices, 2] > 5.
+        stance_mask = self._get_stance_mask().clone()
+        stance_mask[torch.norm(self.commands[:, :3], dim=1) < 0.05] = 1
+        self.contact_filt = torch.logical_or(torch.logical_or(contact, stance_mask), self.last_contacts)
+        self.last_contacts = contact
+        first_contact = (self.feet_air_time > 0.) * self.contact_filt
+        self.feet_air_time += self.dt
+        air_time = self.feet_air_time.clamp(0, 0.5) * first_contact
+        self.feet_air_time *= ~self.contact_filt
+        return air_time.sum(dim=1)
+
+    def _reward_feet_contact_number(self):
+        """
+        Calculates a reward based on the number of feet contacts aligning with the gait phase. 
+        Rewards or penalizes depending on whether the foot contact matches the expected gait phase.
+        """
+        contact = self.contact_forces[:, self.feet_indices, 2] > 5.
+        stance_mask = self._get_stance_mask().clone()
+        stance_mask[torch.norm(self.commands[:, :3], dim=1) <= self.cfg.commands.stand_com_threshold] = 1
+        reward = torch.where(contact == stance_mask, 1, -0.3)
+        return torch.mean(reward, dim=1)
+
+    def _reward_orientation(self):
+        """
+        Calculates the reward for maintaining a flat base orientation. It penalizes deviation 
+        from the desired base orientation using the base euler angles and the projected gravity vector.
+        """
+        quat_mismatch = torch.exp(-torch.sum(torch.abs(self.base_euler_xyz[:, :2]), dim=1) * 10)
+        orientation = torch.exp(-torch.norm(self.projected_gravity[:, :2], dim=1) * 20)
+        return (quat_mismatch + orientation) / 2.
+
+    def _reward_feet_contact_forces(self):
+        """
+        Calculates the reward for keeping contact forces within a specified range. Penalizes
+        high contact forces on the feet.
+        """
+        return torch.sum((torch.norm(self.contact_forces[:, self.feet_indices, :], dim=-1) - self.cfg.rewards.max_contact_force).clip(0, 400), dim=1)
+
+    def _reward_default_joint_pos(self):
+        """
+        Calculates the reward for keeping joint positions close to default positions, with a focus 
+        on penalizing deviation in yaw and roll directions. Excludes yaw and roll from the main penalty.
+        """
+        joint_diff = self.dof_pos - self.default_joint_pd_target
+        left_yaw_roll = joint_diff[:, [1,2,5]]
+        right_yaw_roll = joint_diff[:, [7,8,11]]
+        yaw_roll = torch.norm(left_yaw_roll, dim=1) + torch.norm(right_yaw_roll, dim=1)
+        yaw_roll = torch.clamp(yaw_roll - 0.1, 0, 50)
+        return torch.exp(-yaw_roll * 100) - 0.01 * torch.norm(joint_diff, dim=1)
+
+    def _reward_base_height(self):
+        """
+        Calculates the reward based on the robot's base height. Penalizes deviation from a target base height.
+        The reward is computed based on the height difference between the robot's base and the average height 
+        of its feet when they are in contact with the ground.
+        """
+        stance_mask = self._get_stance_mask()
+        measured_heights = torch.sum(
+            self.rigid_state[:, self.feet_indices, 2] * stance_mask, dim=1) / torch.sum(stance_mask, dim=1)
+        base_height = self.root_states[:, 2] - (measured_heights - self.cfg.rewards.feet_to_ankle_distance)
+        return torch.exp(-torch.abs(base_height - self.cfg.rewards.base_height_target) * 100)
+
+    def _reward_base_acc(self):
+        """
+        Computes the reward based on the base's acceleration. Penalizes high accelerations of the robot's base,
+        encouraging smoother motion.
+        """
+        root_acc = self.last_root_vel - self.root_states[:, 7:13]
+        rew = torch.exp(-torch.norm(root_acc, dim=1) * 3)
+        return rew
+
+
+    def _reward_vel_mismatch_exp(self):
+        """
+        Computes a reward based on the mismatch in the robot's linear and angular velocities. 
+        Encourages the robot to maintain a stable velocity by penalizing large deviations.
+        """
+        lin_mismatch = torch.exp(-torch.square(self.base_lin_vel[:, 2]) * 10)
+        ang_mismatch = torch.exp(-torch.norm(self.base_ang_vel[:, :2], dim=1) * 5.)
+
+        c_update = (lin_mismatch + ang_mismatch) / 2.
+
+        return c_update
+
+    def _reward_track_vel_hard(self):
+        """
+        Calculates a reward for accurately tracking both linear and angular velocity commands.
+        Penalizes deviations from specified linear and angular velocity targets.
+        """
+        # Tracking of linear velocity commands (xy axes)
+        lin_vel_error = torch.norm(
+            self.commands[:, :2] - self.base_lin_vel[:, :2], dim=1)
+        lin_vel_error_exp = torch.exp(-lin_vel_error * 10)
+
+        # Tracking of angular velocity commands (yaw)
+        ang_vel_error = torch.abs(
+            self.commands[:, 2] - self.base_ang_vel[:, 2])
+        ang_vel_error_exp = torch.exp(-ang_vel_error * 10)
+
+        linear_error = 0.2 * (lin_vel_error + ang_vel_error)
+        r = (lin_vel_error_exp + ang_vel_error_exp) / 2. - linear_error
+        return r
+    
+    def _reward_tracking_lin_vel(self):
+        """
+        Tracks linear velocity commands along the xy axes. 
+        Calculates a reward based on how closely the robot's linear velocity matches the commanded values.
+        """
+        stand_command = (torch.norm(self.commands[:, :3], dim=1) <= self.cfg.commands.stand_com_threshold)
+        lin_vel_error_square = torch.sum(torch.square(
+            self.commands[:, :2] - self.base_lin_vel[:, :2]), dim=1)
+        lin_vel_error_abs = torch.sum(torch.abs(
+            self.commands[:, :2] - self.base_lin_vel[:, :2]), dim=1)
+        r_square = torch.exp(-lin_vel_error_square * self.cfg.rewards.tracking_sigma)
+        r_abs = torch.exp(-lin_vel_error_abs * self.cfg.rewards.tracking_sigma * 2)
+        r = torch.where(stand_command, r_abs, r_square)
+
+        return r
+
+    def _reward_tracking_ang_vel(self):
+        """
+        Tracks angular velocity commands for yaw rotation.
+        Computes a reward based on how closely the robot's angular velocity matches the commanded yaw values.
+        """   
+        stand_command = (torch.norm(self.commands[:, :3], dim=1) <= self.cfg.commands.stand_com_threshold)
+        ang_vel_error_square = torch.square(
+            self.commands[:, 2] - self.base_ang_vel[:, 2])
+        ang_vel_error_abs = torch.abs(
+            self.commands[:, 2] - self.base_ang_vel[:, 2])
+        r_square = torch.exp(-ang_vel_error_square * self.cfg.rewards.tracking_sigma)
+        r_abs = torch.exp(-ang_vel_error_abs * self.cfg.rewards.tracking_sigma * 2)
+        r = torch.where(stand_command, r_abs, r_square)
+
+        return r 
+    
+    def _reward_feet_clearance(self):
+        """
+        Calculates reward based on the clearance of the swing leg from the ground during movement.
+        Encourages appropriate lift of the feet during the swing phase of the gait.
+        """
+        # Compute feet contact mask
+        contact = self.contact_forces[:, self.feet_indices, 2] > 5.
+
+        # Get the z-position of the feet and compute the change in z-position
+        feet_z = self.rigid_state[:, self.feet_indices, 2] - self.cfg.rewards.feet_to_ankle_distance
+        delta_z = feet_z - self.last_feet_z
+        self.feet_height += delta_z
+        self.last_feet_z = feet_z
+
+        # Compute swing mask
+        swing_mask = 1 - self._get_stance_mask()
+
+        # feet height should larger than target feet height at the peak
+        rew_pos = (self.feet_height > self.cfg.rewards.target_feet_height) * (self.feet_height < self.cfg.rewards.target_feet_height_max)
+        rew_pos = torch.sum(rew_pos * swing_mask, dim=1)
+        self.feet_height *= ~contact
+        return rew_pos
+
+    def _reward_low_speed(self):
+        """
+        Rewards or penalizes the robot based on its speed relative to the commanded speed. 
+        This function checks if the robot is moving too slow, too fast, or at the desired speed, 
+        and if the movement direction matches the command.
+        """
+        # Calculate the absolute value of speed and command for comparison
+        absolute_speed = torch.abs(self.base_lin_vel[:, 0])
+        absolute_command = torch.abs(self.commands[:, 0])
+
+        # Define speed criteria for desired range
+        speed_too_low = absolute_speed < 0.5 * absolute_command
+        speed_too_high = absolute_speed > 1.2 * absolute_command
+        speed_desired = ~(speed_too_low | speed_too_high)
+
+        # Check if the speed and command directions are mismatched
+        sign_mismatch = torch.sign(
+            self.base_lin_vel[:, 0]) != torch.sign(self.commands[:, 0])
+
+        # Initialize reward tensor
+        reward = torch.zeros_like(self.base_lin_vel[:, 0])
+
+        # Assign rewards based on conditions
+        # Speed too low
+        reward[speed_too_low] = -1.0
+        # Speed too high
+        reward[speed_too_high] = 0.
+        # Speed within desired range
+        reward[speed_desired] = 1.2
+        # Sign mismatch has the highest priority
+        reward[sign_mismatch] = -2.0
+        return reward * (self.commands[:, 0].abs() > 0.05)
+    
+    def _reward_torques(self):
+        """
+        Penalizes the use of high torques in the robot's joints. Encourages efficient movement by minimizing
+        the necessary force exerted by the motors.
+        """
+        return torch.sum(torch.square(self.torques), dim=1)
+    
+    def _reward_ankle_torques(self):
+        """
+        Penalizes the use of high torques in the robot's joints. Encourages efficient movement by minimizing
+        the necessary force exerted by the motors.
+        """
+        ankle_idx = [4,5,10,11]
+        return torch.sum(torch.square(self.torques[:,ankle_idx]), dim=1)
+    
+    def _reward_feet_rotation(self):
+        feet_euler_xyz = self.feet_euler_xyz
+        rotation = torch.sum(torch.square(feet_euler_xyz[:,:,:2]),dim=[1,2])
+        # rotation = torch.sum(torch.square(feet_euler_xyz[:,:,1]),dim=1)
+        r = torch.exp(-rotation*15)
+        return r
+
+    def _reward_dof_vel(self):
+        """
+        Penalizes high velocities at the degrees of freedom (DOF) of the robot. This encourages smoother and 
+        more controlled movements.
+        """
+        return torch.sum(torch.square(self.dof_vel), dim=1)
+    
+    def _reward_dof_acc(self):
+        """
+        Penalizes high accelerations at the robot's degrees of freedom (DOF). This is important for ensuring
+        smooth and stable motion, reducing wear on the robot's mechanical parts.
+        """
+        return torch.sum(torch.square((self.last_dof_vel - self.dof_vel) / self.dt), dim=1)
+    
+    def _reward_collision(self):
+        """
+        Penalizes collisions of the robot with the environment, specifically focusing on selected body parts.
+        This encourages the robot to avoid undesired contact with objects or surfaces.
+        """
+        return torch.sum(1.*(torch.norm(self.contact_forces[:, self.penalised_contact_indices, :], dim=-1) > 0.1), dim=1)
+    
+    def _reward_action_smoothness(self):
+        """
+        Encourages smoothness in the robot's actions by penalizing large differences between consecutive actions.
+        This is important for achieving fluid motion and reducing mechanical stress.
+        """
+        term_1 = torch.sum(torch.square(
+            self.last_actions - self.actions), dim=1)
+        term_2 = torch.sum(torch.square(
+            self.actions + self.last_last_actions - 2 * self.last_actions), dim=1)
+        term_3 = 0.05 * torch.sum(torch.abs(self.actions), dim=1)
+        return term_1 + term_2 + term_3
+    
+    def _reward_termination(self):
+        # Terminal reward / penalty
+        return self.reset_buf * ~self.time_out_buf
+    
+    def _reward_stand_still(self):
+        # penalize motion at zero commands
+        stand_command = (torch.norm(self.commands[:, :3], dim=1) <= self.cfg.commands.stand_com_threshold)
+        r = torch.exp(-torch.sum(torch.square(self.dof_pos - self.default_dof_pos), dim=1))
+        r = torch.where(stand_command, r.clone(),
+                        torch.zeros_like(r))
+        return r
+    
+    def _reward_feet_stumble(self):
+        # Penalize feet hitting vertical surfaces
+        return torch.any(torch.norm(self.contact_forces[:, self.feet_indices, :2], dim=2) >\
+             5 *torch.abs(self.contact_forces[:, self.feet_indices, 2]), dim=1)
+
+    def _reward_dof_pos_limits(self):
+        # Penalize dof positions too close to the limit
+        out_of_limits = -(self.dof_pos - self.dof_pos_limits[:, 0]).clip(max=0.) # lower limit
+        out_of_limits += (self.dof_pos - self.dof_pos_limits[:, 1]).clip(min=0.)
+        return torch.sum(out_of_limits, dim=1)
+
+    def _reward_dof_vel_limits(self):
+        # Penalize dof velocities too close to the limit
+        # clip to max error = 1 rad/s per joint to avoid huge penalties
+        return torch.sum((torch.abs(self.dof_vel) - self.dof_vel_limits*self.cfg.rewards.soft_dof_vel_limit).clip(min=0., max=1.), dim=1)
+
+    def _reward_dof_torque_limits(self):
+        # penalize torques too close to the limit
+        return torch.sum((torch.abs(self.torques) - self.torque_limits*self.cfg.rewards.soft_torque_limit).clip(min=0.), dim=1)
\ No newline at end of file
--- a/humanoid/scripts/export_onnx_dh.py
+++ b/humanoid/scripts/export_onnx_dh.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+from humanoid import LEGGED_GYM_ROOT_DIR
+import os
+from humanoid.envs import *
+from humanoid.utils import  get_args, task_registry
+from datetime import datetime
+import torch
+
+def get_load_path(root, load_run=-1, checkpoint=-1):
+    try:
+        runs = os.listdir(root)
+        runs.sort()
+        if "exported" in runs:
+            runs.remove("exported")
+        last_run = os.path.join(root, runs[-1])
+    except:
+        raise ValueError("No runs in this directory: " + root)
+    if load_run == -1:
+        load_run = last_run
+    else:
+        load_run = os.path.join(root, load_run)
+
+    models = [file for file in os.listdir(load_run)]
+    models.sort(key=lambda m: "{0:0>15}".format(m))
+    model = models[-1]
+
+    load_path = os.path.join(load_run, model)
+    return load_path
+
+def export_onnx(args):
+    env_cfg, train_cfg = task_registry.get_cfgs(name=args.task)
+    # load jit
+    log_root = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', train_cfg.runner.experiment_name, 'exported_policies')
+    model_path = get_load_path(log_root, load_run=args.load_run, checkpoint=args.checkpoint)
+    print("Load model from:", model_path)
+    jit_model = torch.jit.load(model_path)
+    jit_model.eval()
+    
+    current_date_time = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
+    root_path = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', 
+                            train_cfg.runner.experiment_name, 'exported_onnx',
+                            current_date_time)
+    os.makedirs(root_path, exist_ok=True)
+    dir_name = args.task.split('_')[0] + "_policy.onnx"
+    path = os.path.join(root_path, dir_name)
+    example_input = torch.randn(1,env_cfg.env.num_observations)
+    # export onnx model
+    torch.onnx.export(jit_model,               # JIT model
+                    example_input,             # model example input
+                    path,                      # model output path
+                    export_params=True,        # export model params
+                    opset_version=11,          # ONNX opset version
+                    do_constant_folding=True,  # optimize constant variable folding
+                    input_names=['input'],     # model input name
+                    output_names=['output'],   # model output name
+                    )
+    print("Export onnx model to: ", path)
+if __name__ == '__main__':
+    args = get_args()
+    if args.load_run == None:
+        args.load_run = -1
+    export_onnx(args)
\ No newline at end of file
--- a/humanoid/scripts/export_policy_dh.py
+++ b/humanoid/scripts/export_policy_dh.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+from humanoid import LEGGED_GYM_ROOT_DIR
+import os
+import copy
+
+from humanoid.envs import *
+from humanoid.utils import  get_args, task_registry, Logger
+from humanoid.utils.helpers import get_load_path, class_to_dict
+from datetime import datetime
+
+import torch
+from humanoid.algo.ppo import ActorCriticDH
+
+
+class ExportedDH(torch.nn.Module):
+    def __init__(self, actor, long_history, state_estimator, num_short_obs, in_channels, num_proprio_obs):
+        super().__init__()
+        self.actor = copy.deepcopy(actor).cpu()
+        self.long_history = copy.deepcopy(long_history).cpu()
+        self.state_estimator = copy.deepcopy(state_estimator).cpu()
+        self.num_short_obs = num_short_obs
+        self.in_channels = in_channels
+        self.num_proprio_obs = num_proprio_obs
+    
+    def forward(self, observations):
+        short_history = observations[...,-self.num_short_obs:]
+        es_vel = self.state_estimator(short_history)
+        compressed_long_history = self.long_history(observations.view(-1, self.in_channels, self.num_proprio_obs))
+        actor_obs = torch.cat((short_history, es_vel, compressed_long_history),dim=-1)
+        actions_mean = self.actor(actor_obs)
+        return actions_mean
+    
+    def export(self, path):
+        self.to("cpu")
+        traced_script_module = torch.jit.script(self)
+        traced_script_module.save(path)  
+    
+
+def export_policy(args):
+    env_cfg, train_cfg = task_registry.get_cfgs(name=args.task)
+    
+    train_cfg_dict = class_to_dict(train_cfg)
+    policy_cfg = train_cfg_dict["policy"]
+    num_critic_obs = env_cfg.env.num_privileged_obs
+    if env_cfg.terrain.measure_heights:
+        num_critic_obs = env_cfg.env.c_frame_stack * (env_cfg.env.single_num_privileged_obs +env_cfg.terrain.num_height)
+    num_short_obs = env_cfg.env.short_frame_stack * env_cfg.env.num_single_obs
+    actor_critic_class = eval(train_cfg_dict["runner"]["policy_class_name"])
+    actor_critic: ActorCriticDH = actor_critic_class(
+        num_short_obs, env_cfg.env.num_single_obs, num_critic_obs, env_cfg.env.num_actions, **policy_cfg
+    )
+    # load policy
+    log_root_encoder = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', train_cfg.runner.experiment_name, 'exported_data')
+    model_path = get_load_path(log_root_encoder, load_run=args.load_run, checkpoint=args.checkpoint)
+    print("Load model from:", model_path)
+    loaded_dict = torch.load(model_path)
+    actor_critic.load_state_dict(loaded_dict["model_state_dict"])
+    
+    exported_policy = ExportedDH(actor_critic.actor,
+                                 actor_critic.long_history,
+                                 actor_critic.state_estimator,
+                                 num_short_obs,
+                                 policy_cfg["in_channels"],
+                                 env_cfg.env.num_single_obs)
+
+    current_date_time = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
+    
+    root_path = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', 
+                            train_cfg.runner.experiment_name, 'exported_policies',
+                            current_date_time)
+    os.makedirs(root_path, exist_ok=True)
+    dir_name = "policy_dh.jit"
+    path = os.path.join(root_path, dir_name)
+    exported_policy.export(path)
+    print("Export policy to:", path)
+    
+if __name__ == '__main__':
+    EXPORT_POLICY = True
+    args = get_args()
+    if args.load_run == None:
+        args.load_run = -1
+    if args.checkpoint == None:
+        args.checkpoint = -1
+    export_policy(args)
+    
\ No newline at end of file
--- a/humanoid/scripts/play.py
+++ b/humanoid/scripts/play.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+import os
+import cv2
+import numpy as np
+from isaacgym import gymapi
+from humanoid import LEGGED_GYM_ROOT_DIR
+
+# import isaacgym
+from humanoid.envs import *
+from humanoid.utils import  get_args, export_policy_as_jit, task_registry, Logger
+from isaacgym.torch_utils import *
+
+import torch
+from datetime import datetime
+
+import pygame
+from threading import Thread
+
+
+x_vel_cmd, y_vel_cmd, yaw_vel_cmd = 0.0, 0.0, 0.0
+joystick_use = True
+joystick_opened = False
+
+if joystick_use:
+    pygame.init()
+    try:
+        # get joystick
+        joystick = pygame.joystick.Joystick(0)
+        joystick.init()
+        joystick_opened = True
+    except Exception as e:
+        print(f"无法打开手柄：{e}")
+    # joystick thread exit flag
+    exit_flag = False
+
+    def handle_joystick_input():
+        global exit_flag, x_vel_cmd, y_vel_cmd, yaw_vel_cmd, head_vel_cmd
+        
+        
+        while not exit_flag:
+            # get joystick input
+            pygame.event.get()
+            # update robot command
+            x_vel_cmd = -joystick.get_axis(1) * 1
+            y_vel_cmd = -joystick.get_axis(0) * 1
+            yaw_vel_cmd = -joystick.get_axis(3) * 1
+            pygame.time.delay(100)
+
+    if joystick_opened and joystick_use:
+        joystick_thread = Thread(target=handle_joystick_input)
+        joystick_thread.start()
+
+def play(args):
+    env_cfg, train_cfg = task_registry.get_cfgs(name=args.task)
+    # override some parameters for testing
+    env_cfg.env.num_envs = min(env_cfg.env.num_envs, 10)
+    # env_cfg.terrain.mesh_type = 'trimesh'
+    env_cfg.terrain.mesh_type = 'plane'
+    env_cfg.terrain.num_rows = 5
+    env_cfg.terrain.num_cols = 5
+    env_cfg.terrain.max_init_terrain_level = 5
+    env_cfg.env.episode_length_s = 1000
+    env_cfg.noise.add_noise = False
+    env_cfg.domain_rand.randomize_friction = False 
+    env_cfg.domain_rand.push_robots = False 
+    env_cfg.domain_rand.continuous_push = False 
+    env_cfg.domain_rand.randomize_base_mass = False 
+    env_cfg.domain_rand.randomize_com = False 
+    env_cfg.domain_rand.randomize_gains = False 
+    env_cfg.domain_rand.randomize_torque = False 
+    env_cfg.domain_rand.randomize_link_mass = False 
+    env_cfg.domain_rand.randomize_motor_offset = False 
+    env_cfg.domain_rand.randomize_joint_friction = False
+    env_cfg.domain_rand.randomize_joint_damping = False
+    env_cfg.domain_rand.randomize_joint_armature = False
+    env_cfg.domain_rand.randomize_lag_timesteps = False
+    env_cfg.noise.curriculum = False
+    env_cfg.commands.heading_command = False
+
+    train_cfg.seed = 123145
+    print("train_cfg.runner_class_name:", train_cfg.runner_class_name)
+
+    # prepare environment
+    env, _ = task_registry.make_env(name=args.task, args=args, env_cfg=env_cfg)
+    env.set_camera(env_cfg.viewer.pos, env_cfg.viewer.lookat)
+
+    # load policy
+    train_cfg.runner.resume = True
+    ppo_runner, train_cfg, _ = task_registry.make_alg_runner(env=env, name=args.task, args=args, train_cfg=train_cfg)
+    policy = ppo_runner.get_inference_policy(device=env.device)
+    
+    # export policy as a jit module (used to run it from C++)
+    current_date_str = datetime.now().strftime('%Y-%m-%d')
+    current_time_str = datetime.now().strftime('%H-%M-%S')
+    if EXPORT_POLICY:
+        path = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', train_cfg.runner.experiment_name, '0_exported', 'policies')
+        export_policy_as_jit(ppo_runner.alg.actor_critic, path)
+        print('Exported policy as jit script to: ', path)
+
+    logger = Logger(env_cfg.sim.dt * env_cfg.control.decimation)
+    robot_index = 0 # which robot is used for logging
+    joint_index = 5 # which joint is used for logging
+    stop_state_log = 1000 # number of steps before plotting states
+    if RENDER:
+        camera_properties = gymapi.CameraProperties()
+        camera_properties.width = 1920
+        camera_properties.height = 1080
+        h1 = env.gym.create_camera_sensor(env.envs[0], camera_properties)
+        camera_offset = gymapi.Vec3(1, -1, 0.5)
+        camera_rotation = gymapi.Quat.from_axis_angle(gymapi.Vec3(-0.3, 0.2, 1),
+                                                    np.deg2rad(135))
+        actor_handle = env.gym.get_actor_handle(env.envs[0], 0)
+        body_handle = env.gym.get_actor_rigid_body_handle(env.envs[0], actor_handle, 0)
+        env.gym.attach_camera_to_body(
+            h1, env.envs[0], body_handle,
+            gymapi.Transform(camera_offset, camera_rotation),
+            gymapi.FOLLOW_POSITION)
+
+        fourcc = cv2.VideoWriter_fourcc(*"mp4v")
+        video_dir = os.path.join(LEGGED_GYM_ROOT_DIR, 'videos')
+        experiment_dir = os.path.join(LEGGED_GYM_ROOT_DIR, 'videos', train_cfg.runner.experiment_name)
+        dir = os.path.join(experiment_dir, datetime.now().strftime('%b%d_%H-%M-%S')+ args.run_name + '.mp4')
+        if not os.path.exists(video_dir):
+            os.makedirs(video_dir,exist_ok=True)
+        if not os.path.exists(experiment_dir):
+            os.makedirs(experiment_dir,exist_ok=True)
+        video = cv2.VideoWriter(dir, fourcc, 50.0, (1920, 1080))
+    
+    obs = env.get_observations()
+
+    np.set_printoptions(formatter={'float': '{:0.4f}'.format})
+    for i in range(10*stop_state_log):
+        
+        actions = policy(obs.detach()) # * 0.
+        
+        if FIX_COMMAND:
+            env.commands[:, 0] = 0.5   # 1.0
+            env.commands[:, 1] = 0
+            env.commands[:, 2] = 0
+            env.commands[:, 3] = 0.
+            
+        else:
+            env.commands[:, 0] = x_vel_cmd
+            env.commands[:, 1] = y_vel_cmd
+            env.commands[:, 2] = yaw_vel_cmd
+            env.commands[:, 3] = 0.
+        
+        obs, critic_obs, rews, dones, infos = env.step(actions.detach())
+
+        if RENDER:
+            env.gym.fetch_results(env.sim, True)
+            env.gym.step_graphics(env.sim)
+            env.gym.render_all_camera_sensors(env.sim)
+            img = env.gym.get_camera_image(env.sim, env.envs[0], h1, gymapi.IMAGE_COLOR)
+            img = np.reshape(img, (1080, 1920, 4))
+            img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
+            video.write(img[..., :3])
+
+        if i > stop_state_log*0.2 and i < stop_state_log:
+            dict = {
+                    'base_height' : env.root_states[robot_index, 2].item(),
+                    'foot_z_l' : env.rigid_state[robot_index,4,2].item(),
+                    'foot_z_r' : env.rigid_state[robot_index,9,2].item(),
+                    'foot_forcez_l' : env.contact_forces[robot_index,4,2].item(),
+                    'foot_forcez_r' : env.contact_forces[robot_index,9,2].item(),
+                    'base_vel_x': env.base_lin_vel[robot_index, 0].item(),
+                    'command_x': x_vel_cmd,
+                    'base_vel_y':  env.base_lin_vel[robot_index, 1].item(),
+                    'command_y': y_vel_cmd,
+                    'base_vel_z':  env.base_lin_vel[robot_index, 2].item(),
+                    'base_vel_yaw':  env.base_ang_vel[robot_index, 2].item(),
+                    'command_yaw': yaw_vel_cmd,
+                    'dof_pos_target': actions[robot_index, 0].item() * env.cfg.control.action_scale,
+                    'dof_pos': env.dof_pos[robot_index, 0].item(),
+                    'dof_vel': env.dof_vel[robot_index, 0].item(),
+                    'dof_torque': env.torques[robot_index, 0].item(),
+                    'command_sin': obs[0,0].item(),
+                    'command_cos': obs[0,1].item(),
+                }
+
+            # add dof_pos_target
+            for i in range(env_cfg.env.num_actions):
+                dict[f'dof_pos_target[{i}]'] = actions[robot_index, i].item() * env.cfg.control.action_scale,
+
+            # add dof_pos
+            for i in range(env_cfg.env.num_actions):
+                dict[f'dof_pos[{i}]'] = env.dof_pos[robot_index, i].item(),
+
+            # add dof_torque
+            for i in range(env_cfg.env.num_actions):
+                dict[f'dof_torque[{i}]'] = env.torques[robot_index, i].item(),
+
+            # add dof_vel
+            for i in range(env_cfg.env.num_actions):
+                dict[f'dof_vel[{i}]'] = env.dof_vel[robot_index, i].item(),
+
+            logger.log_states(dict=dict)
+        
+        elif _== stop_state_log:
+            logger.plot_states()
+        elif i == stop_state_log:
+            logger.plot_states()
+
+        # ====================== Log states ======================
+        if infos["episode"]:
+            num_episodes = torch.sum(env.reset_buf).item()
+            if num_episodes>0:
+                logger.log_rewards(infos["episode"], num_episodes)
+
+    if RENDER:
+        video.release()
+
+if __name__ == '__main__':
+    EXPORT_POLICY = False
+    RENDER = False
+    FIX_COMMAND = False
+    args = get_args()
+    play(args)
--- a/humanoid/scripts/sim2sim.py
+++ b/humanoid/scripts/sim2sim.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import math
+import numpy as np
+import mujoco, mujoco_viewer
+from collections import deque
+from scipy.spatial.transform import Rotation as R
+from humanoid import LEGGED_GYM_ROOT_DIR
+from humanoid.envs import *
+from humanoid.utils import  Logger
+import torch
+import pygame
+from threading import Thread
+from humanoid.utils.helpers import get_load_path
+import os
+import time
+
+x_vel_cmd, y_vel_cmd, yaw_vel_cmd = 0.0, 0.0, 0.0
+joystick_use = True
+joystick_opened = False
+
+if joystick_use:
+    pygame.init()
+    try:
+        # get joystick
+        joystick = pygame.joystick.Joystick(0)
+        joystick.init()
+        joystick_opened = True
+    except Exception as e:
+        print(f"无法打开手柄：{e}")
+    # joystick thread exit flag
+    exit_flag = False
+
+    def handle_joystick_input():
+        global exit_flag, x_vel_cmd, y_vel_cmd, yaw_vel_cmd, head_vel_cmd
+        
+        
+        while not exit_flag:
+            # get joystick input
+            pygame.event.get()
+            # update robot command
+            x_vel_cmd = -joystick.get_axis(1) * 1
+            y_vel_cmd = -joystick.get_axis(0) * 1
+            yaw_vel_cmd = -joystick.get_axis(3) * 1
+            pygame.time.delay(100)
+
+    if joystick_opened and joystick_use:
+        joystick_thread = Thread(target=handle_joystick_input)
+        joystick_thread.start()
+
+class cmd:
+    vx = 0.0
+    vy = 0.0
+    dyaw = 0.0
+
+def quaternion_to_euler_array(quat):
+    # Ensure quaternion is in the correct format [x, y, z, w]
+    x, y, z, w = quat
+    
+    # Roll (x-axis rotation)
+    t0 = +2.0 * (w * x + y * z)
+    t1 = +1.0 - 2.0 * (x * x + y * y)
+    roll_x = np.arctan2(t0, t1)
+    
+    # Pitch (y-axis rotation)
+    t2 = +2.0 * (w * y - z * x)
+    t2 = np.clip(t2, -1.0, 1.0)
+    pitch_y = np.arcsin(t2)
+    
+    # Yaw (z-axis rotation)
+    t3 = +2.0 * (w * z + x * y)
+    t4 = +1.0 - 2.0 * (y * y + z * z)
+    yaw_z = np.arctan2(t3, t4)
+    
+    # Returns roll, pitch, yaw in a NumPy array in radians
+    return np.array([roll_x, pitch_y, yaw_z])
+
+def get_obs(data,model):
+    '''Extracts an observation from the mujoco data structure
+    '''
+    q = data.qpos.astype(np.double)
+    dq = data.qvel.astype(np.double)
+    quat = data.sensor('body-orientation').data[[1, 2, 3, 0]].astype(np.double)
+    r = R.from_quat(quat)
+    v = r.apply(data.qvel[:3], inverse=True).astype(np.double)  # In the base frame
+    omega = data.sensor('body-angular-velocity').data.astype(np.double)
+    gvec = r.apply(np.array([0., 0., -1.]), inverse=True).astype(np.double)
+    foot_positions = []
+    foot_forces = []
+    for i in range(model.nbody):
+        body_name = mujoco.mj_id2name(model, mujoco.mjtObj.mjOBJ_BODY, i)
+        if '5_link' or 'ankle_roll' in body_name:  # according to model name
+            foot_positions.append(data.xpos[i][2].copy().astype(np.double))
+            foot_forces.append(data.cfrc_ext[i][2].copy().astype(np.double)) 
+        if 'base_link' or 'waist_link' in body_name:  # according to model name
+            base_pos = data.xpos[i][:3].copy().astype(np.double)
+    return (q, dq, quat, v, omega, gvec, base_pos, foot_positions, foot_forces)
+
+def pd_control(target_q, q, kp, target_dq, dq, kd, cfg):
+    '''Calculates torques from position commands
+    '''
+    torque_out = (target_q + cfg.robot_config.default_dof_pos - q ) * kp - dq * kd
+    return torque_out
+
+
+def run_mujoco(policy, cfg, env_cfg):
+    """
+    Run the Mujoco simulation using the provided policy and configuration.
+
+    Args:
+        policy: The policy used for controlling the simulation.
+        cfg: The configuration object containing simulation settings.
+
+    Returns:
+        None
+    """
+    print("Load mujoco xml from:", cfg.sim_config.mujoco_model_path)
+    # load model xml
+    model = mujoco.MjModel.from_xml_path(cfg.sim_config.mujoco_model_path)
+    # simulation timestep
+    model.opt.timestep = cfg.sim_config.dt
+    # model data
+    data = mujoco.MjData(model)
+    num_actuated_joints = env_cfg.env.num_actions  # This should match the number of actuated joints in your model
+    data.qpos[-num_actuated_joints:] = cfg.robot_config.default_dof_pos
+
+
+    mujoco.mj_step(model, data)
+    viewer = mujoco_viewer.MujocoViewer(model, data)
+    target_q = np.zeros((env_cfg.env.num_actions), dtype=np.double)
+    action = np.zeros((env_cfg.env.num_actions), dtype=np.double)
+
+    hist_obs = deque()
+    for _ in range(env_cfg.env.frame_stack):
+        hist_obs.append(np.zeros([1, env_cfg.env.num_single_obs], dtype=np.double))
+
+    count_lowlevel = 1
+    logger = Logger(cfg.sim_config.dt)
+    
+    stop_state_log = 40000
+
+    np.set_printoptions(formatter={'float': '{:0.4f}'.format})
+
+    for _ in range(int(cfg.sim_config.sim_duration / cfg.sim_config.dt)):
+        # Obtain an observation
+        q, dq, quat, v, omega, gvec, base_pos, foot_positions, foot_forces = get_obs(data,model)
+        q = q[-env_cfg.env.num_actions:]
+        dq = dq[-env_cfg.env.num_actions:]
+        
+        base_z = base_pos[2]
+        foot_z = foot_positions
+        foot_force_z = foot_forces
+        # 1000hz -> 100hz
+        if count_lowlevel % cfg.sim_config.decimation == 0:
+            ####### for stand only #######
+            if hasattr(env_cfg.commands,"sw_switch"):
+                vel_norm = np.sqrt(x_vel_cmd**2 + y_vel_cmd**2 + yaw_vel_cmd**2)
+                if env_cfg.commands.sw_switch and vel_norm <= env_cfg.commands.stand_com_threshold:
+                    count_lowlevel = 0
+                    
+            obs = np.zeros([1, env_cfg.env.num_single_obs], dtype=np.float32)
+            eu_ang = quaternion_to_euler_array(quat)
+            eu_ang[eu_ang > math.pi] -= 2 * math.pi
+
+            if env_cfg.env.num_commands == 5:
+                obs[0, 0] = math.sin(2 * math.pi * count_lowlevel * cfg.sim_config.dt  / env_cfg.rewards.cycle_time)
+                obs[0, 1] = math.cos(2 * math.pi * count_lowlevel * cfg.sim_config.dt  / env_cfg.rewards.cycle_time)
+                obs[0, 2] = x_vel_cmd * env_cfg.normalization.obs_scales.lin_vel
+                obs[0, 3] = y_vel_cmd * env_cfg.normalization.obs_scales.lin_vel
+                obs[0, 4] = yaw_vel_cmd * env_cfg.normalization.obs_scales.ang_vel
+            if env_cfg.env.num_commands == 3:
+                obs[0, 0] = x_vel_cmd * env_cfg.normalization.obs_scales.lin_vel
+                obs[0, 1] = y_vel_cmd * env_cfg.normalization.obs_scales.lin_vel
+                obs[0, 2] = yaw_vel_cmd * env_cfg.normalization.obs_scales.ang_vel
+            obs[0, env_cfg.env.num_commands:env_cfg.env.num_commands+env_cfg.env.num_actions] = (q - cfg.robot_config.default_dof_pos) * env_cfg.normalization.obs_scales.dof_pos
+            obs[0, env_cfg.env.num_commands+env_cfg.env.num_actions:env_cfg.env.num_commands+2*env_cfg.env.num_actions] = dq * env_cfg.normalization.obs_scales.dof_vel
+            obs[0, env_cfg.env.num_commands+2*env_cfg.env.num_actions:env_cfg.env.num_commands+3*env_cfg.env.num_actions] = action
+            obs[0, env_cfg.env.num_commands+3*env_cfg.env.num_actions:env_cfg.env.num_commands+3*env_cfg.env.num_actions+3] = omega
+            obs[0, env_cfg.env.num_commands+3*env_cfg.env.num_actions+3:env_cfg.env.num_commands+3*env_cfg.env.num_actions+6] = eu_ang
+            
+            ####### for stand only #######
+            if env_cfg.env.add_stand_bool:
+                vel_norm = np.sqrt(x_vel_cmd**2 + y_vel_cmd**2 + yaw_vel_cmd**2)
+                stand_command = (vel_norm <= env_cfg.commands.stand_com_threshold)
+                obs[0, -1] = stand_command
+            
+            print(x_vel_cmd, y_vel_cmd, yaw_vel_cmd)
+
+            obs = np.clip(obs, -env_cfg.normalization.clip_observations, env_cfg.normalization.clip_observations)
+
+            hist_obs.append(obs)
+            hist_obs.popleft()
+
+            policy_input = np.zeros([1, env_cfg.env.num_observations], dtype=np.float32)
+            for i in range(env_cfg.env.frame_stack):
+                policy_input[0, i * env_cfg.env.num_single_obs : (i + 1) * env_cfg.env.num_single_obs] = hist_obs[i][0, :]
+            
+            action[:] = policy(torch.tensor(policy_input))[0].detach().numpy()
+            action = np.clip(action, -env_cfg.normalization.clip_actions, env_cfg.normalization.clip_actions)
+            target_q = action * env_cfg.control.action_scale
+
+        target_dq = np.zeros((env_cfg.env.num_actions), dtype=np.double)
+        # Generate PD control
+        tau = pd_control(target_q, q, cfg.robot_config.kps,
+                        target_dq, dq, cfg.robot_config.kds, cfg)  # Calc torques
+        tau = np.clip(tau, -cfg.robot_config.tau_limit, cfg.robot_config.tau_limit)  # Clamp torques
+        
+        data.ctrl = tau
+        applied_tau = data.actuator_force
+
+        mujoco.mj_step(model, data)
+        viewer.render()
+
+        count_lowlevel += 1
+        idx = 5
+        dof_pos_target = target_q + cfg.robot_config.default_dof_pos
+        if _ < stop_state_log:
+            dict = {
+                    'base_height': base_z,
+                    'foot_z_l': foot_z[0],
+                    'foot_z_r': foot_z[1],
+                    'foot_forcez_l': foot_force_z[0],
+                    'foot_forcez_r': foot_force_z[1],
+                    'base_vel_x': v[0],
+                    'command_x': x_vel_cmd,
+                    'base_vel_y': v[1],
+                    'command_y': y_vel_cmd,
+                    'base_vel_z': v[2],
+                    'base_vel_yaw': omega[2],
+                    'command_yaw': yaw_vel_cmd,
+                    'dof_pos_target': dof_pos_target[idx],
+                    'dof_pos': q[idx],
+                    'dof_vel': dq[idx],
+                    'dof_torque': applied_tau[idx],
+                    'cmd_dof_torque': tau[idx],
+                }
+
+            # add dof_pos_target
+            for i in range(env_cfg.env.num_actions):
+                dict[f'dof_pos_target[{i}]'] = dof_pos_target[i].item()
+
+            # add dof_pos
+            for i in range(env_cfg.env.num_actions):
+                dict[f'dof_pos[{i}]'] = q[i].item()
+
+            # add dof_torque
+            for i in range(env_cfg.env.num_actions):
+                dict[f'dof_torque[{i}]'] = applied_tau[i].item()
+
+            # add dof_vel
+            for i in range(env_cfg.env.num_actions):
+                dict[f'dof_vel[{i}]'] = dq[i].item()
+            logger.log_states(dict=dict)
+        
+        elif _== stop_state_log:
+            logger.plot_states()
+
+    viewer.close()
+
+
+if __name__ == '__main__':
+    import argparse
+
+    parser = argparse.ArgumentParser(description='Deployment script.')
+    parser.add_argument('--load_model', type=str,
+                        help='Name of the run to load when resume=True. If -1: will load the last run. Overrides config file if provided.')
+    parser.add_argument('--task', type=str, required=True,
+                        help='task name.')
+    args = parser.parse_args()
+    env_cfg, _ = task_registry.get_cfgs(name=args.task)
+
+    class Sim2simCfg():
+
+        class sim_config:
+            mujoco_model_path = env_cfg.asset.xml_file.format(LEGGED_GYM_ROOT_DIR=LEGGED_GYM_ROOT_DIR)
+            sim_duration = 100.0
+            dt = 0.001
+            decimation = 10
+
+        class robot_config:
+            # get PD gain
+            kps = np.array([env_cfg.control.stiffness[joint] for joint in env_cfg.control.stiffness.keys()]*2, dtype=np.double)
+            kds = np.array([env_cfg.control.damping[joint] for joint in env_cfg.control.damping.keys()]*2, dtype=np.double)
+
+            tau_limit = 500. * np.ones(env_cfg.env.num_actions, dtype=np.double)  # 定义关节力矩的限制
+
+            default_dof_pos = np.array(list(env_cfg.init_state.default_joint_angles.values()))
+
+    # load model
+    root_path = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', args.task, 'exported_policies')
+    if args.load_model == None:
+        jit_path = os.listdir(root_path)
+        jit_path.sort()
+        model_path = os.path.join(root_path, jit_path[-1])
+    else:
+        model_path = os.path.join(root_path, args.load_model)
+    jit_name = os.listdir(model_path)
+    model_path = os.path.join(model_path,jit_name[-1])
+    policy = torch.jit.load(model_path)
+    print("Load model from:", model_path)
+
+    run_mujoco(policy, Sim2simCfg(), env_cfg)
+
+    
\ No newline at end of file
--- a/humanoid/scripts/train.py
+++ b/humanoid/scripts/train.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+from humanoid.envs import *
+from humanoid.utils import get_args, task_registry
+
+def train(args):
+    env, env_cfg = task_registry.make_env(name=args.task, args=args)
+    ppo_runner, train_cfg, log_dir = task_registry.make_alg_runner(env=env, name=args.task, args=args)
+    ppo_runner.learn(num_learning_iterations=train_cfg.runner.max_iterations, init_at_random_ep_len=False)
+
+if __name__ == '__main__':
+    args = get_args()
+    train(args)
--- a/humanoid/utils/__init__.py
+++ b/humanoid/utils/__init__.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+from .helpers import class_to_dict, get_load_path, get_args, export_policy_as_jit, set_seed, update_class_from_dict
+from .task_registry import task_registry
+from .logger import Logger
+from .math import *
+from .terrain import Terrain
\ No newline at end of file
--- a/humanoid/utils/helpers.py
+++ b/humanoid/utils/helpers.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import os
+import copy
+import torch
+import numpy as np
+import random
+from isaacgym import gymapi
+from isaacgym import gymutil
+
+from humanoid import LEGGED_GYM_ROOT_DIR, LEGGED_GYM_ENVS_DIR
+
+
+def class_to_dict(obj) -> dict:
+    if not hasattr(obj, "__dict__"):
+        return obj
+    result = {}
+    for key in dir(obj):
+        if key.startswith("_"):
+            continue
+        element = []
+        val = getattr(obj, key)
+        if isinstance(val, list):
+            for item in val:
+                element.append(class_to_dict(item))
+        else:
+            element = class_to_dict(val)
+        result[key] = element
+    return result
+
+
+def update_class_from_dict(obj, dict):
+    for key, val in dict.items():
+        attr = getattr(obj, key, None)
+        if isinstance(attr, type):
+            update_class_from_dict(attr, val)
+        else:
+            setattr(obj, key, val)
+    return
+
+
+def set_seed(seed):
+    if seed == -1:
+        seed = np.random.randint(0, 10000)
+    print("Setting seed: {}".format(seed))
+
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    os.environ["PYTHONHASHSEED"] = str(seed)
+    torch.cuda.manual_seed(seed)
+    torch.cuda.manual_seed_all(seed)
+    
+    # For cudnn backend to ensure reproducibility
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+
+
+def parse_sim_params(args, cfg):
+    # code from Isaac Gym Preview 2
+    # initialize sim params
+    sim_params = gymapi.SimParams()
+
+    # set some values from args
+    if args.physics_engine == gymapi.SIM_FLEX:
+        if args.device != "cpu":
+            print("WARNING: Using Flex with GPU instead of PHYSX!")
+    elif args.physics_engine == gymapi.SIM_PHYSX:
+        sim_params.physx.use_gpu = args.use_gpu
+        sim_params.physx.num_subscenes = args.subscenes
+    sim_params.use_gpu_pipeline = args.use_gpu_pipeline
+
+    # if sim options are provided in cfg, parse them and update/override above:
+    if "sim" in cfg:
+        gymutil.parse_sim_config(cfg["sim"], sim_params)
+
+    # Override num_threads if passed on the command line
+    if args.physics_engine == gymapi.SIM_PHYSX and args.num_threads > 0:
+        sim_params.physx.num_threads = args.num_threads
+
+    return sim_params
+
+
+def get_load_path(root, load_run=-1, checkpoint=-1):
+    try:
+        runs = os.listdir(root)
+        runs.sort()
+        if "exported" in runs:
+            runs.remove("exported")
+        last_run = os.path.join(root, runs[-1])
+    except:
+        raise ValueError("No runs in this directory: " + root)
+    if load_run == -1:
+        load_run = last_run
+    else:
+        load_run = os.path.join(root, load_run)
+
+    if checkpoint == -1:
+        models = [file for file in os.listdir(load_run) if "model" in file]
+        models.sort(key=lambda m: "{0:0>15}".format(m))
+        model = models[-1]
+    else:
+        model = "model_{}.pt".format(checkpoint)
+
+    load_path = os.path.join(load_run, model)
+    return load_path
+
+
+def update_cfg_from_args(env_cfg, cfg_train, args):
+    # seed
+    if env_cfg is not None:
+        # num envs
+        if args.num_envs is not None:
+            env_cfg.env.num_envs = args.num_envs
+    if cfg_train is not None:
+        if args.seed is not None:
+            cfg_train.seed = args.seed
+        # alg runner parameters
+        if args.max_iterations is not None:
+            cfg_train.runner.max_iterations = args.max_iterations
+        if args.resume:
+            cfg_train.runner.resume = args.resume
+        if args.experiment_name is not None:
+            cfg_train.runner.experiment_name = args.experiment_name
+        if args.run_name is not None:
+            cfg_train.runner.run_name = args.run_name
+        if args.load_run is not None:
+            cfg_train.runner.load_run = args.load_run
+        if args.checkpoint is not None:
+            cfg_train.runner.checkpoint = args.checkpoint
+
+    return env_cfg, cfg_train
+
+
+def get_args():
+    custom_parameters = [
+        {
+            "name": "--task",
+            "type": str,
+            "default": "XBotL_free",
+            "help": "Resume training or start testing from a checkpoint. Overrides config file if provided.",
+        },
+        {
+            "name": "--resume",
+            "action": "store_true",
+            "default": False,
+            "help": "Resume training from a checkpoint",
+        },
+        {
+            "name": "--experiment_name",
+            "type": str,
+            "help": "Name of the experiment to run or load. Overrides config file if provided.",
+        },
+        {
+            "name": "--run_name",
+            "type": str,
+            "help": "Name of the run. Overrides config file if provided.",
+        },
+        {
+            "name": "--load_run",
+            "type": str,
+            "help": "Name of the run to load when resume=True. If -1: will load the last run. Overrides config file if provided.",
+        },
+        {
+            "name": "--checkpoint",
+            "type": int,
+            "help": "Saved model checkpoint number. If -1: will load the last checkpoint. Overrides config file if provided.",
+        },
+        {
+            "name": "--headless",
+            "action": "store_true",
+            "default": False,
+            "help": "Force display off at all times",
+        },
+        {
+            "name": "--horovod",
+            "action": "store_true",
+            "default": False,
+            "help": "Use horovod for multi-gpu training",
+        },
+        {
+            "name": "--rl_device",
+            "type": str,
+            "default": "cuda:0",
+            "help": "Device used by the RL algorithm, (cpu, gpu, cuda:0, cuda:1 etc..)",
+        },
+        {
+            "name": "--num_envs",
+            "type": int,
+            "help": "Number of environments to create. Overrides config file if provided.",
+        },
+        {
+            "name": "--seed",
+            "type": int,
+            "help": "Random seed. Overrides config file if provided.",
+        },
+        {
+            "name": "--max_iterations",
+            "type": int,
+            "help": "Maximum number of training iterations. Overrides config file if provided.",
+        },
+    ]
+    # parse arguments
+    args = gymutil.parse_arguments(
+        description="RL Policy", custom_parameters=custom_parameters
+    )
+
+    # name allignment
+    args.sim_device_id = args.compute_device_id
+    args.sim_device = args.sim_device_type
+    if args.sim_device == "cuda":
+        args.sim_device += f":{args.sim_device_id}"
+    return args
+
+
+def export_policy_as_jit(actor_critic, path):
+    os.makedirs(path, exist_ok=True)
+    path = os.path.join(path, "policy_1.pt")
+    model = copy.deepcopy(actor_critic.actor).to("cpu")
+    traced_script_module = torch.jit.script(model)
+    traced_script_module.save(path)
--- a/humanoid/utils/logger.py
+++ b/humanoid/utils/logger.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import matplotlib.pyplot as plt
+import numpy as np
+from collections import defaultdict
+from multiprocessing import Process, Value
+
+class Logger:
+    def __init__(self, dt):
+        self.state_log = defaultdict(list)
+        self.rew_log = defaultdict(list)
+        self.dt = dt
+        self.num_episodes = 0
+        self.plot_process = None
+
+    def log_state(self, key, value):
+        self.state_log[key].append(value)
+
+    def log_states(self, dict):
+        for key, value in dict.items():
+            self.log_state(key, value)
+
+    def log_rewards(self, dict, num_episodes):
+        for key, value in dict.items():
+            if 'rew' in key:
+                self.rew_log[key].append(value.item() * num_episodes)
+        self.num_episodes += num_episodes
+
+    def reset(self):
+        self.state_log.clear()
+        self.rew_log.clear()
+
+    def plot_states(self):
+        self.plot_process = Process(target=self._plot)
+        self.plot_process1 = Process(target=self._plot_position)
+        # self.plot_process2 = Process(target=self._plot_torque)
+        # self.plot_process3 = Process(target=self._plot_vel)
+        # self.plot_process4 = Process(target=self._plot_tn_rms)
+        # self.plot_process5 = Process(target=self._plot_tn)
+        # self.plot_process6 = Process(target=self._plot_torque_vel)
+        self.plot_process7 = Process(target=self._plot_position1)
+        # self.plot_process8 = Process(target=self._plot_torque1)
+        # self.plot_process9 = Process(target=self._plot_vel1)
+        # self.plot_process10 = Process(target=self._plot_tn_rms1)
+        # self.plot_process11 = Process(target=self._plot_tn1)
+        # self.plot_process12 = Process(target=self._plot_torque_vel1)
+        self.plot_process.start()
+        self.plot_process1.start()
+        # self.plot_process2.start()
+        # self.plot_process3.start()
+        # self.plot_process4.start()
+        # self.plot_process5.start()
+        # self.plot_process6.start()
+        self.plot_process7.start()
+        # self.plot_process8.start()
+        # self.plot_process9.start()
+        # self.plot_process10.start()
+        # self.plot_process11.start()
+        # self.plot_process12.start()
+
+    def _plot(self):
+        nb_rows = 3
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        # plot base vel x
+        a = axs[0, 0]
+        if log["base_vel_x"]: a.plot(time, log["base_vel_x"], label='measured')
+        if log["command_x"]: a.plot(time, log["command_x"], label='commanded')
+        if log["command_sin"]: a.plot(time, log["command_sin"], label='commanded')
+        a.set(xlabel='time [s]', ylabel='base lin vel [m/s]', title='Base velocity x')
+        a.legend()
+        # plot base vel y
+        a = axs[0, 1]
+        if log["base_vel_y"]: a.plot(time, log["base_vel_y"], label='measured')
+        if log["command_y"]: a.plot(time, log["command_y"], label='commanded')
+        if log["command_cos"]: a.plot(time, log["command_cos"], label='commanded')
+        a.set(xlabel='time [s]', ylabel='base lin vel [m/s]', title='Base velocity y')
+        a.legend()
+        # plot base vel yaw
+        a = axs[0, 2]
+        if log["base_vel_yaw"]: a.plot(time, log["base_vel_yaw"], label='measured')
+        if log["command_yaw"]: a.plot(time, log["command_yaw"], label='commanded')
+        a.set(xlabel='time [s]', ylabel='base ang vel [rad/s]', title='Base velocity yaw')
+        a.legend()
+        # # plot base vel z
+        # a = axs[1, 0]
+        # if log["command_sin"]: a.plot(time, log["command_sin"], label='commanded')
+        # a.set(xlabel='time [s]', ylabel='command sin', title='Command Sin')
+        # a.legend()
+        # # plot contact forces
+        # a = axs[1, 1]
+        # if log["command_cos"]: a.plot(time, log["command_cos"], label='commanded')
+        # a.set(xlabel='time [s]', ylabel='command cos', title='Command Cos')
+        # a.legend()
+        plt.show()
+
+    def _plot_position(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        # plot position targets and measured positions
+        a = axs[0, 0]
+        if log["dof_pos[0]"]: a.plot(time, log["dof_pos[0]"], label='measured')
+        if log["dof_pos_target[0]"]: a.plot(time, log["dof_pos_target[0]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[0]')
+        a.legend()
+
+        a = axs[0, 1]
+        if log["dof_pos[1]"]: a.plot(time, log["dof_pos[1]"], label='measured')
+        if log["dof_pos_target[1]"]: a.plot(time, log["dof_pos_target[1]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[1]')
+        a.legend()
+
+        a = axs[0, 2]
+        if log["dof_pos[2]"]: a.plot(time, log["dof_pos[2]"], label='measured')
+        if log["dof_pos_target[2]"]: a.plot(time, log["dof_pos_target[2]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[2]')
+        a.legend()
+
+        a = axs[1, 0]
+        if log["dof_pos[3]"]: a.plot(time, log["dof_pos[3]"], label='measured')
+        if log["dof_pos_target[3]"]: a.plot(time, log["dof_pos_target[3]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[3]')
+        a.legend()
+
+        a = axs[1, 1]
+        if log["dof_pos[4]"]: a.plot(time, log["dof_pos[4]"], label='measured')
+        if log["dof_pos_target[4]"]: a.plot(time, log["dof_pos_target[4]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[4]')
+        a.legend()
+
+        a = axs[1, 2]
+        if log["dof_pos[5]"]: a.plot(time, log["dof_pos[5]"], label='measured')
+        if log["dof_pos_target[5]"]: a.plot(time, log["dof_pos_target[5]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[5]')
+        a.legend()
+        plt.show()
+        
+    def _plot_position1(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        # plot position targets and measured positions
+        a = axs[0, 0]
+        if log["dof_pos[6]"]: a.plot(time, log["dof_pos[6]"], label='measured')
+        if log["dof_pos_target[6]"]: a.plot(time, log["dof_pos_target[6]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[6]')
+        a.legend()
+
+        a = axs[0, 1]
+        if log["dof_pos[7]"]: a.plot(time, log["dof_pos[7]"], label='measured')
+        if log["dof_pos_target[7]"]: a.plot(time, log["dof_pos_target[7]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[7]')
+        a.legend()
+
+        a = axs[0, 2]
+        if log["dof_pos[8]"]: a.plot(time, log["dof_pos[8]"], label='measured')
+        if log["dof_pos_target[8]"]: a.plot(time, log["dof_pos_target[8]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[8]')
+        a.legend()
+
+        a = axs[1, 0]
+        if log["dof_pos[9]"]: a.plot(time, log["dof_pos[9]"], label='measured')
+        if log["dof_pos_target[9]"]: a.plot(time, log["dof_pos_target[9]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[9]')
+        a.legend()
+
+        a = axs[1, 1]
+        if log["dof_pos[10]"]: a.plot(time, log["dof_pos[10]"], label='measured')
+        if log["dof_pos_target[10]"]: a.plot(time, log["dof_pos_target[10]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[10]')
+        a.legend()
+
+        a = axs[1, 2]
+        if log["dof_pos[11]"]: a.plot(time, log["dof_pos[11]"], label='measured')
+        if log["dof_pos_target[11]"]: a.plot(time, log["dof_pos_target[11]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Position [rad]', title='DOF Position[11]')
+        a.legend()
+        plt.show()
+
+    def _plot_torque(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        # plot position targets and measured positions
+        a = axs[0, 0]
+        if log["dof_torque[0]"]!=[]: a.plot(time, log["dof_torque[0]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[0]')
+        a.legend()
+        a = axs[0,1]
+        if log["dof_torque[1]"]!=[]: a.plot(time, log["dof_torque[1]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[1]')
+        a.legend()
+        a = axs[0, 2]
+        if log["dof_torque[2]"]!=[]: a.plot(time, log["dof_torque[2]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[2]')
+        a.legend()
+        a = axs[1, 0]
+        if log["dof_torque[3]"]!=[]: a.plot(time, log["dof_torque[3]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[3]')
+        a.legend()
+        a = axs[1, 1]
+        if log["dof_torque[4]"]!=[]: a.plot(time, log["dof_torque[4]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[4]')
+        a.legend()
+        a = axs[1, 2]
+        if log["dof_torque[5]"]!=[]: a.plot(time, log["dof_torque[5]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[5]')
+        a.legend()
+        plt.show()
+
+    def _plot_torque1(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        # plot position targets and measured positions
+        a = axs[0, 0]
+        if log["dof_torque[6]"]!=[]: a.plot(time, log["dof_torque[6]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[6]')
+        a.legend()
+        a = axs[0,1]
+        if log["dof_torque[7]"]!=[]: a.plot(time, log["dof_torque[7]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[7]')
+        a.legend()
+        a = axs[0, 2]
+        if log["dof_torque[8]"]!=[]: a.plot(time, log["dof_torque[8]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[8]')
+        a.legend()
+        a = axs[1, 0]
+        if log["dof_torque[9]"]!=[]: a.plot(time, log["dof_torque[9]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[9]')
+        a.legend()
+        a = axs[1, 1]
+        if log["dof_torque[10]"]!=[]: a.plot(time, log["dof_torque[10]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[10]')
+        a.legend()
+        a = axs[1, 2]
+        if log["dof_torque[11]"]!=[]: a.plot(time, log["dof_torque[11]"], label='measured')
+        a.set(xlabel='time [s]', ylabel='Joint Torque [Nm]', title='Torque[11]')
+        a.legend()
+        plt.show()
+        
+    def _plot_vel(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        # plot position targets and measured positions
+        a = axs[0, 0]
+        if log["dof_vel[0]"]: a.plot(time, log["dof_vel[0]"], label='measured')
+        if log["dof_vel_target[0]"]: a.plot(time, log["dof_vel_target[0]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[0]')
+        a = axs[0,1]
+        a.legend()
+        if log["dof_vel[1]"]: a.plot(time, log["dof_vel[1]"], label='measured')
+        if log["dof_vel_target[1]"]: a.plot(time, log["dof_vel_target[1]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[1]')
+        a.legend()
+        a = axs[0, 2]
+        if log["dof_vel[2]"]: a.plot(time, log["dof_vel[2]"], label='measured')
+        if log["dof_vel_target[2]"]: a.plot(time, log["dof_vel_target[2]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[2]')
+        a.legend()
+        a = axs[1, 0]
+        if log["dof_vel[3]"]: a.plot(time, log["dof_vel[3]"], label='measured')
+        if log["dof_vel_target[3]"]: a.plot(time, log["dof_vel_target[3]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[3]')
+        a.legend()
+        a = axs[1, 1]
+        if log["dof_vel[4]"]: a.plot(time, log["dof_vel[4]"], label='measured')
+        if log["dof_vel_target[4]"]: a.plot(time, log["dof_vel_target[4]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[4]')
+        a.legend()
+        a = axs[1, 2]
+        if log["dof_vel[5]"]: a.plot(time, log["dof_vel[5]"], label='measured')
+        if log["dof_vel_target[5]"]: a.plot(time, log["dof_vel_target[5]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[5]')
+        a.legend()
+        plt.show()
+
+    def _plot_vel1(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        # plot position targets and measured positions
+        a = axs[0, 0]
+        if log["dof_vel[6]"]: a.plot(time, log["dof_vel[6]"], label='measured')
+        if log["dof_vel_target[6]"]: a.plot(time, log["dof_vel_target[6]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[6]')
+        a = axs[0,1]
+        a.legend()
+        if log["dof_vel[7]"]: a.plot(time, log["dof_vel[7]"], label='measured')
+        if log["dof_vel_target[7]"]: a.plot(time, log["dof_vel_target[7]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[7]')
+        a.legend()
+        a = axs[0, 2]
+        if log["dof_vel[8]"]: a.plot(time, log["dof_vel[8]"], label='measured')
+        if log["dof_vel_target[8]"]: a.plot(time, log["dof_vel_target[8]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[8]')
+        a.legend()
+        a = axs[1, 0]
+        if log["dof_vel[9]"]: a.plot(time, log["dof_vel[9]"], label='measured')
+        if log["dof_vel_target[9]"]: a.plot(time, log["dof_vel_target[9]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[9]')
+        a.legend()
+        a = axs[1, 1]
+        if log["dof_vel[10]"]: a.plot(time, log["dof_vel[10]"], label='measured')
+        if log["dof_vel_target[4]"]: a.plot(time, log["dof_vel_target[4]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[10]')
+        a.legend()
+        a = axs[1, 2]
+        if log["dof_vel[11]"]: a.plot(time, log["dof_vel[11]"], label='measured')
+        if log["dof_vel_target[5]"]: a.plot(time, log["dof_vel_target[11]"], label='target')
+        a.set(xlabel='time [s]', ylabel='Velocity [rad/s]', title='Joint Velocity[11]')
+        a.legend()
+        plt.show()
+
+    def _plot_tn_rms(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        a = axs[0, 0]
+        if log["dof_torque[0]"] != [] and log["dof_vel[0]"] != []:
+            vel_array = np.array(log["dof_vel[0]"])
+            torque_array = np.array(log["dof_torque[0]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[0]')
+        a.legend()
+
+        a = axs[0,1]
+        if log["dof_torque[1]"] != [] and log["dof_vel[1]"] != []:
+            vel_array = np.array(log["dof_vel[1]"])
+            torque_array = np.array(log["dof_torque[1]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[1]')
+        a.legend()
+        a = axs[0, 2]
+        if log["dof_torque[1]"] != [] and log["dof_vel[1]"] != []:
+            vel_array = np.array(log["dof_vel[1]"])
+            torque_array = np.array(log["dof_torque[1]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[2]')
+        a.legend()
+        a = axs[1, 0]
+        if log["dof_torque[3]"] != [] and log["dof_vel[3]"] != []:
+            vel_array = np.array(log["dof_vel[3]"])
+            torque_array = np.array(log["dof_torque[3]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[3]')
+        a.legend()
+        a = axs[1, 1]
+        if log["dof_torque[4]"] != [] and log["dof_vel[4]"] != []:
+            vel_array = np.array(log["dof_vel[4]"])
+            torque_array = np.array(log["dof_torque[4]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[4]')
+        a.legend()
+        a = axs[1, 2]
+        if log["dof_torque[5]"] != [] and log["dof_vel[5]"] != []:
+            vel_array = np.array(log["dof_vel[5]"])
+            torque_array = np.array(log["dof_torque[5]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[5]')
+        a.legend()
+        plt.show()
+
+    def _plot_tn_rms1(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        a = axs[0, 0]
+        if log["dof_torque[6]"] != [] and log["dof_vel[6]"] != []:
+            vel_array = np.array(log["dof_vel[6]"])
+            torque_array = np.array(log["dof_torque[6]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[6]')
+        a.legend()
+        a = axs[0,1]
+        if log["dof_torque[7]"] != [] and log["dof_vel[7]"] != []:
+            vel_array = np.array(log["dof_vel[7]"])
+            torque_array = np.array(log["dof_torque[7]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[7]')
+        a.legend()
+        a = axs[0, 2]
+        if log["dof_torque[8]"] != [] and log["dof_vel[8]"] != []:
+            vel_array = np.array(log["dof_vel[8]"])
+            torque_array = np.array(log["dof_torque[8]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[8]')
+        a.legend()
+        a = axs[1, 0]
+        if log["dof_torque[9]"] != [] and log["dof_vel[9]"] != []:
+            vel_array = np.array(log["dof_vel[9]"])
+            torque_array = np.array(log["dof_torque[9]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[9]')
+        a.legend()
+        a = axs[1, 1]
+        if log["dof_torque[10]"] != [] and log["dof_vel[10]"] != []:
+            vel_array = np.array(log["dof_vel[10]"])
+            torque_array = np.array(log["dof_torque[10]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[10]')
+        a.legend()
+        a = axs[1, 2]
+        if log["dof_torque[11]"] != [] and log["dof_vel[11]"] != []:
+            vel_array = np.array(log["dof_vel[11]"])
+            torque_array = np.array(log["dof_torque[11]"])
+            
+            rms_vel = np.sqrt(np.mean(vel_array**2))
+            rms_torque = np.sqrt(np.mean(torque_array**2))
+            a.plot(rms_vel, rms_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN_RMS[11]')
+        a.legend()
+        plt.show()
+
+    def _plot_tn(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        a = axs[0, 0]
+        if log["dof_torque[0]"] != [] and log["dof_vel[0]"] != []:
+            vel_array = np.array(log["dof_vel[0]"])
+            torque_array = np.array(log["dof_torque[0]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[0]')
+        a.legend()
+
+        a = axs[0,1]
+        if log["dof_torque[1]"] != [] and log["dof_vel[1]"] != []:
+            vel_array = np.array(log["dof_vel[1]"])
+            torque_array = np.array(log["dof_torque[1]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[1]')
+        a.legend()
+        a = axs[0, 2]
+        if log["dof_torque[1]"] != [] and log["dof_vel[1]"] != []:
+            vel_array = np.array(log["dof_vel[1]"])
+            torque_array = np.array(log["dof_torque[1]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[2]')
+        a.legend()
+        a = axs[1, 0]
+        if log["dof_torque[3]"] != [] and log["dof_vel[3]"] != []:
+            vel_array = np.array(log["dof_vel[3]"])
+            torque_array = np.array(log["dof_torque[3]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[3]')
+        a.legend()
+        a = axs[1, 1]
+        if log["dof_torque[4]"] != [] and log["dof_vel[4]"] != []:
+            vel_array = np.array(log["dof_vel[4]"])
+            torque_array = np.array(log["dof_torque[4]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[4]')
+        a.legend()
+        a = axs[1, 2]
+        if log["dof_torque[5]"] != [] and log["dof_vel[5]"] != []:
+            vel_array = np.array(log["dof_vel[5]"])
+            torque_array = np.array(log["dof_torque[5]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[5]')
+        a.legend()
+        plt.show()
+
+    def _plot_tn1(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        a = axs[0, 0]
+        if log["dof_torque[6]"] != [] and log["dof_vel[6]"] != []:
+            vel_array = np.array(log["dof_vel[6]"])
+            torque_array = np.array(log["dof_torque[6]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[6]')
+        a.legend()
+        a = axs[0,1]
+        if log["dof_torque[7]"] != [] and log["dof_vel[7]"] != []:
+            vel_array = np.array(log["dof_vel[7]"])
+            torque_array = np.array(log["dof_torque[7]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[7]')
+        a.legend()
+        a = axs[0, 2]
+        if log["dof_torque[8]"] != [] and log["dof_vel[8]"] != []:
+            vel_array = np.array(log["dof_vel[8]"])
+            torque_array = np.array(log["dof_torque[8]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[8]')
+        a.legend()
+        a = axs[1, 0]
+        if log["dof_torque[9]"] != [] and log["dof_vel[9]"] != []:
+            vel_array = np.array(log["dof_vel[9]"])
+            torque_array = np.array(log["dof_torque[9]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[9]')
+        a.legend()
+        a = axs[1, 1]
+        if log["dof_torque[10]"] != [] and log["dof_vel[10]"] != []:
+            vel_array = np.array(log["dof_vel[10]"])
+            torque_array = np.array(log["dof_torque[10]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[10]')
+        a.legend()
+        a = axs[1, 2]
+        if log["dof_torque[11]"] != [] and log["dof_vel[11]"] != []:
+            vel_array = np.array(log["dof_vel[11]"])
+            torque_array = np.array(log["dof_torque[11]"])
+            
+            abs_vel = np.abs(vel_array)
+            abs_torque = np.abs(torque_array)
+            a.plot(abs_vel, abs_torque, '*', label='measured')
+        a.set(xlabel='Velocity [rad/s]', ylabel='Joint Torque [Nm]', title='TN[11]')
+        a.legend()
+        plt.show()
+
+    def _plot_torque_vel(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        # plot position targets and measured positions
+        a = axs[0, 0]
+        if log["dof_torque[0]"]!=[]: a.plot(time, log["dof_torque[0]"], label='measured_torque')
+        if log["dof_vel[0]"]: a.plot(time, log["dof_vel[0]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[0]')
+        a.legend()
+        a = axs[0,1]
+        if log["dof_torque[1]"]!=[]: a.plot(time, log["dof_torque[1]"], label='measured_torque')
+        if log["dof_vel[1]"]: a.plot(time, log["dof_vel[1]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[1]')
+        a.legend()
+        a = axs[0, 2]
+        if log["dof_torque[2]"]!=[]: a.plot(time, log["dof_torque[2]"], label='measured_torque')
+        if log["dof_vel[2]"]: a.plot(time, log["dof_vel[2]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[2]')
+        a.legend()
+        a = axs[1, 0]
+        if log["dof_torque[3]"]!=[]: a.plot(time, log["dof_torque[3]"], label='measured_torque')
+        if log["dof_vel[3]"]: a.plot(time, log["dof_vel[3]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[3]')
+        a.legend()
+        a = axs[1, 1]
+        if log["dof_torque[4]"]!=[]: a.plot(time, log["dof_torque[4]"], label='measured_torque')
+        if log["dof_vel[4]"]: a.plot(time, log["dof_vel[4]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[4]')
+        a.legend()
+        a = axs[1, 2]
+        if log["dof_torque[5]"]!=[]: a.plot(time, log["dof_torque[5]"], label='measured_torque')
+        if log["dof_vel[5]"]: a.plot(time, log["dof_vel[5]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[5]')
+        a.legend()
+        plt.show()
+
+    def _plot_torque_vel1(self):
+        nb_rows = 2
+        nb_cols = 3
+        fig, axs = plt.subplots(nb_rows, nb_cols)
+        for key, value in self.state_log.items():
+            time = np.linspace(0, len(value)*self.dt, len(value))
+            break
+        log= self.state_log
+        # plot position targets and measured positions
+        a = axs[0, 0]
+        if log["dof_torque[6]"]!=[]: a.plot(time, log["dof_torque[6]"], label='measured_torque')
+        if log["dof_vel[6]"]: a.plot(time, log["dof_vel[6]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[6]')
+        a.legend()
+        a = axs[0,1]
+        if log["dof_torque[7]"]!=[]: a.plot(time, log["dof_torque[7]"], label='measured_torque')
+        if log["dof_vel[7]"]: a.plot(time, log["dof_vel[7]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[7]')
+        a.legend()
+        a = axs[0, 2]
+        if log["dof_torque[8]"]!=[]: a.plot(time, log["dof_torque[8]"], label='measured_torque')
+        if log["dof_vel[8]"]: a.plot(time, log["dof_vel[8]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[8]')
+        a.legend()
+        a = axs[1, 0]
+        if log["dof_torque[9]"]!=[]: a.plot(time, log["dof_torque[9]"], label='measured_torque')
+        if log["dof_vel[9]"]: a.plot(time, log["dof_vel[9]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[9]')
+        a.legend()
+        a = axs[1, 1]
+        if log["dof_torque[10]"]!=[]: a.plot(time, log["dof_torque[10]"], label='measured_torque')
+        if log["dof_vel[10]"]: a.plot(time, log["dof_vel[10]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[10]')
+        a.legend()
+        a = axs[1, 2]
+        if log["dof_torque[11]"]!=[]: a.plot(time, log["dof_torque[11]"], label='measured_torque')
+        if log["dof_vel[11]"]: a.plot(time, log["dof_vel[11]"], label='measured_vel')
+        a.set(xlabel='time [s]', ylabel=' ', title='Velocity Torque[11]')
+        a.legend()
+        plt.show()
+
+
+    def print_rewards(self):
+        print("Average rewards per second:")
+        for key, values in self.rew_log.items():
+            mean = np.sum(np.array(values)) / self.num_episodes
+            print(f" - {key}: {mean}")
+        print(f"Total number of episodes: {self.num_episodes}")
+    
+    def __del__(self):
+        if self.plot_process is not None:
+            self.plot_process.kill()
\ No newline at end of file
--- a/humanoid/utils/math.py
+++ b/humanoid/utils/math.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+import torch
+from torch import Tensor
+import numpy as np
+from isaacgym.torch_utils import quat_apply, normalize
+from typing import Tuple
+
+# @ torch.jit.script
+def quat_apply_yaw(quat, vec):
+    quat_yaw = quat.clone().view(-1, 4)
+    quat_yaw[:, :2] = 0.
+    quat_yaw = normalize(quat_yaw)
+    return quat_apply(quat_yaw, vec)
+
+# @ torch.jit.script
+def wrap_to_pi(angles):
+    angles %= 2*np.pi
+    angles -= 2*np.pi * (angles > np.pi)
+    return angles
+
+# @ torch.jit.script
+def torch_rand_sqrt_float(lower, upper, shape, device):
+    # type: (float, float, Tuple[int, int], str) -> Tensor
+    r = 2*torch.rand(*shape, device=device) - 1
+    r = torch.where(r<0., -torch.sqrt(-r), torch.sqrt(r))
+    r =  (r + 1.) / 2.
+    return (upper - lower) * r + lower
\ No newline at end of file
--- a/humanoid/utils/task_registry.py
+++ b/humanoid/utils/task_registry.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+import os
+from typing import Tuple
+from datetime import datetime
+
+from humanoid.algo import VecEnv
+from humanoid.algo import DHOnPolicyRunner
+
+from humanoid import LEGGED_GYM_ROOT_DIR, LEGGED_GYM_ENVS_DIR
+from .helpers import get_args, update_cfg_from_args, class_to_dict, get_load_path, set_seed, parse_sim_params
+from humanoid.envs.base.legged_robot_config import LeggedRobotCfg, LeggedRobotCfgPPO
+
+class TaskRegistry():
+    def __init__(self):
+        self.task_classes = {}
+        self.env_cfgs = {}
+        self.train_cfgs = {}
+    
+    def register(self, name: str, task_class: VecEnv, env_cfg: LeggedRobotCfg, train_cfg: LeggedRobotCfgPPO):
+        self.task_classes[name] = task_class
+        self.env_cfgs[name] = env_cfg
+        self.train_cfgs[name] = train_cfg
+    
+    def get_task_class(self, name: str) -> VecEnv:
+        return self.task_classes[name]
+    
+    def get_cfgs(self, name) -> Tuple[LeggedRobotCfg, LeggedRobotCfgPPO]:
+        train_cfg = self.train_cfgs[name]
+        env_cfg = self.env_cfgs[name]
+        # copy seed
+        env_cfg.seed = train_cfg.seed
+        return env_cfg, train_cfg
+    
+    def make_env(self, name, args=None, env_cfg=None) -> Tuple[VecEnv, LeggedRobotCfg]:
+        """ Creates an environment either from a registered namme or from the provided config file.
+
+        Args:
+            name (string): Name of a registered env.
+            args (Args, optional): Isaac Gym comand line arguments. If None get_args() will be called. Defaults to None.
+            env_cfg (Dict, optional): Environment config file used to override the registered config. Defaults to None.
+
+        Raises:
+            ValueError: Error if no registered env corresponds to 'name' 
+
+        Returns:
+            isaacgym.VecTaskPython: The created environment
+            Dict: the corresponding config file
+        """
+        # if no args passed get command line arguments
+        if args is None:
+            args = get_args()
+        # check if there is a registered env with that name
+        if name in self.task_classes:
+            task_class = self.get_task_class(name)
+        else:
+            raise ValueError(f"Task with name: {name} was not registered")
+        if env_cfg is None:
+            # load config files
+            env_cfg, _ = self.get_cfgs(name)
+        # override cfg from args (if specified)
+        env_cfg, _ = update_cfg_from_args(env_cfg, None, args)
+        set_seed(env_cfg.seed)
+        # parse sim params (convert to dict first)
+        sim_params = {"sim": class_to_dict(env_cfg.sim)}
+        sim_params = parse_sim_params(args, sim_params)
+        env = task_class(   cfg=env_cfg,
+                            sim_params=sim_params,
+                            physics_engine=args.physics_engine,
+                            sim_device=args.sim_device,
+                            headless=args.headless)
+        self.env_cfg_for_wandb = env_cfg
+        return env, env_cfg
+
+    def make_alg_runner(self, env, name=None, args=None, train_cfg=None, log_root="default") -> Tuple[DHOnPolicyRunner, LeggedRobotCfgPPO]:
+        """ Creates the training algorithm  either from a registered namme or from the provided config file.
+
+        Args:
+            env (isaacgym.VecTaskPython): The environment to train (TODO: remove from within the algorithm)
+            name (string, optional): Name of a registered env. If None, the config file will be used instead. Defaults to None.
+            args (Args, optional): Isaac Gym comand line arguments. If None get_args() will be called. Defaults to None.
+            train_cfg (Dict, optional): Training config file. If None 'name' will be used to get the config file. Defaults to None.
+            log_root (str, optional): Logging directory for Tensorboard. Set to 'None' to avoid logging (at test time for example). 
+                                      Logs will be saved in <log_root>/<date_time>_<run_name>. Defaults to "default"=<path_to_LEGGED_GYM>/logs/<experiment_name>.
+
+        Raises:
+            ValueError: Error if neither 'name' or 'train_cfg' are provided
+            Warning: If both 'name' or 'train_cfg' are provided 'name' is ignored
+
+        Returns:
+            PPO: The created algorithm
+            Dict: the corresponding config file
+        """
+        # if no args passed get command line arguments
+        if args is None:
+            args = get_args()
+        # if config files are passed use them, otherwise load from the name
+        if train_cfg is None:
+            if name is None:
+                raise ValueError("Either 'name' or 'train_cfg' must be not None")
+            # load config files
+            _, train_cfg = self.get_cfgs(name)
+        else:
+            if name is not None:
+                print(f"'train_cfg' provided -> Ignoring 'name={name}'")
+        # override cfg from args (if specified)
+        _, train_cfg = update_cfg_from_args(None, train_cfg, args)
+
+        current_date_time_str = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
+
+        if log_root=="default":
+            log_root = os.path.join(LEGGED_GYM_ROOT_DIR, 'logs', train_cfg.runner.experiment_name, 'exported_data')
+            log_dir = os.path.join(log_root, current_date_time_str + train_cfg.runner.run_name)
+        elif log_root is None:
+            log_dir = None
+        else:
+            log_dir = os.path.join(log_root, current_date_time_str + train_cfg.runner.run_name)
+        
+        train_cfg_dict = class_to_dict(train_cfg)
+        env_cfg_dict = class_to_dict(self.env_cfg_for_wandb)
+        all_cfg = {**train_cfg_dict, **env_cfg_dict}
+        
+        runner_class = eval(train_cfg_dict["runner_class_name"])
+        runner = runner_class(env, all_cfg, log_dir, device=args.rl_device)
+        #save resume path before creating a new log_dir
+        resume = train_cfg.runner.resume
+        if resume:
+            # load previously trained model
+            resume_path = get_load_path(log_root, load_run=train_cfg.runner.load_run, checkpoint=train_cfg.runner.checkpoint)
+            print(f"Loading model from: {resume_path}")
+            runner.load(resume_path, load_optimizer=False)
+        return runner, train_cfg, log_dir
+
+# make global task registry
+task_registry = TaskRegistry()
\ No newline at end of file
--- a/humanoid/utils/terrain.py
+++ b/humanoid/utils/terrain.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+
+import numpy as np
+from numpy.random import choice
+from scipy import interpolate
+
+from isaacgym import terrain_utils
+from humanoid.envs.base.legged_robot_config import LeggedRobotCfg
+
+class Terrain:
+    def __init__(self, cfg: LeggedRobotCfg.terrain, num_robots) -> None:
+
+        self.cfg = cfg
+        self.num_robots = num_robots
+        self.type = cfg.mesh_type
+        if self.type in ["none", 'plane']:
+            return
+        # each sub terrain length
+        self.env_length = cfg.terrain_length
+        # each sub terrain width
+        self.env_width = cfg.terrain_width
+        # each terrain type proportion
+        cfg.terrain_proportions = np.array(cfg.terrain_proportions) / np.sum(cfg.terrain_proportions)
+        self.proportions = [np.sum(cfg.terrain_proportions[:i+1]) for i in range(len(cfg.terrain_proportions))]
+        self.cfg.num_sub_terrains = cfg.num_rows * cfg.num_cols
+        self.env_origins = np.zeros((cfg.num_rows, cfg.num_cols, 3))
+        # self.platform is size of platform for some terrain type, like pit, gap, slope
+        self.platform = cfg.platform
+        # max_difficulty is based on num_rows
+        # terrain difficulty is from 0 to max
+        self.max_difficulty = (cfg.num_rows-1)/cfg.num_rows
+
+        self.width_per_env_pixels = int(self.env_width / cfg.horizontal_scale)
+        self.length_per_env_pixels = int(self.env_length / cfg.horizontal_scale)
+
+        # border_size is whole terrain border
+        self.border = int(cfg.border_size/self.cfg.horizontal_scale)
+        # whole terrain cols
+        self.tot_cols = int(cfg.num_cols * self.width_per_env_pixels) + 2 * self.border
+        # whole terrain rows
+        self.tot_rows = int(cfg.num_rows * self.length_per_env_pixels) + 2 * self.border
+
+        self.height_field_raw = np.zeros((self.tot_rows , self.tot_cols), dtype=np.int16)
+        self.terrain_type = np.zeros((cfg.num_rows, cfg.num_cols))
+        self.idx = 0
+        
+        if cfg.curriculum:
+            self.curiculum()
+        elif cfg.selected:
+            self.selected_terrain()
+        else:    
+            self.randomized_terrain()  
+              
+        self.heightsamples = self.height_field_raw
+        if self.type=="trimesh":
+            self.vertices, self.triangles = terrain_utils.convert_heightfield_to_trimesh(   self.height_field_raw,
+                                                                                            self.cfg.horizontal_scale,
+                                                                                            self.cfg.vertical_scale,
+                                                                                            self.cfg.slope_treshold)
+    
+    def randomized_terrain(self):
+        for k in range(self.cfg.num_sub_terrains):
+            # Env coordinates in the world
+            (i, j) = np.unravel_index(k, (self.cfg.num_rows, self.cfg.num_cols))
+
+            choice = np.random.uniform(0, 1)
+            difficulty = np.random.choice([0.5, 0.75, 0.9])
+            terrain = self.make_terrain(choice, difficulty)
+            # i j select row col position in whole terrain
+            self.add_terrain_to_map(terrain, i, j)
+        
+    def curiculum(self):
+        for j in range(self.cfg.num_cols):
+            for i in range(self.cfg.num_rows):
+                difficulty = i / self.cfg.num_rows
+                choice = j / self.cfg.num_cols + 0.001
+
+                terrain = self.make_terrain(choice, difficulty)
+                self.add_terrain_to_map(terrain, i, j)
+
+    def selected_terrain(self):
+        terrain_type = self.cfg.terrain_kwargs.pop('type')
+        for k in range(self.cfg.num_sub_terrains):
+            # Env coordinates in the world
+            (i, j) = np.unravel_index(k, (self.cfg.num_rows, self.cfg.num_cols))
+
+            terrain = terrain_utils.SubTerrain("terrain",
+                              width=self.width_per_env_pixels,
+                              length=self.width_per_env_pixels,
+                              vertical_scale=self.vertical_scale,
+                              horizontal_scale=self.horizontal_scale)
+
+            eval(terrain_type)(terrain, **self.cfg.terrain_kwargs.terrain_kwargs)
+            self.add_terrain_to_map(terrain, i, j)
+    
+    # choice select terrain type, difficulty select row, row increase difficulty increase
+    def make_terrain(self, choice, difficulty):
+        terrain = terrain_utils.SubTerrain(   "terrain",
+                                width=self.width_per_env_pixels,
+                                length=self.width_per_env_pixels,
+                                vertical_scale=self.cfg.vertical_scale,
+                                horizontal_scale=self.cfg.horizontal_scale)
+        rought_flat_min_height = - self.cfg.rough_flat_range[0] - difficulty * (self.cfg.rough_flat_range[1] - self.cfg.rough_flat_range[0]) / self.max_difficulty
+        rought_flat_max_height = self.cfg.rough_flat_range[0] + difficulty * (self.cfg.rough_flat_range[1] - self.cfg.rough_flat_range[0]) / self.max_difficulty
+        slope = self.cfg.slope_range[0] + difficulty * (self.cfg.slope_range[1] - self.cfg.slope_range[0]) / self.max_difficulty
+        rought_slope_min_height = - self.cfg.rough_slope_range[0] - difficulty * (self.cfg.rough_slope_range[1] - self.cfg.rough_slope_range[0]) / self.max_difficulty
+        rought_slope_max_height = self.cfg.rough_slope_range[0] + difficulty * (self.cfg.rough_slope_range[1] - self.cfg.rough_slope_range[0]) / self.max_difficulty
+        stair_width = self.cfg.stair_width_range[0] + difficulty * (self.cfg.stair_width_range[1] - self.cfg.stair_width_range[0]) / self.max_difficulty
+        stair_height = self.cfg.stair_height_range[0] + difficulty * (self.cfg.stair_height_range[1] - self.cfg.stair_height_range[0]) / self.max_difficulty
+        discrete_obstacles_height = self.cfg.discrete_height_range[0] + difficulty * (self.cfg.discrete_height_range[1] - self.cfg.discrete_height_range[0]) / self.max_difficulty
+
+        gap_size = 1. * difficulty
+        pit_depth = 1. * difficulty
+        amplitude = 0.2 + 0.333 * difficulty
+        if choice < self.proportions[0]:
+            idx = 1
+            return terrain
+        elif choice < self.proportions[1]:
+            idx = 2
+            terrain_utils.random_uniform_terrain(terrain, 
+                                                 min_height=rought_flat_min_height, 
+                                                 max_height=rought_flat_max_height, 
+                                                 step=0.005, 
+                                                 downsampled_scale=0.2)
+        elif choice < self.proportions[3]:
+            idx = 4
+            if choice < self.proportions[2]:
+                idx = 3
+                slope *= -1
+            terrain_utils.pyramid_sloped_terrain(terrain, 
+                                                 slope=slope, 
+                                                 platform_size=self.platform)
+            terrain_utils.random_uniform_terrain(terrain, 
+                                                 min_height=rought_slope_min_height, 
+                                                 max_height=rought_slope_max_height,
+                                                 step=0.005, 
+                                                 downsampled_scale=0.2)
+        elif choice < self.proportions[5]:
+            idx = 6
+            if choice < self.proportions[4]:
+                idx = 5
+                slope *= -1
+            terrain_utils.pyramid_sloped_terrain(terrain, 
+                                                 slope=slope, 
+                                                 platform_size=self.platform)
+        elif choice < self.proportions[7]:
+            idx = 8
+            if choice<self.proportions[6]:
+                idx = 7
+                stair_height *= -1
+            terrain_utils.pyramid_stairs_terrain(terrain, 
+                                                 step_width=stair_width, 
+                                                 step_height=stair_height, 
+                                                 platform_size=self.platform)
+        elif choice < self.proportions[8]:
+            idx = 9
+            num_rectangles = 20
+            rectangle_min_size = 1.
+            rectangle_max_size = 2.
+            terrain_utils.discrete_obstacles_terrain(terrain, 
+                                                     discrete_obstacles_height, 
+                                                     rectangle_min_size, 
+                                                     rectangle_max_size, 
+                                                     num_rectangles, 
+                                                     platform_size=self.platform)
+        elif choice < self.proportions[9]:
+            idx = 10
+            terrain_utils.wave_terrain(terrain, 
+                                       num_waves=3, 
+                                       amplitude=amplitude)
+        elif choice < self.proportions[10]:
+            idx = 11
+            gap_terrain(terrain, 
+                        gap_size=gap_size, 
+                        platform_size=self.platform)
+        else:
+            idx = 12
+            pit_terrain(terrain, 
+                        depth=pit_depth, 
+                        platform_size=self.platform)
+        self.idx = idx
+        return terrain
+    
+    # row col select position in whole terrain
+    def add_terrain_to_map(self, terrain, row, col):
+        i = row
+        j = col
+        # map coordinate system
+        start_x = self.border + i * self.length_per_env_pixels
+        end_x = self.border + (i + 1) * self.length_per_env_pixels
+        start_y = self.border + j * self.width_per_env_pixels
+        end_y = self.border + (j + 1) * self.width_per_env_pixels
+        self.height_field_raw[start_x: end_x, start_y:end_y] = terrain.height_field_raw
+
+        env_origin_x = (i + 0.5) * self.env_length
+        env_origin_y = (j + 0.5) * self.env_width
+        x1 = int((self.env_length/2. - 1) / terrain.horizontal_scale)
+        x2 = int((self.env_length/2. + 1) / terrain.horizontal_scale)
+        y1 = int((self.env_width/2. - 1) / terrain.horizontal_scale)
+        y2 = int((self.env_width/2. + 1) / terrain.horizontal_scale)
+        env_origin_z = np.max(terrain.height_field_raw[x1:x2, y1:y2])*terrain.vertical_scale
+        self.env_origins[i, j] = [env_origin_x, env_origin_y, env_origin_z]
+        self.terrain_type[i, j] = self.idx
+
+def gap_terrain(terrain, gap_size, platform_size=1.):
+    gap_size = int(gap_size / terrain.horizontal_scale)
+    platform_size = int(platform_size / terrain.horizontal_scale)
+
+    center_x = terrain.length // 2
+    center_y = terrain.width // 2
+    x1 = (terrain.length - platform_size) // 2
+    x2 = x1 + gap_size
+    y1 = (terrain.width - platform_size) // 2
+    y2 = y1 + gap_size
+   
+    terrain.height_field_raw[center_x-x2 : center_x + x2, center_y-y2 : center_y + y2] = -1000
+    terrain.height_field_raw[center_x-x1 : center_x + x1, center_y-y1 : center_y + y1] = 0
+
+def pit_terrain(terrain, depth, platform_size=1.):
+    depth = int(depth / terrain.vertical_scale)
+    platform_size = int(platform_size / terrain.horizontal_scale / 2)
+    x1 = terrain.length // 2 - platform_size
+    x2 = terrain.length // 2 + platform_size
+    y1 = terrain.width // 2 - platform_size
+    y2 = terrain.width // 2 + platform_size
+    terrain.height_field_raw[x1:x2, y1:y2] = -depth
--- a/resources/robots/x1/meshes/arm_r_wrist_a_ball.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_a_ball.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_a_ball_simple.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_a_ball_simple.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_a_loop.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_a_loop.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_b_ball.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_b_ball.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_b_loop.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_b_loop.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_motor_a_link.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_motor_a_link.STL
--- a/resources/robots/x1/meshes/arm_r_wrist_motor_b_link.STL
+++ b/resources/robots/x1/meshes/arm_r_wrist_motor_b_link.STL
--- a/resources/robots/x1/meshes/base_link_simple.STL
+++ b/resources/robots/x1/meshes/base_link_simple.STL
--- a/resources/robots/x1/meshes/left_ankle_pitch.STL
+++ b/resources/robots/x1/meshes/left_ankle_pitch.STL
--- a/resources/robots/x1/meshes/left_ankle_roll.STL
+++ b/resources/robots/x1/meshes/left_ankle_roll.STL
--- a/resources/robots/x1/meshes/left_elbow_pitch.STL
+++ b/resources/robots/x1/meshes/left_elbow_pitch.STL
--- a/resources/robots/x1/meshes/left_elbow_yaw.STL
+++ b/resources/robots/x1/meshes/left_elbow_yaw.STL
--- a/resources/robots/x1/meshes/left_hip_pitch.STL
+++ b/resources/robots/x1/meshes/left_hip_pitch.STL
--- a/resources/robots/x1/meshes/left_hip_roll.STL
+++ b/resources/robots/x1/meshes/left_hip_roll.STL
--- a/resources/robots/x1/meshes/left_hip_yaw.STL
+++ b/resources/robots/x1/meshes/left_hip_yaw.STL
--- a/resources/robots/x1/meshes/left_knee_pitch.STL
+++ b/resources/robots/x1/meshes/left_knee_pitch.STL
--- a/resources/robots/x1/meshes/left_shoulder_pitch.STL
+++ b/resources/robots/x1/meshes/left_shoulder_pitch.STL
--- a/resources/robots/x1/meshes/left_shoulder_roll.STL
+++ b/resources/robots/x1/meshes/left_shoulder_roll.STL
--- a/resources/robots/x1/meshes/left_shoulder_yaw.STL
+++ b/resources/robots/x1/meshes/left_shoulder_yaw.STL
--- a/resources/robots/x1/meshes/left_wrist_pitch.STL
+++ b/resources/robots/x1/meshes/left_wrist_pitch.STL
--- a/resources/robots/x1/meshes/left_wrist_roll.STL
+++ b/resources/robots/x1/meshes/left_wrist_roll.STL
--- a/resources/robots/x1/meshes/leg_l_toe_a_ball.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_a_ball.STL
--- a/resources/robots/x1/meshes/leg_l_toe_a_link.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_a_link.STL
--- a/resources/robots/x1/meshes/leg_l_toe_a_loop.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_a_loop.STL
--- a/resources/robots/x1/meshes/leg_l_toe_b_ball.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_b_ball.STL
--- a/resources/robots/x1/meshes/leg_l_toe_b_link.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_b_link.STL
--- a/resources/robots/x1/meshes/leg_l_toe_b_loop.STL
+++ b/resources/robots/x1/meshes/leg_l_toe_b_loop.STL
--- a/resources/robots/x1/meshes/leg_r_toe_a_ball.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_a_ball.STL
--- a/resources/robots/x1/meshes/leg_r_toe_a_link.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_a_link.STL
--- a/resources/robots/x1/meshes/leg_r_toe_a_loop.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_a_loop.STL
--- a/resources/robots/x1/meshes/leg_r_toe_b_ball.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_b_ball.STL
--- a/resources/robots/x1/meshes/leg_r_toe_b_link.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_b_link.STL
--- a/resources/robots/x1/meshes/leg_r_toe_b_loop.STL
+++ b/resources/robots/x1/meshes/leg_r_toe_b_loop.STL
--- a/resources/robots/x1/meshes/lumber_pitch.STL
+++ b/resources/robots/x1/meshes/lumber_pitch.STL
--- a/resources/robots/x1/meshes/lumber_roll.STL
+++ b/resources/robots/x1/meshes/lumber_roll.STL
--- a/resources/robots/x1/meshes/lumber_yaw.STL
+++ b/resources/robots/x1/meshes/lumber_yaw.STL
--- a/resources/robots/x1/meshes/right_ankle_pitch.STL
+++ b/resources/robots/x1/meshes/right_ankle_pitch.STL
--- a/resources/robots/x1/meshes/right_ankle_roll.STL
+++ b/resources/robots/x1/meshes/right_ankle_roll.STL
--- a/resources/robots/x1/meshes/right_elbow_pitch.STL
+++ b/resources/robots/x1/meshes/right_elbow_pitch.STL
--- a/resources/robots/x1/meshes/right_elbow_yaw.STL
+++ b/resources/robots/x1/meshes/right_elbow_yaw.STL
--- a/resources/robots/x1/meshes/right_hand_down_1_link.STL
+++ b/resources/robots/x1/meshes/right_hand_down_1_link.STL
--- a/resources/robots/x1/meshes/right_hand_down_2_link.STL
+++ b/resources/robots/x1/meshes/right_hand_down_2_link.STL
--- a/resources/robots/x1/meshes/right_hand_down_3_link.STL
+++ b/resources/robots/x1/meshes/right_hand_down_3_link.STL
--- a/resources/robots/x1/meshes/right_hand_down_loop1_link.STL
+++ b/resources/robots/x1/meshes/right_hand_down_loop1_link.STL
--- a/resources/robots/x1/meshes/right_hand_down_loop2_link.STL
+++ b/resources/robots/x1/meshes/right_hand_down_loop2_link.STL
--- a/resources/robots/x1/meshes/right_hand_up_1_link.STL
+++ b/resources/robots/x1/meshes/right_hand_up_1_link.STL
--- a/resources/robots/x1/meshes/right_hand_up_2_link.STL
+++ b/resources/robots/x1/meshes/right_hand_up_2_link.STL
--- a/resources/robots/x1/meshes/right_hand_up_3_link.STL
+++ b/resources/robots/x1/meshes/right_hand_up_3_link.STL
--- a/resources/robots/x1/meshes/right_hand_up_loop1_link.STL
+++ b/resources/robots/x1/meshes/right_hand_up_loop1_link.STL
--- a/resources/robots/x1/meshes/right_hand_up_loop2_link.STL
+++ b/resources/robots/x1/meshes/right_hand_up_loop2_link.STL
--- a/resources/robots/x1/meshes/right_hip_pitch.STL
+++ b/resources/robots/x1/meshes/right_hip_pitch.STL
--- a/resources/robots/x1/meshes/right_hip_roll.STL
+++ b/resources/robots/x1/meshes/right_hip_roll.STL
--- a/resources/robots/x1/meshes/right_hip_yaw.STL
+++ b/resources/robots/x1/meshes/right_hip_yaw.STL
--- a/resources/robots/x1/meshes/right_knee_pitch.STL
+++ b/resources/robots/x1/meshes/right_knee_pitch.STL
--- a/resources/robots/x1/meshes/right_shoulder_pitch.STL
+++ b/resources/robots/x1/meshes/right_shoulder_pitch.STL
--- a/resources/robots/x1/meshes/right_shoulder_roll.STL
+++ b/resources/robots/x1/meshes/right_shoulder_roll.STL
--- a/resources/robots/x1/meshes/right_shoulder_yaw.STL
+++ b/resources/robots/x1/meshes/right_shoulder_yaw.STL
--- a/resources/robots/x1/meshes/right_wrist_pitch.STL
+++ b/resources/robots/x1/meshes/right_wrist_pitch.STL
--- a/resources/robots/x1/meshes/right_wrist_roll.STL
+++ b/resources/robots/x1/meshes/right_wrist_roll.STL
--- a/resources/robots/x1/meshes/waist_motor_a_ball.STL
+++ b/resources/robots/x1/meshes/waist_motor_a_ball.STL
--- a/resources/robots/x1/meshes/waist_motor_a_link.STL
+++ b/resources/robots/x1/meshes/waist_motor_a_link.STL
--- a/resources/robots/x1/meshes/waist_motor_a_loop.STL
+++ b/resources/robots/x1/meshes/waist_motor_a_loop.STL
--- a/resources/robots/x1/meshes/waist_motor_b_ball.STL
+++ b/resources/robots/x1/meshes/waist_motor_b_ball.STL
--- a/resources/robots/x1/meshes/waist_motor_b_link.STL
+++ b/resources/robots/x1/meshes/waist_motor_b_link.STL
--- a/resources/robots/x1/meshes/waist_motor_b_loop.STL
+++ b/resources/robots/x1/meshes/waist_motor_b_loop.STL
--- a/resources/robots/x1/mjcf/environment/flat.xml
+++ b/resources/robots/x1/mjcf/environment/flat.xml
+<mujoco model="flat">
+  <statistic center="0 0 0.55" extent="1.1"/>
+
+  <visual>
+    <headlight diffuse="0.6 0.6 0.6" ambient="0.3 0.3 0.3" specular="0 0 0"/>
+    <rgba haze="0.15 0.25 0.35 1"/>
+    <global azimuth="150" elevation="-20"/>
+  </visual>
+  <visual>
+    <rgba com="0.502 1.0 0 0.5" contactforce="0.98 0.4 0.4 1" contactpoint="1.0 1.0 0.6 0.4"/>
+    <scale com="1" forcewidth="0.03" contactwidth="0.01" contactheight="0.02" framewidth="0.05" framelength="0.6"/>
+    <map force="0.005"/>
+  </visual>
+  <asset>
+    <texture name="skybox" type="skybox" builtin="gradient" rgb1="0.2 0.3 0.4" rgb2="0 0 0" width="1000" height="1000" mark="random" random="0.001" markrgb="1 1 1"/>
+    <texture type="2d" name="groundplane" builtin="checker" mark="edge" rgb1="0.2 0.3 0.4" rgb2="0.1 0.2 0.3" markrgb="0.8 0.8 0.8" width="1000" height="1000"/>
+    <material name="groundplane" texture="groundplane" texuniform="true" texrepeat="5 5" reflectance="0.2"/>
+  </asset>
+
+  <worldbody>
+    <light pos="0 0 10" dir="0 0 -1" directional="true"/>
+    <geom name="floor" size="0 3 .125" type="plane" material="groundplane" conaffinity="7" condim="3" friction="1"/>
+  </worldbody>
+</mujoco>
--- a/resources/robots/x1/mjcf/robot/xyber_x1/xyber_x1_serial.xml
+++ b/resources/robots/x1/mjcf/robot/xyber_x1/xyber_x1_serial.xml
+<mujoco model="zhiyuan scene">
+  <compiler angle='radian' eulerseq="XYZ" meshdir="../meshes"
+    autolimits="true" />
+
+  <option timestep="0.001" />
+
+  <visual>
+    <global azimuth="150" elevation="-20" />
+    <headlight ambient="0.3 0.3 0.3" diffuse="0.6 0.6 0.6" specular="0 0 0" />
+    <scale forcewidth="0.03" contactwidth="0.01" contactheight="0.02" com="1" framelength="0.6"
+      framewidth="0.05" />
+    <rgba haze="0.15 0.25 0.35 1" com="0.502 1 0 0.5" contactpoint="1 1 0.6 0.4"
+      contactforce="0.98 0.4 0.4 1" />
+  </visual>
+  <!-- <statistic extent="1.1" center="0 0 0.55" /> -->
+  <default>
+      <geom contype = "0" conaffinity = "0" solref = "0.005 1" condim = "3" friction = "1 1" />
+      <equality solref = "0.005 1" />
+      <default class = "zhiyuan">
+      </default>
+      <default class = "collision">
+          <geom contype = "1" group = "3" type = "capsule" />
+          <default class = "collision-left">
+              <geom contype = "2" conaffinity = "4" />
+          </default>
+          <default class = "collision-right">
+              <geom contype = "4" conaffinity = "2" />
+          </default>
+      </default>
+    </default>
+  <asset>
+    <mesh name="base_link_simple" file="base_link_simple.STL" />
+
+    <mesh name="body_yaw" file="lumber_yaw.STL" />
+    <mesh name="body_roll" file="lumber_roll.STL" />
+    <mesh name="body_pitch" file="lumber_pitch.STL" />
+
+
+    <mesh name="left_shoulder_pitch" file="left_shoulder_pitch.STL" />
+    <mesh name="left_shoudler_roll" file="left_shoulder_roll.STL" />
+    <mesh name="left_shoulder_yaw" file="left_shoulder_yaw.STL" />
+    <mesh name="left_elbow_pitch" file="left_elbow_pitch.STL" />
+    <mesh name="left_elbow_yaw" file="left_elbow_yaw.STL" />
+    <mesh name="left_wrist_pitch" file="left_wrist_pitch.STL" />
+    <mesh name="left_wrist_roll" file="left_wrist_roll.STL" />
+
+    <mesh name="right_shoulder_pitch" file="right_shoulder_pitch.STL" />
+    <mesh name="right_shoulder_roll" file="right_shoulder_roll.STL" />
+    <mesh name="right_shoulder_yaw" file="right_shoulder_yaw.STL" />
+    <mesh name="right_elbow_pitch" file="right_elbow_pitch.STL" />
+    <mesh name="right_elbow_yaw" file="right_elbow_yaw.STL" />
+    <mesh name="right_wrist_pitch" file="right_wrist_pitch.STL" />
+    <mesh name="right_wrist_roll" file="right_wrist_roll.STL" />
+
+    <mesh name="left_hip_pitch" file="left_hip_pitch.STL" />
+    <mesh name="left_hip_roll" file="left_hip_roll.STL" />
+    <mesh name="left_hip_yaw" file="left_hip_yaw.STL" />
+    <mesh name="left_knee_pitch" file="left_knee_pitch.STL" />
+    <mesh name="left_ankle_pitch" file="left_ankle_pitch.STL" />
+    <mesh name="left_ankle_roll" file="left_ankle_roll.STL" />
+
+    <mesh name="right_hip_pitch" file="right_hip_pitch.STL" />
+    <mesh name="right_hip_roll" file="right_hip_roll.STL" />
+    <mesh name="right_hip_yaw" file="right_hip_yaw.STL" />
+    <mesh name="right_knee_pitch" file="right_knee_pitch.STL" />
+    <mesh name="right_ankle_pitch" file="right_ankle_pitch.STL" />
+    <mesh name="right_ankle_roll" file="right_ankle_roll.STL" />
+  </asset>
+  <worldbody>
+    <body name = "x1-body" pos = "0 0 0.8">
+      <inertial pos="0.00252285 -0.00063439 0.03023409" mass="4.3041648" fullinertia="0.02680559 0.01083128 0.02180955 -5.49e-06 5.389e-05  -0.00011229 "/>
+      <freejoint name="floating_base"/>
+      <geom type="mesh" rgba="0.8 0.4 0 1" mesh="base_link_simple" />
+      <geom type="mesh" rgba="0.8 0.4 0 1" mesh="base_link_simple" class = "collision"/>
+      <camera name="track" pos="0 -3 1" quat="0.850651 0.525731 0 0" mode="track" />
+      <site name = "imu" size = "0.01" pos = "0 0 0" />
+
+      <body name="body_yaw" pos="0.00245 0 0.115534" quat="1 0 0 -2.27456e-05">
+        <inertial pos="-0.0180007 1.4e-06 0.0243298" quat="-7.72625e-06 0.78309 -2.54549e-05 0.621909" mass="0.362519" diaginertia="0.000655093 0.00045823 0.000342457"/>
+        <geom type='mesh' mesh='body_yaw' pos='0 0 0' euler='-0 0 -0' rgba="1 0.4 0.4 1" />
+
+        <body name="body_roll" pos="0 0 0.0405" quat="0.5 0.5 -0.5 -0.5">
+          <inertial pos="-0.00029919 0.0003642 -0.00019197" quat="0.773215 -0.138631 -0.0225684 0.618394" mass="0.0341207" diaginertia="5.46439e-06 4.90308e-06 4.81253e-06"/>
+          <geom type='mesh' mesh='body_roll' pos='0 0 0' euler='-0 0 -0' rgba="1 1 0.4 1"/>
+
+          <body name="body_pitch" pos="0 0 0" quat="0.707107 0 0.707107 0">
+            <inertial pos="-0.000617851 0.206789 -0.00114246" quat="0.70235 0.7111 -0.0266182 -0.0182289" mass="9.08107" diaginertia="0.154848 0.118223 0.0621153"/>
+            <geom type='mesh' mesh='body_pitch' pos='0 0 0' rgba="0.4 1 0.4 1" />
+            <geom type='mesh' mesh='body_pitch' pos='0 0 0' rgba="0.4 1 0.4 1" class="collision"/>
+
+            <body name="left_shoulder_pitch" pos="0 0.256 -0.1458" quat="0 0 0 -1" gravcomp="0">
+              <inertial pos="-0.00251212 -0.00149141 -0.0567447" quat="0.995975 -0.00474479 -0.0345991 0.0825443" mass="1.00679" diaginertia="0.00108562 0.000958162 0.000853398"/>
+              <geom type='mesh' mesh='left_shoulder_pitch' pos='0 0 0' euler='-0 0 -0'
+                rgba="1 0.4 0.4 1" />
+
+              <body name="left_shoudler_roll" pos="-0.0313 0 -0.0592" quat="0.707105 0 -0.707108 0" gravcomp="0">
+                <inertial pos="0.000154 0.0778464 -0.0271134" quat="0.765297 0.643563 -0.00747942 0.00959688" mass="0.691695" diaginertia="0.00133677 0.00119611 0.000636982"/>
+                <geom type='mesh' mesh='left_shoudler_roll' pos='0 0 0' euler='-0 0 -0'
+                  rgba="1 1 0.4 1" />
+
+                <body name="left_shoulder_yaw" pos="0 0.1252 -0.0313" quat="0.5 0.5 0.5 -0.5" gravcomp="0">
+                  <inertial pos="-2.68e-06 0.00182113 -0.00392718" quat="0.695603 -0.0321904 0.0318975 0.716996" mass="0.725594" diaginertia="0.00118335 0.00116041 0.000481204"/>
+                  <geom type='mesh' mesh='left_shoulder_yaw' pos='0 0 0' euler='-0 0 -0'
+                    rgba="0.4 1 0.4 1" />
+
+                  <body name="left_elbow_pitch" pos="0 -0.031 -0.0365" quat="0.707107 -0.707107 0 0" gravcomp="0">
+                    <inertial pos="2.604e-05 0.0752824 0.0268658" quat="0.597053 0.802202 -2.17373e-05 0.000362389" mass="0.697801" diaginertia="0.00116267 0.00100065 0.000614359"/>
+                    <geom type='mesh' mesh='left_elbow_pitch' pos='0 0 0' euler='-0 0 -0'
+                      rgba="0 0.5 1 1" />
+
+                    <body name="left_elbow_yaw" pos="0 0.117 0.031" quat="0.499999 0.499999 -0.500001 0.500001" gravcomp="0">
+                      <inertial pos="-0.00441829 -0.0011133 -0.0671345" quat="0.990089 0.12403 0.00491944 -0.065691" mass="0.803869" diaginertia="0.00198622 0.00186177 0.000729748"/>
+                      <geom type='mesh' mesh='left_elbow_yaw' pos='0 0 0' euler='-0 0 -0'
+                        rgba="0.8 1.0 0.8 1" />
+
+                      <body name="left_wrist_pitch" pos="0.006 9.9998e-05 -0.1394" quat="0.707105 0.707108 0 0" gravcomp="0">
+                        <inertial pos="-9.994e-05 0 0.00229425" quat="0.5 0.5 0.5 0.5" mass="0.00900738" diaginertia="5.7e-07 3.8e-07 2.4e-07"/>
+                        <geom type='mesh' mesh='left_wrist_pitch' pos='0 0 0' euler='-0 0 -0'
+                          rgba="1 0.5 1 1" />
+
+                        <body name="left_wrist_roll" pos="0 0 0" quat="0.707106 7.94894e-07 0.707108 7.94894e-07" gravcomp="0">
+                          <inertial pos="-0.0206623 -0.0314139 -0.00356965" quat="0.373433 0.844098 -0.329445 0.198775" mass="0.370899" diaginertia="0.000404077 0.000356841 0.000211598"/>
+                          <geom type='mesh' mesh='left_wrist_roll' pos='0 0 0' euler='-0 0 -0'
+                            rgba="1 0.4 0.4 1" />
+                        </body>
+                      </body>
+                    </body>
+                  </body>
+                </body>
+              </body>
+            </body>
+
+            <body name="right_shoulder_pitch" pos="0 0.256 0.1458" quat="-3.67321e-06 1 0 0" gravcomp="0">
+              <inertial pos="0.00250704 -0.00149173 -0.0567433" quat="0.995977 -0.00568417 0.0346063 -0.0824611" mass="1.00679" diaginertia="0.00108562 0.000958168 0.000853396"/>
+              <geom type='mesh' mesh='right_shoulder_pitch' pos='0 0 0' euler='-0 0 -0'
+                rgba="1 0.4 0.4 1" />
+
+              <body name="right_shoulder_roll" pos="0.0313 0 -0.0592" quat="-2.59734e-06 0.707105 2.59735e-06 0.707108" gravcomp="0">
+                <inertial pos="9.998e-05 -0.0778724 -0.0270563" quat="0.64355 0.765355 0.0062383 -0.00599871" mass="0.691426" diaginertia="0.00133761 0.00119704 0.000636294"/>
+                <geom type='mesh' mesh='right_shoulder_roll' pos='0 0 0' euler='-0 0 -0'
+                  rgba="1 1 0.4 1" />
+
+                <body name="right_shoulder_yaw" pos="0 -0.1252 -0.0313" quat="0.707105 0.707108 0 0" gravcomp="0">
+                  <inertial pos="-0.00189082 -0.00017102 0.0317511" quat="0.694675 -0.204745 -0.19648 0.660985" mass="0.604503" diaginertia="0.000550567 0.000505158 0.000491905"/>
+                  <geom type='mesh' mesh='right_shoulder_yaw' pos='0 0 0' euler='-0 0 -0'
+                    rgba="0.4 1 0.4 1" />
+
+                  <body name="right_elbow_pitch" pos="-0.031 0 0.0365" quat="0.5 -0.5 0.5 -0.5" gravcomp="0">
+                    <inertial pos="-2.604e-05 -0.0752983 0.0268563" quat="0.802034 0.597278 0.000362597 -2.24278e-05" mass="0.697801" diaginertia="0.00116302 0.00100116 0.000614171"/>
+                    <geom type='mesh' mesh='right_elbow_pitch' pos='0 0 0' euler='-0 0 -0'
+                      rgba="0 0.5 1 1" />
+
+                    <body name="right_elbow_yaw" pos="0 -0.117 0.031" quat="0.499999 -0.499999 -0.500001 -0.500001" gravcomp="0">
+                      <inertial pos="-0.00403154 -0.00910945 -0.047197" quat="0.994559 0.0793425 -0.0232128 -0.0633939" mass="0.573838" diaginertia="0.00105729 0.000985489 0.000647245"/>
+                      <geom type='mesh' mesh='right_elbow_yaw' pos='0 0 0' euler='-0 0 -0'
+                        rgba="0.8 1.0 0.8 1" />
+
+                      <body name="right_wrist_pitch" pos="0.006 9.9998e-05 -0.1394" quat="0.707105 0.707108 0 0" gravcomp="0">
+                        <inertial pos="-9.994e-05 0 0.00229425" quat="0.5 0.5 0.5 0.5" mass="0.00900738" diaginertia="5.7e-07 3.8e-07 2.4e-07"/>
+                        <geom type='mesh' mesh='right_wrist_pitch' pos='0 0 0' euler='-0 0 -0'
+                          rgba="1 0.5 1 1" />
+
+                        <body name="right_wrist_roll" pos="0 0 0" quat="0.707106 7.94894e-07 0.707108 7.94894e-07" gravcomp="0">
+                          <inertial pos="-0.0206623 -0.0314139 -0.00356965" quat="0.373433 0.844098 -0.329445 0.198775" mass="0.370899" diaginertia="0.000404077 0.000356841 0.000211598"/>
+                          <geom type='mesh' mesh='right_wrist_roll' pos='0 0 0' euler='-0 0 -0'
+                            rgba="1 0.4 0.4 1" />
+                        </body>
+                      </body>
+                    </body>
+                  </body>
+                </body>
+              </body>
+            </body>
+          </body>
+        </body>
+      </body>
+
+      <body name="left_hip_pitch_link" pos="0.00245 0.092277 -0.012143" quat="0.65328 0.270599 -0.270598 0.653282" gravcomp="0">
+        <inertial pos="-4.816e-05 -0.0107584 -0.0540778" quat="0.701067 -0.0888726 0.120485 0.697202" mass="1.67336" diaginertia="0.00256418 0.00234466 0.00215142"/>
+        <joint name="left_hip_pitch" type='hinge' pos="0 0 0" axis="0 0 1" damping='1'
+          range="-3.14 3.14" />
+        <geom type='mesh' mesh='left_hip_pitch' pos='0 0 0' euler='-0 0 -0' rgba="1 0.4 0.4 1"
+        />
+
+        <body name="left_hip_roll_link" pos="0 -0.0405 -0.0589" quat="0.653281 0.653281 0.270598 -0.270598" gravcomp="0">
+          <inertial pos="0.00014604 -0.0412348 -0.0101534" quat="0.30122 0.638429 -0.639374 0.304757" mass="0.285622" diaginertia="0.000612958 0.000557294 0.000275288"/>
+          <joint name="left_hip_roll" type='hinge' pos="0 0 0" axis="0 0 -1"
+            damping='1'
+            range="-3.14 3.14" />
+          <geom type='mesh' mesh='left_hip_roll' pos='0 0 0' euler='-0 0 -0' rgba="1 1 0.4 1"
+          />
+
+          <body name="left_hip_yaw_link" pos="0 -0.0838049 -0.0406" quat="0.707107 0.707107 0 0" gravcomp="0">
+            <inertial pos="-0.00212295 -3.062e-05 0.0916484" quat="0.999467 0.000387515 -0.0324521 -0.0035558" mass="2.73696" diaginertia="0.0127474 0.0124495 0.00338744"/>
+            <joint name="left_hip_yaw" type='hinge' pos="0 0 0" axis="0 0 -1"
+              damping='1'
+              range="-3.14 3.14" />
+            <geom type='mesh' mesh='left_hip_yaw' pos='0 0 0' euler='-0 0 -0'
+              rgba="0.4 1 0.4 1" />
+
+            <body name="lleft_knee_pitch_link" pos="-0.0337 0 0.1422" quat="0.5 -0.5 0.5 -0.5" gravcomp="0">
+              <inertial pos="-0.000466573 -0.136803 0.0283565" quat="0.549096 0.504801 -0.427727 0.510606" mass="1.68649" diaginertia="0.00989796 0.00939201 0.00196477"/>
+              <joint name="left_knee_pitch" type='hinge' pos="0 0 0" axis="0 0 1" damping='1'
+                range="-3.14 3.14" />
+              <geom type='mesh' mesh='left_knee_pitch' pos='0 0 0' euler='-0 0 -0' rgba="0 0.5 1 1"
+              />
+
+              <body name="left_ankle_pitch_link" pos="0 -0.30494 0.0336" quat="0 0 -1 0" gravcomp="0">
+                <inertial pos="-2.32e-06 7.077e-05 -2.048e-05" quat="0.707107 0 0 0.707107" mass="0.0621721" diaginertia="2.373e-05 2.273e-05 6.68e-06"/>
+                <joint name="left_ankle_pitch" type='hinge' pos="0 0 0" axis="0 0 -1"
+                  damping='1'
+                  range="-3.14 3.14" />
+                <geom type='mesh' mesh='left_ankle_pitch' pos='0 0 0' euler='-0 0 -0'
+                  rgba="0.8 1.0 0.8 1" />
+
+                <body name="left_ankle_roll_link" pos="0 0 0" quat="0.707107 0 0.707107 0" gravcomp="0">
+                  <inertial pos="0.00017156 -0.025247 -0.00019936" quat="0.707585 0.0111979 -0.0139853 0.706401" mass="0.589712" diaginertia="0.00190069 0.00152114 0.000541391"/>
+                  <joint name="left_ankle_roll" type='hinge' pos="0 0 0" axis="0 0 1"
+                    damping='1'
+                    range="-3.14 3.14" />
+                  <geom type='mesh' mesh='left_ankle_roll' pos='0 0 0' euler='-0 0 -0'
+                    rgba="1 0.5 1 1" />
+                  <geom type = "sphere" size = "0.002" pos = "0.03 -0.0408 0.07" class = "collision"  />
+                  <geom type = "sphere" size = "0.002" pos = "-0.03 -0.0408 0.07" class = "collision"  />
+                  <geom type = "sphere" size = "0.002" pos = "0.03 -0.0408 -0.07" class = "collision"  />
+                  <geom type = "sphere" size = "0.002" pos = "-0.03 -0.0408 -0.07" class = "collision"  />
+                  <geom type = "sphere" size = "0.02" pos = "0.03 -0.0408 0.07" rgba = "255 0 0 0.5"  />
+                  <geom type = "sphere" size = "0.02" pos = "-0.03 -0.0408 0.07" rgba = "255 0 0 0.5"  />
+                  <geom type = "sphere" size = "0.02" pos = "0.03 -0.0408 -0.07" rgba = "255 0 0 0.5"  />
+                  <geom type = "sphere" size = "0.02" pos = "-0.03 -0.0408 -0.07" rgba = "255 0 0 0.5"  />
+                </body>
+              </body>
+            </body>
+          </body>
+        </body>
+      </body>
+
+      <body name="right_hip_pitch_link" pos="0.00245 -0.092277 -0.012143" quat="0.65328 -0.270599 -0.270598 -0.653282" gravcomp="0">
+        <inertial pos="-0.00011915 0.0106148 -0.0540773" quat="0.704635 0.117558 -0.0919818 0.693692" mass="1.66657" diaginertia="0.00254519 0.00232121 0.00213313"/>
+        <joint name="right_hip_pitch" type='hinge' pos="0 0 0" axis="0 0 1"
+          damping='1'
+          range="-3.14 3.14" />
+        <geom type='mesh' mesh='right_hip_pitch' pos='0 0 0' euler='-0 0 -0'
+          rgba="1 0.4 0.4 1" />
+
+        <body name="right_hip_roll_link" pos="0 0.0405 -0.0589" quat="0.653281 0.653281 0.270598 -0.270598" gravcomp="0">
+          <inertial pos="-0.00027058 -0.0416462 0.0103426" quat="0.633195 0.29395 -0.315414 0.642785" mass="0.287688" diaginertia="0.000626566 0.000566795 0.000280389"/>
+          <joint name="right_hip_roll" type='hinge' pos="0 0 0" axis="0 0 1"
+            damping='1'
+            range="-3.14 3.14" />
+          <geom type='mesh' mesh='right_hip_roll' pos='0 0 0' euler='-0 0 -0' rgba="1 1 0.4 1"
+          />
+
+          <body name="right_hip_yaw_link" pos="0 -0.0777549 0.0406" quat="0.7071 -0.7071 -0.00304418 0.00304418" gravcomp="0">
+            <inertial pos="-0.00241642 1.672e-05 -0.097047" quat="0.999478 -0.000259753 0.0322929 -0.00102692" mass="2.69635" diaginertia="0.0125794 0.0122506 0.00332533"/>
+            <joint name="right_hip_yaw" type='hinge' pos="0 0 0" axis="0 0 1"
+              damping='1'
+              range="-3.14 3.14" />
+            <geom type='mesh' mesh='right_hip_yaw' pos='0 0 0' euler='-0 0 -0'
+              rgba="0.4 1 0.4 1" />
+
+            <body name="right_knee_pitch_link" pos="-0.0347752 0 -0.147956" quat="0.497843 0.502148 0.502148 0.497843" gravcomp="0">
+              <inertial pos="0.000491832 -0.13745 0.0283321" quat="0.511647 0.42723 -0.504967 0.548361" mass="1.68377" diaginertia="0.00984174 0.00933685 0.00195957"/>
+              <joint name="right_knee_pitch" type='hinge' pos="0 0 0" axis="0 0 -1"
+                damping='1'
+                range="-3.14 3.14" />
+              <geom type='mesh' mesh='right_knee_pitch' pos='0 0 0' euler='-0 0 -0' rgba="0 0.5 1 1"
+              />
+
+              <body name="right_ankle_pitch_link" pos="0 -0.30494 0.0336" quat="0 1 0 0" gravcomp="0">
+                <inertial pos="2.47e-06 -7.076e-05 -1.763e-05" quat="0.707107 -0.000207363 0.000207363 0.707107" mass="0.0621721" diaginertia="2.373e-05 2.273e-05 6.67999e-06"/>
+                <joint name="right_ankle_pitch" type='hinge' pos="0 0 0" axis="0 0 1"
+                  damping='1'
+                  range="-3.14 3.14" />
+                <geom type='mesh' mesh='right_ankle_pitch' pos='0 0 0' euler='-0 0 -0'
+                  rgba="0.8 1.0 0.8 1" />
+
+                <body name="right_ankle_roll_link" pos="0 0 0" quat="0.707107 0 0.707107 0" gravcomp="0">
+                  <inertial pos="-2.32e-06 0.0250789 0.00010556" quat="0.707007 -0.0123369 0.0123002 0.706992" mass="0.591828" diaginertia="0.00189406 0.00151845 0.00054237"/>
+                  <joint name="right_ankle_roll" type='hinge' pos="0 0 0" axis="0 0 1"
+                    damping='1'
+                    range="-3.14 3.14" />
+                  <geom type='mesh' mesh='right_ankle_roll' pos='0 0 0' euler='-0 0 -0'
+                    rgba="1 0.5 1 1" />
+                  <geom type = "sphere" size = "0.002" pos = "0.03 0.0408 0.07" class = "collision"  />
+                  <geom type = "sphere" size = "0.002" pos = "-0.03 0.0408 0.07" class = "collision"  />
+                  <geom type = "sphere" size = "0.002" pos = "0.03 0.0408 -0.07" class = "collision"  />
+                  <geom type = "sphere" size = "0.002" pos = "-0.03 0.0408 -0.07" class = "collision"  />
+                  <geom type = "sphere" size = "0.02" pos = "0.03 0.0408 0.07" rgba = "255 0 0 0.5"  />
+                  <geom type = "sphere" size = "0.02" pos = "-0.03 0.0408 0.07" rgba = "255 0 0 0.5"  />
+                  <geom type = "sphere" size = "0.02" pos = "0.03 0.0408 -0.07" rgba = "255 0 0 0.5"  />
+                  <geom type = "sphere" size = "0.02" pos = "-0.03 0.0408 -0.07" rgba = "255 0 0 0.5"  />
+                </body>
+              </body>
+            </body>
+          </body>
+        </body>
+      </body>
+    </body>
+  </worldbody>
+
+  <actuator>
+    <motor name = 'motor_left_hip_pitch' joint = 'left_hip_pitch' ctrlrange = '-150 150' />
+    <motor name = 'motor_left_hip_roll' joint = 'left_hip_roll' ctrlrange = '-50 50' />
+    <motor name = 'motor_left_hip_yaw' joint = 'left_hip_yaw' ctrlrange = '-50 50' />
+    <motor name = 'motor_left_knee_pitch' joint = 'left_knee_pitch' ctrlrange = '-150 150' />
+    <motor name = 'motor_left_ankle_pitch' joint = 'left_ankle_pitch' ctrlrange = '-18 18' />
+    <motor name = 'motor_left_ankle_roll' joint = 'left_ankle_roll' ctrlrange = '-18 18' />
+
+    <motor name = 'motor_right_hip_pitch' joint = 'right_hip_pitch' ctrlrange = '-150 150' />
+    <motor name = 'motor_right_hip_roll' joint = 'right_hip_roll' ctrlrange = '-50 50' />
+    <motor name = 'motor_right_hip_yaw' joint = 'right_hip_yaw' ctrlrange = '-50 50' />
+    <motor name = 'motor_right_knee_pitch' joint = 'right_knee_pitch' ctrlrange = '-150 150' />
+    <motor name = 'motor_right_ankle_pitch' joint = 'right_ankle_pitch' ctrlrange = '-18 18' />
+    <motor name = 'motor_right_ankle_roll' joint = 'right_ankle_roll' ctrlrange = '-18 18' />
+  </actuator>
+
+  <sensor>
+    <!-- imu sensor -->
+    <framequat name="body-orientation" objtype="site" objname="imu" noise="0"/>
+    <gyro name="body-angular-velocity" site="imu" noise="0.001"/>
+    <framepos name="body-linear-pos" objtype="site" objname="imu"/>
+    <velocimeter name="body-linear-vel" site="imu"/>
+    <accelerometer name="body-linear-acceleration" site="imu" noise="0.001"/>
+
+    <jointpos name = 'jointpos_left_hip_pitch' joint = 'left_hip_pitch' noise = "0" />
+    <jointpos name = 'jointpos_left_hip_roll' joint = 'left_hip_roll' noise = "0" />
+    <jointpos name = 'jointpos_left_hip_yaw' joint = 'left_hip_yaw' noise = "0" />
+    <jointpos name = 'jointpos_left_knee_pitch' joint = 'left_knee_pitch' noise = "0" />
+    <jointpos name = 'jointpos_left_ankle_pitch' joint = 'left_ankle_pitch' noise = "0" />
+    <jointpos name = 'jointpos_left_ankle_roll' joint = 'left_ankle_roll' noise = "0" />
+
+    <jointpos name = 'jointpos_right_hip_pitch' joint = 'right_hip_pitch' noise = "0" />
+    <jointpos name = 'jointpos_right_hip_roll' joint = 'right_hip_roll' noise = "0" />
+    <jointpos name = 'jointpos_right_hip_yaw' joint = 'right_hip_yaw' noise = "0" />
+    <jointpos name = 'jointpos_right_knee_pitch' joint = 'right_knee_pitch' noise = "0" />
+    <jointpos name = 'jointpos_right_ankle_pitch' joint = 'right_ankle_pitch' noise = "0" />
+    <jointpos name = 'jointpos_right_ankle_roll' joint = 'right_ankle_roll' noise = "0" />
+
+    <jointvel name = 'jointvel_left_hip_pitch' joint = 'left_hip_pitch' noise = "0" />
+    <jointvel name = 'jointvel_left_hip_roll' joint = 'left_hip_roll' noise = "0" />
+    <jointvel name = 'jointvel_left_hip_yaw' joint = 'left_hip_yaw' noise = "0" />
+    <jointvel name = 'jointvel_left_knee_pitch' joint = 'left_knee_pitch' noise = "0" />
+    <jointvel name = 'jointvel_left_ankle_pitch' joint = 'left_ankle_pitch' noise = "0" />
+    <jointvel name = 'jointvel_left_ankle_roll' joint = 'left_ankle_roll' noise = "0" />
+
+    <jointvel name = 'jointvel_right_hip_pitch' joint = 'right_hip_pitch' noise = "0" />
+    <jointvel name = 'jointvel_right_hip_roll' joint = 'right_hip_roll' noise = "0" />
+    <jointvel name = 'jointvel_right_hip_yaw' joint = 'right_hip_yaw' noise = "0" />
+    <jointvel name = 'jointvel_right_knee_pitch' joint = 'right_knee_pitch' noise = "0" />
+    <jointvel name = 'jointvel_right_ankle_pitch' joint = 'right_ankle_pitch' noise = "0" />
+    <jointvel name = 'jointvel_right_ankle_roll' joint = 'right_ankle_roll' noise = "0" />
+  </sensor>
+
+
+  <keyframe>
+    <key name = "home_default" qpos = "0 0 0.61 1 0 0 0 0.48891 0.06213 -0.33853 0.63204 -0.27224 0. -0.48891 -0.06213 0.33853 0.63204 -0.27224 0."/>
+  </keyframe>
+</mujoco>
\ No newline at end of file
--- a/resources/robots/x1/mjcf/xyber_x1_flat.xml
+++ b/resources/robots/x1/mjcf/xyber_x1_flat.xml
+<mujoco model = "zhiyuan scene">
+
+    <include file = "robot/xyber_x1/xyber_x1_serial.xml" />
+    <include file = "environment/flat.xml" />
+
+</mujoco>
--- a/resources/robots/x1/urdf/x1.urdf
+++ b/resources/robots/x1/urdf/x1.urdf
+<?xml version="1.0" encoding="utf-8"?>
+<!-- This URDF was automatically created by SolidWorks to URDF Exporter! Originally created by Stephen Brawner (brawner@gmail.com) 
+     Commit Version: 1.6.0-4-g7f85cfe  Build Version: 1.6.7995.38578
+     For more information, please see http://wiki.ros.org/sw_urdf_exporter -->
+<robot
+  name="x1">
+  <mujoco>
+        <compiler meshdir="../meshes/" balanceinertia="true" discardvisual="false" fusestatic="false"/>
+  </mujoco>
+  <link
+    name="base_link">
+    <inertial>
+      <origin
+        xyz="0.00252285 -0.00063439 0.03023409"
+        rpy="0 0 0" />
+      <mass
+        value="4.3041648" />
+      <inertia
+        ixx="0.02680559"
+        ixy="-5.49e-06"
+        ixz="5.389e-05"
+        iyy="0.01083128"
+        iyz="-0.00011229"
+        izz="0.02180955" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/base_link_simple.STL" />
+      </geometry>
+      <material
+        name="base_link_material">
+        <color
+          rgba="0.8 0.4 0 1" />
+      </material>
+    </visual>
+    <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/base_link_simple.STL" />
+      </geometry>
+    </collision>
+  </link>
+  <link
+    name="lumber_yaw">
+    <inertial>
+      <origin
+        xyz="-0.01800066 1.4e-06 0.02432982"
+        rpy="0 0 0" />
+      <mass
+        value="0.36251906" />
+      <inertia
+        ixx="0.00035849"
+        ixy="0"
+        ixz="6.896e-05"
+        iyy="0.00045823"
+        iyz="-1e-08"
+        izz="0.00063906" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/lumber_yaw.STL" />
+      </geometry>
+      <material
+        name="lumber_yaw_material">
+        <color
+          rgba="1 0.4 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/lumber_yaw.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="lumber_yaw_joint"
+    type="fixed">
+    <origin
+      xyz="0.00244999999994827 0 0.115534033809005"
+      rpy="0 0 -4.54911392613975E-05" />
+    <parent
+      link="base_link" />
+    <child
+      link="lumber_yaw" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="lumber_roll">
+    <inertial>
+      <origin
+        xyz="-0.00029919 0.0003642 -0.00019197"
+        rpy="0 0 0" />
+      <mass
+        value="0.03412073" />
+      <inertia
+        ixx="4.93e-06"
+        ixy="1.3e-07"
+        ixz="0"
+        iyy="5.42e-06"
+        iyz="-9e-08"
+        izz="4.83e-06" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/lumber_roll.STL" />
+      </geometry>
+      <material
+        name="lumber_roll_material">
+        <color
+          rgba="1 0.4 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/lumber_roll.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="lumber_roll_joint"
+    type="fixed">
+    <origin
+      xyz="0 0 0.0405"
+      rpy="1.57079632679491 0 -1.5707963267949" />
+    <parent
+      link="lumber_yaw" />
+    <child
+      link="lumber_roll" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="lumber_pitch">
+    <inertial>
+      <origin
+        xyz="0.00070244 0.21072612 -0.00117164"
+        rpy="0 0 0" />
+      <mass
+        value="8.8571074" />
+      <inertia
+        ixx="0.14777412"
+        ixy="-0.00392755"
+        ixz="0.00035997"
+        iyy="0.06085873"
+        iyz="-0.00078171"
+        izz="0.11164753" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/lumber_pitch.STL" />
+      </geometry>
+      <material
+        name="lumber_pitch_material">
+        <color
+          rgba="0.4 1 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/lumber_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="lumber_pitch_joint"
+    type="fixed">
+    <origin
+      xyz="0 0 0"
+      rpy="0 1.5707963267949 0" />
+    <parent
+      link="lumber_roll" />
+    <child
+      link="lumber_pitch" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="left_shoulder_pitch">
+    <inertial>
+      <origin
+        xyz="-0.00251212 -0.00149141 -0.05674475"
+        rpy="0 0 0" />
+      <mass
+        value="1.0067875" />
+      <inertia
+        ixx="0.00108106"
+        ixy="2.069e-05"
+        ixz="1.583e-05"
+        iyy="0.00096162"
+        iyz="1.04e-06"
+        izz="0.0008545" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_shoulder_pitch.STL" />
+      </geometry>
+      <material
+        name="left_shoulder_pitch_material">
+        <color
+          rgba="1 0.4 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_shoulder_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_shoulder_pitch_joint"
+    type="fixed">
+    <origin
+      xyz="0 0.255999999999995 -0.145800000000004"
+      rpy="0 0 -3.14159265358978" />
+    <parent
+      link="lumber_pitch" />
+    <child
+      link="left_shoulder_pitch" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="left_shoulder_roll">
+    <inertial>
+      <origin
+        xyz="0.000154 0.07784637 -0.02711338"
+        rpy="0 0 0" />
+      <mass
+        value="0.69169508" />
+      <inertia
+        ixx="0.00133669"
+        ixy="1.21e-06"
+        ixz="3.26e-06"
+        iyy="0.00065344"
+        iyz="9.451e-05"
+        izz="0.00117974" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_shoulder_roll.STL" />
+      </geometry>
+      <material
+        name="left_shoulder_roll_material">
+        <color
+          rgba="1 1 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_shoulder_roll.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_shoulder_roll_joint"
+    type="fixed">
+    <origin
+      xyz="-0.0313 0 -0.0592"
+      rpy="0 -1.5708 0" />
+    <parent
+      link="left_shoulder_pitch" />
+    <child
+      link="left_shoulder_roll" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="left_shoulder_yaw">
+    <inertial>
+      <origin
+        xyz="-2.68e-06 0.00182113 -0.00392718"
+        rpy="0 0 0" />
+      <mass
+        value="0.72559372" />
+      <inertia
+        ixx="0.00116043"
+        ixy="-5.8e-07"
+        ixz="1.27e-06"
+        iyy="0.00117758"
+        iyz="-6.33e-05"
+        izz="0.00048696" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_shoulder_yaw.STL" />
+      </geometry>
+      <material
+        name="left_shoulder_yaw_material">
+        <color
+          rgba="0.4 1 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_shoulder_yaw.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_shoulder_yaw_joint"
+    type="fixed">
+    <origin
+      xyz="0 0.1252 -0.0313000000000029"
+      rpy="1.57079632679488 1.5707963267949 0" />
+    <parent
+      link="left_shoulder_roll" />
+    <child
+      link="left_shoulder_yaw" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="left_elbow_pitch">
+    <inertial>
+      <origin
+        xyz="2.604e-05 0.07528242 0.0268658"
+        rpy="0 0 0" />
+      <mass
+        value="0.69780135" />
+      <inertia
+        ixx="0.00116267"
+        ixy="2.7e-07"
+        ixz="1.6e-07"
+        iyy="0.00064619"
+        iyz="-0.00010622"
+        izz="0.00096882" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_elbow_pitch.STL" />
+      </geometry>
+      <material
+        name="left_elbow_pitch_material">
+        <color
+          rgba="0 0.5 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_elbow_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_elbow_pitch_joint"
+    type="fixed">
+    <origin
+      xyz="0 -0.031 -0.0365000000000021"
+      rpy="-1.5707963267949 0 0" />
+    <parent
+      link="left_shoulder_yaw" />
+    <child
+      link="left_elbow_pitch" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="left_elbow_yaw">
+    <inertial>
+      <origin
+        xyz="3.656e-5 0.00424199 -0.05421422"
+        rpy="0 0 0" />
+      <mass
+        value="0.28787886" />
+      <inertia
+        ixx="0.00071486"
+        ixy="9e-08"
+        ixz="-3.7e-07"
+        iyy="0.00069793"
+        iyz="2.677e-05"
+        izz="0.00054487" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_elbow_yaw.STL" />
+      </geometry>
+      <material
+        name="left_elbow_yaw_material">
+        <color
+          rgba="0.8 1.0 0.8 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_elbow_yaw.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_elbow_yaw_joint"
+    type="fixed">
+    <origin
+      xyz="0 0.116999999999998 0.0310000000000001"
+      rpy="1.5707963267949  -1.5708 0" />
+    <parent
+      link="left_elbow_pitch" />
+    <child
+      link="left_elbow_yaw" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="left_wrist_pitch">
+    <inertial>
+      <origin
+        xyz="-9.994e-05 0 0.00229425"
+        rpy="0 0 0" />
+      <mass
+        value="0.00900738" />
+      <inertia
+        ixx="2.4e-07"
+        ixy="0"
+        ixz="0"
+        iyy="5.7e-07"
+        iyz="0"
+        izz="3.8e-07" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_wrist_pitch.STL" />
+      </geometry>
+      <material
+        name="left_wrist_pitch_material">
+        <color
+          rgba="1 0.5 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_wrist_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_wrist_pitch_joint"
+    type="fixed">
+    <origin
+      xyz="0.006 9.9998E-05 -0.1394"
+      rpy="1.5708 0 0" />
+    <parent
+      link="left_elbow_yaw" />
+    <child
+      link="left_wrist_pitch" />
+    <axis
+      xyz="-1 0 0" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  
+  
+
+  <link
+    name="right_shoulder_pitch">
+    <inertial>
+      <origin
+        xyz="0.00250704 -0.00149173 -0.05674328"
+        rpy="0 0 0" />
+      <mass
+        value="1.0067875" />
+      <inertia
+        ixx="0.00108106"
+        ixy="-2.069e-05"
+        ixz="-1.583e-05"
+        iyy="0.00096162"
+        iyz="8.4e-07"
+        izz="0.0008545" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_shoulder_pitch.STL" />
+      </geometry>
+      <material
+        name="right_shoulder_pitch_material">
+        <color
+          rgba="1 0.4 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_shoulder_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_shoulder_pitch_joint"
+    type="fixed">
+    <origin
+      xyz="0 0.256 0.1458"
+      rpy="3.1416 0 0" />
+    <parent
+      link="lumber_pitch" />
+    <child
+      link="right_shoulder_pitch" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="right_shoulder_roll">
+<inertial>
+  <origin
+    xyz="9.998e-05 -0.07787239 -0.02705625"
+    rpy="0 0 0" />
+  <mass
+    value="0.69142604" />
+  <inertia
+    ixx="0.00133757"
+    ixy="-3.8e-07"
+    ixz="-2.53e-06"
+    iyy="0.00065281"
+    iyz="-9.481e-05"
+    izz="0.00118057" />
+</inertial>
+
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_shoulder_roll.STL" />
+      </geometry>
+      <material
+        name="right_shoulder_roll_material">
+        <color
+          rgba="1 0.4 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_shoulder_roll.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_shoulder_roll_joint"
+    type="fixed">
+    <origin
+      xyz="0.0313 0 -0.0592"
+      rpy="3.1416 -1.5708 0" />
+    <parent
+      link="right_shoulder_pitch" />
+    <child
+      link="right_shoulder_roll" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="right_shoulder_yaw">
+<inertial>
+  <origin
+    xyz="-0.00189082 -0.00017102 0.03175107"
+    rpy="0 0 0" />
+  <mass
+    value="0.604503" />
+  <inertia
+    ixx="0.00050135"
+    ixy="2.4e-06"
+    ixz="6.05e-06"
+    iyy="0.00055045"
+    iyz="-1.7e-07"
+    izz="0.00049583" />
+</inertial>
+
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_shoulder_yaw.STL" />
+      </geometry>
+      <material
+        name="right_shoulder_yaw_material">
+        <color
+          rgba="1 1 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_shoulder_yaw.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_shoulder_yaw_joint"
+    type="fixed">
+    <origin
+      xyz="0 -0.1252 -0.0313"
+      rpy="1.5708 0 0" />
+    <parent
+      link="right_shoulder_roll" />
+    <child
+      link="right_shoulder_yaw" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="right_elbow_pitch">
+<inertial>
+  <origin
+    xyz="-2.604e-05 -0.07529828 0.02685625"
+    rpy="0 0 0" />
+  <mass
+    value="0.69780135" />
+  <inertia
+    ixx="0.00116302"
+    ixy="2.7e-07"
+    ixz="-1.6e-07"
+    iyy="0.00064594"
+    iyz="0.00010623"
+    izz="0.00096939" />
+</inertial>
+
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_elbow_pitch.STL" />
+      </geometry>
+      <material
+        name="right_elbow_pitch_material">
+        <color
+          rgba="0.4 1 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_elbow_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_elbow_pitch_joint"
+    type="fixed">
+    <origin
+      xyz="-0.0309999999999999 0 0.0365000000000022"
+      rpy="-1.5707963267949 0 -1.5707963267949" />
+    <parent
+      link="right_shoulder_yaw" />
+    <child
+      link="right_elbow_pitch" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="right_elbow_yaw">
+    <inertial>
+      <origin
+        xyz="-3.65e-5 -0.00422926 -0.05414537"
+        rpy="0 0 0" />
+      <mass
+        value="0.28835476" />
+      <inertia
+        ixx="0.00071572"
+        ixy="9e-08"
+        ixz="3.7e-07"
+        iyy="0.00069879"
+        iyz="-2.692e-05"
+        izz="0.00054493" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_elbow_yaw.STL" />
+      </geometry>
+      <material
+        name="right_elbow_yaw_material">
+        <color
+          rgba="0 0.5 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_elbow_yaw.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_elbow_yaw_joint"
+    type="fixed">
+    <origin
+      xyz="0 -0.116999999999998 0.0310000000000001"
+      rpy="-1.5707963267949 -1.5708 0" />
+    <parent
+      link="right_elbow_pitch" />
+    <child
+      link="right_elbow_yaw" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="right_wrist_pitch">
+    <inertial>
+      <origin
+        xyz="-9.994e-05 0 0.00229425"
+        rpy="0 0 0" />
+      <mass
+        value="0.00900738" />
+      <inertia
+        ixx="2.4e-07"
+        ixy="0"
+        ixz="0"
+        iyy="5.7e-07"
+        iyz="0"
+        izz="3.8e-07" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_wrist_pitch.STL" />
+      </geometry>
+      <material
+        name="right_wrist_pitch_material">
+        <color
+          rgba="0.8 1.0 0.8 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_wrist_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_wrist_pitch_joint"
+    type="fixed">
+    <origin
+      xyz="0.006 9.9998E-05 -0.1394"
+      rpy="1.5708 0 0" />
+    <parent
+      link="right_elbow_yaw" />
+    <child
+      link="right_wrist_pitch" />
+    <axis
+      xyz="1 0 0" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  
+  <link
+    name="waist_motor_a_link">
+    <inertial>
+      <origin
+        xyz="-0.04046842 0 -0.00673515"
+        rpy="0 0 0" />
+      <mass
+        value="0.05818703" />
+      <inertia
+        ixx="1.004e-05"
+        ixy="-3e-08"
+        ixz="-1.16e-05"
+        iyy="5.053e-05"
+        iyz="0"
+        izz="4.77e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_a_link.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="0.776470588235294 0.756862745098039 0.737254901960784 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_a_link.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="waist_motor_a_link_joint"
+    type="fixed">
+    <origin
+      xyz="0 0.0749999999999926 0.0708000000001417"
+      rpy="0 0 0" />
+    <parent
+      link="lumber_pitch" />
+    <child
+      link="waist_motor_a_link" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="waist_motor_a_ball">
+    <inertial>
+      <origin
+        xyz="-0.00179727 0.03749978 -3.559e-05"
+        rpy="0 0 0" />
+      <mass
+        value="0.03613902" />
+      <inertia
+        ixx="3.412e-05"
+        ixy="0"
+        ixz="0"
+        iyy="2.43e-06"
+        iyz="0"
+        izz="3.624e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_a_ball.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="0.776470588235294 0.756862745098039 0.737254901960784 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_a_ball.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="waist_motor_a_ball_joint"
+    type="fixed">
+    <origin
+      xyz="-0.065 0 -0.0108000000001461"
+      rpy="3.14159265358979 0 0" />
+    <parent
+      link="waist_motor_a_link" />
+    <child
+      link="waist_motor_a_ball" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="waist_motor_a_loop">
+    <inertial>
+      <origin
+        xyz="0 0 0.00052471"
+        rpy="0 0 0" />
+      <mass
+        value="0.01765296" />
+      <inertia
+        ixx="8e-07"
+        ixy="0"
+        ixz="0"
+        iyy="8e-07"
+        iyz="0"
+        izz="3.5e-07" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_a_loop.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_a_loop.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="waist_motor_a_loop_joint"
+    type="fixed">
+    <origin
+      xyz="0 0.0749999999999924 0"
+      rpy="3.14159265358979 0 0" />
+    <parent
+      link="waist_motor_a_ball" />
+    <child
+      link="waist_motor_a_loop" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="waist_motor_b_link">
+    <inertial>
+      <origin
+        xyz="-0.04046842 0 -0.00673515"
+        rpy="0 0 0" />
+      <mass
+        value="0.05818703" />
+      <inertia
+        ixx="1.004e-05"
+        ixy="-3e-08"
+        ixz="-1.16e-05"
+        iyy="5.053e-05"
+        iyz="0"
+        izz="4.77e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_b_link.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="0.776470588235294 0.756862745098039 0.737254901960784 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_b_link.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="waist_motor_b_link_joint"
+    type="fixed">
+    <origin
+      xyz="0 0.0750000000000303 -0.0708000000000074"
+      rpy="3.14159265358979 0 0" />
+    <parent
+      link="lumber_pitch" />
+    <child
+      link="waist_motor_b_link" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="waist_motor_b_ball">
+    <inertial>
+      <origin
+        xyz="3.559e-05 -0.03749978 0.00179727"
+        rpy="0 0 0" />
+      <mass
+        value="0.03613902" />
+      <inertia
+        ixx="3.624e-05"
+        ixy="0"
+        ixz="0"
+        iyy="2.43e-06"
+        iyz="0"
+        izz="3.412e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_b_ball.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_b_ball.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="waist_motor_b_ball_joint"
+    type="fixed">
+    <origin
+      xyz="-0.0650000000000492 0 -0.0108000000000005"
+      rpy="3.14159265358979 1.5707963267949 0" />
+    <parent
+      link="waist_motor_b_link" />
+    <child
+      link="waist_motor_b_ball" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="waist_motor_b_loop">
+    <inertial>
+      <origin
+        xyz="-0.00052471 0 0"
+        rpy="0 0 0" />
+      <mass
+        value="0.01765296" />
+      <inertia
+        ixx="3.5e-07"
+        ixy="0"
+        ixz="0"
+        iyy="8e-07"
+        iyz="0"
+        izz="8e-07" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_b_loop.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/waist_motor_b_loop.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="waist_motor_b_loop_joint"
+    type="fixed">
+    <origin
+      xyz="0 -0.0749999999999924 0"
+      rpy="0 0 0" />
+    <parent
+      link="waist_motor_b_ball" />
+    <child
+      link="waist_motor_b_loop" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="left_hip_pitch">
+    <inertial>
+      <origin
+        xyz="-4.816e-05 -0.01075844 -0.05407776"
+        rpy="0 0 0" />
+      <mass
+        value="1.6733645" />
+      <inertia
+        ixx="0.00234427"
+        ixy="-2.8e-06"
+        ixz="-8.23e-06"
+        iyy="0.00252881"
+        iyz="-0.00011548"
+        izz="0.00218719" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_hip_pitch.STL" />
+      </geometry>
+      <material
+        name="left_hip_pitch_material">
+        <color
+          rgba="1 0.4 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_hip_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_hip_pitch_joint"
+    type="revolute">
+    <origin
+      xyz="0.00245 0.092277 -0.012143"
+      rpy="0 -0.7854 1.5708" />
+    <parent
+      link="base_link" />
+    <child
+      link="left_hip_pitch" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="150"
+      velocity="8" />
+  </joint>
+  <link
+    name="left_hip_roll">
+    <inertial>
+      <origin
+        xyz="0.00014604 -0.04123483 -0.01015339"
+        rpy="0 0 0" />
+      <mass
+        value="0.28562185" />
+      <inertia
+        ixx="0.00055729"
+        ixy="9.8e-07"
+        ixz="5.6e-07"
+        iyy="0.0004105"
+        iyz="-0.00016545"
+        izz="0.00047775" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_hip_roll.STL" />
+      </geometry>
+      <material
+        name="left_hip_roll_material">
+        <color
+          rgba="1 1 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_hip_roll.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_hip_roll_joint"
+    type="revolute">
+    <origin
+      xyz="0 -0.0405000000000076 -0.0589000000000007"
+      rpy="1.57079632679491 0.785398163397451 0" />
+    <parent
+      link="left_hip_pitch" />
+    <child
+      link="left_hip_roll" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="150"
+      velocity="8" />
+  </joint>
+  <link
+    name="left_hip_yaw">
+    <inertial>
+      <origin
+        xyz="-0.00212295 -3.062e-05 0.09164842"
+        rpy="0 0 0" />
+      <mass
+        value="2.7369623" />
+      <inertia
+        ixx="0.01270795"
+        ixy="-2.44e-06"
+        ixz="0.00060592"
+        iyy="0.01244948"
+        iyz="4.78e-06"
+        izz="0.00342683" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_hip_yaw.STL" />
+      </geometry>
+      <material
+        name="left_hip_yaw_material">
+        <color
+          rgba="0.4 1 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_hip_yaw.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_hip_yaw_joint"
+    type="revolute">
+    <origin
+      xyz="0 -0.0838049159474928 -0.0406000000000136"
+      rpy="1.5707963267949 0 0" />
+    <parent
+      link="left_hip_roll" />
+    <child
+      link="left_hip_yaw" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="50"
+      velocity="24" />
+  </joint>
+  <link
+    name="left_knee_pitch">
+    <inertial>
+      <origin
+        xyz="-0.0047354 -0.13101324 0.0279688"
+        rpy="0 0 0" />
+      <mass
+        value="1.5122608" />
+      <inertia
+        ixx="0.00806663"
+        ixy="-5.703e-05"
+        ixz="3.669e-05"
+        iyy="0.00174565"
+        iyz="0.00090864"
+        izz="0.00825577" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_knee_pitch.STL" />
+      </geometry>
+      <material
+        name="left_knee_pitch_material">
+        <color
+          rgba="0 0.5 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_knee_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_knee_pitch_joint"
+    type="revolute">
+    <origin
+      xyz="-0.0337000000000097 0 0.142200000000001"
+      rpy="-1.57079632679488 0 -1.5707963267949" />
+    <parent
+      link="left_hip_yaw" />
+    <child
+      link="left_knee_pitch" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="150"
+      velocity="8" />
+  </joint>
+  <link
+    name="left_ankle_pitch">
+    <inertial>
+      <origin
+        xyz="-2.32e-06 7.077e-05 -2.048e-05"
+        rpy="0 0 0" />
+      <mass
+        value="0.06217211" />
+      <inertia
+        ixx="2.273e-05"
+        ixy="0"
+        ixz="0"
+        iyy="2.373e-05"
+        iyz="0"
+        izz="6.68e-06" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_ankle_pitch.STL" />
+      </geometry>
+      <material
+        name="left_ankle_pitch_material">
+        <color
+          rgba="0 0.5 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_ankle_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="left_ankle_pitch_joint"
+    type="revolute">
+    <origin
+      xyz="0 -0.304939999999997 0.0336000000000411"
+      rpy="-3.14159265358979 0 3.14159265358979" />
+    <parent
+      link="left_knee_pitch" />
+    <child
+      link="left_ankle_pitch" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="80"
+      velocity="10" />
+  </joint>
+  <link
+    name="left_ankle_roll">
+    <inertial>
+      <origin
+        xyz="0.00017156 -0.02524695 -0.00019936"
+        rpy="0 0 0" />
+      <mass
+        value="0.58971207" />
+      <inertia
+        ixx="0.00152113"
+        ixy="4.7e-07"
+        ixz="3.91e-06"
+        iyy="0.00189897"
+        iyz="4.837e-05"
+        izz="0.00054313" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_ankle_roll.STL" />
+      </geometry>
+      <material
+        name="left_ankle_roll_material">
+        <color
+          rgba="1 0.5 1 1" />
+      </material>
+    </visual>
+    <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/left_ankle_roll.STL" />
+      </geometry>
+    </collision>
+  </link>
+  <joint
+    name="left_ankle_roll_joint"
+    type="revolute">
+    <origin
+      xyz="0 0 0"
+      rpy="0 1.5707963267949 0" />
+    <parent
+      link="left_ankle_pitch" />
+    <child
+      link="left_ankle_roll" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="80"
+      velocity="10" />
+  </joint>
+  <link
+    name="leg_l_toe_a_link">
+    <inertial>
+      <origin
+        xyz="-0.00875902 0.00339112 -0.00875449"
+        rpy="0 0 0" />
+      <mass
+        value="0.03114168" />
+      <inertia
+        ixx="4.33e-06"
+        ixy="1.42e-06"
+        ixz="-1.59e-06"
+        iyy="7.48e-06"
+        iyz="6.1e-07"
+        izz="8.76e-06" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_a_link.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_a_link.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_l_toe_a_link_joint"
+    type="fixed">
+    <origin
+      xyz="0.0198500000000011 -0.109961932116273 0.0336568378288508"
+      rpy="0 -1.5707963267949 0" />
+    <parent
+      link="left_knee_pitch" />
+    <child
+      link="leg_l_toe_a_link" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="leg_l_toe_a_ball">
+    <inertial>
+      <origin
+        xyz="-1.914e-5 -0.0941377 1.9e-6"
+        rpy="0 0 0" />
+      <mass
+        value="0.05674707" />
+      <inertia
+        ixx="0.00022106"
+        ixy="-1e-07"
+        ixz="0"
+        iyy="1.09e-06"
+        iyz="-1e-08"
+        izz="0.00022113" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_a_ball.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_a_ball.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_l_toe_a_ball_joint"
+    type="fixed">
+    <origin
+      xyz="-0.0233806454082247 0.00902193211627456 -0.0211500000000038"
+      rpy="0 0 0" />
+    <parent
+      link="leg_l_toe_a_link" />
+    <child
+      link="leg_l_toe_a_ball" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="leg_l_toe_a_loop">
+    <inertial>
+      <origin
+        xyz="0 0 -0.0006502"
+        rpy="0 0 0" />
+      <mass
+        value="0.0062017" />
+      <inertia
+        ixx="2.8e-07"
+        ixy="0"
+        ixz="0"
+        iyy="2.8e-07"
+        iyz="0"
+        izz="1.2e-07" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_a_loop.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_a_loop.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_l_toe_a_loop_joint"
+    type="fixed">
+    <origin
+      xyz="0 -0.195 0"
+      rpy="0 1.5707963267949 0" />
+    <parent
+      link="leg_l_toe_a_ball" />
+    <child
+      link="leg_l_toe_a_loop" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="leg_l_toe_b_link">
+    <inertial>
+      <origin
+        xyz="0.0089831 0.00345524 -0.00906395"
+        rpy="0 0 0" />
+      <mass
+        value="0.03161" />
+      <inertia
+        ixx="4.6e-06"
+        ixy="-1.46e-06"
+        ixz="1.73e-06"
+        iyy="7.81e-06"
+        iyz="6.7e-07"
+        izz="8.88e-06" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_b_link.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_b_link.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_l_toe_b_link_joint"
+    type="fixed">
+    <origin
+      xyz="0.0198499999999967 -0.164961932116271 0.0335431621711256"
+      rpy="0 -1.5707963267949 0" />
+    <parent
+      link="left_knee_pitch" />
+    <child
+      link="leg_l_toe_b_link" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="leg_l_toe_b_ball">
+    <inertial>
+      <origin
+        xyz="-2.566e-5 -0.06676392 2.566e-5"
+        rpy="0 0 0" />
+      <mass
+        value="0.0423305" />
+      <inertia
+        ixx="9.172e-05"
+        ixy="-7e-08"
+        ixz="0"
+        iyy="9.5e-07"
+        iyz="-7e-08"
+        izz="9.179e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_b_ball.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_b_ball.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_l_toe_b_ball_joint"
+    type="fixed">
+    <origin
+      xyz="0.0233806454082588 0.00902193211627189 -0.0211500000000081"
+      rpy="0 0 0" />
+    <parent
+      link="leg_l_toe_b_link" />
+    <child
+      link="leg_l_toe_b_ball" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="leg_l_toe_b_loop">
+    <inertial>
+      <origin
+        xyz="0 0 0.00064901"
+        rpy="0 0 0" />
+      <mass
+        value="0.0062017" />
+      <inertia
+        ixx="2.8e-07"
+        ixy="0"
+        ixz="0"
+        iyy="2.8e-07"
+        iyz="0"
+        izz="1.2e-07" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_b_loop.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_l_toe_b_loop.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_l_toe_b_loop_joint"
+    type="fixed">
+    <origin
+      xyz="0 -0.14 0"
+      rpy="0 1.5707963267949 0" />
+    <parent
+      link="leg_l_toe_b_ball" />
+    <child
+      link="leg_l_toe_b_loop" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  
+  <link
+    name="right_hip_pitch">
+    <inertial>
+      <origin
+        xyz="-0.00011915 0.01061483 -0.05407727"
+        rpy="0 0 0" />
+      <mass
+        value="1.6665735" />
+      <inertia
+        ixx="0.00232109"
+        ixy="6.27e-06"
+        ixz="-4.66e-06"
+        iyy="0.00250972"
+        iyz="0.00011538"
+        izz="0.00216871" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_hip_pitch.STL" />
+      </geometry>
+      <material
+        name="right_hip_pitch_material">
+        <color
+          rgba="1 0.4 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_hip_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_hip_pitch_joint"
+    type="revolute">
+    <origin
+      xyz="0.00245 -0.092277 -0.012143"
+      rpy="0 -0.7854 -1.5708" />
+    <parent
+      link="base_link" />
+    <child
+      link="right_hip_pitch" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="150"
+      velocity="8" />
+  </joint>
+  <link
+    name="right_hip_roll">
+    <inertial>
+      <origin
+        xyz="-0.00027058 -0.04164617 0.01034263"
+        rpy="0 0 0" />
+      <mass
+        value="0.28768793" />
+      <inertia
+        ixx="0.0005667"
+        ixy="-5.75e-06"
+        ixz="2.7e-06"
+        iyy="0.00041717"
+        iyz="0.00016914"
+        izz="0.00048988" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_hip_roll.STL" />
+      </geometry>
+      <material
+        name="right_hip_roll_material">
+        <color
+          rgba="1 1 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_hip_roll.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_hip_roll_joint"
+    type="revolute">
+    <origin
+      xyz="0 0.0405000000001067 -0.0588999999999981"
+      rpy="1.57079632679491 0.78539816339745 0" />
+    <parent
+      link="right_hip_pitch" />
+    <child
+      link="right_hip_roll" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="150"
+      velocity="8" />
+  </joint>
+  <link
+    name="right_hip_yaw">
+    <inertial>
+      <origin
+        xyz="-0.00241642 1.672e-05 -0.09704704"
+        rpy="0 0 0" />
+      <mass
+        value="2.6963535" />
+      <inertia
+        ixx="0.01254086"
+        ixy="-9.4e-07"
+        ixz="-0.00059613"
+        iyy="0.01225062"
+        iyz="-3.99e-06"
+        izz="0.00336389" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_hip_yaw.STL" />
+      </geometry>
+      <material
+        name="right_hip_yaw_material">
+        <color
+          rgba="0.4 1 0.4 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_hip_yaw.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_hip_yaw_joint"
+    type="revolute">
+    <origin
+      xyz="0 -0.0777549159474913 0.0405999999999983"
+      rpy="-1.57079632680723 0 0.00861027512635925" />
+    <parent
+      link="right_hip_roll" />
+    <child
+      link="right_hip_yaw" />
+    <axis
+      xyz="-0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="50"
+      velocity="24" />
+  </joint>
+  <link
+    name="right_knee_pitch">
+    <inertial>
+      <origin
+        xyz="0.00477062 -0.13172342 0.02793834"
+        rpy="0 0 0" />
+      <mass
+        value="1.50954" />
+      <inertia
+        ixx="0.0080222"
+        ixy="3.758e-05"
+        ixz="-3.424e-05"
+        iyy="0.00174185"
+        iyz="0.00090582"
+        izz="0.00821165" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_knee_pitch.STL" />
+      </geometry>
+      <material
+        name="right_knee_pitch_material">
+        <color
+          rgba="0 0.5 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_knee_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_knee_pitch_joint"
+    type="revolute">
+    <origin
+      xyz="-0.0347752157308957 0 -0.147956063988164"
+      rpy="1.57940660192126 0 1.57079632822694" />
+    <parent
+      link="right_hip_yaw" />
+    <child
+      link="right_knee_pitch" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="150"
+      velocity="8" />
+  </joint>
+  <link
+    name="right_ankle_pitch">
+    <inertial>
+      <origin
+        xyz="2.47e-06 -7.076e-05 -1.763e-05"
+        rpy="0 0 0" />
+      <mass
+        value="0.06217211" />
+      <inertia
+        ixx="2.273e-05"
+        ixy="0"
+        ixz="0"
+        iyy="2.373e-05"
+        iyz="-1e-08"
+        izz="6.68e-06" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_ankle_pitch.STL" />
+      </geometry>
+      <material
+        name="right_ankle_pitch_material">
+        <color
+          rgba="0 0.5 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_ankle_pitch.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="right_ankle_pitch_joint"
+    type="revolute">
+    <origin
+      xyz="0 -0.304939999999999 0.0335999999999944"
+      rpy="3.14159265358979 0 0" />
+    <parent
+      link="right_knee_pitch" />
+    <child
+      link="right_ankle_pitch" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="80"
+      velocity="10" />
+  </joint>
+  <link
+    name="right_ankle_roll">
+    <inertial>
+      <origin
+        xyz="-2.32e-06 0.02507893 0.00010556"
+        rpy="0 0 0" />
+      <mass
+        value="0.59182763" />
+      <inertia
+        ixx="0.00151845"
+        ixy="1e-08"
+        ixz="5e-08"
+        iyy="0.00189242"
+        iyz="-4.706e-05"
+        izz="0.00054401" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_ankle_roll.STL" />
+      </geometry>
+      <material
+        name="right_ankle_roll_material">
+        <color
+          rgba="1 0.5 1 1" />
+      </material>
+    </visual>
+    <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/right_ankle_roll.STL" />
+      </geometry>
+    </collision>
+  </link>
+  <joint
+    name="right_ankle_roll_joint"
+    type="revolute">
+    <origin
+      xyz="0 0 0"
+      rpy="0 1.5707963267949 0" />
+    <parent
+      link="right_ankle_pitch" />
+    <child
+      link="right_ankle_roll" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="80"
+      velocity="10" />
+  </joint>
+  <link
+    name="leg_r_toe_a_link">
+    <inertial>
+      <origin
+        xyz="0.00876658 0.00337152 -0.00875449"
+        rpy="0 0 0" />
+      <mass
+        value="0.03114168" />
+      <inertia
+        ixx="4.34e-06"
+        ixy="-1.43e-06"
+        ixz="1.59e-06"
+        iyy="7.47e-06"
+        iyz="6.2e-07"
+        izz="8.76e-06" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_a_link.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_a_link.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_r_toe_a_link_joint"
+    type="fixed">
+    <origin
+      xyz="-0.0198500000000012 -0.109961932116264 0.0336568378288726"
+      rpy="0 1.5707963267949 0" />
+    <parent
+      link="right_knee_pitch" />
+    <child
+      link="leg_r_toe_a_link" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="leg_r_toe_a_ball">
+    <inertial>
+      <origin
+        xyz="-1.914e-5 -0.0941377 1.914e-5"
+        rpy="0 0 0" />
+      <mass
+        value="0.05674707" />
+      <inertia
+        ixx="0.00022106"
+        ixy="-1e-07"
+        ixz="0"
+        iyy="1.09e-06"
+        iyz="-1e-07"
+        izz="0.00022113" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_a_ball.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_a_ball.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_r_toe_a_ball_joint"
+    type="fixed">
+    <origin
+      xyz="0.0233806454082594 0.00902193211626567 -0.0211499999999971"
+      rpy="0 0 0" />
+    <parent
+      link="leg_r_toe_a_link" />
+    <child
+      link="leg_r_toe_a_ball" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="leg_r_toe_a_loop">
+    <inertial>
+      <origin
+        xyz="0 0 0.00064901"
+        rpy="0 0 0" />
+      <mass
+        value="0.0062017" />
+      <inertia
+        ixx="2.8e-07"
+        ixy="0"
+        ixz="0"
+        iyy="2.8e-07"
+        iyz="0"
+        izz="1.2e-07" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_a_loop.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_a_loop.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_r_toe_a_loop_joint"
+    type="fixed">
+    <origin
+      xyz="0 -0.195000000000005 0"
+      rpy="0 1.5707963267949 0" />
+    <parent
+      link="leg_r_toe_a_ball" />
+    <child
+      link="leg_r_toe_a_loop" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="leg_r_toe_b_link">
+    <inertial>
+      <origin
+        xyz="-0.00897565 0.00347454 -0.00906395"
+        rpy="0 0 0" />
+      <mass
+        value="0.03161" />
+      <inertia
+        ixx="4.59e-06"
+        ixy="1.46e-06"
+        ixz="-1.73e-06"
+        iyy="7.82e-06"
+        iyz="6.7e-07"
+        izz="8.88e-06" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_b_link.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_b_link.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_r_toe_b_link_joint"
+    type="fixed">
+    <origin
+      xyz="-0.0198500000000051 -0.164961932116276 0.0335431621711194"
+      rpy="0 1.5707963267949 0" />
+    <parent
+      link="right_knee_pitch" />
+    <child
+      link="leg_r_toe_b_link" />
+    <axis
+      xyz="0 0 -1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="leg_r_toe_b_ball">
+    <inertial>
+      <origin
+        xyz="-2.566e-5 -0.06676392 2.566e-5"
+        rpy="0 0 0" />
+      <mass
+        value="0.0423305" />
+      <inertia
+        ixx="9.172e-05"
+        ixy="-7e-08"
+        ixz="0"
+        iyy="9.5e-07"
+        iyz="-7e-08"
+        izz="9.179e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_b_ball.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_b_ball.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_r_toe_b_ball_joint"
+    type="fixed">
+    <origin
+      xyz="-0.0233806454082563 0.00902193211627755 -0.0211499999999931"
+      rpy="0 0 0" />
+    <parent
+      link="leg_r_toe_b_link" />
+    <child
+      link="leg_r_toe_b_ball" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="leg_r_toe_b_loop">
+    <inertial>
+      <origin
+        xyz="0 0 -0.00064901"
+        rpy="0 0 0" />
+      <mass
+        value="0.0062017" />
+      <inertia
+        ixx="2.8e-07"
+        ixy="0"
+        ixz="0"
+        iyy="2.8e-07"
+        iyz="0"
+        izz="1.2e-07" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_b_loop.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/leg_r_toe_b_loop.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="leg_r_toe_b_loop_joint"
+    type="fixed">
+    <origin
+      xyz="0 -0.14 0"
+      rpy="0 1.5707963267949 0" />
+    <parent
+      link="leg_r_toe_b_ball" />
+    <child
+      link="leg_r_toe_b_loop" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="arm_r_wrist_a_ball">
+    <inertial>
+      <origin
+        xyz="0.00076586 -0.01156825 4.912e-05"
+        rpy="0 0 0" />
+      <mass
+        value="0.23598732" />
+      <inertia
+        ixx="7.634e-05"
+        ixy="1.26e-06"
+        ixz="-3e-08"
+        iyy="2.519e-05"
+        iyz="-1e-07"
+        izz="7.977e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_a_ball.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_a_ball.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_r_wrist_a_ball_joint"
+    type="fixed">
+    <origin
+      xyz="-0.009 -0.02 -0.0199"
+      rpy="1.5708 0 0" />
+    <parent
+      link="right_elbow_yaw" />
+    <child
+      link="arm_r_wrist_a_ball" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="arm_r_wrist_motor_a_link">
+    <inertial>
+      <origin
+        xyz="0 0 0.00973737"
+        rpy="0 0 0" />
+      <mass
+        value="0.00117982" />
+      <inertia
+        ixx="6e-08"
+        ixy="0"
+        ixz="0"
+        iyy="8e-08"
+        iyz="0"
+        izz="2e-08" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_motor_a_link.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_motor_a_link.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_r_wrist_motor_a_link_joint"
+    type="fixed">
+    <origin
+      xyz="0.0027628 -0.097124 0"
+      rpy="1.5708 0 0.028438" />
+    <parent
+      link="arm_r_wrist_a_ball" />
+    <child
+      link="arm_r_wrist_motor_a_link" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="arm_r_wrist_a_loop">
+    <inertial>
+      <origin
+        xyz="0 0 -1.922e-05"
+        rpy="0 0 0" />
+      <mass
+        value="0.00548041" />
+      <inertia
+        ixx="8e-08"
+        ixy="0"
+        ixz="0"
+        iyy="8e-08"
+        iyz="0"
+        izz="7e-08" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_a_loop.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_a_loop.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_r_wrist_a_loop_joint"
+    type="fixed">
+    <origin
+      xyz="0 0 0.0132"
+      rpy="-1.5708 -0.028438 0" />
+    <parent
+      link="arm_r_wrist_motor_a_link" />
+    <child
+      link="arm_r_wrist_a_loop" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="arm_r_wrist_b_ball">
+    <inertial>
+      <origin
+        xyz="0.00121967 0.04417257 0"
+        rpy="0 0 0" />
+      <mass
+        value="0.03617567" />
+      <inertia
+        ixx="1.73e-05"
+        ixy="-4.1e-07"
+        ixz="0"
+        iyy="4.69e-06"
+        iyz="0"
+        izz="1.731e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_b_ball.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_b_ball.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_r_wrist_b_ball_joint"
+    type="fixed">
+    <origin
+      xyz="-0.009 0.02 -0.0199"
+      rpy="-1.5708 0 0" />
+    <parent
+      link="right_elbow_yaw" />
+    <child
+      link="arm_r_wrist_b_ball" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="arm_r_wrist_motor_b_link">
+    <inertial>
+      <origin
+        xyz="0 0 0.00973737"
+        rpy="0 0 0" />
+      <mass
+        value="0.00117982" />
+      <inertia
+        ixx="6e-08"
+        ixy="0"
+        ixz="0"
+        iyy="8e-08"
+        iyz="0"
+        izz="2e-08" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_motor_b_link.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_motor_b_link.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_r_wrist_motor_b_link_joint"
+    type="fixed">
+    <origin
+      xyz="0.0027628 0.097124 0"
+      rpy="-1.5708 0 -0.028438" />
+    <parent
+      link="arm_r_wrist_b_ball" />
+    <child
+      link="arm_r_wrist_motor_b_link" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="arm_r_wrist_b_loop">
+    <inertial>
+      <origin
+        xyz="0 0 -1.922e-05"
+        rpy="0 0 0" />
+      <mass
+        value="0.00548041" />
+      <inertia
+        ixx="8e-08"
+        ixy="0"
+        ixz="0"
+        iyy="8e-08"
+        iyz="0"
+        izz="7e-08" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_b_loop.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_b_loop.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_r_wrist_b_loop_joint"
+    type="fixed">
+    <origin
+      xyz="0 0 0.0132"
+      rpy="1.5708 -0.028438 0" />
+    <parent
+      link="arm_r_wrist_motor_b_link" />
+    <child
+      link="arm_r_wrist_b_loop" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="arm_l_wrist_a_ball">
+    <inertial>
+      <origin
+        xyz="0.00076586 -0.01156825 4.912e-05"
+        rpy="0 0 0" />
+      <mass
+        value="0.23598732" />
+      <inertia
+        ixx="7.634e-05"
+        ixy="1.26e-06"
+        ixz="-3e-08"
+        iyy="2.519e-05"
+        iyz="-1e-07"
+        izz="7.977e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_a_ball.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_a_ball.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_l_wrist_a_ball_joint"
+    type="fixed">
+    <origin
+      xyz="-0.009 -0.02 -0.0199"
+      rpy="1.5708 0 0" />
+    <parent
+      link="left_elbow_yaw" />
+    <child
+      link="arm_l_wrist_a_ball" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="arm_l_wrist_motor_a_link">
+    <inertial>
+      <origin
+        xyz="0 0 0.00973737"
+        rpy="0 0 0" />
+      <mass
+        value="0.00117982" />
+      <inertia
+        ixx="6e-08"
+        ixy="0"
+        ixz="0"
+        iyy="8e-08"
+        iyz="0"
+        izz="2e-08" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_motor_a_link.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_motor_a_link.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_l_wrist_motor_a_link_joint"
+    type="fixed">
+    <origin
+      xyz="0.0027628 -0.097124 0"
+      rpy="1.5708 0 0.028438" />
+    <parent
+      link="arm_l_wrist_a_ball" />
+    <child
+      link="arm_l_wrist_motor_a_link" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="arm_l_wrist_a_loop">
+    <inertial>
+      <origin
+        xyz="0 0 -1.922e-05"
+        rpy="0 0 0" />
+      <mass
+        value="0.00548041" />
+      <inertia
+        ixx="8e-08"
+        ixy="0"
+        ixz="0"
+        iyy="8e-08"
+        iyz="0"
+        izz="7e-08" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_a_loop.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_a_loop.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_l_wrist_a_loop_joint"
+    type="fixed">
+    <origin
+      xyz="0 0 0.0132"
+      rpy="-1.5708 -0.028438 0" />
+    <parent
+      link="arm_l_wrist_motor_a_link" />
+    <child
+      link="arm_l_wrist_a_loop" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="arm_l_wrist_b_ball">
+    <inertial>
+      <origin
+        xyz="0.00121967 0.04417257 0"
+        rpy="0 0 0" />
+      <mass
+        value="0.03617567" />
+      <inertia
+        ixx="1.73e-05"
+        ixy="-4.1e-07"
+        ixz="0"
+        iyy="4.69e-06"
+        iyz="0"
+        izz="1.731e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_b_ball.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_b_ball.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_l_wrist_b_ball_joint"
+    type="fixed">
+    <origin
+      xyz="-0.009 0.02 -0.0199"
+      rpy="-1.5708 0 0" />
+    <parent
+      link="left_elbow_yaw" />
+    <child
+      link="arm_l_wrist_b_ball" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+  <link
+    name="arm_l_wrist_motor_b_link">
+    <inertial>
+      <origin
+        xyz="0.00076586 0.01156825 -4.912e-05"
+        rpy="0 0 0" />
+      <mass
+        value="0.23598732" />
+      <inertia
+        ixx="7.634e-05"
+        ixy="-1.26e-06"
+        ixz="3e-08"
+        iyy="2.519e-05"
+        iyz="-1e-07"
+        izz="7.977e-05" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_motor_b_link.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_motor_b_link.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_l_wrist_motor_b_link_joint"
+    type="fixed">
+    <origin
+      xyz="0.0027628 0.097124 0"
+      rpy="-1.5708 0 -0.028438" />
+    <parent
+      link="arm_l_wrist_b_ball" />
+    <child
+      link="arm_l_wrist_motor_b_link" />
+    <axis
+      xyz="0 0 1" />
+    <limit
+      lower="-3.14"
+      upper="3.14"
+      effort="0"
+      velocity="0" />
+  </joint>
+  <link
+    name="arm_l_wrist_b_loop">
+    <inertial>
+      <origin
+        xyz="0 0 0.00973737"
+        rpy="0 0 0" />
+      <mass
+        value="0.00117982" />
+      <inertia
+        ixx="6e-08"
+        ixy="0"
+        ixz="0"
+        iyy="8e-08"
+        iyz="0"
+        izz="2e-08" />
+    </inertial>
+    <visual>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_b_loop.STL" />
+      </geometry>
+      <material
+        name="">
+        <color
+          rgba="1 1 1 1" />
+      </material>
+    </visual>
+    <!-- <collision>
+      <origin
+        xyz="0 0 0"
+        rpy="0 0 0" />
+      <geometry>
+        <mesh
+          filename="../meshes/arm_r_wrist_b_loop.STL" />
+      </geometry>
+    </collision> -->
+  </link>
+  <joint
+    name="arm_l_wrist_b_loop_joint"
+    type="fixed">
+    <origin
+      xyz="0 0 0.0132"
+      rpy="1.5708 -0.028438 0" />
+    <parent
+      link="arm_l_wrist_motor_b_link" />
+    <child
+      link="arm_l_wrist_b_loop" />
+    <axis
+      xyz="0 0 0" />
+  </joint>
+</robot>
--- a/setup.py
+++ b/setup.py
+# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2021 ETH Zurich, Nikita Rudin
+# SPDX-FileCopyrightText: Copyright (c) 2024 Beijing RobotEra TECHNOLOGY CO.,LTD. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Copyright (c) 2024, AgiBot Inc. All rights reserved.
+
+from setuptools import find_packages
+from distutils.core import setup
+
+setup(
+    name='humanoid',
+    version='1.0.0',
+    author='AgiBot',
+    license="BSD-3-Clause",
+    packages=find_packages(),
+    author_email='developer@zhiyuan-robot.com',
+    description='Isaac Gym environments for Legged Robots',
+    install_requires=['isaacgym',  # preview4
+                      'tensorboard',
+                      'numpy==1.23.5',
+                      'opencv-python',
+                      'mujoco==2.3.6',
+                      'mujoco-python-viewer',
+                      'matplotlib']
+)