1,381
Views
0
CrossRef citations to date
0
Altmetric
Articles

Deep-reinforcement-learning-based gait pattern controller on an uneven terrain for humanoid robots

, , &

Figures & data

Figure 1. Reinforcement learning.[Citation22]

Figure 1. Reinforcement learning.[Citation22]

Figure 2. Type A fuzzy membership function.

Figure 2. Type A fuzzy membership function.

Figure 3. Type B fuzzy membership function.

Figure 3. Type B fuzzy membership function.

Figure 4. Discrete wavelet transform process.

Figure 4. Discrete wavelet transform process.

Figure 5. System architecture.

Figure 5. System architecture.

Figure 6. Humanoid robot: (a) front view and (b) side view.

Figure 6. Humanoid robot: (a) front view and (b) side view.

Figure 7. State of training with PPO2.

Figure 7. State of training with PPO2.

Figure 8. Staircase with steps of different heights.

Figure 8. Staircase with steps of different heights.

Table 1. Parameters of gait patterns.

Figure 9. Changes of gait parameters when the robot is walking. (a) Gait parameters without PPO controller and (b) Gait parameters when PPO controller is operating.

Figure 9. Changes of gait parameters when the robot is walking. (a) Gait parameters without PPO controller and (b) Gait parameters when PPO controller is operating.

Table 2. Parameters of PPO2.

Figure 10. Reward values during the training process.

Figure 10. Reward values during the training process.

Figure 11. Acceleration record when the reward was −32.

Figure 11. Acceleration record when the reward was −32.

Figure 12. Acceleration record when the reward was 1.5.

Figure 12. Acceleration record when the reward was 1.5.

Figure 13. Acceleration record when the reward was 49.

Figure 13. Acceleration record when the reward was 49.

Figure 14. Acceleration record after training.

Figure 14. Acceleration record after training.

Figure 15. Experimental results before training.

Figure 15. Experimental results before training.

Figure 16. Final results after training.

Figure 16. Final results after training.

Figure 17. Acceleration record in the x, y, and z directions processed by the unprocessed sensor data.

Figure 17. Acceleration record in the x, y, and z directions processed by the unprocessed sensor data.

Figure 18. Acceleration record in the x, y, and z directions processed by wavelet transform.

Figure 18. Acceleration record in the x, y, and z directions processed by wavelet transform.

Figure 19. Acceleration record in the x, y, and z directions processed by the type A fuzzy membership function.

Figure 19. Acceleration record in the x, y, and z directions processed by the type A fuzzy membership function.

Figure 20. Acceleration record in the x, y, and z directions processed by the type B fuzzy membership function.

Figure 20. Acceleration record in the x, y, and z directions processed by the type B fuzzy membership function.

Table 3. Standard deviation of the PPO2 training results.

Table 4. Standard deviation of the TRPO training results.

Table 5. Standard deviation of the A2C training results.

Figure 21. Experimental results on vibrating terrain.

Figure 21. Experimental results on vibrating terrain.

Figure 22. Experimental results with impact disturbance.

Figure 22. Experimental results with impact disturbance.

Figure 23. Experiment of additional motion on vibrating terrain (turn left).

Figure 23. Experiment of additional motion on vibrating terrain (turn left).

Figure 24. Experiment of additional motion on vibrating terrain (turn right).

Figure 24. Experiment of additional motion on vibrating terrain (turn right).

Figure 25. Experiment of additional motion with impact disturbance (turn left).

Figure 25. Experiment of additional motion with impact disturbance (turn left).

Figure 26. Experiment of additional motion with impact disturbance (turn right).

Figure 26. Experiment of additional motion with impact disturbance (turn right).