13
Views
4
CrossRef citations to date
0
Altmetric
PAPERS

Behaviour generation in humanoids by learning potential-based policies from constrained motion

, , , &
Pages 195-211 | Received 29 Sep 2008, Accepted 01 Feb 2009, Published online: 03 Apr 2009
 

Abstract

Movement generation that is consistent with observed or demonstrated behaviour is an efficient way to seed movement planning in complex, high-dimensional movement systems like humanoid robots. We present a method for learning potential-based policies from constrained motion data. In contrast to previous approaches to direct policy learning, our method can combine observations from a variety of contexts where different constraints are in force, to learn the underlying unconstrained policy in form of its potential function. This allows us to generalise and predict behaviour where novel constraints apply. We demonstrate our approach on systems of varying complexity, including kinematic data from the ASIMO humanoid robot with 22 degrees of freedom.

Notes

1For a review on DPL, please see (CitationBillard et al. 2007) and references therein.

2It should be noted that, as with all DPL approaches, the choice of state-space is problem specific (CitationSchaal et al. 2003) and, when used for imitation learning, depends on the correspondence between demonstrator and imitator. For example if we wish to learn the policy a human demonstrator uses to wash a window, and transfer that behaviour to an imitator robot, an appropriate choice of x would be the Cartesian coordinates of the hand, which would correspond to the end-effector coordinates of the robot. Transfer of behaviour across non-isomorphic state spaces, for example if the demonstrator and imitator have different embodiments, is also possible by defining an appropriate state-action metric (CitationAlissandrakis et al. 2007).

3 A† denotes the (unweighted) Moore–Penrose pseudoinverse of the matrix A

4It should be noted that in general the orientation of the constraint plane onto which the policy is projected may vary both with state position and time.

5It should be noted that these trajectories are not outliers in the sense of containing corrupt data and could in fact be used for further training of the model. For example one could take a hierarchical approach, where groups of strongly connected trajectories are aligned first to form models consisting of groups of trajectories with good alignment. We can then recursively repeat the process, aligning these larger (but more weakly connected) groups until all of the data has been included.

6Since the goal of the experiments was to validate the proposed approach, we used policies known in closed form as a ground truth. In the follow-up paper we apply our method to human motion capture data.

7A detailed explanation of the error measures used can be found in Appendix B.

8Please note that we also discard the outliers for evaluating the error statistics—we can hardly expect to observe good performance in regions where the learnt model f(x) has seen no data.

93 DOFs per hand × 2 hands.

Log in via your institution

Log in to Taylor & Francis Online

There are no offers available at the current time.

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.