기계·로봇 연구정보센터

Inferring Task Goals and Constraints using Bayesian Nonparametric Inverse Reinforcement Learning

Daehyung Park(Healthcare Robotics Lab, Institute for Robotics an)

USA | Proceedings of the Conference on Robot Learning

2020. | 바로가기

IRL, CBN-IRL

Cited by 7

■ View full text

Proceedings of the Conference on Robot Learning, PMLR 100:1005-1014, 2020.

http://proceedings.mlr.press/v100/park20a.html

■ Researchers

Daehyung Park, Michael Noseworthy, Rohan Paul, Subhro Roy, Nicholas Roy ;

Healthcare Robotics Lab, Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA, USA

■ Abstract

Recovering an unknown reward function for complex manipulation tasks is the fundamental problem of Inverse Reinforcement Learning (IRL). Often, the recovered reward function fails to explicitly capture implicit constraints (e.g., axis alignment, force, or relative alignment) between the manipulator, the objects of interaction, and other entities in the workspace. The standard IRL approaches do not model the presence of locally-consistent constraints that may be active only in a section of a demonstration. This work introduces Constraint-based Bayesian Nonparametric Inverse Reinforcement Learning (CBN-IRL) that models the observed behaviour as a sequence of subtasks, each consisting of a goal and a set of locally-active constraints. CBN-IRL infers locally-active constraints given a single demonstration by identifying potential constraints and their activation space. Further, the nonparametric prior over subgoals constituting the task allows the model to adapt with the complexity of the demonstration. The inferred set of goals and constraints are then used to recover a control policy via constrained optimization. We evaluate the proposed model in simulated navigation and manipulation domains. CBN-IRL efficiently learns a compact representation for complex tasks that allows generalization in novel environments, outperforming state-of-the-art IRL methods. Finally, we demonstrate the model on two tool-manipulation tasks using a UR5 manipulator and show generalization to novel test scenarios.

전체댓글 0

[로그인]

댓글 입력란

0/500자