Learning from limited demonstrations
Beomjoon Kim(MIT)

■  View full text 


■ Researchers

Beomjoon Kim

School of Computer Science McGill University

Amir-massoud Farahmand

School of Computer Science McGill University

Joelle Pineau

School of Computer Science McGill University

Doina Precup

School of Computer Science McGill University


■ Abstract

We propose a Learning from Demonstration (LfD) algorithm which leverages expert data, even if they are very few or inaccurate. We achieve this by using both expert data, as well as reinforcement signals gathered through trial-and-error interactions with the environment. The key idea of our approach, Approximate Policy Iteration with Demonstration (APID), is that expert’s suggestions are used to define linear constraints which guide the optimization performed by Approximate Policy Iteration. We prove an upper bound on the Bellman error of the estimate computed by APID at each iteration. Moreover, we show empirically that APID outperforms pure Approximate Policy Iteration, a state-of-the-art LfD algorithm,and supervised learning in a variety of scenarios, including when very few and/or suboptimal demonstrations are available. Our experiments include simulations aswell as a real robot path-finding task.


인쇄 Facebook Twitter 스크랩

전체댓글 0


댓글 입력란
프로필 이미지