Imitation Learning Techniques for Robot Manipulation

Gubbi Venkatesh, Sagar

dc.contributor.advisor	Amrutur, Bharadwaj
dc.contributor.author	Gubbi Venkatesh, Sagar
dc.date.accessioned	2022-01-11T04:25:45Z
dc.date.available	2022-01-11T04:25:45Z
dc.date.submitted	2021
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5587
dc.description.abstract	Robots that can operate in unstructured environments and collaborate with humans play a major role in raising productivity and living standards as societies age. Unlike the robots currently used in industrial settings for repetitive tasks, they will have to be capable of perceiving the novel environments they come across, dealing with the ambiguities of natural and intuitive communication with non-expert human operators, and manipulate the objects in the environment in complex ways. This problem may be broadly divided into two areas. One is to specify what the task is to the robot, and the other is how to execute the specified task. In the first part of this thesis, a Siamese neural network with a modified spatial attention layer is proposed to specify novel objects that the robot has not seen during the training phase using visual cues. Although Siamese networks have been used for detecting novel objects, the prevalent architectures require a cropped image of the object and cannot support the use of natural and intuitive visual cues for specifying which is the object of interest in the scene. The proposed network is used to enable non-expert human operators to specify new objects by either using a laser pointer, or pointing with finger, or by video demonstration of the task by the human. The problem is a weakly supervised learning problem where the proposed architecture learns the visual cue implicitly as part of the training process without additional labels for the visual cue. In the second part of the thesis, instructions in natural language are interpreted in the context of the visual scene so that the robot can understand which object to manipulate. A U-Net structure along with LSTM for language processing is proposed for processing spatial relationships specified in the instruction in the context of the scene. Although the U-Net architecture has been successfully applied for several computer vision problems, we show that they are useful not only for object detection but also in the stages after object detection for grounding the natural language instruction in the visual scene. We then go beyond merely specifying the object using natural language to specifying more complex tasks. Most of the current work on imitation learning for neural robot control uses direct neural actuator with expert demonstrations collected using an input device like a game controller or a virtual reality rig. However, in industrial settings, expert robot programmers write short programs to control the robot to perform various tasks rather than using an analog controller to teach the robot. We investigate whether such expert authored programs better capture the intention of the expert and if they can be generated by a neural network. We propose using neural machine translation to translate instructions in English to Python code which in turn accesses the objects detected in the scene and controls the robot to accomplish the specified task. We evaluate how such a translation model compares with the current imitation learning methods for a variety of tasks specified in natural language. The third part of the thesis is about how to perform complex manipulation tasks. Imitation learning has emerged in recent years as a potent method for training neural networks to control robot actuators. However, much of the existing methods ignore stochasticity in the training data. We discuss various ways in which stochasticity in the teleoperated expert demonstrations can be accounted for when training policy networks and evaluate how they perform for several tasks. Most of the current visuo-motor policy networks for imitation learning use convolutional layers for processing the camera input. By construction, convolution is translation invariant which poses a problem for the subsequent control layers when there are multiple instances of an object in the scene. We propose a modified spatial-softmax layer which we use in a policy network in order to learn how to manipulate objects from teleoperated demonstrations in the presence of multiple instances of the object of interest. We show that the proposed modification is essential to prevent the network from becoming confused when multiple instances of an object are present. Subsequently, we consider the high-precision task of inserting a peg into a hole with a gap of less than 10 um. The current literature on imitation learning has largely focused on visuomotor manipulation tasks that require far less precision. For high precision tasks, it becomes necessary to rely on force sensors rather than visual feedback. We propose using generative adversarial reinforcement learning to learn how to perform this task from only a handful of teleoperated expert demonstrations. This thesis contributes to the growing body of knowledge on using neural networks for robot control. It makes several contributions on both task specification and task execution. It proposes neural network architectures for specifying tasks intuitively through the use of visual cues and natural language. It presents carefully designed experiments that reveal shortcomings in the prevalent visuomotor policy networks for executing tasks and proposes architectural changes to address them. It also explores scenarios requiring high precision where the use of force sensors is more appropriate than vision.	en_US
dc.description.sponsorship	Visvesvaraya PhD Scheme	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Imitation Learning	en_US
dc.subject	Learning in Robotics	en_US
dc.subject	Robotics	en_US
dc.subject	Machine learning	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics	en_US
dc.title	Imitation Learning Techniques for Robot Manipulation	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: sagar_phd_thesis_revised.pdf
Size:: 24.08Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Electrical Communication Engineering (ECE) [467]

Show simple item record