CAMBRIDGE, MA—When robots come across unfamiliar objects, they often struggle to account for a simple truth: Appearances aren’t everything. They may attempt to grasp a block, only to find out it’s a literal piece of cake.

The misleading appearance of that object could lead the robot to miscalculate physical properties like its weight and center of mass, using the wrong grasp and applying more force than needed.

To address this issue, engineers at the Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a new way for robots to detect and interact with objects.

The Grasping Neural Process is a predictive physics model capable of inferring these hidden traits in real time for more intelligent robotic grasping. It trains robots to infer invisible physical properties from a history of attempted grasps, and then use the inferred properties to guess which grasps would work well in the future.

Typically, methods that infer physical properties build on traditional statistical methods that require many known grasps and a great amount of computation time to work well. The Grasping Neural Process enables robots to execute good grasps more efficiently by using far less interaction data. In addition, it finishes its computation in less than a tenth of a second, as opposed to seconds or minutes required by traditional methods.

“As an engineer, it’s unwise to assume a robot knows all the necessary information it needs to grasp successfully,” says Michael Noseworthy, an MIT Ph.D. student in electrical engineering and computer science who is heading up the research project. “Without humans labeling the properties of an object, robots have traditionally needed to use a costly inference process. Our model helps robots do this much more efficiently, enabling the [machine] to imagine which grasps will inform the best result.” 

According to Noseworthy, robots that complete inference tasks have a three-part act to follow: Training, adaptation and testing. During the training step, robots practice on a fixed set of objects and learn how to infer physical properties from a history of successful (or unsuccessful) grasps.

The new CSAIL model amortizes the inference of the objects’ physics, meaning it trains a neural network to learn to predict the output of an otherwise expensive statistical algorithm. Only a single pass through a neural network with limited interaction data is needed to simulate and predict which grasps work best on different objects.

Then, the robot is introduced to an unfamiliar object during the adaptation phase. During this step, the Grasping Neural Process helps a robot experiment and update its position accordingly, understanding which grips would work best. This tinkering phase prepares the machine for the final step: Testing, where the robot formally executes a task on an item with a new understanding of its properties.