Sometimes, our body gives us away. We say we are calm and a treacherous tic reveals that we are a bundle of nerves. For another person, this simple involuntary gesture has clear connotations: anxiety, agitation … But for a robot, our body language would go completely unnoticed.
Despite the latest advances in facial recognition and artificial vision technologies, machines are still not good at reading human expressions. The way we show our emotions externally implies a number of subtle movements that are difficult to detect and even more to understand. Above all, for those who, for the moment, neither feel nor suffer.
“We communicate almost as much with the movement of our body as with our voice,” says robotics researcher Yaser Sheikh. And he adds: ‘But computers are more or less blind to this. ‘
Sheikh is a member of the group of scientists at the Robotics Institute of Carnegie Mellon University in the United States, which has developed a system that could end that blindness. The Spanish expert in robotic vision and machine learning Ginés Hidalgo is also part of the team.
An intelligent observer
The tool, baptized as OpenPose, can follow in real-time the movement of people’s bodies, including the face and hands. And even of the fingers: it is the first time that software detects the movement of the phalanges. Detecting these details would allow robots to better understand the world around them and interact naturally with human beings.
Researchers have trained the system to individually process each frame of a video. Thus, thanks to artificial vision technology and machine learning or machine learning, getting monitor both the activity of a single person as a group simultaneously.
The incredible capacity of OpenPose has applications in different technological areas. As we mentioned, they could be used to improve interactions between robots and humans. Domestic robots would respond to the emotions and signs of their owners, whether they were children or the elderly.
On the other hand, your ability to read body language could pave the way to create more interactive virtual and augmented reality environments, and more intuitive user interfaces. Imagine that your computer or your new mobile knew that you do not understand him by the expression of confusion on your face. Maybe that would make things easier for you to learn how it works.
Hundreds of video cameras
In order for the system to even detect finger displacement, researchers have used a technique developed in the Panoptic Study of the American university. It is a laboratory equipped with a sophisticated camera system that allows you to capture up to 100,000 different moving points at any time.
These experts used a capture technique capable of breaking down the scenes collected by more than 500 video cameras installed in a geodesic dome (a kind of polygonal vault) of two floors. In this case, the cameras captured the human body in different positions and from different angles. All this visual information served to create a database to work with later.
The next step was to submit the images to a tool known as a key point detector. He was responsible for identifying and labeling certain parts of human anatomy. This software can also learn to associate body parts with specific people. In this way, you might know that someone’s hand is usually near their elbow, allowing them to track several people at once.
3D body language
The dome cameras had captured the images in two dimensions, but the researchers computerized them to obtain them in 3D. The objective was to help algorithms understand what each body posture looked like from different perspectives.
Once fed with all this data, the system is able to determine what a hand, an arm or a leg is when they are in a certain position, even if some parts are dark or do not appear in an image. It combines them with others obtained from a different perspective to find the missing areas.
After this training, hundreds of cameras or any processing that is too complex for the system to interpret body language are no longer needed: the information stored in its database allows the software to operate with a single camera and a laptop.
Neither the polygonal dome is necessary, so its technology is mobile and accessible so that it can be used under any condition. Your code is available for others to experiment with.
Advantages of virtual reality
Currently, VR (virtual reality) video games that detect the movement of users need them to wear some type of device with sensors, be they gloves or stickers. However, with the new system created by these scientists, players would not have to wear any accessories. The program could recognize your steps without outside help.
The software also has applications in the field of robotics. In particular, it could improve communication between machines and humans. A domestic robot, for example, would know where to go if you point an address with your finger and could learn to interpret body language to see if you are happy or angry.
Likewise, machines could develop better social skills: recognize what mood the people around them are and if they can interrupt them in their conversations or chores. Autonomous cars that detect the intentions of pedestrians and systems that analyze the movements of athletes are other of their future possibilities.
Although the day when robots are truly empathic is still far away, research in artificial intelligence means that at least they learn to recognize emotions better. Humans are not exactly easy to understand: we can say one thing and convey the opposite with our body language.