Remember this article when Facebook releases a new pair of glasses with cameras, microphones, and proximity sensors in a few years. Today, we’ll take a look at some “Ego4D” research published by Facebook AI. This study collaborated with “a consortium of 13 universities and labs from nine countries” who gathered “2,200 hours of first-person video shot in the wild, featuring over 700 participants going about their daily lives.”
The goal of the Facebook AI researchers working on this project is to create artificial intelligence that “understands the world from this point of view”* to “unlock a new era of immersive experiences.” They’re particularly interested in how AR glasses and virtual reality (VR) headsets will “become as useful in everyday life as smartphones.”
*Here, they’re referring to a first-person viewpoint. They worked with video taken from a first-person perspective, rather than the standard third-person perspective from which AI is trained through video and photos.
For this project, researchers identified five “benchmark challenges” that effectively demonstrate what they’re tracking. To be clear, Facebook isn’t currently tracking this data using real-world devices for this project. All of this is being documented through first-person perspective videos. For this project, Facebook AI was able to achieve the following:
• Episodic memory: What happened when?
• Forecasting: What am I likely to do next?
• Hand and object manipulation: What am I doing?
• Audio-visual diarization: Who said what when?
• Social interaction: Who is interacting with whom?
Facebook AI used a data set 20 times larger than any other “in terms of hours of footage” to research this topic. This information was made public through the Facebook AI project announcement for Ego4D.
Take a look at the arXiv-published research paper Ego4D: Around the World in 3,000 Hours of Egocentric Video to learn more about this project.
Source: about.fb | ai.facebook