Killing chicken with a knife? 3D gesture recognition is not the only choice for VR interaction

December 13, 2022

Since the invention of the tool by humans, it is necessary to establish a connection with the tool in one way, and the handle of the tool by hand is a kind of contact. In the era of electronic technology, the importance of interaction has become more and more prominent, just as the remote control is used for air conditioners, keyboard and mouse for computers, game controllers for video games, etc., lacking a simple and effective interaction, the tools are lost for humans. significance.

With the rise of virtual reality, how to find the right interactive technology has become the top priority of practitioners.

1. Why do we need to find a new form of interaction for VR?

Unlike other technology products, the VR experience emphasizes immersion, and the source of immersion is created by isolation from the outside world, especially the isolation of vision and hearing, which causes the brain to be deceived and create a virtual immersion from the real world. sense. This creates a new problem: the invisible body, especially the most important hand and movement of human interaction, the most important foot, can not interact with the virtual environment, and become a spectator in the virtual world.

At the beginning of the rise of virtual reality, users are pursuing the focus on whether VR can create a immersive situation in the context of a fresh sense of explosion. The demand for interaction is not so strong. Human subjective initiative, so I hope to manipulate and control the virtual world to find more sense of existence. With the continuous development of the industry, VR's sense of freshness for users has dropped, users have begun to look for the need to feel the existence of the virtual world in the first place, and began to pursue deeper immersion, hoping to interact with the virtual reality world.

It is very unfortunate that the big factory is the first to pursue the high-tech content of the output equipment, and the technology and energy in the first part of the investment. The development of the interactive block is relatively lagging, which leads to the lack of interactive means. For example, the Oculus Rift can only choose the XBox handle to make up the number.

Through the interaction of traditional electronic devices, such as products such as handles, the problem of interaction is temporarily solved, but the user is brought out from the virtual world, and the immersion is greatly reduced. For example, the relatively high maturity of the Samsung Gear VR head has a good visual experience, but the control is configured on the right side of the headset, which means that the user needs to lift the right hand at all times, which is invisible. It destroys the feeling of immersion.

In a two-dimensional screen interaction, almost all control commands can be abstracted into keystrokes. But in virtual reality, users want natural interaction, that is, how humans interact with the outside world in the real world. In the virtual world, we also hope to interact in the same way, with higher immersion, high efficiency and low learning cost.

Therefore, finding a new and appropriate form of virtual reality interaction has become a very necessary thing.

2. Why is hand gesture recognition the most popular among known forms of interaction?

So far, there is no mature and universal control interaction method in the VR field. Xiaobian probably lists several interactive forms that are currently advocated in the industry: using "eyeball tracking" to achieve interaction, "motion capture" to achieve interaction, "electromyography" to achieve interaction, and "tactile feedback" to achieve interaction, Use "voice" to achieve interaction, "gesture tracking" to achieve interaction, "sensor" to achieve interaction, and so on.

These interaction forms have their own advantages, but they all have certain defects. For example, eye tracking, although many companies are researching eye tracking technology, almost none of the solutions are satisfactory, and they are unable to provide accurate and real-time feedback. Or, like motion capture, the motion capture device on the market will only be used in certain overweight scenes, and it takes a long time for the user to wear and calibrate to use it, and the big pain point of this method is that there is no feedback, the user It's hard to feel that your operation is effective.

Another example is tactile feedback, which can't adapt to a wider range of application scenarios. Although the three major VR head manufacturers Oculus, Sony, and HTC Vive have adopted the virtual reality controller as the standard interactive mode, this is only for some highly specialized. The game-like application or the light consumer application is just one of the compromise strategies that the merchants are second to none, because the early consumers of the VR headset are basically gamers. For example, voice interaction, first of all, the understanding of the human language is a big problem. Simple voice is good, complex is not enough, and understanding the machine can accurately execute the instruction is a big problem.

For humans, the most natural and effective way to interact is to have two actions, because even if the language is unreasonable, you can still communicate with others through the action's plan. In the VR, limbs and gestures can be used for most interactive scenarios, especially for fixed scenes with light interactions or for moving scenes with heavy interactions. The advantages of gestures are outstanding.

Thus, hand motion recognition becomes the most popular form in known forms of interaction.

Third, is the hand motion recognition only 3D gesture recognition?

Speaking of hand movement recognition, everyone should be familiar with Leap Motion. But in fact, the solution for hand motion recognition is not only available to Leap Motion, but in principle, it is not the only one. Only because of Oculus' strong support for Leap Motion, along with the high exposure of the Oculus Rift, Leap Motion's 3D gesture recognition is well known to the public.

3D gesture recognition is not the only one in the VR interactive field. It can be divided into two types: two-dimensional hand recognition, two-dimensional gesture recognition, and three-dimensional gesture recognition.

Two-dimensional hand recognition

Two-dimensional hand recognition, also known as static two-dimensional gesture recognition, identifies the simplest type of gesture. This technique can recognize several static gestures, such as a fist or a five-finger open, after acquiring two-dimensional information. Its representative company is Flutter, which was acquired by Google a year ago. After using his family's software, the user can control the player with several hand types.

"Static" is an important feature of this two-dimensional gesture recognition technology. This technique only recognizes the "state" of the gesture, but does not perceive the "continuous change" of the gesture. For example, if this technique is used in guessing, it can recognize the gesture state of stones, scissors, and cloth. But for other gestures, it knows nothing about it. Therefore, this technology is a pattern matching technology. The computer vision algorithm analyzes the image and compares it with the preset image mode to understand the meaning of the gesture.

The shortcomings of this technology are obvious: only the preset state can be identified, the scalability is poor, the control feeling is weak, and the user can only realize the most basic human-computer interaction function.

Two-dimensional gesture recognition

Two-dimensional gesture recognition is a bit more difficult than two-dimensional hand recognition, but still basically does not contain depth information, staying on the two-dimensional level. This technology not only recognizes the hand shape, but also recognizes some simple two-dimensional gestures, such as waving at the camera. Its representative company is PointGrab, EyeSight and ExtremeReality from Israel.

Two-dimensional gesture recognition has dynamic features that track the movement of gestures and identify complex movements that combine gestures and hand movements. In this way, we will truly extend the scope of gesture recognition to the two-dimensional plane. Not only can we control the computer play/pause through gestures, but we can also implement the complex operations of moving forward/backward/upward page/down scrolling that require two-dimensional coordinate change information.

Although this technology is no different from the two-dimensional hand recognition in hardware requirements, it can get more abundant human-computer interaction content thanks to more advanced computer vision algorithms. In the use experience, it has also improved a grade, from pure state control to a relatively rich plane control.

3D gesture recognition

The input required for 3D gesture recognition is depth-containing information that identifies various hand shapes, gestures, and actions. Compared to the first two two-dimensional gesture recognition techniques, 3D gesture recognition can no longer use only a single normal camera, because a single normal camera cannot provide depth information. To obtain depth information requires special hardware. At present, there are three main hardware implementation methods in the world, and new advanced computer vision software algorithms can realize three-dimensional gesture recognition.

1. Structure Light

The representative application of structured light is the Kinect generation of PrimeSense.

The basic principle of this technique is to load a laser projector, and place a grating with a specific pattern on the outside of the laser projector. When the laser is projected through the grating, it will refract, so that the laser will eventually fall on the surface of the object. Produce displacement.

When the object is closer to the laser projector, the displacement caused by the refraction is smaller; when the object is farther away, the displacement caused by the refraction will correspondingly become larger. At this time, a camera is used to detect and collect the pattern projected onto the surface of the object. Through the displacement change of the pattern, the position and depth information of the object can be calculated by an algorithm, thereby restoring the entire three-dimensional space.

In the case of Kinect's structured light technology, because of the displacement of the drop caused by laser refraction, the displacement caused by refraction is not obvious at too close distance. Using this technique, the depth information cannot be calculated too accurately. Therefore, 1 to 4 meters is the best application range.

2. Time of Flight

The Flying Time is a technology used by SoftKinetic, which provides Intel with a 3D camera with gesture recognition. At the same time, this hardware technology is also used by Microsoft's new generation Kinect.

The basic principle of this technique is to load a light-emitting element, and the photons emitted by the light-emitting element are reflected back when they hit the surface of the object. Using a special CMOS sensor to capture these photons emitted by the illuminating element and reflected back from the surface of the object, the photon flight time can be obtained. According to the photon flight time, the distance of the photon flight can be derived, and the depth information of the object is obtained. Computationally speaking, the flying time is the simplest in 3D gesture recognition and does not require any computer vision calculations.

3. Multi-camera imaging (Multi-camera)

The representative product of the multi-angle imaging technology is the same name product of Leap Motion and the Fingo of Usens.

The basic principle of this technique is to use two or more cameras to simultaneously capture images, as if humans used both eyes and insects to observe the world with multiple eyes, by comparing the differences between the images obtained by these different cameras at the same time. , using algorithms to calculate depth information for multi-angle 3D imaging.

Here we briefly explain by imaging two cameras:

Dual camera ranging is based on geometric principles to calculate depth information. Using two cameras to shoot the current environment and get two different perspective photos for the same environment, in fact, simulates the principle of human eye work. Since the parameters of the two cameras and the relative position between them are known, as long as the position of the same object (Maple Leaf) in different pictures is found, we can calculate the distance of the object (Maple Leaf) from the camera by algorithm. The depth is gone.

Multi-angle imaging is the lowest hardware requirement in 3D gesture recognition technology, but it is also the most difficult to implement. Multi-angle imaging does not require any additional special equipment and relies entirely on computer vision algorithms to match the same target in both images. Compared with the disadvantages of high cost and high power consumption, the two techniques of structured light or optical flying time can provide three-dimensional gesture recognition effect of â€œcheap and good qualityâ€.

Fourth, killing chicken without a knife, how should the interaction in VR be chosen?

Mild interaction

Mobile VR devices generally cannot run VR content with heavy experience, and the requirements for interaction are basically kept at a lightweight level. 3D gesture recognition is used in the light VR interaction. It is actually a bit of a killer. The most common contact is the 2D touch screen. Most of the UI is also 2D design. The depth information added by 3D gesture recognition is mostly People are too advanced, and most people do not have a wingspan of more than 1 meter. The depth information cannot reflect the difference from the 2D plane here.

Therefore, a simple gesture interaction system with a single single camera through edge recognition can meet the interaction requirements of most current VR scenes, reduce the threshold of gesture interaction, and quickly popularize the concept of gesture interaction. If it can also cooperate with the voice interaction function, it can quickly meet the requirements. The interaction needs of VR applications in the short term.

Heavy interaction

The PC-side VR device is high in cost and high in technology, and can run VR content with heavy experience, so the demand for interaction is also a weight level. 3D gesture recognition is used in the heavy VR interaction. It is the real good steel used in the blade, which can meet the user's heavy interaction requirements, and can provide better feedback and immersion. In the three-dimensional scene where the user is in, it is impossible to interact with the items in the three-dimensional scene without depth information.

Now Oculus and HTC Vive actually use a handle solution, but 3D gesture interaction is actually a more natural and comfortable way. For complex 3D scenes, 3D gesture interaction is indispensable, and a more realistic and immersive 3D scene experience is the future of VR content. In the content of heavy VR experience, the depth information of the space is more complicated, and the changes of the application scene are more diverse. Only the 3D gesture recognition can better meet the requirements of accuracy, delay and immersion.

As for how to develop in the future, history is always written by the people, and the choice of consumers is the choice of technical direction.

For more stock market opportunities, pay close attention to micro-signals: stock market opportunity intelligence (thsjihui)

Responsible editor: zzn

Men's Shirts
There`s no better way to look put together and stylish than with the perfect men`s button-down shirt. If you think that button-downs only come in boring solid colors, we have printed button-down shirts with cool details that will make you stand out from the crowd. Choose from shirts in graphic floral prints, bold stripes, classic plaids, and more.
Tropical Print Shirt,Men'S Striped Shirt,Men'S Short Sleeve Shirt,Men'S Button Down Shirt
Shaoxing Yidie Garment Co.,Ltd , https://www.yidiegarments.com