Behind Intel's acquisition of Movidius: Why do we need a dedicated CV processing chip?


(Movidius Myriad 2, Credit: Movidius)

Editor's Note: The author of this article is Ying Yuji CEO Zhu Yucong, the company launched Hand CV based on mobile phone camera gesture interactive products.

Background: The scientific and technological community has ushered in a large-scale acquisition. The chip giant Intel acquired Movidius, a startup company in the computer vision field in Silicon Valley. The company's main products are low-power vision processors: the Myriad series of VPUs. In August of this year, Intel also acquired Nervana, a company that specializes in deep learning, for $350 million. This acquisition allows Intel to acquire deep-learned IP and specific products to meet the AI ​​development and data center demand for chips.

To explain why Intel has such a big move in the short term, first look at the 2016 CVPR.

(CVPR is called International Conference on Computer Vision and Pattern Recognition. The issues covered include, but are not limited to, object recognition and detection, advanced semantic understanding of images, face, optimization methods, Correspondences, camera positioning, and Three-dimensional map construction (SLAM).CVPR is the largest annual gathering of computer vision, receiving 2,145 submissions in 2016, accepting 643 papers, receiving 29.9%, and attending 3,600 participants.)

At the world’s top computer vision industry conference, according to incomplete statistics, about 70% of articles are related to deep learning. In the areas of image classification, object detection, and semantic segmentation, deep learning has achieved significant results. Traditional algorithm. According to Microsoft Asia Research Institute, many scholars have given their own solutions based on deep learning even in areas where traditional methods such as 3D vision and low-level image processing are relatively mainstream.

Taking gesture recognition as an example, traditional recognition schemes are mostly based on color spaces such as RGB, HSV, and YCrBr. However, these algorithms cannot exclude the interference of skin-colored objects and black skin on the recognition accuracy. Some algorithms also perform recognition by extracting the opponent contour features, such as the HoG+SVM classification recognition method, but they still cannot improve the recognition accuracy under the conditions of dim light and backlight. However, with deep learning, such as training a large number of marked gesture image data through R-CNN, the resulting model is much better than traditional solutions when dealing with gesture recognition problems with complex backgrounds and dark light environments.

On the one hand, deep learning can bring unprecedented advances to the field of computer vision, but on the other hand, the hardware and data requirements of this method are unprecedented. Training a set of gestures through the R-CNN network requires about 100,000 pre-labeled image resources. At the same time, during the process of learning images, it also has very high computational requirements for the GPU, and the training time should not be underestimated. Even if a R-CNN or Faster R-CNN network model has been obtained on a high-performance platform and the identification algorithm is run on some low-computing platforms (mobile phones, tablets), real-time performance and high recognition rate cannot be taken into account at the same time. . For example, the YOLO object recognition algorithm can reach 45 FPS on a high-performance platform, but its mAP is only about 63.4. The better master, Faster R-CNN, is only 7 FPS.

Currently, if you want to run deep learning algorithms on devices with very low GPU performance, such as mobile phones, the challenge is still quite large. Only by optimizing the algorithms can you run them on the current mainstream Android and iOS platforms. For example, in the field of monocular gesture recognition, there are available from the Israeli eyesight, superbreality company and our Hand CV in the global scope. Among them, the Superbreality company's solution relies more on the recognition of gesture contours. Hand CV's solution incorporates color space, outlines, and YOLO deep learning. Therefore, the idea of ​​solving computer vision problems through machine learning has gradually shifted to mobile platforms such as mobile phones.

So, since deep learning makes the improvement of recognition accuracy so obvious, just like the iPhone 5s joined the M7 coprocessor in the past, the computer vision field also needs a special low-power processing chip on some mobile devices. It can not only share the tasks of CPU and GPU, but also be more efficient in dealing with deep learning problems. It can be optimized from the chip level for the training features of convolutional neural networks, thereby promoting deep learning based computer vision algorithms on mobile devices. universal.

As mentioned in the background description, Intel is already laying out in this direction: At the same time, upstream chip supplier NVIDIA also released the Jetson TX1 GPU module at the end of last year, mainly for the artificial intelligence market.

(The Jetson TX1 GPU module includes a 256-core Maxwell-based GPU with teraflop-level floating-point operations, a 64-bit ARM A57 chipset, 4GB of LPDDR4 RAM memory (25.6GB/sec bandwidth), and 15GB of local storage. The 802.11 2x2 ac Wi-Fi solution and 1Gb Ethernet port are also equipped with the Jetson Linux software development kit.The Jetson TX1 GPU module has an area of ​​only 5087 mm and is only a credit card size. Although it has a very small form factor, the Jetson TX1 GPU Performance can't be ignored.)

The first customers of the Jetson TX1 GPU module include technology giants such as Microsoft, Amazon, Google, and IBM, which will be equipped with their own drones or robotics devices to run smoothly in artificial intelligence applications. The following shows an application of the chip combined with deep learning: the Kespry drone.

(Kespry Drone: Video)

Some seemingly insurmountable gaps in the field of computer vision have been easily explored by deep learning. However, the disadvantages of deep learning which inherently require high computing performance will certainly be resolved with the maturity of hardware. With the development of AI technology and people's attention to it, the application scene of computer vision is no longer confined to the industry. It has gradually entered the mass market, such as road signs, lane analysis of driving recorders, and gesture recognition based on the mobile VR of mobile phones. And so on, the high quality implementation of these functions requires deep learning algorithms running under limited CPU and GPU performance. Then, a dedicated CV processing chip will be the next oasis for the entire chip manufacturing industry. This is why Intel Movidius, which produces vision processors, and Nervana, a deep learning company, will be acquired in a short period of time. At the same time, due to the sensitive nature of power consumption of mobile phones, tablets, and unmanned aerial vehicles, it is necessary to operate this chip that specifically handles CV content with low power consumption.

Therefore, the problems encountered in the field of computer vision can be better solved by deep learning, and the problem of deep learning running on mobile platforms will eventually be solved by low-power CV processing chips. The real AI is getting away from us. Near.

Wireless Charger Car Phone Holder

Wireless Charger Car Phone Holder,In Car Phone Holder Wireless Charger,Car Phone Holder And Charger,Phone Car Mount With Wireless Charger

Ningbo Luke Automotive Supplies Ltd. , https://www.nbluke.com

Posted on