Newsletter

Tesla AI Day: “Simulation Training” of Autonomous Driving and Musk’s “Drunk Man’s Meaning”_Chip

Original title: Tesla AI Day: “Simulation Training” of Autonomous Driving and Musk’s “Drunkard”

In line with expectations, the Tesla AI Day, which was delayed for nearly 40 minutes, focused on three parts: the autonomous driving visual route, the DOJO+D1 chip, and the Tesla Bot humanoid robot.

The last point is almost irrelevant to the current stage. According to Tesla’s plan, the Tesla Bot robot prototype will be launched in 2022, and now that the beginning of the story is told early, it seems to cater to the speculation about “holding investors”. .

Compared with the science fiction of “Star Wars”, Tesla’s vector analysis, labeling and simulation calculations on the visual route of autonomous driving are closer to reality. Previously, in the choice of autonomous driving perception routes, the two solutions of pure vision and visual radar fusion formed the biggest difference between Tesla and other car companies.

From Tesla’s point of view, “driving with eyes” can reduce the cost of hardware, but with it, a large number of artificial intelligence calculations and analysis methods are used to solve various problems in vehicle driving. The lack of radar data has increased the difficulty of software processing geometrically. The simplification of the hardware at the perception level will also bring about the upgrade of the perception system and the complex calculations at the execution level.

Although this is a solution that is closest to human beings, the difference is that in the perspective of the camera, the vast world we are sighing about presents only a collection of various pixels. Therefore, in the way of discrimination, the camera needs to analyze the connections between pixels and group them into objects that are convenient for subsequent driving decisions through labeling.

In order to avoid various problems such as image distortion and insufficient frame presentation, Tesla upgraded from the original 2D image labeling to the space + time 4D vector label.

At present, Tesla’s body is equipped with 8 cameras. At 36 frames per second, 1280*960 resolution 12bit HDR images are clear standards, integrating surrounding objects and distinguishing static, dynamic and object boundaries through time trajectories, and use Including multi-head routes, camera calibration, buffering, queues, and optimization methods to simplify neural network calculations.

Whether it is a huge length of semi-trailer truck or a street intersection with a blocked boundary, Tesla has established a huge street view label through multi-angle image presentation, data analysis, Transformer distance prediction algorithm, and the superposition and coverage of different features. Provide good perceptual analysis for subsequent calculations.

At this stage, Tesla has applied data labeling to 1 billion different images and 300 million different scenes, but for fully autonomous driving, these labels are far from enough.

In order to deal with such a large amount of data, Tesla said that the company currently has a data labeling team of 1,000 people, working with engineers to create a completely customized data labeling and analysis architecture. At the same time, with the continuous increase in efficiency, Tesla has achieved multiple data collections on the same road, erased the “bounding box” that used to be composed of red and yellow colors, split the environmental scene into point clouds, and uploaded them to the cloud. Formed an actual measurement environment scene close to the “high-precision map”.

D1 chip added to promote “simulation training” for autonomous driving

The bigger problem of autonomous driving lies in how to deal with extremely complex road conditions. Normalized collection can bring about the rapid establishment of the tag library, which can face the changeable climate environment and the dynamics of pedestrians and vehicles in the actual driving process. It is not enough to rely solely on the road test of the engineer. “Our data identification comes from simulated images. We need some extreme conditions. It is difficult for human drive test engineers to collect. Therefore, different 3D road simulation scenes are used to collect relevant data.”

Relying on short films to record driving scenes, Tesla can obtain 10,000 short films of similar harsh environments every week, and finally realize accurate distance perception through automatic tags.

In addition, with the Autopilot simulation test, the computer can accurately label and deploy virtual vehicles and pedestrians, and pour them into various weather environments and different scenarios. Through large-scale simulation training, the goal of “computer training” for automatic driving can be realized.

At present, there are 371 million images trained by Tesla’s in-car network, forming 480 million labels. In the future, in addition to dynamic objects such as people and cars, Tesla will also detect static objects, road topologies, more vehicles and pedestrians, and reinforcement learning. These hundreds of millions of training projects will be performed in the DOJO Supercomputing Center and D1 chip. With the successful R&D and production of the company, it can proceed smoothly.

It is understood that the D1 chip uses TSMC’s 7nm process and has a processing capacity of 362 TFLOPs. The internal bandwidth of a single chip reaches 10TB/s, and the off-chip bandwidth is also as high as 4TB/s.

25 single chips can be assembled into a chipset, directly lithographically on a wafer, so that the chip group interface speed can reach 36 TB/s, and the processing capacity can reach 9 PFLOPs. This also means that not only the delay rate of the new chip group is significantly reduced, but also the computing power far exceeds the existing products in the market. It even only needs dozens of such chip groups to achieve the current top supercomputing capability.

Theoretically, 120 D1 chip clusters form a server cabinet, and its computing power will exceed 1.1 EFLOPS. Compared with other products in the industry, the performance is increased by 4 times, the energy efficiency ratio is increased by 1.3 times, and the space is reduced by 5 times.

Efficient decision-making in milliseconds, driving experience becomes tighter and smoother

Repeated simulation training, actual road data collection, and system training expand the calculation and selection space of the vehicle decision-making layer. Ashok Elluswamy, director of Tesla’s autopilot software, said that Tesla uses a hybrid decision-making system. First, the perception data is passed through a coarse search in the vector space, and then after continuous optimization, it can finally form a smooth motion trajectory.

For example, when passing an intersection that includes a variety of other driving vehicles such as turning left, going straight, overtaking, etc., the vehicle has multiple decision-making methods to choose from, such as decelerating and changing lanes early, accelerating and delaying lane changes, stopping and waiting to give way, but how do you base it? In the current road conditions, to choose a better driving plan, you need to rely on coarse search (coarse search) to achieve.

Within 1.5 milliseconds, Tesla can search for 2500 lane change timings, and obtain a relative smooth trajectory (smooth trajectory) through numerous alternatives, finally enabling the vehicle to take timely lane change measures on the basis of both comfort and safety.

In another case, the vehicle is in a “two-way traffic” scene that can only meet the “single-vehicle width” narrow road section to pass through. First of all, facing the oncoming car, Tesla decided to slow down and continue to move forward. Then for another oncoming car that appeared again, Tesla chose to avoid parking at the same time, the oncoming vehicle also chose to stop, so , Tesla decisively changed the driving decision and started again through this section of the road.

In the driving of these special road sections, Tesla’s decision-making and planning are more inclined to actual human driving, and the ultimate goal is to use fast route selection and driving decision-making to ensure driving speed while providing users with safer, Smooth product optimization.

In comparison, Tesla has a certain distance from the products presented by other autonomous driving companies in terms of perception and decision-making. However, in terms of returning to the choice of perception route, it lacks hardware equipment such as high-precision maps and lidar. How long can such a neural computing network advantage last, and whether the driving ability of future vehicles can truly surpass human beings are still the problems Tesla will solve and the key to whether it can continue to maintain its leading position.

Of course, Musk’s focus may not be here. In the rapid growth period of Tesla’s technology, how to “attract talents” and “give market investment confidence” seems to be the most critical significance of this AI Day.Return to Sohu to see more

.