Decode the "end-to-end" concept that everyone is talking about, and easily play with the new trend of automotive technology.

Since the launch of the end-to-end intelligent driving system "FSD V12 beta" in the United States in August 2023, end-to-end has become the hottest topic in the automobile industry. With the follow-up of Huawei, Tucki, Weilai, Ideality and other enterprises, almost every domestic manufacturer will spend a lot of space in the press conference to publicize how powerful their end-to-end is. Regardless of marketing methods, what does "end-to-end" mean? What does it mean for intelligent driving? Today we will explain this question to you!

Before the end-to-end intelligent driving system was mass-produced, every intelligent driving system adopted a modular scheme. Simply put, modular intelligent driving is an assembly line, which mainly includes four processes: perception, prediction, planning and control. First of all, the task of the sensing part is to process the data of sensors such as radar and camera of the vehicle, and then analyze the specific position and road trajectory of the objects around the vehicle, and distinguish whether they are pedestrians, bicycles, cars or trucks.

Then, the perception module will transmit the above information to the prediction module, and the prediction module will analyze the next movement state of the surrounding traffic participants according to the above information, such as whether the surrounding vehicles will turn, go straight or stop next. After further analysis, the prediction module will provide one or more driving paths and vehicle speeds that the vehicle can refer to next.

Then the prediction module sends the road driving scheme of the vehicle to the planning module, and the planning module will decide what the vehicle should do next according to the information such as the vehicle’s own state and navigation. After the planning module confirms the driving path and speed, it will transmit the command to the control module, and then the control module will calculate and operate the steering wheel, brake and throttle of the vehicle. A seemingly simple intelligent driving function is realized through the above steps.

From the above introduction, it is not difficult to see that modular intelligent driving decomposes simple driving behavior into multiple steps, and the logic of each step is in perfect harmony. In the eyes of car companies and suppliers, modular intelligent driving itself is a very good solution, because different teams can be responsible for the corresponding modules and give full play to the advantages of division of labor and cooperation, thus rapidly changing intelligent driving from concept to mass production.

Secondly, modular intelligent driving has a set of system framework with clear functions and responsibilities, so when the intelligent driving system finds bugs in use, car companies and suppliers can immediately find the specific reasons for the bugs and quickly fix them through OTA. For example, if the vehicle brakes wrongly when driving at high speed, then through data analysis, the vehicle enterprise can know whether the fault is due to the wrong data of the sensing module or the wrong judgment given by the forecasting and planning module.

Although modular intelligent driving is convenient for mass production and BUG repair, it needs to learn a lot of traffic rules and driving experience if it wants to control vehicles like people, and all this depends on engineers to define the rules in advance, that is, to turn traffic rules and driving experience into lines of software code. But can engineers write code to cover all the driving scenes in reality? Of course it’s impossible! On this issue, there is a classic case in the industry. If you drive a vehicle on a narrow road full of cars on both sides, a balloon suddenly floats on one side of the road, then the general logic will think that there may be children jumping out on one side of the road, so the vehicle should brake immediately. However, the same scene is placed on the high speed. If the intelligent driving system still controls the vehicle by braking immediately, it is likely to turn into a rear-end collision. In other words, if engineers don’t define the rules in advance for this kind of driving scene, such as the system doesn’t brake after the balloon is detected at high speed, then the intelligent driving system will have safety risks when it encounters similar scenes.

According to Xpeng Motors, a relatively stable mass production intelligent driving system has about 100,000 rules. And if the intelligent driving system is to be close to the same level as people, it needs to manually write about 1 billion rules. For software engineering development, this is almost impossible. Because of this, we can see that the traditional intelligent driving system will make more or less mistakes in daily use, so that drivers have to intervene.

For the above reasons, car companies that focus on autonomous driving have been trying to solve the problem that traditional intelligent driving needs preset rules, so there is an end-to-end approach. The so-called end-to-end, in fact, is to make all the traditional sub-modules of perception-prediction-planning-control neural network, that is, to replace the traditional algorithms and manually written rules with advanced algorithm models.

Therefore, the end-to-end workflow is quite different from the traditional modular workflow. The traditional modular work sequence is perception-prediction-planning-control in turn, while the end-to-end sequence is sensor data (radar, camera)-neural network-driving parameters (steering wheel, throttle, brake), that is to say, the traditional perception, prediction, planning and control modules are all completed by neural network.

As can be seen from the workflow, the core technology in end-to-end is neural network, and the technology most closely related to neural network is AI. In the past two years, AI has shown great performance in the fields of voice, text, pictures and video, and everyone should have realized it. When the neural network is applied to the car, it means that people can constantly train the intelligent driving system, so that it can learn to adapt to more complex driving environment.

Therefore, at the functional level, the biggest change from end to end is that the system has the ability of autonomous learning, which is a function that traditional modular intelligent driving does not have.In this way, when dealing with all kinds of unexpected real driving scenes, end-to-end can calculate the appropriate rules through neural network, without having to manually write the rules in advance, which also provides a solution for intelligent driving to deal with endless driving scenes in reality.For example, the Tesla FSD V11 version, which didn’t have end-to-end function before, has written more than 300,000 lines of code, while the end-to-end FSD V12 version has directly reduced more than 300,000 lines of code to 2,000 lines, but the performance of FSD V12 is closer to human drivers than V11.

fromIn theory, end-to-end is indeed a very ideal technology, but in practice, end-to-end is not so reliable. This is because people’s understanding of neural networks is still not clear enough at this stage, so people also call neural networks "black boxes". As shown in the above figure, in the white box state, we are very aware of the logical causal relationship between system input/output, but when the input information passes through the black box, people can’t explain why the input information becomes the output information.

For example, when there is an obvious logical error in the intelligent driving system, the vehicle manufacturer can quickly find out which module the problem is, and then manually write a new rule. But in the end-to-end system, car companies do not know which parameter or structure in the complex neural network has problems.

For this reason, the end-to-end intelligent driving system based on neural network can sometimes give reasonable rules in very complicated scenes, but sometimes it will make very low-level mistakes, such as not being able to distinguish traffic lights, so some people describe it as "the upper limit is very high and the lower limit is very low". Considering that there are some risks in the end-to-end application, the end-to-end intelligent driving system introduced by Huawei and Tucki does not rely entirely on autonomous learning, and their end-to-end systems still have many manually written rules to cover the intelligent driving system.

From the perspective of technology development trend, end-to-end is definitely the direction of high-level intelligent driving in the future. However, because people’s understanding of neural networks is not thorough enough, the end-to-end performance of various companies at this stage still has a certain gap with the ideal level. In addition, compared with the existing modular intelligent driving system, whether the end-to-end neural network algorithm model is smart or not depends heavily on massive real data for training. Only after massive data training, the neural network can be transformed from a small model to a usable large model, which means that high-order intelligent driving requires huge computing power and data input in the development stage, thus invisibly raising the threshold of intelligent driving. Because of this, some car owners will feedback that their cars are not as good as they used to be after upgrading to end-to-end intelligent driving. This is the painful period of big model training. Considering that there are great differences in end-to-end training among different brands in reality, this means that the performance gap of different brands of intelligent driving systems may be gradually widened in the future.