Wang Liang, Chief R&D Architect for Baidu’s Intelligent Driving Group (IDG), offered his take on FSD v12’s touted end-to-end learning. “The term ‘end-to-end’ has gained popularity with Tesla FSD V12, but internally, we use it with caution,” Wang Liang said flatly. “We believe it’s not truly end-to-end, as there’s still a significant layered structure within the system.”
So what exactly is Wang Liang getting at? Well, he argues that despite Tesla’s shift to a more data-driven decision model, there’s still a fairly traditional visual perception pipeline feeding into that: “Tesla has already established strong visual perception capabilities, as demonstrated in Tesla FSD v11. This latest step primarily replaces rule-based decision-making with a data-driven approach,” he explained. “I personally believe in this direction.”
But under the hood, Wang Liang believes FSD v12 is still leaning on Tesla’s robust computer vision models to extract road structure and objects before sending that processed data to the driving policy model. It’s not a true single neural net making decisions straight from raw sensor data. “If other players rush to emulate Tesla and adopt an end-to-end approach without having reached a similar level of visual perception capability, it’s unclear how much data and computing power would be required to train such a complex and opaque system,” Wang Liang warned. “It would be an incredibly challenging task, even more so than building an atomic bomb.”
LiDARS Loaded: Wang Liang Concedes the Near-Term Value of Hybrid Perception
Another huge topic dominating the autonomous vehicle sphere is the bitter LiDAR vs pure vision sensor debate, with Tesla steadfastly rebuffing LiDAR while most other major players embrace it—at least for now.
Wang Liang lands somewhere in the middle, “LiDAR offers a more readily achievable solution in the short term. For Chinese autonomous vehicle players, incorporating LiDAR significantly reduces the time needed to develop a mature perception model, allowing for quicker deployment,” he said. “This approach is not contradictory to pursuing pure vision systems.”
His rationale? While LiDAR sensor data is more limited, combining it with cameras gets you closer to Level 4 “mind off” autonomy much faster than holding out for pure vision to catch up. But if you’re only shooting for Level 2 driver assistance? The cost of LiDAR may not justify the minimal gains. “If the goal remains L2+ autonomy, the cost of LiDAR might not be justifiable, as the return on investment wouldn’t be significant,” he noted.
So in typical Big Tech fashion, Baidu seems to be preaching a strategic “cross-modal” approach hedging its bets on both LiDAR and vision until one definitively takes the lead.
But one thing’s for sure: Thoughtful insights like Wang Liang’s cut through the endless stream of breathless autonomous vehicle hype and give us a more nuanced, reality-grounded perspective. And in this rapidly evolving field, a little pragmatic nuance could go a long way.