FSD to Use Pure Vision AI
May 10, 2021
By Nuno C.
Elon tweeted that v9 of the FSD beta would remove its reliance on radar completely and instead determine decisions based purely on vision. Humans don't have radar after all, so it seems like a logical solution and tells us Tesla is feeling much more confident in their vision AI.
Radar and vision each have their advantages, but radar has thus far been much more reliable in detecting objects and determining speed. If you've ever noticed your Tesla being able to detect two vehicles in front of you when you can only see the one directly ahead of you, that's radar at work.
In this situation the radio waves from the radar sensor are bouncing underneath the car in front of you and are able to continue traveling and detect that there is another object ahead even though it could never "see" it.
It really is one of those wow moments where you can feel the future and the ability for AI-powered cars to drive better than humans one day. It's baby steps and slowly we'll see more and more of these situations where the vehicle simply sees or does something we could never do.
There's no doubting that more sensors could provide a more reliable and accurate interpretation of the real world as they each have their own advantages. In an ideal world a vehicle with radar, lidar, vision, ultrasonic sensors and even audio processing would provide the best solution. However, more sensors and systems come at a price, resulting in increased vehicle cost and system complexity.
After all humans are relatively safe drivers with two "cameras" and vision alone. If Tesla can completely solve vision, they'll easily be able to achieve superhuman driving capabilities. Teslas have eight cameras, facing in all directions. They're able to analyze all of them concurrently and make much more accurate interpretations then we ever could in the same amount of time.
Tristan on Twitter recently had some great insight into Tesla vision AI and how they're going to replace radar. Here's what Tristan had to say:
"We recently got some insight into how Tesla is going to replace radar in the recent firmware updates + some nifty ML model techniques
From the binaries we can see that they've added velocity and acceleration outputs. These predictions in addition to the existing xyz outputs give much of the same information that radar traditionally provides (distance + velocity + acceleration).
For autosteer on city streets, you need to know the velocity and acceleration of cars in all directions but radar is only pointing forward. If it's accurate enough to make a left turn, radar is probably unnecessary for the most part.
How can a neural network figure out velocity and acceleration from static images you ask?
They've recently switched to something that appears to be styled on an Recurrent Neural Network.
Net structure is unknown (LSTM?) but they're providing the net with a queue of the 15 most recent hidden states. Seems quite a bit easier to train than normal RNNs which need to learn to encode historical data and can have issues like vanishing gradients for longer time windows.
The velocity and acceleration predictions is new, by giving the last 15 frames (~1s) of data I'd expect you can train a highly accurate net to predict velocity + acceleration based off of the learned time series.
They've already been using these queue based RNNs with the normal position nets for a few months presumably to improve stability of the predictions.
This matches with the recent public statements from Tesla about new models training on video instead of static images.
To evaluate the performance compared to radar, I bet Tesla has run some feature importance techniques on the models and radar importance has probably dropped quite a bit with the new nets. See tools like https://captum.ai for more info.
I still think that radar is going to stick around for quite a while for highway usage since the current camera performance in rain and snow isn't great.
NoA often disables in mild rain. City streets might behave better since the relative rain speed is lower.
One other nifty trick they've recently added is a task to rectify the images before feeding them into the neural nets.
This is a common in classical CV applications so surprised it only popped up in the last couple of months.
This makes a lot of sense since it means that the nets don't need to learn the lens distortion. It also likely makes it a lot easier for the nets to correlate objects across multiple cameras since the movement is now much more linear.
For more background on LSTMs (Long Short-Term Memory) see https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
They're tricky to train because they need to encode history which is fed into future runs. The more times you pass the state, the more the earlier frames is diluted hence "vanishing gradients".
Tesla's FSD beta v9 will be a big improvement forward from what FSD beta users have been using where the system was still relying on radar. And it'll be an even bigger leap from what non-beta testers currently have access to. We can't wait. Now where's that button?
Almost ready with FSD Beta V9.0. Step change improvement is massive, especially for weird corner cases & bad weather. Pure vision, no radar.— Elon Musk (@elonmusk) April 9, 2021