While Tesla has held back its main new Autopilot feature with the release of version 9, it has still deployed a new neural net for Autopilot and according to a new analysis, it is a massively bigger neural net with impressive new capabilities.
Based on the new capabilities of Autopilot under version 9, we already knew that the new computer vision neural net had to be significantly updated.
It can now track vehicles and other objects all around the car – meaning that it makes better use of the 8 cameras around the car and not just the front-facing ones.
Now we have a better understanding of just how significant Tesla’s neural net update in version 9 is as TMC member Jimmy_d, a deep learning expert who has access to the software and has been releasing his thoughts on each update, has produced an interesting analysis on version 9.
Jimmy confirmed that Tesla has now deployed a new unified camera network that handles all 8 cameras.
He also listed a few other main changes:
- Same weight file being used for all cameras (this has pretty interesting implications and previously V8 main/narrow seems to have had separate weights for each camera)
- Processed resolution of 3 front cameras and back camera: 1280×960 (full camera resolution)
- Processed resolution of pillar and repeater cameras: 640×480 (1/2×1/2 of camera’s true resolution)
- all cameras: 3 color channels, 2 frames (2 frames also has very interesting implications) (was 640×416, 2 color channels, 1 frame, only main and narrow in V8)
Those changes add up to a much larger neural network that requires a lot more processing power.
Jimmy estimates that it might already be pushing the limits of the onboard computer – hence why Tesla is working on a computer upgrade.
He tried to communicate just how much bigger the neural net on v9 is compared to v8:
“This V9 network is a monster, and that’s not the half of it. When you increase the number of parameters (weights) in an NN by a factor of 5 you don’t just get 5 times the capacity and need 5 times as much training data. In terms of expressive capacity increase it’s more akin to a number with 5 times as many digits. So if V8’s expressive capacity was 10, V9’s capacity is more like 100,000. It’s a mind boggling expansion of raw capacity. And likewise the amount of training data doesn’t go up by a mere 5x. It probably takes at least thousands and perhaps millions of times more data to fully utilize a network that has 5x as many parameters.
This network is far larger than any vision NN I’ve seen publicly disclosed and I’m just reeling at the thought of how much data it must take to train it. I sat on this estimate for a long time because I thought that I must have made a mistake. But going over it again and again I find that it’s not my calculations that were off, it’s my expectations that were off.”
Based on his analysis, version 9 appears to be more than an incremental step change when it comes to computer vision.
The deep learning expert sees Tesla playing into its strengths with the update:
“Scaling computational power, training data, and industrial resources plays to Tesla’s strengths and involves less uncertainty than potentially more powerful but less mature techniques. At the same time Tesla is doubling down on their ‘vision first / all neural networks’ approach and, as far as I can tell, it seems to be going well.”
We are also starting to get a better understanding of what Autopilot can see through an effort from Tesla hackers.
Here’s a look at Tesla’s previous Autopilot software’s recognition of roadside structures: