We live in a very strange era of Artificial Intelligence. Tesla CEO Elon Musk did some dire fear-mongering about AI becoming so intelligent it wants to murder us all; I don’t buy this — not because I don’t think an advanced machine intelligence couldn’t find a suitable motive, but more because I don’t think we’re anywhere near close to real artificial intelligence. Even the name is deceptive, because what we’re calling AI now isn’t really “intelligence,” at least not as we understand it in humans. Maybe it’s just a bunch of if-then-else statements, maybe it’s more, but whatever it is, it sure is weird. Like many people, I’ve been playing with the image-from-text-generating system called Craiyon, which was known as DALL-E Mini, which was based on the much more advanced DALL-E. Of course, the only good use for a computer is making it do things with cars, so I tried a lot of car-related inputs to see what I’d get. The results are, I think, fascinating.
I only have the most rudimentary understanding of how the system actually works, but from what I’ve read, the full DALL-E system uses something known as a Generative Pre-Trained Transformer, which uses a neural network that implements methods to make it better at predicting relevance of inputs from a given dataset. The system is trained with text-image pairs from the internet, meaning that it sort of recognizes what certain images are of based on tags, and has a vast library of these to pull from.
Here’s the simple explanation of how it works, according to the DALL-E Mini site:
The model is trained by looking at millions of images from the internet with their associated captions. Over time, it learns how to draw an image from a text prompt.Some of the concepts are learnt from memory as it may have seen similar images. However, it can also learn how to create unique images that don’t exist such as “the Eiffel tower is landing on the moon” by combining multiple concepts together.Several models are combined together to achieve these results:
an image encoder that turns raw images into a sequence of numbers with its associated decoder
a model that turns a text prompt into an encoded image
a model that judges the quality of the images generated for better filtering
I’m not clear on how it specifically “draws” the images, but it seems to be taking elements from images and somehow re-processing them and compositing them together, I think? I am not clear at all, but one thing I can say is that the results are simultaneously impressive and a bit disturbing. The images suggest something that has a lot of skill and observational understanding without any overall comprehension. It can draw things that look like cars, even specific cars or cars of a given era, but it’s also very clear it has no idea what a car actually is.
According to the more detailed technical explainer, images are built from what appear to be discrete tokens, mosaic’d together:
The system manages to get many details right but has no idea what they are, what they’re for, or really anything. It’s hard to explain, but I think it is something genuinely new in the timeline of images that humans have had some hand in creating. So, with that in mind, let’s look at some AI-generated cars!
All of these start with text prompts to get the system going. With that in mind, I started with one of the best ways to start any car-related whatever, by asking it to show me some “Tatra cars in city.” Here’s the result:
Well, those do resemble Tatras! Severely mutated Tatras, sure, but there’s definitely some Tatra T87 in there. Though, the middle row, rightmost car looks kind of like a Jaguar Mk2 or something. I think it’d really interesting how, despite getting so much generally right about the idiosyncratic stylistic design of a Tatra, it somehow doesn’t understand really basic fundamental things about cars like how they’re almost always bilaterally symmetrical.
In case you forgot what a Tatra looks like, here you go (via Wikimedia Commons):
It’s also interesting to note that most of the streets are cobblestone, because most pictures of Tatras are from picturesque Eastern European old cities with these sorts of streets.
Let’s try making it come up with something that doesn’t really exist, but is a combination of two things that definitely do. Like a Bugatti pickup truck:
Huh! This is interesting, because it seems to have almost exclusively chosen old, 30s-era Bugattis as opposed to modern ones like a Veyron or Chiron. Some of those trucks do have the old school Bugatti horse-collar grille, and look like pretty cool sometimes hot-rodded pickup trucks. Again, there’s no understanding of automotive symmetry or even the fundamental layout of a car. That bottom middle one also looks like some kind of cool three-wheel cyclecar experiment. Generally, it looks like the AI found these sorts of examples and merged them together:
What if we try another carmaker-that-doesn’t-make-trucks type of experiment? Like a Lexus dump truck?
Huh. Surprisingly boring! Hardly any Lexus at all in these! Also, look how mind-hurtingly confused that top middle one is.
Okay, let’s try one more incongruous car type with carmaker. This time, how about seeing what a Velorex – a tiny, leather-bodied Czech three-wheeled microcar – limousine might be like. Here’s a real-world Velorex, to give you a reference:
…and here’s what a computer thinks a Velorex limo might be like:
It clearly had a lot of good Velorex sources, because all of these do have a very Velorex look, and I suppose some of them are longer and seem to have four doors, which would qualify as a limo, based on Velorex standards, so I’ll call this one a victory.
Let’s get a little self-referential and see what happens here. What if I just ask it to draw “The Autopian?”
Strangely, for this input it seems like it took almost all the visual references from one story I wrote a bit ago, about a 1906 magic lantern show that seems to be one of the first car-related movie-like things ever. That bottom middle pic is almost directly lifted from one of those magic lantern slides. But why did it focus on that for “The Autopian?” And who the hell are those old scientist-looking dudes?
Keeping self-referential, I wonder what it thinks the co-founders of this dump, David Tracy and me, look like:
Yikes. These digital ghouls dress better than we do, but I think I can safely say we’re better looking. Also, it seems to think David and I dress better than we actually do. I can’t think of any pictures where David and I would have been posed like these, in environments like these, so I have no idea where it’s pulling its data from for this.
[Editor’s Note: Good Lord I wish I could unsee that. -DT]
Let’s move to more technical stuff; I wonder how the AI will do if I ask it to draw me a diagram of a horizontally-opposed, or flat-four engine?
This is interesting for a few reasons, even if none of them is really a flat-four layout. What’s interesting is that they do sort of have a diagrammatic look to them, so the AI understood that much, and you can see that a lot of these appear to have finned cylinders like an air-cooled engine, which would make sense because the most common flat-fours likely to be found online in diagrams are air-cooled VW engines.
[Editor’s Note: These are amazing! -DT]
Some of these look like air-cooled Fiat 500 motors, some look like large V8s, and almost all of them have floppy-looking pulleys and some Escher-esque belt routing paths. Also interesting are the callouts, which I don’t believe the AI understands are not part of the engine diagram itself, because, again, it has no idea what it’s drawing, really.
Let’s see how it does with just a request for an engine image, not a diagram. Something exciting, like a turbocharged V8:
There’s big turbo pipes and some things that could be superchargers, even, and what could be valve covers – these all definitely suggest V8 engines, even if the specifics are pretty fuzzy and some of the parts look like a fully-popped Jiffy Pop pan.
What if we try to do something really specific, like ask the AI to re-create this scene from a famous 1985 Citroën commercial involving Grace Jones, a Citroën CX2, and a garage shaped like a giant Grace Jones head. This ad:
Something about everything about this commercial feels like it may be at home in an AI-generated space. So let’s see what it does when I ask it to draw “Grace Jones driving Citroën into giant head”:
Okay, so, it’s light on the Citroën and giant head part, but, honestly, these are kind of great interpretations of Grace Jones, ones that I feel like she would appreciate. They might be the most recognizable specific person I’ve ever seen this system generate, if I’m honest.
Let’s try something simple and straightforward and dear to my heart this time: taillights.
Those are all fairly plausible taillights! They all feel quite modern, at least 1990s-era and up, and there seems to be a fondness for the inset round lamp, sometimes an amber indicator, sometimes not. It’s interesting how they all seem to generally have that sort of lights-at-the-corners tombstone shape like so so many cars, like, say, a Kia Spectra or something generic like that.
I got a very unexpected result from something I thought would be more specific, “semaphore turn signal.” You know, those funny-looking little arms that were used as turn indicators on some European cars back in the 30s to the 60s or so:
I do love the look of the result, though, which feels like early 20th century modern art, like a Mondrian or Kandinsky or something:
I should have called them “trafficators.” In fact, let’s try that:
Okay, that’s not any better. Let’s move on to something else, like taking well-known cars and putting them in unexpected situations. Like a Beetle racing at Le Mans:
That worked pretty well! I like how it doesn’t really seem to differentiate between key circular elements like wheels, lights, and the gumball-number on the doors. These are all quite recognizable Beetles, though. Let’s try with some other highly-identifiable cars. Like a Ford Model T, but desert racing in Baja:
Hot damn, look at that! We even have some action shots here! Let’s replace the Model T with a BMW Isetta:
Those are very clearly Isettas, and they’re off-roading!
Okay, one last set of experiments. I’m curious to see how well the AI categorizes different stylistic eras for cars, so let’ come up with a nonsense car – a Saturn Tesla Wagon – and add the year, starting from the 1960s, and see how different the results look. Here’s a 1963 Saturn Tesla Wagon:
These definitely feel like late 1950s/early 1960s cars, with all the expected design language and details. Mostly seem American, too. Let’s see what it thinks the 1970s looked like, automotively:
Yeah, these do feel very 1970s! The right earthtone-heavy color palette, proportions feel right, and this time we have some interior shots as well. These still feel quite American, but with some Euro-looking details. On to the 1980s:
Very 1980s, especially 1980s GM. At this point we’re getting some real-world Saturn design influences here, mostly in the front ends, even though we really didn’t have Saturns on the roads until the 1990s. Again, though, Craiyon/DALL-E mini does seem very capable of creating likely car images based on an era’s general design traits. On to the 1990s!
Here we’re finally getting a lot of Saturn design influences, which makes sense, as there’s plenty of 1990s Saturn wagon references. To the 2000s!
Again, very Saturn-y, but more importantly, very much early 2000s-era looking design. Let’s keep going!
We’re finally seeing some Tesla influence here (look at the vertical screen right above here) though the top left one reminds me of a Saab 9-3 wagon, a bit, with the clear taillights and overall look.
Let’s jump a tiny bit into the future now, and see what it does when we ask for a 2024 car:
Interestingly, because the year is in the future, we have the blank backgrounds you’d see for concept car renderings, and some very sleek, concept car designs.
I think it’s worth playing around with this tool; it’s kind of amazing that we have free and easy access to something like this at all. What I really think is fascinating is how the system tends to fail, possibly more so than how it succeeds. It can do some truly remarkable things, and at the same time I can’t think of a better way to convey the limitations of the state of AI than seeing where it misses the mark.
So many of these sorts of systems are “black boxes,” meaning no one is entirely sure what’s going on inside them, which is all the more reason to try things out and see what you get. With that, I’ll leave you with this important set of images:
[Editor’s Note: This is incredible. -DT]
Looks like the AI creates images like Morty created phrases on the “Edge of Tomorty” episode. It guesses every pixel until it meets some threshold of acceptability under the training model it has.
As far as I could understand it, the AI doesn’t drawn a VW Beetle, but instead creates a grid of pixels that satisfy the statistical probability of being on the right place when compared to know Beetle pictures (and combines the probabilities for the mashup prompts. I guess?)
Anyway, I think that the current implementation of AI are to a real mind what the 1000 monkeys are to Shakespeare. We just got better and better at compressing the typing time and churning out the coherent outputs.
To be clear, this is not to diss the work being done, or even pretend that I can do something better (or, as a mater of fact, something at all on this area). I’m just writing down the analogies my monkey brain uses to understand this kind of stuff 🙂
Damn, I wish we could post pictures. “Camouflage Karmann Ghia” produces some cool results.
Should see what Brown Diesel Manual Wagon produces
Salvador Dali, your car is ready.
God the term AI is thrown around loosely these days.
For me, Torch’s inputs are the most fun part of articles like this one. That said, the Model T baja results are fucking cool.
Ralph Steadman pug.
It allowed me to realise my fantasy of looking at Margaret Thatcher and the Queen in bikini at the Beach ????
I wasn’t afraid of AI until I saw this.
I think this is far from intelligence but very entertaining. Seems like it pulls x images of each word and creates an average and then tries to autodraw into one picture. Rudimentary Excell program but with pictures.
Lee Iococca should have been eating a hot dog, a Yocco’s hot dog, since his uncle founded that company.
Yoccos was how us PA dutch pronounced Iacocca
I am fascinated by these images. Clearly it has images of Turbo V8s but it tries to pull the essential elements of all of them and meld them together. Rather, living up to its Dall-E name, it tries to MELT them all together.
Thanks Torch, now I won’t get anything done all afternoon.
Too bad Lee Iacocca isn’t still around to see this.
Torch, when you are done with your brain, you should donate it to science, because I don’t think it works like other peoples. I’m not complaining, just making an observation.
I don’t know, remember when Abby Normal brain was donated in Young Frankenstein?
I love this article.
also that semaphore turn signal batch, *chef’s kiss*, several of those belong in the MOMA.
Now we need The Bishop’s take on a Saturn Tesla wagon from various eras. For science!
I just tried “Christ on a bike.”
And I can’t stop giggling.
Oh jeez, “Carroll Shelby Muppet” is delightful. I really wish we could post images in comments.
I tried Jeremy Clarkson on a bike but the system chickened out and just made the neck longer and cropped out the face
That is some… Delightful nightmare fuel? Some of those results are pretty good, but they really do illustrate the system’s limits of comprehension. It would be interesting to see a human designer make a rational drawing based on some of the computer’s better attempts.
I’m currently trying “Pixar CARS internal anatomy cross-section diagram”
The middle rightmost Lee Iacocca is special. Haha. He ate so many sandwiches that his lips actually turned into a sandwich.
Also Jesus Christ the David and Torch one is going to haunt me for the rest of my days.
https://64.media.tumblr.com/71fce07d3ab24ecbf5d97b9bbacd538f/f1d21173c9a1b9a0-26/s540x810/90b9b351c22613be12462a9cbab270621c246590.gifv
It looks like an ad for a bad burn victim clinic
Please make your main logo the bottom left entry for Autopian. It kind of looks like a car driving into a guy’s neck.
THe Grace Jones ad is very Grace Jones. I would like to think she has owned a garage shaped like her head.
Came here to say, “bottom right!” Looks like a Monty Python upperclass twit of an Autopian logo!
Unfortunately (or perhaps unsurprisingly), “Jello picnic” was underwhelming.
He’s bludgeoned himself to death!
I am so fortunate to exist in this timeline where this article exists and I was able to read it.
Thank you, Great Spaghetti Monster, for making it possible.
This is great research. I now understand Toyota’s styling department so much better.
Back in my day we didn’t need fancy computer programs for this.
A hit of decent acid would produce the same result.