Very Stable Diffusion

As I predicted in my last post, I ended up spending a few hours messing about with Stable Diffusion. Being more interested in reading than in futzing about with computers, though, it took me a while to get going.

In the end, it was the potential for satire that made me dive in. I decided I had to have a go at it after seeing this in r/StableDiffusion:

very stable diffusion — This was far too good to ignore, and definitely tickled my funny bone.

Now, I do not have any machines with discrete GPUs, so I have been giving most trendy ML techniques a pass over the past few years–I had plenty to do with that in real life^TM, and with the GPU shortages and all I have not (yet) gotten myself a beefy Ryzen desktop with a CUDA-capable GPU.

But, serendipitously, Hacker News was aflame with a discussion about an Apple Silicon version, which I was able to get going on my M1 Pro in around 10 minutes (including downloading the freely available 4GB model).

I’m actually now using lstein/stablediffusion, which has been very active over the past two days and has papered over most of the missing bits in PyTorch and GPU support except upscaling. That means I’m somewhat limited to rendering 512px images in under a minute, but that is plenty enough¹.

I’m not going to wax lyrical about the impact of AI-driven illustration on art, or how it’s going to put stock image providers out of business. Both of those are quite likely to pan out in unexpected ways, so I’ll just say that having Stable Diffusion freely available and driven by Open Source tooling is definitely going to impact companies like OpenAI as far as the mass market is concerned (making money out of their current tech has just become really hard, and they are going to need to step up and provide much more sophisticated tooling).

My take on this is that it’s not going to replace artists – Stable Diffusion is closer to being “a bicycle for art” (i.e., an enabler and accelerator) than anything else, and for those of us with less artistic leanings it’s just going to be a lot of fun².

Prompt Engineering

Since Stable Diffusion is not Dall-E 2, a lot of the staple prompt engineering that is making the rounds is a bit moot. However, you can go a long way if you stick to a theme and fine tune it, and so for most of my experiments I went with this:

portrait painting of thing or person, sharp focus, award-winning, trending on artstation, masterpiece, highly detailed, intricate. art by josan gonzales and moebius and deathburger

This is what you’d get if you asked for a Corgi:

Portraiture

But then things quickly go ballistic from there:

a man, writer, black woman, engineer — This is great bang for the buck when the source is free.

Sadly, the current version I’m using ignores seed values, so I can’t reproduce these at will (yet).

The Pantheon of Tech

The really amazing thing, though, is when you realize that inside those 4GB of model weights there is enough data to synthesize these:

a pantheon of tech — Ada Lovelace, Richard Feynman, an amazing Steve Jobs, and a passable Satya Nadella.

This is the bit where Arthur C. Clarke’s old adage about sufficiently available technology being indistinguishable from magic applies, because I just typed their names into the prompt placeholder.

Artists

It gets a lot generally right for artists, in tone if not in looks:

Bill Murray, Kraftwerk, the Queen, Mick Jagger (representing the Rolling Stones), Robin Williams, the Beatles, more Queen, Groove Armada, Daft Punk

Almost Fictional

And it just comes up with the most delightful details if you pick names of people who either don’t exist nor have a photographic record:

The craggy look of Caesar, Paul’s stillsuit, Master Chief’s armor (in a “portrait”, in exactly the same prompt as the others) and Artemis’ countenance… If you picture this as a sort of dip into our collective unconscious, the results are nothing short of amazing.

It was even able to cope with a completely vague subject like the cast of Friends:

Almost Joey — Yep, AI watches TV, for sure.

Our Collective Unconscious

So I decided to take this a bit further and dip into the cesspool of our collective unconscious that is Twitter, and generate random things based on what crossed my timeline:

Madness — A discussion on bike lanes, Super Man holding a train car, the war on Ukraine, Elvis revealed as an alien.

Image Generation

The best thing that I’ve yet to explore fully, though, is using Stable Diffusion to generate new images based on sketches or existing pictures.

For a lark, I turned some of my friends’ avatars into weathered fishermen by using them as base images:

But it is much more impressive if you look at the source images alongside. For example, I took my very first Dall-E 2 output and used it as an input:

May the Force Be With You — This is where I ought to be, I think

Given that people are already incorporating Stable Diffusion in drawing tools and the rate at which things have been progressing during Summer, I can’t wait to see what is going to come out of this in a year or so.

I expect a lot of video, a lot more sophistication around prompt design, a few more completely free models and, of course, a veritable plague of AI-generated reaction GIFs, but I think it will all be alright in the end.

Well, eventually. A lot of immature people (and policy makers, official or otherwise) are going to be freaking out…

You can also run the Vulkan version of Real-ERSGAN and upscale images separately, which has been working fine for me. ↩︎
The Muggles, of course, will just appreciate the better voice-driven Instagram filters it will inevitably spawn. Or something. ↩︎

Tao of Mac