On the Evolutionary Art of Midjourney
July 20, 2024Around a year ago, I started using digital and generative art platforms as my primary tools. This shift was made largely out of necessity - I was living out of a suitcase, using a loft in Brooklyn as my home base, while all of my resins, canvasses, and acrylic paints were in a storage unit in San Francisco. I found an incredible amount of freedom being able to work and create from my laptop, anywhere. So I started using Midjourney. Immediately I was addicted. The fast feedback loops (from prompt > image in seconds) was immensely gratifying to my attention-challenged brain, and the sheer scale of stylistic space that was now open for exploration was immediately evident.
To use Midjourney most effectively, you need to approach it with a nonzero degree of humility, as an artistic collaboration with another form of consciousness. The first thing about Midjourney that an artist needs to accept is that it is objectively better than you, in a multiple of ways, both technically and artistically. Technically, Midjourney can emulate any style it has been trained on, and can transfer these styles to reference images. Artistically, It can blend multiple reference images and style while retaining reasonable composition in the final image. The fundamental building block of creativity is novel permutations, so this is an incredibly powerful asset - particularly if you, as the human collaborator, have good intuition and/or learnings regarding which styles blend well together, and where the “research gaps” are in the artistic space - has anybody made a robotic impressionist Mona Lisa before?
Style codes are like the genome an an image in Midjourney. As a Midjourney artist, your role is a curator and mentor. Your fundamental technical task is to work with Midjourney to generate and collect a set of style codes that offer some degree of novelty, resonate with the story if you are trying to tell, and are cohesive with your larger body of work. This is the most valuable human ingredient within this recipe - translating a set of directly felt and observed life experiences from the real world into a Midjourney expression. The fundamental tool by which this is accomplished is not the prompt, as most AI artists would assrt, but the choice.
Input tokens matter surprisingly little - particularly those which are constrained by language. Folks who painstaking craft hyper-detailed-paragraph-long prompts specifying everything from the subject matter, to the art style, to the camera angle and color palette are missing the plot entirely. This is a generative art platform - it has a life of its own, and if you are holding a specific vision for the output in your mind, you will be sorely disappointed. The output will ever match your vision perfectly, because Midjourney is not a technician, and you are micromanaging an artist. The most interesting outputs come from inputing relatively few and vague thematic tokens, and allowing Midjourney the space and freedom to synthesize expressions of that theme nobody has thought of or seen before. The fundamental promise of AI is humans get to level up one degree of abstraction in their working domain. You are no longer the artist, you have been promoted to curator. Your role as the curator is to select which of the four outputs catches your eye, represents the feelings you are trying to convey, and fits into your narrative.
Now here’s where it starts to really get interesting. Human choices don’t have to be “one and done”. Why be satisfied with the first output? The real strength of Midjourney is the number of similar related outputs it can produce in a short time frame. The “variation” button is your best friend; it enables you to explore the stylistic neighborhood. Choosing one of four images is like selecting which offsprings’ genes will get carried on to the next generation. “Subtle” variation produces a somewhat mutated set, whereas “strong variations” will have more mutations. Every time you find an interesting stylistic pocket, you should comprehensively explore it and generate many versions and several “generations” before selecting the final version. Just like photography, hundreds of “photos” of the stylistic space may need to be snapped before one just feels right! You should explore different poses, zooms, etc. Like photography, there are variables outside of your control. 1. Your subject matter has a mind of its own - it may spontaneously select a new position, go on hot and cold streaks, etc. 2. The environment is also randomly changing. The sun may go behind some clouds, the wind may shift - these random variables can make or break the artistic process, it’s part of the challenge!
Like evolutionarily biology, the majority of mutations are selected from their parents, and these properties are recognizable in the offspring - but some mutations are truly random emergent properties of the life form. This is is the most interesting space to explore. By combining several styles and images, and smashing that “strong variation” button 50 times in a row, strange new patterns begin to emerge. On the edge of the stylistic space, fractals start to appear. Surreal compositions mixing out of focus and in-focus elements dominate. Scale seems to matter less. You can convince yourself you are seeing glimpses of how the AI is thinking and generating outputs. Things start to look weird, and weird is good! Biology is about embracing the outlier.
The most effective Midjourney compositions are obtained by curating a gene pool of carefully curated style codes, carefully cross-pollinating the most interesting mutants, and of course occasionally out-crossing selected style genes in the form of reference images from purely human artistic products. In my work, I have been compelled to create pieces which draw heavily on the inherent beauty in nature by feeling images of the natural world - everything from proteins structure, to fluid flows, to acoustic waveforms. My work aims to straddle the boundary between science and art, and reveal how LLMs can be considered creative beings.