· Computational Creativity · 5 min read
Image Generation - Visual Narrative Storytelling using Stable Diffusion
This week, I used stable diffusion techniques to generate a series of images that tell a short story about a man named John who enters a dark mansion. The process involved refining the model, creating prompts, and adjusting image sizes to ensure a cohesive visual narrative.
Overview
This week, I focused on refining the model for a stable diffusion-based project aimed at generating a visual narrative. The project involved using a diffusion-based pipeline to create a sequence of “cartoonish, monochromatic, hand-drawn” style images, based on a set of 10 prompts that form a short story. The story revolves around a character named John, who enters a dark mansion, encounters a ghost, and runs away. The generated images were saved as .jpg files and displayed in sequence, each accompanied by text explaining the characters’ thoughts or dialogue.
The Story Setup
The story is structured around a simple narrative with the following scenes:
- A rustic key with ornate engravings, cartoonish style, monochromatic, hand drawing
- A map of a mansion, cartoonish style, monochromatic, hand drawing
- Man standing in front of a mansion, cartoonish style, monochromatic, hand drawing
- Man entering a mansion, cartoonish style, monochromatic, hand drawing
- A dark room with a locked treasure chest, cartoonish style, monochromatic, hand drawing
- A coin, cartoonish style, monochromatic, hand drawing
- A scary cute ghost, cartoon style, monochromatic, hand drawing
- Man’s shocked face, cartoon style, monochromatic, hand drawing
- A man running away from a dark room, cartoon style, monochromatic, hand drawing
- Man running away from a mansion, cartoon style, monochromatic, hand drawing
Each scene was crafted to evoke a particular atmosphere and emotion, using the prompts to guide the stable diffusion model in generating the corresponding images.
I displayed the following descriptions under the images to describe the images generated for each scene:
- John: This key… It must open something important.
- John: The map shows a treasure in the mansion. I must find it!
- John: This place looks old and haunted, but the treasure awaits!
- John: The door’s open… Here I go!
- John: I bet there’s a treasure in that chest!
- John: This coin… It must be a clue.
- Ghost: Huahuahuahauha! Did you want my treasure?
- John: I wasn’t expecting a ghost! What have I gotten into?
- John: Oh no, a ghost! I need to get out!
- John: That was close! Never again!
Generating the Images
Code Overview
The code begins by importing the necessary libraries and setting up the diffusion-based pipeline. The scenes list contains 10 prompts that describe the key moments in the story. Here’s an outline of the code process:
- Library Imports: Importing the necessary libraries for stable diffusion and image generation.
- Pipeline Setup: Setting up the diffusion-based pipeline to handle the generation of “cartoonish, monochromatic, hand-drawn” style images.
- Prompt List: Creating a list of prompts that describe each scene of the story.
- Image Generation: Iterating through the prompts and generating the corresponding images.
- Saving Images: Saving the generated images as
.jpg
files. - Displaying Images: Displaying the images in sequence, along with the associated text.
Code
You can view and run the code using the link below:
!pip install diffusers==0.11.1
!pip install transformers scipy ftfy accelerate
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe = pipe.to("cuda") # move to GPU to run faster (although I found this made no difference)
# create a story
scenes = [
"A rustic key with ornate engravings, cartoonish style, monochromatic, hand drawing",
"A map of a mansion, cartoonish style, monochromatic, hand drawing",
"Man standing in front of a mansion, cartoonish style, monochromatic, hand drawing",
"Man entering a mansion, cartoonish style, monochromatic, hand drawing",
"A dark room with a locked treasure chest, cartoonish style, monochromatic, hand drawing",
"A coin, cartoonish style, monochromatic, hand drawing",
"A scary cute ghost, cartoon style, monochromatic, hand drawing",
"Man's shocked face, cartoon style, monochromatic, hand drawing",
"A man running away from a dark room, cartoon style, monochromatic, hand drawing",
"Man running away from a mansion, cartoon style, monochromatic, hand drawing"
]
texts = [
"John: This key... It must open something important.",
"John: The map shows a treasure in the mansion. I must find it!",
"John: This place looks old and haunted, but the treasure awaits!",
"John: The door's open... Here I go!",
"John: I bet there's a treasure in that chest!",
"John: This coin... It must be a clue.",
"Ghost: Huahuahuahauha! Did you want my treasure?",
"John: I wasn't expecting a ghost! What have I gotten into?",
"John: Oh no, a ghost! I need to get out!",
"John: That was close! Never again!"
]
# generating story
for i in range(len(scenes)):
prompt = scenes[i]
image = pipe(prompt).images[0] # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)
# Now to display a single image you can save it as:
image.save(scenes[i]+".jpg")
Adjustments and Refinements
During the process, I noticed that the text below one image was overlapping with the adjacent image’s text. To fix this issue, I adjusted the image sizes, ensuring that each image had enough space for its associated text without interfering with neighboring images.
Additional Work: Blended Space for Dinoman
Earlier in the week, I explored the concept of blended spaces by creating a “Dinoman” character, a blend of a dinosaur and a businessman. This involved conceptualizing the blended space using a diagram and generating several image experiments to visualize the character.
Blended Space Diagram and Image Experiments
The blended space diagram visually represents the conceptual blend between the two input spaces—dinosaur and businessman—leading to the creation of the Dinoman character.
Using Stable Diffusion I generated various versions of the Dinoman character in different styles:
Reflection
This week’s work allowed me to explore the power of stable diffusion in creating cohesive visual narratives. The process of refining the model, crafting prompts, and adjusting the final output highlighted the importance of attention to detail in AI-driven storytelling. Additionally, the blended space exploration with Dinoman provided valuable insights into how characters can be visualized through a combination of conceptual blending and AI-generated art.
I must add that I do not think that AI image generation should be used in commercial products because the model may have been trained on many artists’ works without their consent and I don’t consider this to be very ethical.
Video Explanation
Below is a short video where I walk through the code, explain the stable diffusion process, and discuss the visual storytelling approach.