Image Generation - Visual Narrative Storytelling using Stable Diffusion

Overview

This week, I focused on refining the model for a stable diffusion-based project aimed at generating a visual narrative. The project involved using a diffusion-based pipeline to create a sequence of “cartoonish, monochromatic, hand-drawn” style images, based on a set of 10 prompts that form a short story. The story revolves around a character named John, who enters a dark mansion, encounters a ghost, and runs away. The generated images were saved as .jpg files and displayed in sequence, each accompanied by text explaining the characters’ thoughts or dialogue.

Stable Diffusion 10 Generated Scenes

The Story Setup

The story is structured around a simple narrative with the following scenes:

A rustic key with ornate engravings, cartoonish style, monochromatic, hand drawing
A map of a mansion, cartoonish style, monochromatic, hand drawing
Man standing in front of a mansion, cartoonish style, monochromatic, hand drawing
Man entering a mansion, cartoonish style, monochromatic, hand drawing
A dark room with a locked treasure chest, cartoonish style, monochromatic, hand drawing
A coin, cartoonish style, monochromatic, hand drawing
A scary cute ghost, cartoon style, monochromatic, hand drawing
Man’s shocked face, cartoon style, monochromatic, hand drawing
A man running away from a dark room, cartoon style, monochromatic, hand drawing
Man running away from a mansion, cartoon style, monochromatic, hand drawing

Each scene was crafted to evoke a particular atmosphere and emotion, using the prompts to guide the stable diffusion model in generating the corresponding images.

I displayed the following descriptions under the images to describe the images generated for each scene:

John: This key… It must open something important.
John: The map shows a treasure in the mansion. I must find it!
John: This place looks old and haunted, but the treasure awaits!
John: The door’s open… Here I go!
John: I bet there’s a treasure in that chest!
John: This coin… It must be a clue.
Ghost: Huahuahuahauha! Did you want my treasure?
John: I wasn’t expecting a ghost! What have I gotten into?
John: Oh no, a ghost! I need to get out!
John: That was close! Never again!

Generating the Images

Code Overview

The code begins by importing the necessary libraries and setting up the diffusion-based pipeline. The scenes list contains 10 prompts that describe the key moments in the story. Here’s an outline of the code process:

Library Imports: Importing the necessary libraries for stable diffusion and image generation.
Pipeline Setup: Setting up the diffusion-based pipeline to handle the generation of “cartoonish, monochromatic, hand-drawn” style images.
Prompt List: Creating a list of prompts that describe each scene of the story.
Image Generation: Iterating through the prompts and generating the corresponding images.
Saving Images: Saving the generated images as .jpg files.
Displaying Images: Displaying the images in sequence, along with the associated text.

Code

You can view and run the code using the link below:

Google Drive Link to Code

!pip install diffusers==0.11.1
!pip install transformers scipy ftfy accelerate

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe = pipe.to("cuda") # move to GPU to run faster (although I found this made no difference)

# create a story
scenes = [
    "A rustic key with ornate engravings, cartoonish style, monochromatic, hand drawing",
    "A map of a mansion, cartoonish style, monochromatic, hand drawing",
    "Man standing in front of a mansion, cartoonish style, monochromatic, hand drawing",
    "Man entering a mansion, cartoonish style, monochromatic, hand drawing",
    "A dark room with a locked treasure chest, cartoonish style, monochromatic, hand drawing",
    "A coin, cartoonish style, monochromatic, hand drawing",
    "A scary cute ghost, cartoon style, monochromatic, hand drawing",
    "Man's shocked face, cartoon style, monochromatic, hand drawing",
    "A man running away from a dark room, cartoon style, monochromatic, hand drawing",
    "Man running away from a mansion, cartoon style, monochromatic, hand drawing"
]

texts = [
    "John: This key... It must open something important.",
    "John: The map shows a treasure in the mansion. I must find it!",
    "John: This place looks old and haunted, but the treasure awaits!",
    "John: The door's open... Here I go!",
    "John: I bet there's a treasure in that chest!",
    "John: This coin... It must be a clue.",
    "Ghost: Huahuahuahauha! Did you want my treasure?",
    "John: I wasn't expecting a ghost! What have I gotten into?",
    "John: Oh no, a ghost! I need to get out!",
    "John: That was close! Never again!"
]

# generating story
for i in range(len(scenes)):
  prompt = scenes[i]
  image = pipe(prompt).images[0]  # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)

  # Now to display a single image you can save it as:
  image.save(scenes[i]+".jpg")

During the process, I noticed that the text below one image was overlapping with the adjacent image’s text. To fix this issue, I adjusted the image sizes, ensuring that each image had enough space for its associated text without interfering with neighboring images.

Additional Work: Blended Space for Dinoman

Earlier in the week, I explored the concept of blended spaces by creating a “Dinoman” character, a blend of a dinosaur and a businessman. This involved conceptualizing the blended space using a diagram and generating several image experiments to visualize the character.

Blended Space Diagram and Image Experiments

The blended space diagram visually represents the conceptual blend between the two input spaces—dinosaur and businessman—leading to the creation of the Dinoman character.

Using Stable Diffusion I generated various versions of the Dinoman character in different styles:

Blended Space Diagram

Reflection

This week’s work allowed me to explore the power of stable diffusion in creating cohesive visual narratives. The process of refining the model, crafting prompts, and adjusting the final output highlighted the importance of attention to detail in AI-driven storytelling. Additionally, the blended space exploration with Dinoman provided valuable insights into how characters can be visualized through a combination of conceptual blending and AI-generated art.

I must add that I do not think that AI image generation should be used in commercial products because the model may have been trained on many artists’ works without their consent and I don’t consider this to be very ethical.

Video Explanation

Below is a short video where I walk through the code, explain the stable diffusion process, and discuss the visual storytelling approach.

Image Generation - Visual Narrative Storytelling using Stable Diffusion

Overview

The Story Setup

Generating the Images

Code Overview

Code

Additional Work: Blended Space for Dinoman

Blended Space Diagram and Image Experiments

Reflection

Video Explanation

Related Posts

Exploring Text Generation with The Iliad - Training and Evaluating AI Models

Image Classifier - Hand Shadow Puppet AI Application using p5.js

National Space Centre Game

Python Galaxy Visualisation

Overview

The Story Setup

Generating the Images

Code Overview

Code

Adjustments and Refinements

Additional Work: Blended Space for Dinoman

Blended Space Diagram and Image Experiments

Reflection

Video Explanation

Related Posts

Exploring Text Generation with The Iliad - Training and Evaluating AI Models

Image Classifier - Hand Shadow Puppet AI Application using p5.js

National Space Centre Game

Python Galaxy Visualisation