Skip to content

Lesson 28 - Integrating with AI

Today, GPT 4o (gpt-image-1) and Nano banana (gemini-2.5-flash-image) have significantly lowered the barrier to image editing. From a human-computer interaction perspective, the combination of chat interfaces and canvas is becoming increasingly popular. Chat history with models naturally reflects the modification history of images, while freely draggable canvas makes image selection and parallel processing natural. For more details, see UI for AI.

The image below shows Lovart's product interface, which uses Konva.js mentioned in our Lesson 21 - Transformer as the underlying technology. Although primarily focused on image editing, it doesn't abandon common features from graphic editors, such as the layer list hidden by default in the bottom left corner, and the left toolbar can also insert some basic shapes.

Lovart

Recraft is also testing chat functionality. In my observation, canvas and chat are becoming the two main entry points for this type of editor:

Recraft chat

In this lesson, we'll combine with Nano banana to enrich our image editing functionality.

Integrating Models

To use Nano banana, I chose fal.ai over Google's official generative-ai. The reason is that a unified API makes it easier for me to compare the effects of other image generation models, such as qwen-image-edit or FLUX.1 Kontext.

There are many other aggregated SDKs like OpenRouter. Taking the image generation interface as an example, you only need to pass in a prompt to receive the URL for the generated image and the original model text response:

ts
import { fal } from '@fal-ai/client';

const result = await fal.subscribe('fal-ai/gemini-25-flash-image', {
    input: {
        prompt: '',
    },
});
console.log(result.data); // { image: [{ url: 'https://...' }]; description: 'Sure, this is your image:' }

The image edit API also accepts a set of image URLs as parameters. Even when passing encoded DataURLs, warnings like “Unable to read image information” may still appear. Therefore, fal.ai provides a file upload interface, allowing us to enable uploads when local images are added to the canvas.

API Design

We require an API responsible for generating and modifying images. In both scenarios, the parameters should be identical: a prompt and a list of reference images.

ts
import { fal } from '@fal-ai/client';

api.createOrEditImage = async (
    isEdit: boolean,
    prompt: string,
    image_urls: string[],
): Promise<{ images: { url: string }[]; description: string }> => {
    const result = await fal.subscribe(
        isEdit
            ? 'fal-ai/gemini-25-flash-image/edit'
            : 'fal-ai/gemini-25-flash-image',
        {
            input: {
                prompt,
                image_urls,
            },
        },
    );
    return result.data;
};

Chatbox

The chat box provides another starting point beyond the canvas.

Remove background

Double click image to enter edit mode:

ts
private async removeBackground() {
    this.removingBackground = true;
    const { images } = await createOrEditImage(
        true,
        'Remove background from the image',
        [this.node.fill],
    );
    if (images.length > 0) {
        this.api.runAtNextTick(() => {
        this.api.updateNode(newImage, { fill: images[0].url });

        this.api.record();
        this.removingBackground = false;
        });
    }
}

Inpainting

Suitable for erasing or modifying selected existing objects in an image while ensuring other parts remain unchanged.

https://www.recraft.ai/docs#inpaint-image

Inpainting replaces or modifies specific parts of an image. It uses a mask to identify the areas to be filled in, where white pixels represent the regions to inpaint, and black pixels indicate the areas to keep intact, i.e. the white pixels are filled based on the input provided in the prompt.

When users draw a closed area using a simple editor, it needs to be converted into a mask parameter to pass to the API. This mask is essentially a grayscale image:

inpainting in gpt-4o

This is where the importance of editors becomes apparent. Even simple editing features have value. Recraft mentions three points: https://www.recraft.ai/blog/inpainting-with-ai-how-to-edit-images-with-precision-using-recraft

  1. Ease of zooming in and out - After all, it's a precision operation, so canvas zooming is crucial.
  2. AI inpainting using segmentation models like SAM automatically
  3. Creative flexibility

Create mask

We offer multiple interactive methods for users to generate masks:

  1. Lesson 26 - Selection tool
  2. Lesson 25 - Drawing mode and brush

Using SAM via WebGPU

In addition to allowing users to define the modification area as precisely as possible, it would be even better if area selection could be accomplished through simpler methods, such as clicking to select.

Smart select in Midjourney

In Lesson 1 - Hardware abstraction layers, we introduced the advantages of WebGPU (Figma also recently upgraded its rendering engine). Beyond rendering, it makes browser-side GPGPU possible with Compute Shader support.

Image Segmentation in the Browser with Segment Anything Model 2

Combining Multiple Images

Using canvas allows us to obtain additional positional information about images, which is often difficult to describe with language. For example, we can drag a teacup to any position on a desktop and composite an image.

Outpainting

This feature doesn't have a corresponding API implementation from OpenAI yet. Let's first see how Recraft does it. https://www.recraft.ai/blog/ai-outpainting-how-to-expand-images

Outpainting allows users to expand an image beyond its original frame — especially useful for completing cropped images or adding more background scenery.

Suitable for keeping selected objects in the image unchanged, such as changing the background:

Outpainting in Recraft

Or expanding outward:

Outpainting in Recraft

Currently, GPT 4o only supports three fixed sizes, while Nano banana needs some hack methods to achieve arbitrary image size output, such as passing in a blank image of a specified size as a reference and emphasizing it in the prompt. We can make this very natural through canvas operations: users only need to drag to the appropriate size, and the application automatically generates this blank reference image through the Canvas API.

Layer separation

Raster to vector

Many online and open-source tools offer solutions based on traditional image processing:

However, this approach does not yield satisfactory results for text processing:

Raster to vector in lottiefiles. source: https://lottiefiles.com/tools/raster-to-vector

The reason is that this algorithm is typically divided into the following stages, with the first stage not distinguishing between text and graphics suitable for vectorization:

  1. “Path walking” converts pixels into paths
  2. Paths are simplified into polygons
  3. Attempts are made to smooth the polygons
source: https://www.visioncortex.org/vtracer-docs#path-walking

Split background and text

First, use an OCR-like tool to identify text regions and generate a mask. Then, remove the mask and have the model regenerate the image through a standard inpainting process to obtain a background image without text.

FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing

text editing with flux-text

Font recognition

Next, we need to identify the style attributes such as font and font size within the text area.

TextStyleBrush: Transfer of Text Aesthetics from a Single Example

Adobe Photoshop provides Match fonts:

Select a font from the list of similar fonts in the Match Fonts dialog box

whatfontis provides a public API that matches the closest font in its font library to a specified area within an image.

json
[
    {
        "title": "Abril Fatface",
        "url": "https://www.whatfontis.com/FF_Abril-Fatface.font",
        "image": "https://www.whatfontis.com/img16/A/B/FF_Abril-FatfaceA.png"
    }
]

Finally, overlay all the layers.

MCP

MCP: What It Is and Why It Matters

Instead of only having a GUI or API that humans use, you get an AI interface “for free.” This idea has led to the concept of “MCP-first development”, where you build the MCP server for your app before or alongside the GUI.

Figma MCP Server can manipulate Figma API.

Released under the MIT License.