It turns out that the “Fixed Diffusion” AI image generation can actually achieve quite excellent image compression | ‹ Economy web portal / IT news

Stable Diffusion, released to the public in August 2022, is an AI that automatically generates images according to the words you enter. Software engineer Matthew Bühlmann explains how to use such Fixed Diffusion not only as image generation AI but also as a powerful lossy image compression codec.
Fixed Diffusion Based Image Compression | by Matthias Bühlmann | September, 2022 | Medium
https://matthias-buehlmann.medium.com/stable-diffusion-based-image-compression-6f1f0a399202
All the following images are actually images compressed to 512×512 pixels, taken in the city of San Francisco. The first is JPEG format, the second is WebP format, and the third is Fixed Diffusion compression.

Candy shop display. The first is JPEG format, the second is WebP format, and the third is Fixed Diffusion compression.

Alpaca face. The first is JPEG format, the second is WebP format, and the third is Fixed Diffusion compression.

In both cases, we can see that compression using Stable Diffusion has the least noise and the smallest file size after compression.
A Variable Auto Encoder (VAE) encodes and decodes images from image space to some hidden space while generating images with Stable Diffusion. A hidden space representation is a low resolution (64×64 pixels), high definition (4×32 bits) representation of any source image (512×512 pixels in 3×8 or 4×8 bits).
For example, the following 512 × 512 pixel 24 bpp image……

Encoding to a 64×64 pixel 128 bpp image with VAE looks like this.

And the following is the encoded image decoded to a 512 x 512 pixel, 24 bpp image. At first glance, it looks like it has just returned to its original state, but in fact there is a bit of loss, like the letters written on the alpaca’s collar becoming a bit difficult to read.

In order to use Stable Diffusion as an image compression codec, Mr. Bühlmann how efficiently the latent image representation generated by VAE can be compressed. Subsampling the latent image or using an existing lossy image compression method resulted in significant degradation of the reconstructed image. On the other hand, VAE decoding seems to be found to be very robust against latent image quantization.
By quantizing the hidden image to 8 bits with VAE, the data size of the image representation is 64 x 64 x 4 x 8 bits = 131,072 bits = about 16.4 kB. Using 256 palette representations using Floyd-Steinberg convolution, the data size is 64 x 64 x 8 bits + 256 x 4 x 8 bits = 40,960 bits = about 5.12 kB.
Floyd-Steinberg divergence introduces noise into the hidden palette, which distorts the decoded result. However, since Stable Diffusion is a mechanism that removes noise from the hidden image, the decoding result will be close to the original image by performing several versions.
However, Stable Diffusion has the property that it affects the content of the image itself rather than the quality of the image. Since version 1.4 of Stable Diffusion cannot capture small characters and faces in the hidden space, even if the image quality appears beautiful at first glance, it may actually be a completely different image. Bühlmann says that if this problem is solved in version 1.5, the usefulness of Stable Diffusion as an image compression algorithm will further increase.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.