figured it was a good idea to document the image encoding format i’m currently using in veadotube (it’s not used in any currently available versions, though)! it’s a simple delta-encoding format, hence the name.
each channel is encoded separately, compressing delta values with prefix codes. the compressed image is in RGB with alpha, with all channels having 8-bit depth (0 to 255).
in order to compress a channel, it must first take the original values and apply the difference between the previous value and the current, keeping the first value intact. like this:
original: 6 7 8 7 11 14 14 14 14 14 delta: 6 1 1 -1 4 3 0 0 0 0
these deltas are stored as signed bytes, meaning that they range from -128 to 127 and take advantage of byte overflows/underflows:
original: 0 255 0 240 delta: 0 -1 1 -16
the delta values are then encoded as prefix codes of variable bit length:
abs value | sequence x = 0 | 1 0 x = 1 ~ 2 | 1 1 [1 bit, x - 1] [sign bit] x = 3 ~ 4 | 0 1 0 [1 bit, x - 3] [sign bit] x = 5 ~ 8 | 0 1 1 [2 bits, x - 5] [sign bit] x = 9 ~ 16 | 0 0 1 0 [3 bits, x - 9] [sign bit] x = 17 ~ 32 | 0 0 1 1 [4 bits, x - 17] [sign bit] x = 33 ~ 64 | 0 0 0 1 0 [5 bits, x - 33] [sign bit] x = 65 ~ 128 | 0 0 0 1 1 [6 bits, x - 65] [sign bit]
the sign bit is 1 for positive, and 0 for negative. there’s also a special bit sequence that must be used when 0 is repeated at least 7 times, with a maximum value of 262:
0 0 0 0 [8 bits, x - 7]
it’s important to note that bit sequences are read/written from the least significant bit to most.
each encoded channel is padded by one byte, meaning that they have an exact byte length, instead of storing bits from more than one channel in one byte.
here’s the entire data buffer itself:
- start offset of the encoded R channel, 4 bytes - start offset of the encoded G channel, 4 bytes - start offset of the encoded B channel, 4 bytes - encoded alpha channel - encoded R channel - encoded G channel - encoded B channel
a few notes:
- it doesn’t store image dimensions, as those are stored elsewhere in the avatar file
- the offset values are stored as little-endian
- RGB channels are encoded by skipping the pixels where alpha = 0
- decoding of RGB channels can be threaded, with the alpha channel being decoded first beforehand so alpha = 0 pixels can be skipped
- encoding, on the other hand, can be fully threaded :] just encode each channel in their own buffer, and then combine them in the final buffer.
i employed this encoding as i needed something to replace the previous solution i went with; the image in the avatar file used to be stored as the original image file. this meant that if the user imported a PNG file, that same PNG file is kept intact in the avatar file, so everytime an avatar is loaded the entire image file needed to be parsed again, which can be slow.
images in memory also took a lot of space. i had two options here:
- storing the image in the GPU memory only, meaning that most animated images take a lot of GPU space, as a lot of those have repeated frames, and data deduplication (aka only storing one copy of the same thing) in this scenario is pretty hard to achieve in Unity
- storing the image both in the GPU and CPU memory, so data deduplication is possible. but also keeping an entire raw image in CPU is heavy as hell
the latter option is what mini 1.4 does, and it’s not ideal, honestly – it’s partly why mini currently has a 2048x2048 limit, so that people don’t innocently crash veado with heavy images.
so the idea was to have an encoding that followed the following requisites:
- simple enough so it can be compressed & decompressed fairly quickly
- optimised for 2D transparent sprites, as that’s the common vtuber scenario
- resulting file size comparable to PNG and the likes, considering the above item – this way it can be saved with the avatar file
i tried to go with QOI, which works for most purposes, but i realised that i wanted to take advantage of threading. so i figured i’d go through the “write my own encoding” route, which isn’t as insane as it sounds – as i’m working through image file decoding, most project files (Photoshop, MediBang, SAI, whatnot) roll out their own solutions for encoding images for their own purposes (and most do delta encoding as well!)
in the end, VeadoDelta works quite well! it seemed to achieve the file size goal from my tests (remind me to properly post all the tests here! i don’t have them with me right now), so it’s what i’m using for future versions of veado :]
it has one caveat: in the worst case (white pixel followed by grey pixel and so on), an encoded channel takes up 1.25x the original size. that rarely happens, but in the case of the encoding happening to the same size of the raw image or larger, it simply throws away the encoding and uses the raw image instead. thus, before decoding, the program must check if the data is smaller than what the raw image would take – if so, decode! otherwise, just use the buffer straight away.
it’s also honestly surprising how image processing is taking a lot more of my time than i expected, but i guess it makes sense, as i’m making an app that in its essence simply puts images together!