Tag Archives: computer vision

Photo Mosaics with SciKit, AWS, and KDTree

I’m not sure why, but a few weeks ago I got the bug to make photo mosaics. They’re quite fun. I planned out the simple infrastructure on a commute home from work. I found a repo that got me started. Then, my friend Google Gemini and I started to make changes. Why didn’t I write improvements? Because we need to see the results first.

Infrastructure, Tiles, and Starter Code

I created my own repo here. It is a fork for codebox/mosaic, a public Python repo. (Credit to (codebox, MrEbbinghuas, and Hatns for the great project.) I forked the repo, and added some quality of life tools, such as scripts to handle uploading, the splicing of videos to photos, and then photos to mosaic tiles etc.

The infrastructure was basic. I ran the code on an AWS Ec2 instance, originally a c7a.2xLarge (8 cpus, 32gb of memory), but later had to upgrade to a r7a.2xLarge (8 cpus, 64gb of memory) because OOM process kills. I also went richie-rich with an io2 drive, and provisioned over 30k iops. I regret nothing. No other infrastructure was required.

To generate tiles, I gathered videos from the Library of Congress, NASA, and public domain films. I added videos from my personal iPhone. The script, splicer.py, sampled the videos at 1 frame per second, and each frame became two different square tiles. This produced about 190k tiles. For context, there are about 170k frames in an average-length feature film.

Photos to Start

What will my mosaics look like, and how long will it take? Here are three start photos. All photos were resized to 600px on the largest side before any processing.

Aurora Luna Wynterstarr, from the DnD 5e Campaign “Strixhaven: a Curriclum of Choas”
My Cousin Ariel’s adorable Cat
Owl Coffee Mug with Latte!

Results of Original Code

The original code from codebox/mosaic produced the following images. Each photo took about 14 minutes to complete. The total processing time was 1 hour and 13 minutes. I processed a total of five photos, but the others aren’t great for the blog.

These photos came off incredibly crisp, smooth, and a bit pixelated/clumpy. We can see up close that in many cases, the same image is used over and over.

Additionally the original code, scans all 190k possible tiles when comparing it a portion of the original. Is there a way to get a prune the mosaic tiles down tile-to-be-replaced of the original photo?

Add KD Tree and SSIM

I significantly reorganized the code. I asked my friend Google Gemini if there would be a way to organize the tiles by average pixel color into buckets. From there, you could take the average color of the tile to be replaced against a smaller set of tiles, rather than all 190k. I asked it if there was something better than a Python dictionary for such a large amount of data to do it.

The suggestion was Scipy’s KDTree it indexes the mosaic tiles using their average color. Then, I can query it for a list of mosaic tiles that are k distance from that average. It does so in three dimensions (RGB), which I suppose why it is call a query_ball_tree.

The original code relied on a highly procedural method in the TileFitter class to compare the similarity between the tile-to-be-replaced and each of the 190k mosaic tiles. I replaced this with TillFitterSciKit, which used a tool from scikit-image to discover similarity.

The Structural Similarity Image Model (SSIM) returns a value between 0.0 and 1.0, which 1.0 being a perfect math. In these experiments, the average of the best tile found was about 0.43

Both the original code and my fork required mosaic tiles to be processed before matching. In my fork, I process the mosaic tiles once, and then store the results in a cached .gz file on the hard drive to speed things along.

My hope was a faster processing time at minimal expense in quality.

Results with KDTree

We did much better with time on these images. The tile caching process took about 5 minutes. The images took about 90 seconds each to complete. The entire process took about fifteen minutes for five images, the caching process included.

But how do they look?

Aurora has freckles now?
Cat has some green?

Mosaic tiles are clumpy. Less clumpy. But still clumpy.

So did things improve? I’m not sure because that is an aesthetic judgment. Things got faster, but also odd. The KDtree refines the search per tile-to-be-replaced with k mosaic tiles whose average color is the k closest or matches its average color. Wouldn’t we expect the best match out of 190k to also be within k distance from the average color of the original frame? That doesn’t appear to be the case though. More on that later.

Reducing Repetition

I did come up with a way to reduce the repetition of images. Remember that the match is float between 0.0 and 1.0, with 1.0 being a prefect match.

I added code that did this. All mosaic tiles start with a penalty of 0.0. When a mosaic is evaluated, subtract its current penalty from its actual SSIM score. When a mosaic tile is declared the best match, add a small value to its penalty (e.g. 0.03).

This means each time a tile is declared a ‘best match’, it has a higher threshold to be the ‘best match’ when it is evaluated again. The higher the value, the messier the mosaics looked.

E.g. here are some mosaics run with .05

And here are the results with a penalty of .15

Which looks right? That again is up to anyone’s visual taste. But we know for sure which ones are less clumpy and pixelated looking.

So why didn’t the KDTree pruning work?

To reiterate, the KDTree organizes mosaic tiles by average color. When we compare mosaic tiles to tiles-to-be-replaced, we take the average color of the later. We then grab mosaic tiles that are k distance from that. It is intuitive to expect the best match of tile-to-be-replaced to be close or exact in average color. This whole process is to speed up the matching work (accomplished) with minimal degradation in quality (not so much accomplished).

Put simply, why do things look like a better match in the codebox/mosaic code, than the code that used the KDTree? Here are few possibilities (and I genuinely do not know the answer).

First, “average color” might not be the right metric to organize frames by. A cloudy grey sky over an ocean might produce the same “average color” as a black and white flag with a thin turquoise line in in the middle. Aurora Lune Wynterstarr’s new freckles suggest this might be the case.

Secondly, the way I get “average color” is rather crude, simple, and the first thing my buddy Gemini told me to try. In their defense, they didn’t suggest some other methods. Maybe it’s time to find a better average.

Third, the method of grabbing the average is good, but the k distance is not high enough. It might be the case that I have many near-perfect matches of an average color, but the code excludes what SSIM would rate the best match. I find this unlikely, but not impossible.

Finally, the scikit-image SSIM might not be the right method of comparison, or it needs to be tweaked. It’s entirely possibly that this SSIM produces a lower quality result than the plodding, procedural, comparison of the original code.

I won’t know the answer to these until I experiment more, and as the experimenting runs up my AWS charges, I doubt I’ll visit it again anytime soon.

Until then, I’m happy with the state of my mosaic project.