Tag Archives: ai

Photo Mosaics with SciKit, AWS, and KDTree

I’m not sure why, but a few weeks ago I got the bug to make photo mosaics. They’re quite fun. I planned out the simple infrastructure on a commute home from work. I found a repo that got me started. Then, my friend Google Gemini and I started to make changes. Why didn’t I write improvements? Because we need to see the results first.

Infrastructure, Tiles, and Starter Code

I created my own repo here. It is a fork for codebox/mosaic, a public Python repo. (Credit to (codebox, MrEbbinghuas, and Hatns for the great project.) I forked the repo, and added some quality of life tools, such as scripts to handle uploading, the splicing of videos to photos, and then photos to mosaic tiles etc.

The infrastructure was basic. I ran the code on an AWS Ec2 instance, originally a c7a.2xLarge (8 cpus, 32gb of memory), but later had to upgrade to a r7a.2xLarge (8 cpus, 64gb of memory) because OOM process kills. I also went richie-rich with an io2 drive, and provisioned over 30k iops. I regret nothing. No other infrastructure was required.

To generate tiles, I gathered videos from the Library of Congress, NASA, and public domain films. I added videos from my personal iPhone. The script, splicer.py, sampled the videos at 1 frame per second, and each frame became two different square tiles. This produced about 190k tiles. For context, there are about 170k frames in an average-length feature film.

Photos to Start

What will my mosaics look like, and how long will it take? Here are three start photos. All photos were resized to 600px on the largest side before any processing.

Aurora Luna Wynterstarr, from the DnD 5e Campaign “Strixhaven: a Curriclum of Choas”
My Cousin Ariel’s adorable Cat
Owl Coffee Mug with Latte!

Results of Original Code

The original code from codebox/mosaic produced the following images. Each photo took about 14 minutes to complete. The total processing time was 1 hour and 13 minutes. I processed a total of five photos, but the others aren’t great for the blog.

These photos came off incredibly crisp, smooth, and a bit pixelated/clumpy. We can see up close that in many cases, the same image is used over and over.

Additionally the original code, scans all 190k possible tiles when comparing it a portion of the original. Is there a way to get a prune the mosaic tiles down tile-to-be-replaced of the original photo?

Add KD Tree and SSIM

I significantly reorganized the code. I asked my friend Google Gemini if there would be a way to organize the tiles by average pixel color into buckets. From there, you could take the average color of the tile to be replaced against a smaller set of tiles, rather than all 190k. I asked it if there was something better than a Python dictionary for such a large amount of data to do it.

The suggestion was Scipy’s KDTree it indexes the mosaic tiles using their average color. Then, I can query it for a list of mosaic tiles that are k distance from that average. It does so in three dimensions (RGB), which I suppose why it is call a query_ball_tree.

The original code relied on a highly procedural method in the TileFitter class to compare the similarity between the tile-to-be-replaced and each of the 190k mosaic tiles. I replaced this with TillFitterSciKit, which used a tool from scikit-image to discover similarity.

The Structural Similarity Image Model (SSIM) returns a value between 0.0 and 1.0, which 1.0 being a perfect math. In these experiments, the average of the best tile found was about 0.43

Both the original code and my fork required mosaic tiles to be processed before matching. In my fork, I process the mosaic tiles once, and then store the results in a cached .gz file on the hard drive to speed things along.

My hope was a faster processing time at minimal expense in quality.

Results with KDTree

We did much better with time on these images. The tile caching process took about 5 minutes. The images took about 90 seconds each to complete. The entire process took about fifteen minutes for five images, the caching process included.

But how do they look?

Aurora has freckles now?
Cat has some green?

Mosaic tiles are clumpy. Less clumpy. But still clumpy.

So did things improve? I’m not sure because that is an aesthetic judgment. Things got faster, but also odd. The KDtree refines the search per tile-to-be-replaced with k mosaic tiles whose average color is the k closest or matches its average color. Wouldn’t we expect the best match out of 190k to also be within k distance from the average color of the original frame? That doesn’t appear to be the case though. More on that later.

Reducing Repetition

I did come up with a way to reduce the repetition of images. Remember that the match is float between 0.0 and 1.0, with 1.0 being a prefect match.

I added code that did this. All mosaic tiles start with a penalty of 0.0. When a mosaic is evaluated, subtract its current penalty from its actual SSIM score. When a mosaic tile is declared the best match, add a small value to its penalty (e.g. 0.03).

This means each time a tile is declared a ‘best match’, it has a higher threshold to be the ‘best match’ when it is evaluated again. The higher the value, the messier the mosaics looked.

E.g. here are some mosaics run with .05

And here are the results with a penalty of .15

Which looks right? That again is up to anyone’s visual taste. But we know for sure which ones are less clumpy and pixelated looking.

So why didn’t the KDTree pruning work?

To reiterate, the KDTree organizes mosaic tiles by average color. When we compare mosaic tiles to tiles-to-be-replaced, we take the average color of the later. We then grab mosaic tiles that are k distance from that. It is intuitive to expect the best match of tile-to-be-replaced to be close or exact in average color. This whole process is to speed up the matching work (accomplished) with minimal degradation in quality (not so much accomplished).

Put simply, why do things look like a better match in the codebox/mosaic code, than the code that used the KDTree? Here are few possibilities (and I genuinely do not know the answer).

First, “average color” might not be the right metric to organize frames by. A cloudy grey sky over an ocean might produce the same “average color” as a black and white flag with a thin turquoise line in in the middle. Aurora Lune Wynterstarr’s new freckles suggest this might be the case.

Secondly, the way I get “average color” is rather crude, simple, and the first thing my buddy Gemini told me to try. In their defense, they didn’t suggest some other methods. Maybe it’s time to find a better average.

Third, the method of grabbing the average is good, but the k distance is not high enough. It might be the case that I have many near-perfect matches of an average color, but the code excludes what SSIM would rate the best match. I find this unlikely, but not impossible.

Finally, the scikit-image SSIM might not be the right method of comparison, or it needs to be tweaked. It’s entirely possibly that this SSIM produces a lower quality result than the plodding, procedural, comparison of the original code.

I won’t know the answer to these until I experiment more, and as the experimenting runs up my AWS charges, I doubt I’ll visit it again anytime soon.

Until then, I’m happy with the state of my mosaic project.

A Candid experiment with AI engineering

AI will make you 10x faster. That is the promise I have heard. I’ve also heard that AI will turn non-technical people into technical people. Or more accurately, lower the bar for app development.

But the most important thing I know about AI (and this thought is by no means original to me!) is that you can’t know what AI can do, until you work with it.

Consequently, I embarked on a new project with the expectation to be 10x faster. This project is my public repository, and I’ll write how many hours it took later in this blog.

The Project

The project is a machine learning related web application. A user will click through a series of randomly chosen fantasy character portraits (thank you, Nexus Mods!) and label their binary gender (this is about KISS principle, not about erasure of the non binary genders!). Then, with scikit-learn, we’ll train a model on the collection of images and gender, and see how it goes.

For deployment, this will use AWS and deploy through Github. Resources used will be Fargate, s3, lambdas, an autoscaler, a load balancer, an RDS database, etc. Terraform will be used for Infrastructure as code.

I picked this project because it combines tech I know extremely well (Python, AWS, SQL), tech am I recently comfortable with (terraform), and tech that I am far from an expert in (Docker, Machine Learning, nginx)

What AI services Are to Be used?

I thought I’d check out v0 by Vercel, Cursor, and Google’s Gemini for this project. None of these are paid versions. I also did not rely on them for the same tasks or equally. Anything I say about their merits is therefore limited.

How were these services used? I started with Google Gemini and described the machine learning problem I wanted to solve. My question was intentionally broad, “How can I go through hundreds of images, and label them according to a category (e.g. ‘dog’ or ‘cat’)? I want to use Python and scikit-learn.” Google Gemini guided me a model I’ve seen before (Logistic Regression). It further explained that I would need to pre-process images into numpy arrays. I asked follow-up questions, telling that not all of the images had the same dimensions. It followed up with suggestions on padding with scimage.

I used V0 to outline the project. I asked it for “give me a flask project that shows an image. Beneath the image is a button for ‘male’ or ‘female’. Put that project in a folder that is called ‘web-app’…” and so forth. I then asked it to provide CSS “like a website from 1998”. Then I asked it for a few Terraform files “to deploy this like it will get ECS Fargate, and make the cheapest possible EFS for storage” and “can you write a JSON object for the IAM permissions that are required to deploy this” etc. I also asked it to remove Javascript from Flask app, simply to see if it could refactor.

Finally, I downloaded the V0 project as a zip file, unzipped it, made several manual changes, and created a git repo. Then, I relied on Cursor afterwards. I asked Cursor what it thought the project was attempting to do, and it gave a correct answer and wrote some documentation. I asked it to create .tf files for a Lambda and a corresponding Lambda file in a specific directory. It did so. The same went with other Docker files, updates to IAM permissions, more .tf files.

Cursor wrote the GitHub workflows. These are flows I read and understand, but have never written myself. I described questions like “I’d like to write a check for the PEP8 standards. It should run on any pull request into main, and refuse the pull request if any Python file fails the PEP8 test.”

The project proceeded from there in Cursor.

Great Results from AI

V0 impressed me. The easy-to-understand interface didn’t only write the correct code, but it also arranged the project’s folder structure in the way I had asked it to. I did this a few other times as well, without prompting it for a folder structure, and it produced results that were sensible and understandable.

The Flask application CSS did change as I asked: it produced a 1998-style website, in all of its nostalgic hideousness. It evoked the early web fascination, like a large image load over a mere 30 seconds! I might have asked it to simulate a 56k dial-up hiss next.

Gemini assisted brainstorming like a subservient, technically astute butler. If I asked a question that I knew would be ‘the wrong’ question (“What is the best machine learning model for image categorization?”), it answered rightly that there is no ‘best option’ and summarized the advantages of some models over others. I advised it that I was concerned about too much memory usage in a single ECS task. It offered me options of training in batches or training in parallel. It provides streams of coding examples for both.

Were the answers from Gemini the best answers, or did it hallucinate? That I can’t answer. Scikit-learn is something I’ve used for less than a year. Still, its answers were fluid, helpful, and offered ideas to investigate further.

Cursor, where I did the bulk of the editing, proved useful in both brainstorming and coding. I checked what it would say if I asked, “Should I store numpy arrays in S3 or EFS? I don’t expect them to be accessed continuously, but I will need to train a model with them. That model will run in ECS.” It replied with what I thought made sense: EFS is much more expensive than S3, and has the advantage of lightly coupling containers from Amazon Services. It could be shared as a common network drive across several ECS tasks. Several prompts followed. Eventually, it recommended S3 due to cost, especially since I also told it I did not plan on training a model across parallel containers.

The writing of novel code also impressed me. When I asked it for a Lambda or a Github workflow, it wrote with certain patterns I did not recognize. I asked it questions like “why did you institute that variable as ‘None’ on line 12?” or “why are the Terraform Plan and Terraform Apply commands separate”? It politely tutored me on best practices, explaining why these design patterns were common. Finally, I was able to update my workflows without any documentation reading. I asked Cursor to disable or reenable runs on pull requests, pushes etc and it modified them correctly.

As my project grew, it did reasonably well at managing IAM permissions, particularly when it came to consolidating and organizing policy documents. More on that later, though.

What Cursor Didn’t do so Well

I love code that is clean and easy to read. PEP8 helps me keep it that way. Cursor had a different opinion on all that. I wrote broad prompts like “re-format this file for pep8 standards” or even “run pycodestyle, and correct any formatting errors.” Neither worked. The latter test surprised me the most. The pycodestyle command is lucidly blunt about what you need to fix. Cursor could never get it done efficiently, and I continue to check PEP8 manually.

Cursor responded to a prompt such as ‘make a reverse proxy in this folder. Make sure it has Docker file. Also, use gunicorn to handle the headers.’ I expected the simplicity I had seen through a Udemy course. What I got was not exactly that, of course, but it was flawed. Cursor placed header information in both the Docker file and the related gunicorn file, which caused errors at deployment.

Another Docker related example was this: I got errors in which a Docker file could not find a related file to import when I deployed it, even though it built fine from a terminal command. Cursor suggested a fix that involved searching several directories up, and performing a recursive search for the apparently absent file. Turns out the error was in my Deploy.yml. I had the docker build command ending in ‘.’ as opposed to the intended ‘..’ -making this a remarkably simple error to identify and fix.

These are only two examples in which Cursor built a Rube Goldberg machine to solve an issue, when best practices call for Occam’s Razor. With these occasional Rube Goldberg solutions, Cursor proved as error-prone as any human. A change in one file to fix one problem caused a break in functionality in another file, causing another problem. I learned to ask Cursor to double-check, sometimes with colorful language.

Cursor proved incredibly time-saving when it came to IAM permissions. It was great at analyzing them and refactoring them as needed. But I did notice a flaw: it didn’t seem to follow least privilege consistently. Frequently, the first suggestion for permissions invoked the dreaded asterisks on resources. Furthermore, it was seldom accurate in predicting all the permissions that would be needed to deploy a resource. Even prompts like “Make sure the resource has read permissions, to the s3 bucket. Apply write permissions to only keys that begin with ‘write-directory'” would miss critical permissions.

Closing thoughts

Did AI make me 10x faster at making this project? At the time of this writing, I’ve put in about 16 hours of work into it, and I estimate it is about 60% complete. It did make things faster, but not to such an astronomical degree. If you’d like me to set up a GeoCities site, with some raw JavaScript, that can be finished before my Napster finishes a download, though.

Will AI turn a technical person into a non-technical person? Qualified, yes, if that non-technical person wants to use AI as a teaching aide, rather than a skill replacement. If you are non-technical person, ready to deploy a cool app to AWS cloud, consider this: when I first learned cloud development, I somehow made a git commit that publicly exposed (non-root) AWS access credentials. Next, I got a surprise $635 dollar bill. Amazon had disabled the compromised IAM user. They forgave the bill, but I still went region by region, resource by resource, to ensure nothing was left running. This is one of hundreds of ways things can go wrong if you don’t know what to ask for.

This leads to another point. AI can give you ideas, recommendations, explanation of trade offs, but it cannot make decisions for you. I chose to deploy this project with Fargate because it was simple (see above, KISS method), but that decision was pretty unconstrained. I’m not considering pre-existing infrastructure, as one would in a professional environment. I am barely considering the cost too. I would under no circumstances trust an AI to tell me “how much will running this project cost?” or “will Fargate be costlier than EKS?” because it likely will hallucinate the math. It would flatter me with “great question!” while doing so.

There is one final thought though. Even though AI did not make this project exponentially faster to deploy, it did make it overall easier. As I debugged and re-deployed, it was nice to have an integrated terminal to ask questions, rather than have four browser tabs open between official documentation, stack overflow, a repo of sample code etc. If I updated or introduced a variable name, I did not have to scour my code for every single reference either. I also grew in skill as I asked it questions about why it structured code a certain way. It answered questions when I didn’t understand a particular error in deployment, a cloud watch log, or even locally.

The explanations have always been fruitful, even if I make mental notes to double check for accuracy.

Thanks for reading, and remember, this is only one developer’s experience!

How not to use AI

I’d like to share a story and then I’d like you, the reader to imagine, what you’d do in the bosses shoes.

You are reviewing a new client’s contract and looking at some numbers. Most of it is good, but one of the numbers is off. It appears, based on your prior experience, that it must have been calculated differently. Perhaps it was a typo. Maybe the wrong column from an Excel sheet got copied over. Perhaps there is something exceptional in this contract that you’re not aware of. So you send a message back to the employee who sent it to you and ask them to explain.

You get an email back. It appears slightly off-topic, but at least it’s a reply. The explanation you get confirms your first thought: this particular number was calculated differently than usual. You send a follow-up email and get another reply back. The answer continues to explain reasons for the change in calculation, but the answer does not satisfy you. So you speak with the employee in person to have them elaborate further.

At this point, you discover that this employee has been replying to recent emails with ChatGPT. You also suspect that this is how the contract, of which you have found one error, was generated to begin with.

How do you feel about this employee at this point? Perhaps not only as a subordinate, but also as a colleague, or worse, the person youm yourself report to?

This anecdote is only partially fiction. A similar incident happened to a friend of mine. Except in this case, he was a direct report of someone who relied on ChatGPT to communicate. Needless to say, he finds his position less than satisfactory.

Does Tech Make us too Lazy now?

It’s hard to know where to start with this, but I recollect a psych journal written in the early days of social media. It repeated several variations of a double-blind experiment. Each iteration tried to account for a different variable. Yet the same conclusion came up again and again: people frequently confused easy access to information, via the internet, with personal mastery of that information.

A more recent article from Futurism states a related conclusion: If you rely too much on AI, you atrophy your ability to think critically.

Yet I wonder, might there not be many out there who shrug their shoulders and say, ‘so what’? Or at the very least, behave that way as they type ‘Chat GPT, how do I get out of trouble with my boss today?’

AI is not your Relationship

It’s easy to say someone is lazy for replying to their co-workers with AI responses, but that’s not quite enough is it? Communication, especially written communication, is always subjective and relational between the writer and the audience. I feel silly pointing this out, but ChatGPT doesn’t know your co-worker like you do. It cannot therefore know your co-worker’s state of mind, behavior, expectations, or emotional state. It cannot understand what stakes might matter either. In the case of this anecdote, ChatGPT would not know if this miscalculation might have legal or financial consequences. It cannot know if this is a trivial or a deal-breaking change.

AI cannot choose the right words or mention any relevant contextual points in your replies. Does this matter, though? Well, that depends on how much you value your working relationship with someone. If someone sees an email from co-worker, and considers splat, cut, and paste reply via ChatGPT enough of a reply, I can’t help but think they value the relationship with the sender very little.

This says nothing of failures to communicate that evidently follow.

AI is not your brain

Now let’s consider a deeper problem here. In the story described, an employee did something that resulted in this strange point of data. The employee could not explain the decision. In all probability, this is because the employee did not make a decision.

The employee did not think.

Most of our work is habit. We do not always understand why we do something, but we would at least need to know if we had done something on autopilot. “I’m sorry, boss. I used our usual template contract and did not account for a currency exchange rate for this client,” could have been a good reply. But that’s not what happened here. The employee needed Chat GPT to make up an answer to a question she did not understand, despite it being her responsibility to do so.

As the Futurism article noted, ‘atrophied and unprepared’ is not the right way for your brain to work. The mind is a muscle like any other, and if it is not used well, it can weaken. Many of us, myself included, are knowledge workers, so the sharpness of our minds is as important as an industrial worker’s hammer-wielding muscle. We cannot completely substitute thinking with automation. Even if we could, do we want to release important details and company secrets into an AI?

Can a knowledge worker, who prefers not to think, be an excellent employee?

“But AI is tool!” some might point out. “A tool like a calculator! Or a book! You do not do complex math in your mind, nor commit paragraphs to memory like a bronze-age cleric!” That is correct, of course. I have no more desire to do complicated math or memorize books than I do to row a boat, burn candles for light, or even send postal letters to distant relatives. Humans are tool makers. Let us celebrate our amazing tools!

But one podcaster pointed out that there is no thinking that is not verbal processing. I add that there is little emotional processing that is not also verbal. It can’t be said enough: doing both must be a habit. It must be exercised to be done well.

Do we value the content of our minds and hearts so little that we prefer ChatGpt to do that for us? To be clear, AI can be useful in helping us think better, but it should not think for us. Doing so outsources our reasoning to an algorithm that’s well-known for making things up. It furthermore makes us dependent on a corporation whose motivations are opaque. It exposes us to something that can be manipulative.

In short, thinking poorly leads to a lack of freedom.

The consequences of bad thinking can’t be enumerated enough. We all know terrible examples of the ‘sunk cost fallacy.’ We might consider the 19th century ‘end times’ cults, where people upended their social and financial lives for nothing. We might shake our heads (even if we privately chuckle) when conspiracists get in trouble with the law. Thousands are in their graves because they believed their easy access to information about medicine made them experts in a novel virus. I’m sure many of them imagined themselves as bold Copernicans. If only they understood how difficult it is to know something new.

If these aren’t enough examples, I have some digital currency to sell you. Your favorite Tik toker promises it’s not a rug pull.

This is not to say that AI will necessarily deceive or manipulate. What I am saying is that an ‘atrophied and unprepared mind’ is an easy target for manipulation. In this light, a clumsy series of e-mails from one employee to her boss is trivial. What’s more important is: do you wish to be free and have power over yourself? If that’s so, then you’ll have to do the work of emotional and cognitive processing on your own.

At a minimum, we ought to be able to communicate why we made a particular decision in business. So please, don’t ask ChatGPT to do everything for you.

How I use AI: AI as Coworker?

The question will inevitably be asked, how do I use AI?

To begin, I am generally a slow adopter of consumer tech. I don’t think I owned an iPod until about 2009. I never adopted any social media that came out after Instagram. I only recently, as of last month, purchased a new Kindle. My previous, first-generation, Kindle is a mere fourteen years old. I’ve subscribed to Spotify for less than a year.

I am this way because I am skeptical, without being cynical, about any new developments in technology. The reason can be summarized in a conversation with a Dwarf NPC in an obscure video game called Arcanum. The dwarf comments that the short lives of humans make them zealous for industrious achievement. When new tech comes along, the first thing a human asks is “What can I use this for?” when the question ought to have been “What is the cost of its use?”

While I don’t think it’s an either-or question, asking both kept me from resorting to Chat GPT every time I encountered a new problem in Leetcode for the first time.

But AI has been around for years, and I have seen what it can do for me. My guidebook has been Co-intelligence: Living and Working with AI by Ethan Mollick. Ethan Mollick proposes that AI can be (among other things) a co-worker and a tutor.

AI as Co-Worker, or better yet, ‘legacy code’

The first thing I did was see what ChatGPT could do. I prompted it for some code with something like this: “Create a Cloudformation YAML file. In it, deploy a Serverless API gateway using Lambdas with only one GET endpoint. Have it return ‘hello world’ on the API path /hello. Write the Lambda code in JavaScript.”

I got a YAML file. I read through it. I did not understand everything I read, but I referenced AWS documentation to understand what I looked at. I estimate it would have taken me four or five hours to write that same code from scratch.

I went through a few iterations. I asked it to add a public S3 bucket (you know, for web hosting). I asked it to change the Lambda code to Python. I asked it to add two more endpoints, including POST request. This request did nothing but bounce back the request body to you with a 200 status. Each time the YAML file was updated with the correct code.

Well mostly correct. That same afternoon I tried to deploy in Cloudfront. I got several errors. Some of those errors were the result of permissions. I had not enabled my IAM account to deploy S3 buckets or Lambdas. But others were because the YAML code tried to set both a bucket ACL and bucket policy. AWS recommends using only one method or the other.

As I read through AWS documents to fix all that, I learned I had asked the wrong question. AWS doesn’t recommend public S3 buckets for simple websites anymore. The right way to do this is to use a private bucket and set it up as a source for a CloudFront distribution, which adds a whole new level of complexity to a YAML file.

I began this process assuming that AI could be a tutor, as had been recommended in Co-intelligence. However, I realized the mental process I went through is more like a co-worker, or maybe even a previous worker. What had I done to deploy this project? I examined code that had a bug, debugged it, and then optimized its legacy setup for more up-to-date best practices in AWS. In short, it was much like picking up on a forgotten project that has -for some reason- now become important to the company again.

This was my first, but not only, observation in using AI. I have not, for instance, talked about AI as tutor.

That’s for another blog and another time though.

Thanks for reading.