A Candid experiment with AI engineering

AI will make you 10x faster. That is the promise I have heard. I’ve also heard that AI will turn non-technical people into technical people. Or more accurately, lower the bar for app development.

But the most important thing I know about AI (and this thought is by no means original to me!) is that you can’t know what AI can do, untill you work with it.

Consequently, I embarked on a new project with the expectation to be 10x faster. This project is my public repository, and I’ll write how many hours it took later in this blog.

The Project

The project is a machine learning related web application. A user will click through a series of randomly chosen fantasy character portraits (thank you, Nexus Mods!) and label their binary gender (this is about KISS principle, not about erasure of the non binary genders!). Then, with scikit-learn, we’ll train a model on the collection of images and gender, and see how it goes.

For deployment, this will use AWS and deploy through Github. Resources used will be Fargate, s3, lambdas, an autoscaler, a load balancer, an RDS database, etc. Terraform will be used for Infrastructure as code.

I picked this project because it combines tech I know extremely well (Python, AWS, SQL), tech that is new to me, but I’m comfortable with (terraform), and tech that I am far from an expert in (Docker, Machine Learning, nginx)

What AI services Are to Be used?

I thought I’d check out v0 by Vercel, Cursor, and Google’s Gemini for this project. None of these are paid versions. I also did not rely on them for the same tasks, or equally. Anything I say about their merits is therefore limited.

How were these services used? I started with Google Gemini and described the machine learning problem I wanted to solve. My question was intentionally broad, “How can I go through hundreds of images, and label them according to a category (e.g. ‘dog’ or ‘cat’)? I want to use python and scikit-learn.” Google Gemini guided me a model I’ve seen before (Logistic Regression). It further explained that I would need to pre-process images into numpy arrays. I asked follow up questions, telling that not all of images had the same dimensions. It followed up with suggestions on padding with scimage.

I used V0 to make an outline of the project. I asked it for “give me a flask project that shows and image. Beneath the image is a button for ‘male’ or ‘female’. Put that project in a folder that is called ‘web-app’…” and so forth. I then asked it to provide CSS “like a website from 1998”. Then I asked it for a few Terraform files “to deploy this like it will got ECS Fargate, and make the cheapest possible EFS for storage” and “can you write a JSON object for the IAM permissions that are required to deploy this” etc. I also asked it to remove Javascript from Flask app, simply to see if it could refactor.

Finally, I downloaded the V0 project as a zip file, unzipped it, made several manual changes and created a git repo. Then, I relied on Cursor afterwards. I asked Cursor what it thought the project was attempting to do, and it gave a correct answer and wrote some documentation. I asked it to create .tf files for a Lambda and a corresponding Lambda file in a specific directory. It did so. The same went with other Docker files, updates to IAM permissions, more .tf files.

Cursor wrote the GitHub workflows. These are flows I read and understand, but have never written myself. I described questions like “I’d like to write a check for the PEP8 standards. It should run on any pull request into main, and refuse the pull request if any Python file fails the PEP8 test.”

The project proceeded from there in Cursor.

Great Results from AI

V0 impressed me. The easy-to-understand interface didn’t only write the correct code, but it also arranged the project’s folder structure in the way I had asked it to. I did this a few other times as well, without prompting it for a folder structure, and it produced results that were sensible and understandable.

The Flask application CSS did change as I asked: it produced a 1998-style website, in all of its nostalgic hideousness. It evoked the early web fascination, like a large image load over a mere 30 seconds! I might have asked it to simulate a 56k dial up hiss next.

Gemini assisted brainstorming like a subservient, technically astute, butler. If I asked a question that I knew would be ‘the wrong’ question (“What is the best machine learning model for image categorization?”), it answered rightly that there is no ‘best option’ and summarized the advantages of some models over others. I advised it that was concerned about too much memory usage in a single ECS task. It offered me options of training in batches or training in parallel. It provides streams of coding examples for both.

Were the answers from Gemini the best answers, or did it hallucinate? That I can’t answer. Scikit-learn is something I’ve used for less than a year. Still, its answers were fluid, helpful, and offered ideas to investigate.

Cursor, where I did the bulk of the editing, proved useful in both brainstorming and coding. I checked what it would say if I asked “should I store numpy arrays in S3 or EFS? I don’t expect them to be accessed continuously, but I will need to train a model with them. That model will run in ECS.” It gave out a reply I thought made sense: EFS is much more expensive than S3, and has the advantage of lightly coupling containers from Amazon Services. It could be shared as common network drive across several ECS tasks. It wound up recommending S3 due to cost, especially since I also told it I did not plan on training a model across parallel containers.

The writing of novel code also impressed me. When I asked it for a Lambda or a Github workflow, it wrote with certain patterns I didn’t not recognize. I asked it questions like “why did you institute that variable as ‘None’ on line 12?” or “why are the Terraform Plan and Terraform Apply commands separate”? It politely tutored me on best practices, explaining why these design patterns were common. Finally, I was able to update my workflows without any documentation reading. I asked cursor to disable or reenable runs on pull requests, pushes etc and it modified them correctly.

As my project grew, it did reasonably well at managing IAM permissions, particularly when it came to consolidating and organizing policy documents. More on that later though.

What Cursor Didn’t do so Well

I love code that is clean and easy to read. PEP8 helps me keep it that way. Cursor had a different opinion on all that. I wrote broad prompts like “re-format this file for pep8 standards” or even “run pycodestyle, and correct any formatting errors.” Neither worked. The latter test surprised me the most. The pycodestyle command is lucidly blunt about what you need to fix. Cursor could never get it done efficiently, and I continue to check PEP8 manually.

Cursor responded to a prompt such as ‘make a reverse proxy in this folder. Make sure it has Docker file. Also, use gunicorn to handle the headers.’ I expected the simplicity I had seen through a Udemy course. What I got was not exactly that, of course, but it was flawed. Cursor placed header information in both the Docker file and the related gunicorn file, which caused errors at deployment.

Another Docker related example was this: I got errors in which a Docker file could not find a related file to import when I deployed it, even though it built fine from a terminal command. Cursor suggested a fix that involved going several directories up, and performing a recursive search for the apparently absent file. Turns out the error was in my Deploy.yml. I had the docker build command ending in ‘.’ as opposed to the intended ‘..’ -making this a remarkably simple error to identify and fix.

These are only two examples in which Cursor built a Rube Goldberg machine to solve an issue, when best practices call for Occam’s Razor. With these occasional Rube Goldberg solutions, Cursor proved as error-prone as any human. A change in one file to fix one problem caused a break in functionality in another file, causing another problem. I learned to ask Cursor to double-check, sometimes with colorful language.

Cursor proved incredibly time saving when it came to IAM permissions. It was great at analyzing them and refactoring them as needed. But I did notice a flaw: it didn’t seem to follow least privilege consistently. Frequently, the first suggestion for permissions invoked the dreaded asterisks on resources. Furthermore, it was almost never accurate it predicating all the permissions that would be needed to deploy a resource. Even prompts like “Make sure the resource has read permissions, to the s3 bucket. Apply write permissions to only keys that begin with ‘write-directory'” would miss critical permissions.

Closing thoughts

Did AI make me 10x faster at making this project? At the time of this writing, I’ve put in about 16 hours of work into it, and I estimate it is about 60% complete. It did make things faster, but not to such an astronomical degree. If you’d like me to set up a GeoCities site, with some raw JavaScript, that can be finished before my Napster finishes a download, though.

Will AI turn a technical person into a non technical person? Qualified yes, if that non-technical person wants to use AI as a teaching aide, rather than a skill replacement. If you are non-technical person, ready to deploy a cool app to AWS cloud, consider this: when I first learned cloud development, I somehow made a git commit that publicly exposed (non-root) AWS access credentials. Next, I got a surprise $635 dollar bill. Amazon had disabled the compromised IAM user. They forgave the bill, but I still went region by region, resource by resource, to ensure nothing was left running. This is one of hundreds of ways thing can go wrong if you don’t know what to ask for.

This leads to another point. AI can give you ideas, recommendations, explanation of trade offs, but it cannot make decisions for you. I chose to deploy this project with Fargate because it was simple (see above, KISS method), but that decision was pretty unconstrained. I’m not considering pre-existing infrastructure, as one would in a professional environment. I am barely considering the cost too. I would under no circumstances trust an AI to tell me “how much will running this project cost?” or “will Fargate be costlier than EKS?” because it likely will hallucinate the math. It would flatter me with “great question!” while doing so.

There is one final thought though. Even though AI did not make this project exponentially faster to deploy, it did make it overall easier. As I debugged and re-deployed, it was nice to have an integrated terminal to ask questions, rather than have four browser tabs open between official documentation, stack overflow, a repo of sample code etc. If I updated or introduced a variable name, I did not have to scour my code for every single reference either. I also grew in skill as I asked it questions about why it structured code a certain way. It answered questions when I didn’t understand a particular error in deployment, a cloud watch log, or even locally.

The explanations have always been fruitful, even if I make mental notes to double check for accuracy.

Thanks for reading, and remember, this is only one developer’s experience!

How not to use AI

I’d like to share a story and then I’d like you, the reader to imagine, what you’d do in the bosses shoes.

You are reviewing a new client’s contract and looking at some numbers. Most of it is good, but one of the numbers is off. It appears, based on your prior experience, that it must have been calculated differently. Perhaps it was a typo. Maybe the wrong column from an Excel sheet got copied over. Perhaps there is something exceptional in this contract that you’re not aware of. So you send a message back to the employee who sent it to you and ask them to explain.

You get an email back. It appears slightly off-topic, but at least it’s a reply. The explanation you get confirms your first thought: this particular number was calculated differently than usual. You send a follow-up email and get another reply back. The answer continues to explain reasons for the change in calculation, but the answer does not satisfy you. So you speak with the employee in person to have them elaborate further.

At this point, you discover that this employee has been replying to recent emails with ChatGPT. You also suspect that this is how the contract, of which you have found one error, was generated to begin with.

How do you feel about this employee at this point? Perhaps not only as a subordinate, but also as a colleague, or worse, the person youm yourself report to?

This anecdote is only partially fiction. A similar incident happened to a friend of mine. Except in this case, he was a direct report of someone who relied on ChatGPT to communicate. Needless to say, he finds his position less than satisfactory.

Does Tech Make us too Lazy now?

It’s hard to know where to start with this, but I recollect a psych journal written in the early days of social media. It repeated several variations of a double-blind experiment. Each iteration tried to account for a different variable. Yet the same conclusion came up again and again: people frequently confused easy access to information, via the internet, with personal mastery of that information.

A more recent article from Futurism states a related conclusion: If you rely too much on AI, you atrophy your ability to think critically.

Yet I wonder, might there not be many out there who shrug their shoulders and say, ‘so what’? Or at the very least, behave that way as they type ‘Chat GPT, how do I get out of trouble with my boss today?’

AI is not your Relationship

It’s easy to say someone is lazy for replying to their co-workers with AI responses, but that’s not quite enough is it? Communication, especially written communication, is always subjective and relational between the writer and the audience. I feel silly pointing this out, but ChatGPT doesn’t know your co-worker like you do. It cannot therefore know your co-worker’s state of mind, behavior, expectations, or emotional state. It cannot understand what stakes might matter either. In the case of this anecdote, ChatGPT would not know if this miscalculation might have legal or financial consequences. It cannot know if this is a trivial or a deal-breaking change.

AI cannot choose the right words or mention any relevant contextual points in your replies. Does this matter, though? Well, that depends on how much you value your working relationship with someone. If someone sees an email from co-worker, and considers splat, cut, and paste reply via ChatGPT enough of a reply, I can’t help but think they value the relationship with the sender very little.

This says nothing of failures to communicate that evidently follow.

AI is not your brain

Now let’s consider a deeper problem here. In the story described, an employee did something that resulted in this strange point of data. The employee could not explain the decision. In all probability, this is because the employee did not make a decision.

The employee did not think.

Most of our work is habit. We do not always understand why we do something, but we would at least need to know if we had done something on autopilot. “I’m sorry, boss. I used our usual template contract and did not account for a currency exchange rate for this client,” could have been a good reply. But that’s not what happened here. The employee needed Chat GPT to make up an answer to a question she did not understand, despite it being her responsibility to do so.

As the Futurism article noted, ‘atrophied and unprepared’ is not the right way for your brain to work. The mind is a muscle like any other, and if it is not used well, it can weaken. Many of us, myself included, are knowledge workers, so the sharpness of our minds is as important as an industrial worker’s hammer-wielding muscle. We cannot completely substitute thinking with automation. Even if we could, do we want to release important details and company secrets into an AI?

Can a knowledge worker, who prefers not to think, be an excellent employee?

“But AI is tool!” some might point out. “A tool like a calculator! Or a book! You do not do complex math in your mind, nor commit paragraphs to memory like a bronze-age cleric!” That is correct, of course. I have no more desire to do complicated math or memorize books than I do to row a boat, burn candles for light, or even send postal letters to distant relatives. Humans are tool makers. Let us celebrate our amazing tools!

But one podcaster pointed out that there is no thinking that is not verbal processing. I add that there is little emotional processing that is not also verbal. It can’t be said enough: doing both must be a habit. It must be exercised to be done well.

Do we value the content of our minds and hearts so little that we prefer ChatGpt to do that for us? To be clear, AI can be useful in helping us think better, but it should not think for us. Doing so outsources our reasoning to an algorithm that’s well-known for making things up. It furthermore makes us dependent on a corporation whose motivations are opaque. It exposes us to something that can be manipulative.

In short, thinking poorly leads to a lack of freedom.

The consequences of bad thinking can’t be enumerated enough. We all know terrible examples of the ‘sunk cost fallacy.’ We might consider the 19th century ‘end times’ cults, where people upended their social and financial lives for nothing. We might shake our heads (even if we privately chuckle) when conspiracists get in trouble with the law. Thousands are in their graves because they believed their easy access to information about medicine made them experts in a novel virus. I’m sure many of them imagined themselves as bold Copernicans. If only they understood how difficult it is to know something new.

If these aren’t enough examples, I have some digital currency to sell you. Your favorite Tik toker promises it’s not a rug pull.

This is not to say that AI will necessarily deceive or manipulate. What I am saying is that an ‘atrophied and unprepared mind’ is an easy target for manipulation. In this light, a clumsy series of e-mails from one employee to her boss is trivial. What’s more important is: do you wish to be free and have power over yourself? If that’s so, then you’ll have to do the work of emotional and cognitive processing on your own.

At a minimum, we ought to be able to communicate why we made a particular decision in business. So please, don’t ask ChatGPT to do everything for you.

How I use AI: AI as Coworker?

The question will inevitably be asked, how do I use AI?

To begin, I am generally a slow adopter of consumer tech. I don’t think I owned an iPod until about 2009. I never adopted any social media that came out after Instagram. I only recently, as of last month, purchased a new Kindle. My previous, first-generation, Kindle is a mere fourteen years old. I’ve subscribed to Spotify for less than a year.

I am this way because I am skeptical, without being cynical, about any new developments in technology. The reason can be summarized in a conversation with a Dwarf NPC in an obscure video game called Arcanum. The dwarf comments that the short lives of humans make them zealous for industrious achievement. When new tech comes along, the first thing a human asks is “What can I use this for?” when the question ought to have been “What is the cost of its use?”

While I don’t think it’s an either-or question, asking both kept me from resorting to Chat GPT every time I encountered a new problem in Leetcode for the first time.

But AI has been around for years, and I have seen what it can do for me. My guidebook has been Co-intelligence: Living and Working with AI by Ethan Mollick. Ethan Mollick proposes that AI can be (among other things) a co-worker and a tutor.

AI as Co-Worker, or better yet, ‘legacy code’

The first thing I did was see what ChatGPT could do. I prompted it for some code with something like this: “Create a Cloudformation YAML file. In it, deploy a Serverless API gateway using Lambdas with only one GET endpoint. Have it return ‘hello world’ on the API path /hello. Write the Lambda code in JavaScript.”

I got a YAML file. I read through it. I did not understand everything I read, but I referenced AWS documentation to understand what I looked at. I estimate it would have taken me four or five hours to write that same code from scratch.

I went through a few iterations. I asked it to add a public S3 bucket (you know, for web hosting). I asked it to change the Lambda code to Python. I asked it to add two more endpoints, including POST request. This request did nothing but bounce back the request body to you with a 200 status. Each time the YAML file was updated with the correct code.

Well mostly correct. That same afternoon I tried to deploy in Cloudfront. I got several errors. Some of those errors were the result of permissions. I had not enabled my IAM account to deploy S3 buckets or Lambdas. But others were because the YAML code tried to set both a bucket ACL and bucket policy. AWS recommends using only one method or the other.

As I read through AWS documents to fix all that, I learned I had asked the wrong question. AWS doesn’t recommend public S3 buckets for simple websites anymore. The right way to do this is to use a private bucket and set it up as a source for a CloudFront distribution, which adds a whole new level of complexity to a YAML file.

I began this process assuming that AI could be a tutor, as had been recommended in Co-intelligence. However, I realized the mental process I went through is more like a co-worker, or maybe even a previous worker. What had I done to deploy this project? I examined code that had a bug, debugged it, and then optimized its legacy setup for more up-to-date best practices in AWS. In short, it was much like picking up on a forgotten project that has -for some reason- now become important to the company again.

This was my first, but not only, observation in using AI. I have not, for instance, talked about AI as tutor.

That’s for another blog and another time though.

Thanks for reading.

Why were photos spawning in my S3 Bucket?

I didn’t ask for them. Yes, the code is intended to make them. But I did not trigger code to make them. So why were photos spawning in my S3 Bucket the other day?

Let’s go over some context first. Yesterday, I expanded on my Udacity Meme Generator project. I’m taking it out of local deployment and into the cloud. The purpose of this site is to take random photos and random quotes and turn them into random memes. The source photos and resulting memes are stored in an S3 Bucket. The code is run on Ec2 instances. The Ec2 instances are in an autoscaling group behind a load balancer. The Cloud Formation YAML file can be viewed here.

My task yesterday involved experimenting with fonts. I wanted the Ec2 instances to pull fonts from the same S3 bucket. I created several test scripts to get it right. And while testing, I expected a dozen or so random memes generated in my s3 bucket. Instead, I saw hundreds.

Unsure how that happened, I cleared them out of the folder. Then, ssh’ed into one Ec2 instance and continued my work. As I generated memes with the new fonts, I noticed more random memes generated with the old font, and worse, it looked like they were generated from the website. I wasn’t loading the website so I still didn’t get it.

I deleted the spawned memes again. Then I wondered if someone was trolling my application, but that seemed unlikely. I thought my test files might have made multiple calls, or somehow recursing over themselves. Not happening.

Finally, I looked at my Cloudformation YAML file. At the time this problem was happening, the health check looked like this:

While my Flask application looked like this (is intended behavior).

Can you see the issue here?

I followed the KISS method in setting up the original health check. It called root, which is essentially checks, “hey, can you load this page yet?” The Ec2 instance could load that page. It was, therefore ‘healthy’. It therefore also created a new meme, and put it into the s3 bucket.

Health checks are repeated checks, and these repeated checks found healthy instances, and those instances repeatedly spawned memes.

To fix this, I made a proper health check. The YAML was changed to this:

And then I added another KISS method health check like this

And now the memes only spawn when I tell them to.

Why I Got out of IT and into Software

Since this is a blog about my professional life, I can’t think of a better first post than this: how did I go from IT to Software?

To begin, the second iteration of my career in IT began around 2012. I was back in the United States from a stint teaching abroad in South Korea. I supervised a call center that supported Windows machines remotely. Our service was surveillance, and it was hardly glamorous. Still, I reworked my technical skills there for about three years. I also learned from one of the best managers I ever had simply how to be a manager, as well as how to work well in a corporate team.

I continued in similar work supporting both Windows and Apple computers for several years. I helped network admins set up the fiber. I created my own RAID arrays. I designed a new AD environment from scratch. I’ve probably crimped miles of ethernet cable. I recovered data that outgoing employees tried to delete. I did all that for several companies around Los Angeles, including three different studios.

The Warner Brother’s lot was the best place to play the Harry Potter mobile game.

It was during this time I developed an elementary appreciation for programming. This was something I never did in college because I wasn’t exactly a mathematical savant. However, I tired of not understanding how code works. So I enrolled in Udacity’s Intro to Programming course to get a better idea. I also learned Git version control during that time.

It probably isn’t a coincidence that I took to Python pretty quickly. When I write in Python, I experience coding less like Mathematics, and more like analytical writing, driven by informal logic. Writing code felt more like reading terse (or perhaps sometimes dense) philosophy, which I was quite used to.

Within about three months I experimented with my own APIs and Angular webpages. In several IT positions, I wrote customized PowerShell modules, often using principles like Classes to great success.

There is one day I can still remember, at one of those Studios, that finally made me determined to get out of IT. A team of us walked into a cavernous brick room about the size of a bedroom house, and saw little there except empty server racks, the debris of abandoned ethernet cables, and rumors that the place was haunted (I’m not joking). We deployed a new desktop to some network engineer, who spoke of this room like a wise old village elder tells the explorer never to go to that lost, plague-ridden, jungle city.

The story was simple: everything went to the cloud. Honestly, I don’t know why this network engineer was even in the room.

I contemplated that, as well as my time deploying servers, routers, and firewalls over the years. I came to a pretty quick conclusion: I don’t even like going into server rooms anyway, so I might as well adapt to the times, learn Infrastructure as Code, and pursue cloud tech forever.

It was about six months after that epiphany that took a job troubleshooting software, hosted in AWS cloud. My contributions to that company are a story for another time.

May I never crimp ethernet cables again.