Programming Using AI

I've been thinking about this article for a while but it is only recently that I have been able to sit down and really have a think about it properly. Or at least collate all of my thoughts into a single article.

Over the last couple of years the term "AI" has become a sort of marketing term that is banded about (and abused) by all sorts of companies with the intent of trying to make life easier.

In this article we will define the term AI in the context of programming, look at some services that you can use to produce code, and go through some pros and cons of using AI systems to code.

What Is An AI?

The term AI is a generic term that encompasses a number of different technologies. For the purposes of this article, when I say "AI" I mean "large language model" or LLM, which are machine learning models that are designed for natural language processing tasks and are trained using huge amounts of text.

By "large amounts of text" I mean very large amounts of text. OpenAI, the company behind ChatGPT, haven't released the exact numbers of data used to train their models, but it is estimated that around 10 trillion words were used to train GPT-4. To put this into context, the entire Lord of the Rings trilogy contains around 480,000 words, so that would be the equivalent of training using 21 million books the size of the trilogy. That's a lot of data.

All this data is used to perform one task, which is basically to estimate what the next word in a sequence would be. The vast amounts of data fed into the model means that they have a large statistical database that can be used to see what the next word would be. This is how all of these models are able to produce sentences that look like they were written by a person.

In terms of writing code the same systems are able to reproduce code in similar ways to what they have seen already. If you train an AI model how to interpret lots of coding examples from around the internet, you can then get it to produce code when given certain requirements.

It is important to remember the following points when thinking about large language models.

  • It is a statistical model that produces output based on token analysis and statistical likelihood of outcome.
  • It is not conscious or intelligent, it just mimics intelligence.

With that in mind, let's look at some services that allow us to create code using AI.

What Services Exist?

I won't go through lots of different services here, but I'll highlight a few of the more popular/well known ones that I have used to generate code. If you do know of any more then please let me know!

Instead of just linking to the code, I thought I would ask the each of the models to generate a quicksort in PHP and post the results. This is an interesting question to ask as the quicksort algorithm, whilst simple, can be a bit tricky to get right due to the recursion and the different arrays being passed between the recursive calls.

I ran the resulting code through the tests I created for my PHP sorting exercise repository, which contains unit tests to check that the quick sort has sorted an array of numbers correctly.

And yes, I know there are a few more services out there. These are the few that I have had experiences with.

ChatGPT

Link: https://chatgpt.com/ 

Has a number of different language models that can be used to generate both text and code.

I'm using the free tier that only had access to the LLM "GPT-4o mini", and whilst the text that ChatGPT returns is impressive, the code that this model produces is pretty bad.

Here is the result of the quicksort question.

function quicksort(array $array): array {
    // Base case: arrays with 0 or 1 elements are already sorted
    if (count($array) < 2) {
        return $array;
    }

    // Choose a pivot element
    $pivot = $array[0];

    // Partition the array into two subarrays
    $less = array_filter(array_slice($array, 1), fn($x) => $x <= $pivot);
    $greater = array_filter(array_slice($array, 1), fn($x) => $x > $pivot);

    // Recursively sort the subarrays and merge them with the pivot
    return array_merge(quicksort($less), [$pivot], quicksort($greater));
}

// Example usage
$array = [3, 6, 8, 10, 1, 2, 1];
$sortedArray = quicksort($array);

echo "Original array: " . implode(", ", $array) . "\n";
echo "Sorted array: " . implode(", ", $sortedArray) . "\n";

At first inspection, this appears to be ok, but after running the code it produced an error.

Undefined array key 0
test.php:8

The issue is caused by the use of the function array_filter() to remove items that are less than or equal to, or greater than, the pivot value. Using this function will cause the resulting array to be missing array key 0, which is required for the next step in the recursion. As a result, if the first item in the array is filtered out of the array then the recursive call to quicksort() will contain an array that will cause an error when attempting to get the pivot value.

Indeed, the AI actually gave me a list of numbers "[3, 6, 8, 10, 1, 2, 1]" that actually caused this issue to occur when sorting them.

Google Gemini

Link: https://gemini.google.com 

Gemmi uses Google's 1.5 Flash model, which is available to use for free. There is also a "pro" version that is apparently more capable of logical reasoning and also uses the "Gemini-Exp-1206" model that has more code and math capabilities.

Here is the result of the quicksort question.

function quickSort(array $arr): array {
    $length = count($arr);

    if ($length <= 1) {
        return $arr; // Base case: already sorted
    }

    $pivot = $arr[0]; // Choose the first element as the pivot
    $left = $right = [];

    for ($i = 1; $i < $length; $i++) {
        if ($arr[$i] < $pivot) {
            $left[] = $arr[$i];
        } else {
            $right[] = $arr[$i];
        }
    }

    return array_merge(quickSort($left), [$pivot], quickSort($right));
}


// Example usage:
$unsortedArray = [5, 2, 8, 1, 9, 4, 7, 3, 6];
$sortedArray = quickSort($unsortedArray);

echo "Unsorted Array: " . implode(", ", $unsortedArray) . "\n";
echo "Sorted Array: " . implode(", ", $sortedArray) . "\n";

This works perfectly, and also doesn't suffer from the issue that ChatGPT introduced. There are even some small optimisations done to speed things up, like storing the count of the array in a variable so that we don't have to do that more than once.

Gemini also gave me a very detailed breakdown of what the code was doing. It also explains that the pivot "Uses the first element as the pivot for simplicity. While other pivot selection strategies (like random pivot) can improve average performance, this is a clear and common implementation.", which is a good observation of pivot selection strategies.

GitHub CoPilot

Link https://github.com/features/copilot 

GitHub CoPilot was recently made available for free (for a limited number of responses) and has a number of available models to select. As CoPilot is a dedicated code completion tool I wanted to include it in this analysis. The tool can be used via the GitHub website, but can also be embedded into a number of IDEs for use within projects.

I had a go with the GPT 4o model from Azure to see how it would fair with the question. Here is the result of the quicksort question.

function quicksort(array $array): array {
    // Base case: arrays with 0 or 1 element are already sorted
    if (count($array) < 2) {
        return $array;
    }
    
    // Select the pivot element
    $pivot = $array[0];
    
    // Partition the array into three parts
    $less = []; // Elements less than the pivot
    $greater = []; // Elements greater than the pivot
    
    for ($i = 1; $i < count($array); $i++) {
        if ($array[$i] <= $pivot) {
            $less[] = $array[$i];
        } else {
            $greater[] = $array[$i];
        }
    }
    
    // Recursively sort the partitions and combine them with the pivot
    return array_merge(quicksort($less), [$pivot], quicksort($greater));
}

// Example usage:
$array = [3, 6, 8, 10, 1, 2, 1];
$sortedArray = quicksort($array);

echo "Sorted array: " . implode(", ", $sortedArray);

This implementation is almost exactly the same as the one produced by Gemini, which makes me wonder if they are derived from the same source. The example numbers at the end are from the ChatGPT output.

Codeium

Link: https://codeium.com/live/general

Codeium can be used on the site, but can also be integrated into more than 40 IDEs, which is a greater number of integrations than CoPilot. It's not clear how they created their model or if it is a wrapper around another AI system.

Here is the result of the quicksort question.

function quicksort($array) {
    if (count($array) < 2) {
        return $array;
    }
    
    $pivot = $array[0];
    $left = $right = [];
    
    for ($i = 1; $i < count($array); $i++) {
        if ($array[$i] < $pivot) {
            $left[] = $array[$i];
        } else {
            $right[] = $array[$i];
        }
    }
    
    return array_merge(quicksort($left), [$pivot], quicksort($right));
}

// Usage example
$array = [3, 6, 8, 10, 1, 2, 1];
$sortedArray = quicksort($array);
print_r($sortedArray);

This is exactly the same code the CoPilot produced, even down to the numbers in the list added as an example at the end.

JetBrains

Link: https://www.jetbrains.com/ai/

I use the JetBrains IDE PHPStorm, and have done for a few years now. It's a pretty solid system and has lots of features and last year JetBrains introduced their own AI that integrates with their suite of IDEs. I have been using JetBrains for a few weeks and really not been impressed by the results. It tends to get the context of what I'm doing wrong and will add in code that misses the point.

On one occasion I was writing some code to extract information from a Drupal database, using the Drupal entity system. I was trying to load items in the theme layer to display information connected to a page. Out of interest, I gave the AI a go at the code I was writing and it added code that would loop through all of the entities I was looking at and update the fields to static values. This effectively would destroy the site I was looking at.

So what about the quicksort algorithm? The JetBrains AI has a prompt you can use to ask specific questions so I was able to produce this.

function quicksort(array $array): array
{
    if (count($array) < 2) {
        return $array; // Base case: arrays with 0 or 1 element are already sorted
    }

    $pivot = $array[0]; // Choose the pivot (first element of the array in this case)
    $less = [];         // Elements less than the pivot
    $greater = [];      // Elements greater than the pivot

    // Partition the array
    for ($i = 1, $len = count($array); $i < $len; $i++) {
        if ($array[$i] <= $pivot) {
            $less[] = $array[$i];
        } else {
            $greater[] = $array[$i];
        }
    }

    // Recursively sort the subarrays and combine
    return array_merge(quicksort($less), [$pivot], quicksort($greater));
}

This works fine, although it is pretty much the same as the Gemini, CoPilot and Codium results. Even the wording of the comments is similar.

Analysis Of Services

What is strange is that the output from all of these bots is a combination of outputs from different systems. I did some searching with the outputs generated to see if I could tell where it came from in the first place. The quick sort algorithm used for Gemini, CoPilot, Codeium, JetBrains look like they are based on an 11 year old reddit post in /r/dailyprogrammer. The quicksort generated by ChatGPT appears to be generated from an example on this post.

Oddly, the number sequence "[3, 6, 8, 10, 1, 2, 1]" is used on a number of different sites when talking about sorting algorithms, including the quicksort algorithm. These large language models appear to have combined their output from a number of different sources.

Whilst these examples are impressive, they are simply copied and reworded from existing examples found on the internet, some of which have flaws that are simply reproduced without understanding.

To me, this just seems like copy and paste with extra steps.

Pros Of Using AI For Programming

Here are some positive aspects of using AI when working as a programmer.

  • It can be interacted with using normal text.
  • It can generate explanations for (simple) code, showing what code is doing line by line.
  • Generates quick solutions to simple or mundane problems.

I honestly couldn't think of any other advantage of using AI for programming. So, let's look at the cons.

Cons Of Using AI For Programming

Here are some disadvantages of using AI when working as a programmer.

  • AI's don't really understand problems you ask them about. They only appear to understand what the problem is and produce language that makes it look like they have understood everything.

    A good example of this is with any task involving mathematics. LLMs don't understand maths and often stumble quite hard when asked to perform maths functions. The reason for this is that LLMs only reproduce what they see in their data, which means that if lots of people write "1+1=3" then this will be accepted as a "fact" by the LLM and reproduced.

    Some AI models will solve this problem, but only through adding rules that the AI should follow when faced with certain questions.

    It's quite a telling point that content is AI generated when people include statistics in their content where simple things like percentages don't add up to 100%. This problem is exacerbated with complex mathematics and algebra where the numbers involved in the problem are removed in favor of variable assignments.

  • A lot of data from ChatGTP and others is derived from open source projects and online resources. ChatGTP have never stated explicitly where their training data is from, but even my example above shows that they are just pulling from the web.

    The issue here is that the vast majority of this code is either highly contextual or not great quality. This isn't a dig at open source at all, it's just a fact about source code that is freely available. For every finely crafted open source project that had great code quality (with coding standards and tests) there are dozens of WordPress plugins that look like they have been deliberately obfuscated.

    This causes the AI engines to reproduce this trash code because it's "statistically significant". It will look at the terrible code quality of WordPress plugins and see that this must be the way to write code because most of the code is like that.

    There are settings, like "heat", that can allow you select different tokens from the API for tasks, but that certainly won't resolve the issue as you will end up mixing up code examples and producing utter nonsense.

  • AI "hallucinations" are very dangerous when generating code as it can lead to developers wasting time attempting to fit AI generated code into systems. AI will often tell developers to create files that do not have any impact on the system at all.

    I've had all sorts of nonsense returned to me from various different AIs. Everything from files that don't exist to a syntax that just mushes together PHP and YAML in a single file. There was clearly no understanding how the structure of the file or how it would actually work.

    One notable example of this in my own experience was when asking ChatGPT to produce code. I asked it to write a plugin for Drupal SearchAPI. It suggested I create a *.plugin.yml file, which doesn't exist as a thing. I can imagine a developer who isn't familiar with that system add the file and wonder why their changes aren't being picked up and then spending hours trying to debug it.

  • The answers the AI process are just wrong enough that you wouldn't know they were wrong unless you knew the subject.

    This is more subtle, and more connected to writing than programming, but if you ask an AI to produce some documentation on a subject then you may need to be an expert on that subject already to spot the mistakes.

    As an example, I asked ChatGPT to create an article about the logging service in Drupal to see what it produced. Though the grammar and structure of the article was good, when I looked at the details presented I realised it was nonsense. A beginner to the system would read that and accept it as the truth, but the document would actually do more harm than good.

  • AI don't really understand system architectures. They only "know" about smaller aspects of programming and so getting AI to generate anything more than a few lines of code will produce unpredictable results.

    The more complex a system is, the more that AIs will struggle to produce code for it. I'm sure you can easily get an AI to spit out code that creates a CRUD system in a number of different languages as there are a million examples of this all over the internet. But what about integrating with a specialist API? Or creating a system that has multiple object patterns working together? I've not been able to get any AI to produce a coherent answer for those tasks.

  • Relying on AI means that you either need to rewrite it later or you will be implementing code into your application that you don't understand well enough to maintain. Doing this long term has a detrimental effect on both skills and the code base.

    Adding code that you are responsible for, but don't understand, to a project is a sure fire recipe for disaster.

  • Studies have shown that there is a 30% increase in code churn when using AI to generate code. The term "code churn" relates to code that is committed to a system and than changed after the fact, and the study showed that teams using AI to generate code had to change code 30% more often than teams who didn't. That study specifically looked at CoPilot, but the problem is universal. If you accept code generated by AI into your application without fully understanding it and testing it first then you will introduce bugs.
  • AI chat bots have been caught leaking sensitive information. The interactions you make with the chat bot are used to train the AI, which means that if you enter any code to these AI systems there is a high chance that they will form the answer for another user on the system. 

    It's easy to say that you should never input any sensitive information or intellectual property into chat bots. The issue is, however, that with IDE integration it becomes easy for the AI to read your source code. You MUST be absolutely sure that your proprietary code isn't being sent upstream or used to train the AI.

  • Finally, but perhaps most critically, there's also the environmental impact of using AI. Training and using AI is eating resources at planet consuming rates. It is estimated that for every response that ChatGPT makes, it costs a liter of water to produce that result. I can't get behind a technology that will destroy the world in an effort to produce correct looking text.

There are more issues here, but I wanted to focus on just programming in this article.

Will I Use AI?

Frankly, no.

I'm happy to use some minimal autocomplete features, but using AI to write code for me seems like a recipe for disaster. As a professional coder I need to fully understand all of the code I write, including the context of the system that I'm writing that code in.

By allowing AI to write code for me I open myself up for bugs that would be quite difficult to spot otherwise. I'm also hesitant to use AI as I know I will get complacent and end up relying on it, which would ultimately effect my learning process.

I feel especially sorry for junior developers who have come to rely on AI as it means they aren't growing as coders as they could do. Without writing code, making mistakes, seeing errors, and learning from those mistakes, they are destined to stay at mediocre levels and never progress.

When stuck on problems I've asked different AI agents for help, and the answers they produce have always been either complete gibberish, unreadable code, or just plain wrong. Even with simple tasks like reformatting files I end up with nonsense that just wastes my time.

That's the crux of the matter really. Time. The use of AI agents keeps being advertised as "saving time", but I see absolutely no evidence of that at all.

I also tend to work on a lot of proprietary code, and so the prospect of leaking that code accidentally due to an overzealous AI agent fills me with dread. This would essentially be in violation of a number of NDAs that I've signed and my career would essentially be over.

As for writing articles for me, that's also a solid no.

For #! code I made the choice early on that I would never use automatic writing or AI agents to write articles. I really wanted to ensure that the quality of the articles I create are of a high standard. As a result, I haven't looked at content creation or even auto summarisation of content for the site.

I will occasionally come across articles on other websites (mostly Medium for some reason) where the post seems fine at face value. Upon closer inspection it is clear that the entire thing makes no sense at all and I question whether the article was just auto-generated and posted without thought. I do not want to be associated with a fraudulent site that generates AI copy to con readers.

I want to be very clear with that point. I will never use AI to generate content for this site, nor will I accept content generated by other authors. This includes images or other artwork, as if I do need an image creating I will either create it myself or commission it to be created for me.

This doesn't mean that I won't be using AI for anything. I have had some success with pattern matching where simple string matching wasn't working. I've also gotten a demo of retrieval augmented generation search system working, and that worked very well. I appreciate that AI has a place within systems for certain tasks, but for the task of programming I don't think we are there yet. 

Conclusion

I will agree that what I have written here are largely opinions, although I have tried to back up some of my points with facts and links to sources.

Ultimately, if you use AI for anything more than a simple code completion then you are asking for problems. This is especially the case if you are leaning how to code or how to use a system and using AI agents to do this. If you accept their output without questioning it then you'll end up wasting time.

Indeed, The Register recently reported on developers first hand experience of using a system called Devin. The AI bot Devin is meant to be an end to end development AI and would integrate with emails, slack and deployment processes to assist in building and launching applications.

Unfortunately, the Devin bot "rarely worked" and would often waste entire days by failing to understand what wasn't supported and would hallucinate features that didn't exist. The developers in charge of the project would have to keep a very careful eye on what it did as tasks it had successfully performed before would fail in "complex, time-consuming ways".

I have seen first hand developers generate code using ChatGPT and commit this to a codebase without really understanding what it is doing. And that really concerns me.

One example was a junior developer who wanted to create some code that parsed a CSV of URLs and used that information for tests. The code was pretty impressive for a junior but had some subtle bugs that had been missed in the code review. When I asked him to fix them he admitted he didn't know what the code did and had problems changing it. I then realised why the code had been produced so quickly. It was all just spat out with AI with little change and he had committed it without attempting to understand what it was doing.

I've also seen developers use ChatGPT as a kind of help desk support that has produced answers that were utter nonsense. A friend (who I have a lot of respect for) had problems starting his computer due to a filesystem issue; so he just asked ChatGPT what to do. The response included a command to "mount the chat-gpt filesystem", which doesn't exist, but he was blindly trying to mount it anyway. We got the computer booting again, but only once we stopped using AI to troubleshoot.

I find these examples really indicative of the use of AI in the software industry. Developers relying on information produced by AI without understanding it or questioning it. Managers touting the virtues of using AI without really understanding what it is doing. Products forcing AI into everything they can just because it seems magical, only to realise it is telling people to eat glue or selling brand new cars for a $1.

I think we need to take a step back and really think about what AI is doing in a given context before just letting it loose.

On the other hand. Maybe the use of AI agents to generate source code is a brilliant idea and should be encouraged.

Why? Because it will create lots of applications with poorly implemented, insecure, slow, and outright broken code, that the original developer doesn't understand, which will need highly skilled developers to unpick and put right.

The future is looking good! :)

Add new comment

The content of this field is kept private and will not be shown publicly.