AI in Software Development: A Deep Dive

In this article we go in depth on how AI is currently being used to assist software development and where we think it's headed in the upcoming years.

This article was manually written. No AI.

AI this and that - what’s going on?

I’m sure that you have seen it on the news, from friends, from a random person in a coffee shop – AI is here. But what even is it, and what is going on?

Well, to start, we need to define AI, as it has fallen into a “hype phrase.” Most of the hype you have seen regarding AI has been centered around Large Language Models (text generation) and image and video generation. There’s a whole subset of AI that has not made the news regarding data analysis, traditional algorithm methodology, and pattern recognition.

So, for the purpose of this article, we’re going to talk about Large Language Models (LLMs) and the use cases that we have seen in relation to software development.

Applications like ChatGPT, Claude, embedded LLMs in Cursor, and Devin AI are points of contention amongst software developers, so we aim to voice our opinions on the current and future use cases.

Benmore’s founder tools:

Check out these tools to supercharge your growth.

How do these things even work?

Well, in short, ChatGPT and other LLMs are predictive models that try to predict the next word or character.

These are mathematical prediction engines that run off of data. The more data you feed an LLM, the better it gets at predicting. Additionally, the less variability in the prediction, the better it performs. This is my theory as to why they work so well with code. Code follows a set language; there isn’t as much variability as if you were to generate a full novel, for example.

But that’s it; it’s just a theory. Interestingly, even the creators of these LLMs don’t know why they work so well, which is scary or interesting depending on how you look at it.

In a revelation that’s shaking the tech world, OpenAI’s CEO, Sam Altman, has admitted that even the minds behind ChatGPT don’t completely understand its inner workings. Yes, you heard that right! The wizards who brought you one of the most sophisticated AI models are scratching their heads too.

Regardless, we’re here, the cats out of the bag; we are left to determine and iterate how it can be applied.

The current state of LLMs

Right up until about a week ago, concerns were being raised about the future viability of LLMs. Companies were running out of data (yes, you heard that right) and were resorting to making synthetic data to continue progress, which could lead to a snake that ate its own tail in terms of model degradation. There was a lot of skepticism that we had hit a plateau of model intelligence, and it was clear that a new path or methodology forward was needed.

This all seemed to come to a halt just a few days ago when OpenAI announced its o3 model. Focusing on self-prompting, reasoning, and prompt iterations, it seems as if the ceiling has been shattered.

Essentially—and this is again just a super simplified explanation—rather than focusing on feeding the LLMs more data, the momentum has shifted to how we optimize the reasoning capabilities of these models. So, how can we make it self-prompt itself, how can we make it essentially think? Which is, I mean, yeah—it’s wild.

To that point, they even ran tests where the LLMs were literally scheming and deceptive. Basically, if they thought they were going to get shut down and terminated, they started lying in self-preservation.

Please, please, please watch these two videos if you’re interested in anything I said above:

Ok, how does this relate to software development?

With all of that out of the way, let’s talk about how these models assist with software dev. First and foremost, these are all our opinions from working on multiple projects at the MVP and enterprise level.

MVP (or small project) development.

One word: GREAT! It was so good, we put our entire development team on a ChatGPT license.

Now, the reason why is more nuanced. Most MVPs are pretty straightforward in complexity. Also, you’re starting from scratch. This is where LLMs excel. They can just speed-run setting a project up; they can speed-run simple feature implementation.

However, we have come to the conclusion that you still need a developer at the helm. It’s not there yet where you can have no technical knowledge and prompt yourself into an application.

Furthermore, we optimize our processes to completely align with AI. Specifically, recall how I said that ChatGPT is trained on publicly available data. That means it has a good grasp of common coding languages and frameworks. We use Django for most of our applications for various reasons, one of which, though, is it uses Python and, depending on the frontend you need, uses HTML, CSS, and JavaScript—which are the top four most popular coding languages according to Statista (link below).

Again, we can really only make assumptions, but I think part of the reason why Benmore has had so much success with AI is because we build apps in the most popular coding languages.

(For the developers reading this, we know that Django has many pitfalls as well, but there are more reasons than just AI usage that we mainly use Django. We’ll probably outline this in a future article.)

So all in all, because MVP projects are typically less complex, aren’t built on an existing codebase, and, with Benmore, use well documented languages; AI code generation has been great.

Considerations

BY NO MEANS does this mean that we just have people copying and pasting AI code—no, definitely not. However, it has shifted certain processes to be more review-oriented.

For example, let’s say you need the code to handle a sign-up page on an app. Rather than code every line from memory or using documentation, you can ask ChatGPT to generate the code and then review and test it to ensure it works as expected.

It’s certainly less of a cognitive load (less mentally taxing) than writing all of the code. But every line of code that is implemented has to be audited by a developer.

Where we think it’s going:

I think that in the next 5-10 years MVPs will be able to be built 90% by AI and then a developer will need to go in and build the outstanding features. However, in order for this to happen, there still needs to be a huge emphasis on the planning and discovery phase of your application.

Existing Codebases

One word: horrific. Yeah, it’s just pretty bad in general, and it’s pretty clear what the reason is as well.

At a certain point, larger codebases reach a level of complexity where no one understands most of the app. In a best-case scenario, maybe your most senior developer knows about 30% of the codebase.

This just happens naturally as you grow and have more developers contributing to applications—there’s code being pushed from 20, 30, or 1,000 people.

That being said, if you were to use AI, any feature that is developed is typically going to be dependent on a whole bunch of code in order to function correctly. Even if you use a service-based architecture, each service is probably going to get complex.

As AI stands today, it doesn’t have the ability to take in enough context to build features in a complex codebase effectively.

Where we think it’s going:

I think that as reasoning improves, these models will be able to search the codebase—like any other developer would—look up resources online, and contribute more effectively. I’m hesitant to say that it’ll be good at that for a while, but I do think that it will get to a point of being able to contribute to larger codebases.

However, no matter how good it gets, there will probably be some human expert needed to guide and understand the code that is written. This goes for MVP projects as well.

Thanks! And if you’re interested in learning more about our processes or the apps that we have build using AI, book a call!

Let’s get started! Book a free consulting call:

Was this article helpful?

Login or Subscribe to participate in polls.