How long do you think until AI writes and debugs code better than the average programmer?

PumpkinDrama@reddthat.com · edit-2 10 months ago

How long do you think until AI writes and debugs code better than the average programmer?

stinerman [Ohio]@midwest.social · 10 months ago

The code has always been the easy part. The hard part is getting requirements that make sense. It will continue to be the case with bots doing the coding.

2xsaiko@discuss.tchncs.de · 10 months ago

Yes. This is the new “visual programming will make executives be able to write programs themselves” but this time the technology (assuming OP means LLMs since both the ones in his image post seem to be LLMs) is completely unsuitable from the start.

ezchili@iusearchlinux.fyi · edit-2 10 months ago

I’ve had 100% failure rate on simple requirements that require a simple spin on well known solutions

“Make a pathfinding function for a 2d grid” - fine

“Make a pathfinding function for a 2d grid, but we can only move 15 cells at a time” - fails on lesser models, it keeps clinging to pulling you the same A* as the first one

“Make a pathfinding function for a 2d grid, but we can only move 15 cells at a time, also, some cells are on fire and must be avoided if possible, but if there is no other path possible then you’re allowed to use fire cells as fallback” - Never works

There for that last one, none of the models give a solution that fits the very simple requirement. It will either always avoid fire or give fire a higher cost, which is not at all a fitting solution

High costs means if you’ve got a path that’s 15 tiles long without fire, but way shorter with fire, then sure, some fire is fine! And if you could walk 15 tiles and go to your destination but need to walk on 1 fire, then it will count that as 15-something and that’s too long.

Except no, that’s not what you asked.

If you try and tell it that, gpt4 flip flops between avoiding fire and touching the price of tiles

It fails because all the literature on pathfinding talks about is the default approach, and cost heuristic functions. That won’t cut it here, you have to touch the meat of the algorithm and no one ever covers that (because that’s just programming, it depends on what you need to do there are infinite ways you could do it, to fit infinite business requirements)

dQw4w9WgXcQ@lemm.ee · 10 months ago

I’ve had a lot more success in debugging than in writing code. I had a problem with adjusting the sample rate of a certain metrics framework in a java application, and stackoverflow failed me, both when searching for an aswer and when asking the question. However, when I in some desperation asked GPT 3.5, I received a great answer which pinpointed the necessary adjustment.

However, asking it to write simple code snippets, i.e. for migrating to a different elasticsearch client framework, has not been great. I’m often met with the confident wrong answers.

JohnDClay@sh.itjust.works · 10 months ago

The average person who has programmed something, or the average professional career programmer? I doubt the AI will be great a dealing with the really weird bugs and conflicts any time soon.

👍Maximum Derek👍@discuss.tchncs.de · 10 months ago

Individual scripts/modules and even simple microservices: not long… provided the AI isn’t actively poisoning itself right now.

Writing, securing, and maintaining complex applications: We’d need another breakthrough.

Since my role is often solutions architecture I’ve been worried about cloud systems engineering being something that’s immediately vulnerable. But after working with AWS’s Q for a couple hours, I am less worried. But if someone made an AI to create a cloud provider that is well (and accurately) documented, consistent in functionality and UX, and which actually has all the features that get announced in its own blog posts; then AI might be able to run it.

PumpkinDrama@reddthat.com · edit-2 10 months ago

Google Gemini Powered AlphaCode 2 Technical Report

HumanEval achieved 74.4%, surpassing GPT-4 at 67%. It successfully solves 43% of problems in the latest Codeforces rounds with 10 attempts. The evaluation considered the time penalty, and it still ranks in the 85th percentile or higher. AlphaCode 2 already beats 85% of people in top programming competitions (which are already better than 99% of engineers out there). So, I believe AI already writes better short code than the average programmer, but I don’t think it can debug any code yet. I’d say it will need a platform to test and iteratively rewrite the code, and I don’t see that happening earlier than 3 years.