Probably Dance

I can program and like games

Tag: LLM

How LLMs Keep on Getting Better

If you look at the source code of a modern open source LLM, it looks very similar to the transformer described in the “Attention is all you need” paper from 2017. It’s just a stack of exactly three components: attention blocks, matmuls, and norm layers. The big algorithmic changes, like Mamba 2 or linear attention variants, aren’t really used yet. But look closer and almost everything has changed in the details.

The story of how LLMs keep on getting better is one of pushing for big and little improvements in a hundred different directions. Turns out hill climbing can get you to a really good place if you just climb along enough dimensions. This makes it hard to notice changes as they’re happening because they’re so small, so lets look at the last two years and see how many small changes there were to add up to the big improvements we saw.

Read the rest of this entry »

How I use LLMs to program

Studies have shown that LLMs help novice programmers more than experienced programmers. This matches my experience. At work I see that interns or new hires have some LLM window open almost all the time. I use them maybe once a week. But you could say the same thing about Stack Overflow. I used it all the time when I started programming. Now I use it occasionally. While it’s easy to point at their obvious issues, I think they are also clearly a net-positive on average. So how do LLMs help me?

Big plus: Languages that I don’t use as often

I don’t often write SQL statements. I can obviously write the simple ones, but SQL is a language that has all the features you could ever possibly want, and I don’t know how to use them and don’t know how to google for them. So I ask a LLM. Similarly for javascript/css/html programming. I used to hate doing web frontend work, now it’s not so bad because LLMs can help me get out of the tricky edge cases.

I have also used LLMs to translate functionality from one language to another. E.g. if I know what a function is called in C++ but I can’t find an equivalent one in the standard library of another language, an LLM will often do a decent first pass of rewriting the C++ function in the other language.

Small minus: The code is overly generic

Read the rest of this entry »