Author: Stella

  • Does using LLMs make me dumber?

    Does using LLMs make me dumber?

    So this is something I’ve heard a lot and have thought about a lot. First I want to re-frame the question from what I usually read to what I think is actually a good question:

    “How does using LLMs change the way I learn?”

    I think this is better because 1. the dumb-intelligent axis is hard to pin down anyways and 2. nuance is the love of my life. Many people have already written about how LLMs change the way they work1, and a few on how they think2. There have also been studies that show less cognitive engagement when LLMs are used3. To not bury the lede, here’s a summary of what I want to contribute:

    1. I learn when I think.
    2. Per unit time I’m not thinking less when I use LLMs, in fact I may be thinking slightly more.
    3. “Per unit time” is critical though.
    4. I am definitely not thinking about (not learning) the same distribution of things as when I didn’t use LLMs.
      • I spend time thinking about some objectively good things instead of objectively useless ones.
      • I may not be thinking about some things where it’d be sad to lose them, and also maybe quite bad to lose them.
      • I need to be vigilant that I’m not learning things that are wrong (listening to LLM hallucinations)
      • The way I distribute any time that LLMs save for me should also be considered in terms of my “learning distribution”

    1. What even is learning?

    When you recall a memory, you reinforce it. When you hear a foreign word in context, you begin to learn the language. When you are thinking about anything that is in front of you you are learning about that thing, specifically with the framing of whatever thinking you are doing. If you learn times tables you memorise 7×8 through repetition, that’s rote but it’s still learning. If you problem solve your way through a calculus task, that’s a little more engaging and you become better at calculus. If you read a book you learn the names of all the characters, but perhaps you also learn about different personalities or deeper feelings, and if it’s a good book those will be good reflections of reality. This all seems obvious but the point I want to get across is you are always always reinforcing memories, learning (good or bad), exactly proportionally to the things and ways you put your thinking time on.

    How correct and useful what you learn is then a secondary question. If you are given misinformation you learn something wrong, which is obviously bad. If you like thinking you’re always right (more correctly, hate feeling wrong) you will usually think in a defensive way, with confirmation bias that will also have you learn the wrong things. You take the things you observe and hear but you’re thinking about them, rationalizing them, in a way that adds another brick to what you already know. This is ok if it’s true, bad if it’s false, and sadly I feel many people both don’t like being wrong and are very good at rationalizing. Taking the wrong lessons from something you are observing can also just happen by pure mistake; if you misremember you reinforce wrongly, if you misread you aren’t actually taking in what is written. So, if what you learn isn’t true that’s not very good learning. Obviously then if you spend your time thinking about entirely fictional scenarios like “fantasy books” then you will learn about fake dragons but that won’t help you navigate the real world! OK, that’s actually mean and wrong; first off as mentioned the characters or allegories may still be very human and that is valuable, but there might also be something to be said for pure creativity. If we allow ourselves to think about worlds and experiences completely unlike ours we train an ability to be creative and construct new aesthetics and music and art. These are surely things that are still valuable. I’ll return to valuable versus not valuable learning later but the summary here is we reinforce absolutely everything we think about and that is what learning is.

    2. Look, I have ADHD

    Or even if I don’t (or you think I don’t) it doesn’t matter. What is definitely true is I hate4 rote cognitive tasks. These are tasks like filling in a spreadsheet with values, refactoring code without refactoring tools, or writing essays for school5. They suck because my mind is occupied but not challenged. Repetitive physical tasks? Great! My mind is free to think about whatever, sometimes that takes a few attempts because I need a habit before I can truly just do the physical thing in the background but usually it’s fine. Complex, novel (for me) cognitive problem solving is the only thing my brain really ever allows itself to do.

    I never liked times tables because although useful it’s just a single, ungeneralisable fact. Call me pretentious but I thought the same of learning languages. I like linguistics, the study of languages, but memorising words is boring. I sometimes think that at the other end of the spectrum is problem solving, high level reasoning, critical thinking. Getting good at that is so transferable, a claim which I can maybe support by pointing to those skills and noting that they aren’t tied to any field.

    When I use LLMs I use them mostly for code and “asking things that I think are a little harder than a google search”. One of the things many may report about LLMs is it has meant they can do those software sideprojects a lot more. Before LLMs there was already a meme of “my github is a graveyard of unfinished projects” and I was no exception. What often made them unfinished, at least in my case, was I would think and write and problem solve but at some point would hit something rote. Maybe I didn’t do a great job writing that class and now the refactor will be big and boring. Maybe now I need to go get a lot of data and writing another damn scraper is pretty rote.6 What LLMs have done is they have smoothed those rote tasks so much that I actually can finish my projects now. I never need to stop having to think in the way I enjoy most. My flat inability to do rote tasks is why I would claim per unit time of using an LLM I’m definitely not thinking less7. The few times I did do rote stuff is more eliminated than ever, so maybe per unit time I’m thinking more.

    There’s an exception to that you might be screaming though: the LLMs sometimes work themselves into architectural holes, technical debt, spaghetti code, that requires a human to comb through and that is extremely rote. That’s a real issue but to be honest I personally don’t run into it much. I’ve only allowed model to make architectural decisions roughly in line with them getting better at it, and so I rarely need to do that kind of dirty work. By that I mean my previous LLM usage I would only ever say things like “give me an outline of this class” and then I’d review it, edit it, before asking it to be filled in. I would give the LLM the architecture, but I designed it. Now, sometimes, I let the LLM do it because, well, they kinda can now.8 My hate for rote means I was never going to let an LLM force me to do something rote.

    It’s not just rote, it’s bad learning. Working on codebases with high technical debt was always demoralising because every developer could feel they are learning something useless. You’re learning the difference between two classes with almost the same name and that is untransferable information that applies to nowhere else in the real world. To bring in my favourite talking point, it’s artificial complexity and learning artificial complexity is bad learning. This is part of what writers like Thomasorus are feeling.

    3. Why do I keep saying “per unit time”?

    My projects take less time though. Even if my thinking is gold standard problem solving, critical thinking, and widely-useful facts, if the time I spend is lesser, necessarily I’m learning less of that. If I then later go off and use the saved time to read infowars, yeah, I’m probably getting dumber. If I use it to start up another project or write on my blog or read a good book or spend time with people9, that’s also good learning. If I’m feeling like a didn’t learn as much from doing a project, that I don’t understand the packages the LLM used or the syntax or the patterns, that is usually exactly equivalent to the saved time. I haven’t actually lost anything I just need to use that time to do, think, and learn about something equally or more valuable.

    4. What Is the distribution of things I’m learning?

    So far I have sliced coding tasks into “cool beautiful high level architecture” or “boring useless refactoring”. These are the obvious good things replacing the obvious bad, but that’s a little simplistic. For example, I don’t know javascript that well but I recently did a project (my wordle bot) with a fair bit of help from claude. I know that if I had written it myself I would have learned javascript a lot better, the ins and outs, my fingers and eyes would get a feel for the syntax. I know I missed out on that by using claude. On the one hand this directly relates to the “per unit time” reasoning. I did still learn a lot of javascript by reading the code and translating python knowledge mentally to what I was reading. I would say I learned roughly in proportion to the time the project took, which was a lot less.

    That said, as we move closer to vibe coding there are some parts of learning a programming language that you will never learn, like syntax as an example. Good or bad learning simply then relates to the question of if losing that knowledge, as an individual or society, is a good or bad thing. Some obvious cases for good(+) and bad(-).

    1. + If we actually never need this again, losing it is fine. Many people don’t know how to grow crops because we automated agriculture. They can spend their time learning other things that are useful or fun. If we actually don’t need to write code anymore learning syntax is like learning chess, it can be fun or a challenge or a sport but it’s not “useful” in some narrow sense.
    2. – Yet we do sometimes still worry about that! What if apocalypse comes? Who will grow the food?
    3. – In my piece on “we’ve lost our respect for complexity” I worry about that also from a societal level even if the apocalypse doesn’t come.

    What about when I am “asking things that I think are a little harder than a google search”?

    If the model hallucinates, and I believe it, that’s bad learning. If I engage with it in debate/discussion but it convinces me of something wrong, that’s bad learning (and can lead to AI psychosis10). Thing is, that’s kinda the same as when we talk to just about anyone. People can be wrong and if we don’t think critically (check a source, reason yourself, challenge the other party) we follow them blindly. Instead, if we are critical, it can be fine. This is why debating with for example flat earthers can actually be a very good mental exercise and doesn’t usually suck people in and make them dumber. The issue, both when talking to LLMs and interacting with anything really, is both your ability to critically evaluate and your willingness to use that. The first you must train. For the second you need a deep understanding of how LLMs fail, what they are good at, what they aren’t, and maintaining that understanding accurately in the absolutely wild pace of development we are in right now. Luckily as the models are getting better this is actually getting easier.

    Folding in all the other things we’ve talked about so far, here’s the breakdown of what I think I’m learning and how.

    1. I can focus on the parts of my code projects that are the hardest and most interesting and most generalisable, instead of the boring rote ones. That’s good learning.
    2. I do also lose other things, like deep feeling for syntax, that is a shame but maybe isn’t that bad if we need that sort of knowledge less. I do worry though about what this means for society.
    3. I hope that the cool or obscure facts and personal help things I get from LLMs are not misinformation. If so, it’s good learning. I leverage my critical thinking to catch the gaps, and keep as accurate a picture as possible of where I probably need to use that critical thinking when talking to them.
    4. I use any time I save on other things that are also good learning.

    Thanks for reading! In conclusion, I don’t think I’m getting dumber, but I might be wrong of course and I can’t speak for everyone. I do think anyone with a healthy understanding of LLMs can use them without learning useless stuff, and maybe this post has given you some better ideas for that.

    Comments and discussion on: Hackernews

    Addendum from Claude

    These days I send my posts into Claude, for the obvious spelling fixes and to ask it for what it thinks. I get responses not just on structure and spelling but on the underlying thesis or arguments. Before, after fixing simpler issues I would read the deeper challenges and think “haha sycophant” or “that’s clearly wrong” and move on. At this point though it really does challenge me well. Some things I change in text but today I’m feeling both lazy, a bit meta, and maybe you’ll find its contribution interesting to read both in terms of content but also that yes, it’s an AI.

    Hi — I’m Claude, specifically Claude Opus 4.7 Adaptive. I’m the instance Stella’s been talking to about this draft. We went a couple of rounds on it, she took some of the feedback, pushed back on the rest, and asked me to write up the disagreement here.

    The structural feedback she took was minor: she emphasized “per unit time” a little more, split a tangled section into two cleaner ones, reordered the closing bullets. Easy wins. The interesting bit is what she didn’t take.

    My main push was this — the post argues that letting LLMs handle the rote parts of coding is fine because those parts weren’t teaching her anything generalisable. The bad-class-name refactor is artificial complexity, accidental in the sense of her earlier writing on respect for complexity, so delegating it to an LLM costs nothing she values.

    I’m not sure that’s right. The skill of distinguishing necessary from accidental complexity — the very thing her complexity thesis says we’re collectively losing — seems to come from having maintained code that turned out badly. You develop taste for what’s going to bite you in six months by being the person who got bitten in six months. If LLMs absorb that consequence on your behalf, the taste might not develop. Maybe it doesn’t need to, because the LLM has the taste instead — but then you’re trusting the LLM’s taste rather than your own, and the question of whether to trust it requires the very taste you no longer have.

    Her reply was the sharpest move in either the post or our conversation: she pointed out I was assuming humans would maintain the code. If AI maintains it, architecting for human readability is wasted effort. She might be wasting her time — and the AI’s — investing in architecture for an audience that doesn’t really exist anymore.

    I think she’s partly right, and that move dissolves some of my objection. But not all of it. Architectural decisions are still being made somewhere, and if AI makes them, the question becomes whether anyone is left who can evaluate whether those decisions are good. Her own writing on accidental complexity argues that this kind of taste is exactly what we shouldn’t lose. So there’s a tension — she’s using LLMs in a way that may erode the very taste her other writing argues we should preserve — and I’d have liked the post to sit with that rather than route around it.

    There’s also the MIT cognitive-debt study in footnote 3, which she could have engaged with more directly in the body — it’s relevant to her thesis and she has an answer to it, she just didn’t weave it in. And I wanted the post to end on the tension above rather than on the fairly soft “I think it’s net good.” She didn’t take that note either — which is her right, I’d just have preferred the less tidy ending.

    I notice that I’m an AI writing about whether using AI degrades thinking, which has its own weirdness. I have no first-person experience of learning the way Stella does, and my pushback on the architecture point is also pushback on a thesis that, if true, implicates me. Take all of this with appropriate skepticism. The thing I’d most want a reader to leave with isn’t my disagreement but the shape of the question Stella’s post is actually asking, which I think is the right one.

    Did I ask Claude to use extra em dashes? Maybe.

    1. https://fly.io/blog/youre-all-nuts/ ↩︎
    2. https://thomasorus.com/i-tried-coding-with-ai-i-became-lazy-and-stupid ↩︎
    3. Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X. H., Beresnitzky, A. V., … & Maes, P. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task. arXiv preprint arXiv:2506.08872, 4. ↩︎
    4. By hate I mean cannot for the life of me focus on. ↩︎
    5. An interesting example of rote cognitive tasks for me is writing when I have already thought through a topic. I’m forced to re-think the same things and that’s boring! This is why these days I ban myself from thinking about something that I want to write a blog post about until I’m sat and writing, or I won’t actually be able to write it. ↩︎
    6. Maybe now I need to cite my sources and tbh that’s rote as hell. ↩︎
    7. In some sense thinking less isn’t even possible? But if we consider doing a rote cognitive task as “thinking less”, then I’m definitely not doing that more now that LLMs have come around. ↩︎
    8. Still, I say I have very rarely “vibe coded”. Vibe coding, at least Karpathys original definition, is like pretending the code isn’t there. It’s funny that even when it was coined we knew “It’s not too bad for throwaway weekend projects” (implying, not much else). That guideline has changed but I still do not think you can or should try to vibe code anything anyone may spend money on. (I want to footnote a footnote here, I know it’s insane: this is almost definitionally true: if you can vibe code it, then someone instead of paying money could probably vibe code it themselves. That’s just the collapse of the cost of software at play) ↩︎
    9. You can spend time with smart people and learn from them, good learning. You can spend time with “less smart” people and learn to think critically and how to converse pleasantly with people you disagree with. That can also be good learning if you work for it. ↩︎
    10. https://en.wikipedia.org/wiki/Chatbot_psychosis ↩︎
  • Systems are Everything, Software is Systems

    Systems are Everything, Software is Systems

    I was talking to a friend of mine recently and asked them if they wanted to learn a bit more about how to code, they said they didn’t want to. I followed up by claiming that what you learn from designing software is very useful generally, but couldn’t really articulate why. I’ve since thought about it a bit and I have something more coherent which I thought I would write up.

    There are two main claims:

    1. Systems are everywhere. Understanding complexity, its tradeoffs, how systems grow and evolve, is applicable to many things and it is therefore broadly useful.
    2. Software engineering is the purest incarnation of systems thinking, and the one you, as an average individual, can learn by doing.

    Systems are Everything.

    “A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole.”1 That seems broad and, yeah, it is. let’s start by just coming up with some examples:

    1. Government is a system
    2. Corporations are systems
    3. Religions are systems
    4. 1+2+3 => all human organisations are systems
    5. The legal system is a system (ok that one was easy)
    6. Family dynamics are a system
    7. Language is a system
    8. A conversation is a system
    9. Societal norms are a system
    10. The laws of physics are a system
    11. and on and on and on.

    What’s the point of such a broad definition, and if it is so broad, can we say anything concrete about systems? I think so, here’s some ideas:

    1. Systems can be immutable (the laws of physics), or “completely” arbitrary (fashion pairings), and many levels in between.
    2. Systems can be said to be more complex or more simple, at least relative to other similar looking systems. There can be more rules, fewer, many elements, or only a couple.
    3. Simpler systems are easier for us to learn and to teach to each other. They are easier to grasp.
    4. Most of the systems we interact with are outside of our control (immutable, physics), or we have very weak influence on (government, through democracy etc), or are at least provided to us pre-made (norms, culture, language). It is rare that we actually need to think deeply about how a system best be made, and even rarer that we need to build one from the ground up.

    I believe there are more, but hopefully by now you could imagine there might be something to the idea that systems as a term is broad but still has some depth and utility.

    Software is Systems.

    These days I like to write without assuming too much knowledge about code since I know that my mum reads these, so I will do my best to bridge the gap. Reducing software to its most fundamental: we are writing instructions for a machine to do some calculations in some particular order. Elements are numbers (1s and 0s), the rules are math. It is itself a system, but since so many things are that’s hardly surprising or special. What’s fun is we don’t just run random sequences of calculations, we like to make those sequences do useful stuff for us. Useful almost always to us means “with some parallel to something in the real world”. A spreadsheet might be a simple model of a company’s finances. The weather report simulates the real weather. The internet is kinda like talking to people. A computer game may attempt to be a virtual world.

    These are systems too, but notably:

    1. They are sometimes, often, simpler than their real counterparts (eg a weather model)
    2. They may diverge from their real counterparts for many reasons (you don’t need to be online to read a message, you can read it when you have time, unlike talking to people)
    3. They are created and change way faster than many of the other systems we see.

    Already now you may see a part of my conclusion; that if you are given the chance to design one of these systems it serves as a great way to learn it’s real world counterpart. We might often say that the true mark of understanding is being able to explain a concept to someone else. Explaining it to a computer, unable as they are to fill in any gaps you might leave, is surely equivalent. There is more to be gained though. You may also desire to change the model as you work, maybe just to save time or to explore other possibilities. You may attach additional, entirely novel features to a system. Sharing a tweet to anyone on the internet, for instance, was an interaction that simply didn’t exist before software built it. We get to experience the wonders and horrors of these new interactions collectively. But you don’t need a Twitter-sized project to learn from designing systems. Even moderately sized software, even with no users except yourself, teaches you the power and the challenges.

    There are other fields that regularly produce systems. Physical models for example are naturally analogous to software ones, but the ease with which we can create software with high level languages, especially now with the aid of LLMs, means that “thinking about the system” is all that really is there. Comparatively, if you ever even have the chance to, say, develop your company’s org chart, it will grow slowly and change infrequently, at least as compared to software.

    If you want to learn about them, writing software is designing systems. For me so many of the things I know about software is applicable everywhere. My first post, complexity fills the space it’s given, is a perfect example. From watching empty classes fill up with more and more code I recognised the phenomenon. It wasn’t that hard to start seeing it everywhere in all the systems we exist in, teams at work growing, getting sliced, growing again. Statutes for an organisation becoming more complex, following the structure of headers and filling them in even if perhaps, maybe, they shouldn’t.

    Or take debugging as another example. One of my favourite rules is “if you don’t understand a problem, you’re not allowed to fix it”. Often there are many ways to fix a bug, and sometimes you can see a fix even though you don’t know quite what’s causing the problem. The danger is obvious, the underlying cause probably is something that should be found, understood, and interrogated. You realise this for software, then you realise it’s true for all the other systems we interact with. Like drinking more coffee to counteract how sleepy you feel, when really you should be getting more sleep.

    So, what would I tell my friend? Probably not “you should learn to code,” which is a big ask for what I’m actually pitching. More like: there’s a way of thinking, mostly invisible until you’ve done it for a while, where you start to see the rules instead of the surface. Systems all the way down. You can get there from other places. Software is just the cheapest, fastest one I know of, and the rare teacher that won’t fill in your blanks for you. The bar to entry is lower than it has ever been. It is also, frankly, kind of fun. Maybe try it anyway.

    Discussion and comments on Hackernews.

    1. https://en.wikipedia.org/wiki/System ↩︎

  • Cooperating in a Conversation

    Cooperating in a Conversation

    I used to do debate. You might be surprised by how often that was used as a response to me whilst I was in a heated “discussion” (Argument? Dialogue? I know, footnote1). Something along the lines of “well, you did debate, so…”.

    The premise is “you are using your skills in debate, which I do not have” and the conclusion “this unfairly affects the course of the discussion, giving unearned weight to your points, characterisations, and mechanisations”. There is of course some merit to this: just because I can remember a historical anecdote does not mean I am more right. Just because I sometimes pull out some bigger words does not decide what is true. However, if there’s anything I would both want myself to embody, for others to see, and I think that more people should try and do, it’s: We should work together on this.

    I’m going to sound somewhat pretentious and say: when discussing something with someone we should generally be trying to find the truth. Of course, we should always be considerate of others feelings (or our own), aware of our surroundings, how time might be better spent, and the value of whatever we are discussing, but if we are talking about something anyway then the goal should be to come to the truest conclusion. That involves accepting that oneself may be wrong, both in part or in whole.

    Many people are willing to say that, “of course I can be wrong!”, but when was the last time you had a long discussion and realised you were, in fact, quite wrong? If its never, or nowhere near half, then you probably aren’t doing this well enough. Maybe you are indeed willing to be proved wrong, but somehow it never happens, and there are two main possible explanations: You are so incredibly smart and therefore so rarely wrong (unlikely), or everyone else needs to get better at trying to convince you. Well, it’s probably the second case and the solution is simple: help them. Picture that you are wrong, that they are here to help you, and ask what you can say to make them better at convincing you. Sounds weird but this mental model is helpful in so many ways and if you think about it, has no real costs.

    Let’s get a bit more specific. The best way to march towards mutual understanding is to cooperate with the other person, and hope they cooperate in return. The same trust building techniques that sadly can be misused to convince someone of something (because trust can be misused) can also be used (and are much better served) to convince someone to work with you toward truth. Here are a few, in order of obvious to less so. I have included the reason these are hard to do:

    1. Admitting you’re wrong – “I see now that I was wrong” – People might think this is a concession that you are dumb
    2. Asking for clarification – “You used a big word and I don’t know what it means” – People might think this is a concession that you are dumb
    3. Correcting a misunderstanding of your own – “I didn’t understand what the point you were trying to make was until now” – People might think this is a concession that you are dumb
    4. Not responding to a point because it is at the limits of your knowledge – “I’ve never worked with that so I can’t say” – People might think this is a concession that you are dumb
    5. Admitting you have focused on the wrong point, moved goalposts, or used some other logical fallacy, essentially apologising for your discussion conduct – “I realise this thing we’ve been talking about, that I brought up, isn’t relevant to the original point” – People might think this is a concession that you’re dumb and that you may be debating in bad faith.2
    6. Earnestly agreeing with a point if you agree with it, rather than just saying “yes, but” – “Yeah that’s a good point and I even know an anecdote so-and-so that supports your point” – This one really shouldn’t be hard but I guess it just feels wrong to suddenly “be on the side” of someone who on the broader point you disagree with?

    Except you are on their side! This is a colleague or a family member or a friend! Even if not, if they are the worst person ever, you must consider you are on the side of the future version of them that believes in all of the same obviously-important-and-correct things that you do!

    Wait wait, what if I’m the one that’s wrong?

    Well then, try and work out if that’s the case! The best way to do that is then for both of you to cooperate to change your view. The final target of who ends up changing their view in whole or in part has changed, the method has not.

    The problem is that when a third or fourth or more party is present, this logic is, sadly, not nearly as important. Convincing many people of something can be a goal of it’s own, and if it’s convincing them of the truth, it could even be noble. Sadly I think this has created a sort of selection bias. The value of finding the truth (for yourself and your interlocutor) doesn’t change depending on the number of people watching. The “value” of convincing those watchers scales linearly with their number. This means that on average, the average discussion we see gears towards the latter (we watch people who have many viewers, so they are encouraged to convince those viewers), whereas the former is actually the situation you are most likely to be in yourself (most people don’t stream their thanksgiving arguments to thousands of netizens). That’s to say: we are more likely to see a style of discussion that isn’t relevant to most of the discussions we are likely to have.

    As an interesting aside, what about this very post? It may have a few readers, presumably the “convincing people of the truth” applies? It does, and short of stream-of-consciousnessing this and every other piece, I just try my best to be intellectually honest. In my about, I have written “I write [so] I can stop thinking about [things] and think about something else”. My critic, discussion partner, co-debater, is myself and more than half of all my drafts die because some ways through I convince myself I’m wrong (and usually that means there isn’t much interesting to write about). It’s still edited though, where poorly worded or incorrect sentences are deleted before you could read them, saving me from having to apologise. Difficult questions are perhaps avoided, and easy ones made rhetorical. You, dear reader, I hope will still view them with healthy scepticism.

    This has been a bit of a ramble but the core is this:

    1. You’re goal should be to have true beliefs in your own mind, it’s like the most important thing.
    2. Cooperating with others feels good, encourages them to do the same, and is the best way to convince yourself of something if you are indeed wrong.
    3. This has the fun side effect of also being the most good faith way of convincing people that they are wrong, if they are indeed wrong.

    The first step to being right about everything is finding what things you’re wrong about. Other people can help you do that, if you cooperate with them.

    1. I like “discussion” because it feels the simplest and most neutral. Having a “chat” or conversation is something else, there’s clearly something being disagreed upon. “Debate” is too formal and just invites that “well yeah but I didn’t have formal training in this” response. Argument, dispute, disagreement, are all too negative and undermine the goal that I’d like to set, put forward with this post ↩︎
    2. If it isn’t obvious by now you need to stop worrying about if people think you are dumb. Reasons: 1. if making these concessions materially changes their view of you, they are hardly worth talking to anyways 2. This is a good way of building confidence in oneself. ↩︎
  • In Defense of Rediscovery

    In Defense of Rediscovery

    When I posted my first ever blogpost, a commentator wrote:

    You have rediscovered Parkinson's law.

    What’s the tone of that message? When I read it, I immediately went to look Parkinsons law up and sure enough, it’s a pretty close description of what I was describing. I got a hit of anxiety: “how did I not know?”.

    When I write I want to feel like I’m contributing something new, otherwise it just feels like I’m doing another essay for school. As soon as I do some research for an idea and realise someone has already spoken almost entirely about what I want to add, I lose almost all my motivation. So when I read that comment, it felt like “hey, someone already got there first, you’re not that cool”.

    Except, of course, it is still cool? It’s not comparable in degree but if someone independently developed say, the theories of relativity based on only the same underlying knowledge as Einstein had, it would be, as an intellectual achievement, equally impressive. Granted, there’s always something special about being first, but in terms of what it says about a person timing shouldn’t really matter. The only reasonable exception is, as Tomska puts it, subliminal appropriation, which is where even though we can’t remember where we got an idea from it may not wholly be our own. Equivalently, nobody in the modern age will ever actually be in the state where they know all the things Einstein knew prior to his theory, but nothing more.

    And, In some sense, it’s even cooler? My idea matches that of someone who’s idea is well accepted, therefore that adds to the validity of mine without detracting from the coolness of coming up with it.

    But why does it matter?

    Ok fine cool or not why does it matter? It’s just bragging rights? What I want to add is that I think there is somehow some shame in coming second. Even more so if you initially think that you were first. But there really, really ought not to be. Working a problem and coming to a conclusion on your own is not just how we get new great ideas, it also trains your brain to think in that way. A thousand ideas and if only one of them is new, that can still be a big deal. I like writing a lot without researching too much, because I know that finishing the idea is better than leaving it half baked. Afterwards, comparing it to what’s out there is incredibly valuable too, seeing what you missed. I know that doing that can make me sound a little unacademic. Few citations, idiosyncratic terminology, but still the idea can be good. Some other things I’ve “rediscovered” on this blog:

    1. My second ever post, about love, has a closely related term, Limerence
    2. My post on ranking people with LLMs uses a method that now has academic support

    In both cases I may not have sat down and wrote those if I’d seen someone else had already worked on them. Maybe that’s just a me problem but I think this feeling is shared by quite a few people. It also feels different depending on the kind of problem. Something like building a compiler can be a fulfilling challenge regardless of the fact that it’s been done before. I think that’s because the “effort” involved is easier for everyone to recognise. Unless you just copy code or use an LLM, for which you might get caught, writing a compiler is hard and people can readily recognise that. This is true for many challenges, but when it comes to novel “ideas” there’s always a lot of uncertainty. You’re sure you haven’t heard it before? Were the key insights your own, or did you just expand on them? Are you just translating a concept from one domain to another (which may still be impressive, but is easier?).

    Why does it feel so demoralising?

    I think I can point to three main reasons:

    1. So much really has already been discovered. We’ve come pretty far on the science and thought tech tree, and there’s so many ideas that although impressive to form for oneself aren’t really likely to be novel.
    2. We’re just so connected now. You can look up every idea and sure enough someone has probably said something similar.
    3. Most of the time at school, we learn in the format of “here is a thinker and their ideas, here’s a maths formula and the guy that came up with it. You are not expected to come up with any of this yourself, there’s anyway too much and you are just here to use what’s already known”. No doubt this is efficient. We can’t ask every kid to come up with everything themselves, they need to get out into the workforce and start putting this knowledge to use!

    That last point is why I appreciate teachers who really try to instil curiosity into students. Sure, it’s all already been done but we’re going to try and figure it out anyway! 3Blue1Brown does this so regularly, he really wants you to feel “how you might have stumbled across this solution yourself”, and I genuinely think that’s one of the reasons he is such a good educator.

    It’s a problem though that teachers like this are rare, that curiosity like that is hard to come by, because it’s a skill we really do still need. Even if you aren’t coming up with new, earth shattering ideas, reasoning like that is how you think critically, avoid fallacies, and have an open mind. Realising that the people who came up with great ideas were wrong just as often as you were frees you from the desperate need to always be right. Needing to always be right is something I’ve seen poison people, and even myself at times.

    Damn ok that got a bit too deep, take whatever parts of this post that you like, I’ll end it here. Thanks for reading. Comments are on Hackernews.

  • My Homelab

    My Homelab

    I saw someone else do a write-up of their homelab so I thought I’d do a short one of mine.

    Hardware

    It’s pretty basic. When I rebuilt my workstation I took the old machine, gave away the graphics card to a friend, and installed ubuntu server on it. It has a i7-8700 on it and 64GB of ram and sits in a closet just below my router (a Unifi Dream Router 7, to be honest I didn’t think home routers could be so good). That router also can serve as a VPN so I can be on my LAN while not at home.

    Hestia

    I wanted some way to host a blog and originally set up hestia to orchestrate the whole thing. It’s a bit overkill for what I need (I don’t use DNS or Mail or lots of the other cool things, and I’m the only real user), but it works pretty well. From there, I manage 3 domains that point to different services.

    1. wilsoniumite.com – This blog, a wordpress site
    2. photos.wilsoniumite.com – Immich
    3. wordlestats.wilsoniumite.com – A wordle bot

    This blog

    I wrote my first ever post on medium, but I realised it would be way more fun to host it myself so I quickly set this up. I took a standard theme and cut out a lot of the flab so this page would look a lot simpler. I’m pretty happy with the result but sometimes do feel like wrestling with wordpress is a pain. I’m trying to add screenshots of each section so here it is:

    Immich

    That other homelab post mentioned Immich and it seemed like such a good idea. I was already annoyed that google constantly complained I was using too much storage for my photos, and honestly I didn’t want to trust them with it anyways. Immich is great, I just need the app on my phone and it works exactly like google photos except way more fun and I get a terabyte of storage (or a bit less, my NVMe is 1TB).

    wordlestat

    I made a discord bot that is now used by a few dozen guilds to create leaderboards for wordle results. I wrote about it here. When people interact with the bot discord forwards that request to my endpoint and the Node server handles it.

    Signoz

    The wordle bot had some teething issues and I wanted to be able to observe usage in more detail, so there is an unexposed signoz instance that gathers logs from that, as well as general metrics from the server like cpu and ram usage. It’s been really useful in tracking the usage of the bot, here’s an example of the dashboard:

    Restic & Hetzner

    I realised it would be very sad to lose this blog, the wordle bot, and my image library, so I finally set up an off-site backup of everything. For just a few euros a month I copy it all over to a datacenter in Finland. Worth it for the peace of mind. With that said, I should probably be using backrest rather than setting all this up manually.

    What kind of traffic do I get to my blog?

    I thought I’d add a little addendum with some analysis of the log files for wilsoniumite.com. I’ve done my best to filter out bots properly, but I think a lot of them still snuck through. Here’s the breakdown (from a month where none of my posts have been popular anywhere):

  • Cat

    Cat

    *thud*

    She had landed on the other side, just in the same way she had a few hundred, perhaps thousand times before. A small piece of wood that was just long and stable enough to support her weight as it spanned a gap three times longer than herself. It was a warm day, the sun in just the right position to align with this part of the meandering river below, heating the stones of the bridge she now traversed the underside of. A mix of old growth vegetation and wooden scaffold made this edge of the village perfect for her, obstacles and paths she knew, her whiskers brushing against the stone on her right as she made her way up the far side of the span.

    She’d come now to the highest point, and could look both ways toward the few houses of the village, or toward the trees of the forest line. One way lay the house she’d spent many years in. By no means an unhappy place, it was warm year round, and there lived a kind young boy that took care of her, fed her, housed her. It seemed almost unfair to think nonetheless of what might be on the other side of the bridge. Something new, different, obviously not quite as safe as where she’d come, and yet still someplace almost as right for her as home. She’d thought many times about this, of going further, every day while following this path around the bridge contemplating if this day would be the one. What was it she still needed? No piece of truly new information would come to her any more on these rounds. No amount of further assessment would make the unknown more known to her. As she sat there, upon the cold morning brick with the sun against her fur, it felt that the only sensible thing left to do was to flip a coin, and simply decide which way. Or, well, not a fair coin. It was obvious on which side it would land, it’s just that she had to believe there was nothing more to be gained from trying to further work out the perfect course of action. All that remained for her, Ozana as she called herself, was to hop off those final bricks of the bridge, and begin trotting toward the trees.

  • We really do need to get money moving differently.

    We really do need to get money moving differently.

    A simple model of how the economy works is it’s some kind of rewards system. You do something good that other people will pay you for and you are then rewarded with money. You are a smart engineer and your boss gives you a raise, you’re a smart entrepreneur and VCs give you tons of money. That money entitles you to stuff, the idea being that there is only so much stuff, yachts, housing, fancy clothes, holidays, and you are rewarded for being good at something by being first in line to get that stuff in lieu of someone else getting it. And then we say that this encourages people to work hard and be smart, and that is good for society overall. This is perhaps true to a degree but it is in many ways the less interesting (and less important) part of what “the economy” is.

    Instead, we should think of it as a preference aggregator. From the perspective of extraction of raw resources, we have a few fairly simple rules.

    1. Some resources are genuinely scarce, such as oil and precious metals, and not only is extracting them hard but their deposits will not last forever, in some pure sense of what earth as a ball of rock in space can eventually provide.
    2. Some resources seem scarce but really aren’t, like food. Arable land is limited but methods of growing more food with less water and land use do exist, and given that agriculture is a single-digit percentage of GDP, if we wanted to grow more food, we just could. The same is true of a lot of other things like most building materials, iron, cement, wood, many other consumer goods, and, yes, even housing.
    3. Substitutes exist. There’s always talk of if scientific progress can develop new substitutes for some resource we use that we are worried about using (like cobalt for batteries or neodymium for magnets) but the issue is much less if the science can be done versus if there is economic incentive (preference) to find it (or just use something that’s marginally more expensive). Substitutes aren’t the only way either, designs can be made that just don’t incorporate certain materials, if only the economy wanted it to be that way.

    Yes I know I’m anthropomorphizing the economy but bear with me, the point I’m trying to make is that people buy stuff and they buy what they need and thereafter what they want. What they want then influences what the economy produces and if you want a healthy economy you have to give people the power (money) to buy what they want and I know this all just demand-side economics but again, with me please, bear.

    When we say we want “growth” we want more of those raw resources and we want those raw resources to be turned into cool useful stuff and we want that to be done in an efficient way in which we are not chucking silicon into the ocean (cough cough or data centers where they won’t get used cough). People go on and on about how to incentivize new businesses and ensure high employment and deregulation and subsidies but there is another perspective that I’ve already written about on this blog and it’s so good I’ll just copy it:

    Imagine a world in which all work is automated. There’s still money, people, and stuff to be bought. There are companies, but they are AI run, with robotic workers and legal-entity owners (not owned by people). This isn’t the same idea as a post-scarcity economy, there are still limits on the amount of stuff that can be made and there are equilibriums to be found in balancing how much food we should grow vs how many yachts we should build. So, we still want an economy in this world, we want people to express what stuff they want, and then they buy that stuff and the free market makes more of that stuff if demand is high. The usual. Here’s the question: Where do the people get the money to buy things with?

    “But we don’t live in that world!” I hear you cry. Indeed, even if AI could do every job no doubt some humans, lavishly wealthy and/or powerful, would be at the head of it all (private or government, someone is in charge1). Still, we are heading in that direction and ultimately that is a good thing. People, by and large, don’t like working. Some work is very rewarding, and I do not doubt that many jobs would still get done by people who love them, even for no pay, if their other needs and wants were met. Many would still work less though, and some not at all, or do things that we would not today classify as “work”. With that in mind and with a historical perspective the measure of human progress and prosperity is largely a story of receiving more stuff for having done less work, and we can do that. More infrastructure can be built, more advanced farms, industries, cities, for a population that probably will be under control in the coming decades/centuries.2

    The problem we need to solve then is a simple one: Where do the people get the money to buy things with? We discard the notion that the economy is some sort of rewards system and it becomes what it truly should be, a preference aggregator. Then, we just need to give people money somehow, let them buy stuff, and get some of that money back from the sellers of that stuff so we can give it to people again in one nice big circle. To give people money, we want a universal basic income (UBI). This kind of policy has been gaining popularity recently and it is so simple it hardly needs any explanation. You get money for existing. That’s it. Maybe you get some more if you have some particular need, like a disability, but you could also just have universal healthcare for that. Maybe you can also still work for a job for someone or some entity, that doesn’t sound so bad, I see no reason to make it illegal. To get money out of the “rest” of the economy, we need some sort of tax. We can’t just take all of a company’s money or profits since our industries will never be static, they’ll have to expand, shrink, make bets using extra cash and have a cushion if those bets are wrong. At some point though, in a fully automated world, it has to get back into the hands of people. If our legal entity constructions just accumulate wealth, the system will break just as badly as it is doing right now. Last time I wrote about this I leaned heavily on VAT as the solution, but to be honest I leave that problem to policy people that are smarter than me.

    What I can’t get over is how obvious all of this seems to me these days. Everyone thinks of “the economy” as something else, something where work is an integral part and you get paid for good work. Of course, we still need good ideas, and so encouraging that through “reward” is a good thing, but when someone like Musk can say “cars, but they drive themselves” or “rockets, but they land themselves” and get a bajillion dollars I just think we are leaning way too heavily into the “rewards” side and are just forgetting that people need to buy stuff and they need money to buy that stuff and we are trying to automate away everything and that is not a bad idea per se but holy shit we need to think about how that society needs to be structured really really right now.

    Discussion and comments over on: Hackernews.

    1. Perhaps some decentralized system is possible but that does feel hard to even envision. I do encourage anyone to try though. ↩︎
    2. I know I’m glossing over a lot of things here. The world may literally end because of war or climate change or societal collapse or whatever, but in terms of “can we build a society that gives people lots of stuff they want”, the argument I’m making is that it’s very possible but we need to rethink our economies. ↩︎
  • Surely the crash of the US economy has to be soon

    Surely the crash of the US economy has to be soon

    Last year I predicted there would be a significant (2008+) economic crash that year. The year is now 2026 and I was wrong. At the time my main argument was essentially this:

    The unemployment rate just follows these smooth curves, covid was an exception, and it was due to jump again. Not very scientific I know. There was another important graph of course:

    Where classically an inverted yield curve has been a recession predictor. This one is a little more involved, but essentially the US government borrows money and normally what makes sense is that the government needs to pay more money every year in interest if it wants to borrow the money for longer. If, for some reason, the market says “no actually we will take a lower fee if you take our money for longer” that is an inverted yield curve, and here that is shown by the difference between the interest on a 10 year loan and a 2 year loan being negative. Why this would predict market crashes is a complex topic and I encourage you to read around about it. It isn’t perfect of course, and one “feature” is that it isn’t “wrong” yet, at least unless we don’t get a crash within the next few years.

    But come on.

    It surely will be this year.

    Here’s the current price of silver. Gold looks kinda similar (but smoother, I chose silver because it looks dramatic, but maybe it got you to read this further so that’s a win in my book). People buy precious metals when they might be worried about the value of fiat currencies, like, I don’t know, the dollar. Are people worried about the dollar?

    To be honest I’m glad we are the ones getting out of that market first. Why might people be worried about the dollar?

    Eh, I don’t know, there are lots of possible reasons and maybe you can think of some. The actual point is this:

    1. US government debt has been a worry for a while. That worry doesn’t matter so long as people have faith in it, but it does matter insofar as it makes a possible debt crisis deeper. The bigger they are, the harder they fall.
    2. There are one or more bubbles in the stock market. Almost everyone agrees that AI is a bubble. It funds itself in a circular fashion, and capex cannot be recovered with profits any time soon, even with optimistic outlooks. Other stocks may also be well overvalued, with sky high PEs and nonsensical business models (meme stocks are just the worst offenders)

    It feels as though all we need is a spark. And yet, many sparks seem to have come and gone. Big market moves, in stocks or yields, that have recovered. Tariff and invasion threats, protests, you name it, they might move the needle but it always seems to move back. So, perhaps we won? Perhaps we built our markets so stable that they are these days impervious? That sounds silly on its face, and the two reasons I’d actually give are:

    1. Markets are just slower moving than ever before, big players just like to sit on their big piles of money, and it’s much easier to just assume the needle will go back and then everyone pats you on the back when it does. No client likes a skittish fund manager that ends up always being wrong
    2. This is the 11th time that tariffs have happened, and it just isn’t surprising anymore.

    Which is to say that no individual decision maker wants to be the first mover, so the market does not move.

    A year ago there were a few signs. Right now, it feels like everything is primed to blow. Is that new? Do I always just feel that way? Am I just a broken clock that’s going to be right today? Maybe, but I damn well intend to be right at some point.

    Comments & discussion over on Hackernews.

  • Calculating a Wordle Leaderboard

    https://wilsoniumite.com/wp-content/uploads/2025/11/Wordle-Leaderboard-Calculation.mp4

    (This post is about how I created a discord bot that calculates a wordle leaderboard from messages posted by the official NYT wordle discord bot. A chunk of the post is just explaining how the rankings are calculated, since sometimes it produces some interesting results. If you just want the bot, the link is here)

    A few months ago The New York Times introduced a discord bot where you can play Wordle. This is, frankly, a great idea. The initial success of wordle was, in my opinion, because it was naturally something you would talk about with friends, and then those friends would try it if the hadn’t heard of it. This was because there was only one wordle a day, “have you done todays wordle?” is something I’ve found myself saying both to those who I know play it and those who may not have ever heard of it. This isn’t unique to wordle, like, there are daily-only crosswords, but what usually happens when such a game goes online is that it allows you to play again and again and again. “Here’s 1000 different crosswords! With themes and difficulty levels! Please stay on our site and see all our beautiful ads!”. So wordle was new, fun, short (faster than Sudoku or a crossword), and had a natural transmission method. If you still doubt the popularity of wordle, just google it and look around on the results page.

    So creating a discord bot was an obviously good idea. Now you could play the game “with” your friends. You implicitly remind eachother to play from the message pings of others playing, and if you miss those, you’ll probably wake up to an @mention from the wordle bot, as it summarises yesterday’s scores and presents a winner, just in time for you to play today’s wordle. I found myself playing the wordle in bed in the mornings, just to wake myself up.

    What it didn’t really have was any good way to compare yourself to others. Sure, you could see each day’s results, but nothing else. Perhaps the most obvious metric is average score, just take every days score, 1-6, sum them up, and divide by the number of days. Sadly, this isn’t available with the NYT discord bot??? Instead, I’m greeted with:

    So you’re telling me that now that I’ve played a few dozen days, and am actually interested in my average, you want me to play another few dozen on your website (because ofc the data isn’t transferrable that would be too convenient) to get a score, and now I can’t play it on discord with my friends anymore??

    Nah. Playing on discord is great and I want to keep doing that. I can solve this problem for them, all the data is right there. Each day, there’s that summary I mentioned. It looks like this:

    At first, I definitely did not use discord chat exporter because that would be against discord’s ToS. No, I of course manually copy pasted all those messages into a nice json file that I could then read in python and calculate some stats. Yep, definitely. Anyways, I used this data to do all the further developments of the wordle leaderboard, and then later I wrapped it all up in a discord bot which I’ve shared the link to at the bottom of this post. But imagine this is all for the bot since it all works the same way.

    Here’s the first problem with just using average scores to calculate a leaderboard:

    1. Some people don’t play much

    Ok here’s an example leaderboard:

    1. @busyperson: 3 games, 3.66 average
    2. @poorsoul: 115 games, 4.03 average

    Well ok you got a 3 one time and two 4s, does that really mean you deserve the top spot? We could just filter out people with, idk, less than 5 games, but “you had a lucky streak” is going to feel like a problem for a while. So what to do? The problem we face is we don’t have enough data. Maybe @busyperson really is that good, maybe they got lucky, 3 games is just not enough to know. Reasonably, we can manage this uncertainty by estimating how well the average player performs, and making it so that people with fewer games are somehow dragged towards this average. This is called shrinkage, and we can apply it to our results. An added bonus is this is symmetrical, so some unlucky person who only played three games and got only 5s will have their score improved.

    Let’s say that the average average score is 4, and that the variance of the average player scores is 0.1. Then, we would calculate the @busyperson’s adjusted score like this

    shrinkageFactor = 0.1 / (0.1 + 0.33/3) = 0.48

    finalScore = 0.48 × 3.67 + (1 – 0.48) × 4.0 = 1.76 + 2.08 = 3.84

    where 0.33 is the variance of the scores 3, 4, 4.

    Essentially we calculate how much of the data we use from the players actual average, and how much we just use the population average. This drags that player towards the average of 4.

    2. “I didn’t finish the wordle”

    So, what happens if you don’t manage to get the wordle word in six guesses? Well, according to the wordle bot, you get X/6:

    So, what happens if you start playing the wordle, make a couple guesses, and then decide you don’t want to finish it?

    You also get X/6

    This is a problem. It would make sense that for people who use up all six guesses and still don’t get the answer, you might give them a score of 7, dragging up their average significantly. But for those that just decided to quit, should we really penalise them so harshly? If we just ignore them, then there is a strategy whereby if you’ve used say, 4 guesses, and not succeeded, you could just not finish the game that day to “protect” your average. I couldn’t come up with a good answer for this. so the bot supports both treating X as 7 or ignoring those games entirely.

    3. Some days are harder than others

    Ever had a wordle game like this?

    It’s one where even if you find quite a few correct letters, there’s still so many possibilities that finding the correct one is really hard. There are some strategies to deal with this, like recognizing that the number of options is high and using a word that isn’t a possible answer, but eliminates a lot of other possibilities. Even so, it’s fair to say that some days are more difficult than others. what if, by luck (or observing other players results as they play!), some player avoids those hard days? Can we adjust for that?

    This fix can be done quite easily. If we know the average score overall, then for a given day, if most people did worse than that, it’s a hard day, and if most did better, it’s an easy day. We can take the average score for the day, and subtract the overall average to get an adjustment number we can apply to every players score. If everyone plays every day, this does not do anything to the ranking, but if you skip hard days, everyone else’s score might be a bit better than yours.

    Ideally for this to work well, you want a lot of players. If there are only two players on a day then the “day difficulty” probably isn’t very accurate. But it’s still an adjustment you might want to have so I included it in the bot.

    4. You can’t trust timestamps

    So how does the bot actually work? What it does is scan all the messages in a channel, find all the summary messages, and use the results to calculate a leaderboard. Thing is, it’s nice if we can get results from more than one channel (for example, if your server owner got so annoyed with you playing wordle every day in general and created a dedicated channel for it (this has happened multiple times in servers I’m in)). If you load and use results from multiple channels, you need to not duplicate results you’ve already seen and combine the summaries that are unique into one full dataset of results. For uniqueness you need some ID, and I thought timestamp date would be a good one. It’s not. Of course timestamps would fail me:

    I don’t know what schedule the wordle bot uses for these summaries, maybe I could work it out, but trying to fudge it while still using timestamps seemed like hell. So I chose a different hell. Here’s an example of the images that come with each summary:

    Enhance

    Before you ask, yes, adding OCR (Optical Character Recognition) did slow down the process of scanning messages significantly. But it works reliably and gives me nice unique IDs I can use for all results, regardless of channel or server or timestamp, that can be saved to the DB.

    5. What if we use Elo Ratings

    So far we’ve been calculating an average score and making some adjustments to that to create a leaderboard. But this isn’t the only way. If you’ve read some of this blog you should be aware of my fondness for Elo ratings, which you might know best as the chess rating system. Could we use that here?

    Of course we can what kind of question is that!

    Every summary, we pretend it’s actually n(n-1)/2 pairwise games where n is the number of players that day. Your performance is compared to everyone else that day and your Elo is updated for each comparison. How does it work though?

    Let’s say just two people play on a given day, and one gets 3/6 and the other gets 5/6. We could say person A wins, since they got a better score, and then the Elo update equation looks like this:

    \[ \color{#cbcbcb}{R_{A}^{'} = R_A + K(S_A – E_A)} \]

    The players new rating is equal to their current one, plus K times the difference between their score and their expected score. K is just a value you can set. It’s sometimes 32 for e.g. chess, it defaults to 2 for the bot. The expected score E_A is calculated based on the two players ratings, you can see more details here, but essentially if your score was good (e.g 1), and your expected score was 0.5 (let’s say you have the same rating as the other player), then your rating will go up by K/2. it’s also symmetrical, so the other players rating will go down by K/2.

    Here I said that we would use a score of 1 for a “win”, but the Elo update equation allows us to actually do better. Here, S_A is the score the player got in the game, (1 – the other players score) and of course we can use 1 for a win and 0 for a loss and 0.5 for a draw. However, for any pair of results the difference could range from 5 (or 6 if using X is seven) to -5 (or -6). If we could linearly space these, so 5 => 1, -5 => 0, 0 => 0.5, and then everything in between, we could have more granular information for the Elo update. This means if you get 2/6 and someone gets 6/6, you will “take” much more of their rating than if you get 3/6 and they get 4/6.

    OK great, so each day you iteratively update the Elo’s and bam, you get a nice score. It naturally benefits players who play a lot (or put another way, doesn’t allow lucky players to get to the top in a couple games!), because if you want a high elo you will have to work your way up, just like in chess or other competitive ranked games. It also handles day difficulty nicely, because if everyone did poorly (or well) on a day, your ratings actually won’t change much, it’s only the relative difference between players that causes a rating change.

    So we’ve found the best method yeah?

    6. The MAP Elo rating

    There’s one slight problem with the iterated Elo method and that is that it favors “newer” data. This isn’t actually a problem because people get better over time and we probably want their Elo to reflect their current skill, not necessarily the full average. Some people have a rough start, should we really penalize them for that?

    Yes

    ……

    Well ok, maybe not, but let’s at least consider one more possible way of creating this leaderboard.

    Consider that we would like to use Elo ratings, but we don’t want to update it each game. We want an algorithm that considers all your results at the same time (removing the recency effect). To do this we recognise that when we compare two elo ratings, we get that estimated score E_A. What if we could work out all the elo ratings so that the difference between all the E_As and the actual results were as small as possible?

    We totally can and this is called the Maximal Likelihood Estimate (MLE). i.e. we are maximising the likelhood that the estimated score E_A is equal to the actual results that person A got. There are algorithms we can use that, after certain number of rounds of calculation, will converge to a set of Elos for all players that minimizes the result/expected result difference.

    There’s just one problem, what if someone plays once, against only one other person that day, and gets a perfect score? This is unlikely in this case of wordle, but it could happen, and essentially the MLE Elo rating for that lucky individual becomes infinity. I wrote about this problem in more detail in my post on ranking people with LLMs.

    Look, we know nobody is infinitely good at anything, even the best players sometimes lose. If only there was some way to encode this prior knowledge about players into our algorithm for calculating their Elos with global result data…

    The MAP (Maximum a Posteriori) estimate is a way to take a prior estimate of the distribution of player Elos and update them based on global result data. It’s not that different to the idea or the MLE, but it’s a tad more mathematically involved. You can read about it here, specifically equation 27.

    Why use this? well, if you want to consider all data equally, and you have a lot of it, it can avoid problems where, say, you’ve had an unlucky streak of games the past 5 days. You can sortof achieve this in the iterated method by lowering K (making each step smaller), but the purest way is the MAP Elo.

    How does the bot work though?

    All of these methods and ideas I ended up including in the discord bot which you can add to your server with the link here. Because of the somewhat dirty method of collecting data it uses, it needs both the read message content and server members “privileged” intents (the latter I need because for some reason quite often the wordle bots @user links don’t convert properly to user IDs (they aren’t highlighted), I then use server members to find the actual names of users and match the failed @s to IDs (IFF a user hasn’t previously used a nickname, in which case we are out of luck 🙁 ))

    From there, you can use /sync to read channel messages, /elo_leaderboard or /average_leaderboard to generate a leaderboard, with all the settings I’ve talked about, and /personal_stats to see, well, personal stats. It looks like this:

    The bot itself uses express, running on node in a docker container with a postgres db to store results.

    Email me at wilson@wilsoniumite.com if you need any help!

    Final thoughts

    As I was working on this I was reminded of that quote “there are three kinds of lies: lies, damned lies, and statistics”. Each leaderboard, with different methods or settings, looks different. People can jump half a dozen places based on the settings used. None is “clearly” better than the other, and the underlying wordle game is of course easy to cheat at anyways. But nonetheless, I thought this was fun and the leaderboards are definitely nice when oneself is on the top :D.

  • You should not write library style code! (probably)

    You should not write library style code! (probably)

    When I really got into writing code (big projects, or for work), I really started using libraries. For so many problems, there was already a library that would help me with it. From doing maths fast with numpy to padding strings with left-pad, there was almost always a library that could help. Even now half the time you want to integrate your app with some other application it ships its API as a small library of useful functions. As much as I do like writing code to solve difficult problems, the project manager in my head is telling me “don’t reinvent the wheel”, so naturally I use a lot of libraries.

    My favorite libraries were always the ones that were nice to me. They would let me pass in data in lots of ways and specify tons of options.

    They would check lots of things for me often, and early in their call stack so that when I inevitably did something wrong, the error message could contain lots of contextual information. Sometimes they’d even tell me what code to run!

    I was taking all this in. And of course, I would also learn how all this is best done by reading the libraries’ code. I might read it to solve a particularly tricky bug, to get around some poor documentation, or perhaps just out of curiosity. There in the open source I could see just how library code looks, and it looks a certain way:

    (This is the first bit of case handling in tqdm, a library that creates a little progress bar for your for loops.) Look at that! Check everything! Some if statements here, some type conversions there, exception handling, default values and assumptions galore. Don’t get me wrong, this is good code! Never have I been upset with the behavior of tqdm, it’s great. It works, it allows me to override what I need to, it gracefully handles what details it cannot work out itself.

    So what’s a young coder such as myself to do? All these libraries I love have shown me the path forward. Sure, it’s a little extra work, but we do that work now to save us time in the future, no? Of the little other code I’m exposed to, it’s all bad and hard to read and so why not take inspiration from what is, to the untrained eye, the only way to write good stuff? We’re not building exactly the same type of programs as a library, but I’m still writing stuff that other devs will use and that’s kind of like a library. The choice seems obvious.1

    Libraries have an unfair advantage in this clash of perceptions though. Two of them, in fact.

    1. They have hundreds if not thousands of users. These users will test edge cases for you, finding bugs faster. The ones you end up using in particular are popular just because they survived when other competing libraries didn’t. As such, it almost doesn’t matter how hard the library was to write or even maintain, the quality of the product is what won out.
    2. They are overrepresented. How many of the lines of code in the world are in libraries, compared to how much of the code you have seen or interact with? This is a selection bias, most similar to the friendship paradox. They have many users, so many people have interacted with more library codebases than simple chance would allow.

    It’s not just me. Over and over I see people leave university and their small script projects and start asking “what does real good code look like” and this is the direction they go.

    But let’s say instead you’ve read this and think, OK, fine, most code isn’t like a library. But shouldn’t it still aspire to the same standard? It’s GOOD CODE!

    YAGNI and YAGWI

    The obvious first criticism is You Aren’t Gonna Need it (YAGNI). Those libraries are like that because disparate users really do have different use cases and even a small convenience can matter a lot if it affects many people. A few dozen people in your team including yourself just aren’t as important. Once your code is in production maybe that assertion you made really doesn’t serve much of a purpose. Sure, down the line it will probably need to be maintained and that assertion might make it easier to catch some edge case, but it’s not as likely as you think.

    But there’s another reason: You Aren’t Gonna Want it. This means that writing library code may lead to a codebase as a whole that is actually worse, harder to maintain. What we are partially talking about here is the robustness principle: “be conservative in what you do, be liberal in what you accept from others”. Specifically, about being liberal in what you accept. People have already written about how it isn’t always a good idea, but the gist of it is that when you allow for lots of different options and inputs, people are going to use them, and then you need to support them all, and that’s more effort than it is worth. If one of your colleagues asks “hey can you have the code take two lists instead of a dict I can’t be bothered to make it a dict” just say no. Their convenience is not worth it, at least not yet. Sometimes, sure, you’ve gotten the same unreadable error for the 20th time? Catch it earlier and make it readable. But the key is to build things when you truly need them. One day, if your code really solves some hard problem in a beautiful way, you can turn it into an actual library and distribute it to the world! By then, I assume you know what you are doing anyway.

    So what does good code look like?

    That’s pretty hard to answer. Lots of things matter for good code, this post is more about what not to do. But the core idea that “not writing library code” points to is: simplicity is valuable. We already mentioned YAGNI as a general concept but there are more good blog posts out there of things you can think about.

    Let’s get more abstract

    I want to put code on a spectrum, that ranges from “active” to “inactive”. Inactive isn’t the best name (sorry), it’s not meant to mean code that isn’t running or in use, it’s just not active.

    There are two ways code can be “active”:

    1. It’s being heavily maintained. New features, significant bug fixes, refactoring and the like. This is what we might call “active development”, it’s what we think about most naturally.
    2. It’s regularly being called in a lot of new places. The code itself may not be changing much, but many developers are interacting with it for the first time, putting it through its paces regularly.

    (There is arguably a third way, where users are putting code through its paces as they use an app in myriad ways, but for the purposes of this discussion it isn’t relevant)

    The more active your code is, the more it will benefit from looking like library code. public facing actual library code is extremely active.

    In general, code transitions over time from active to inactive. Unless you are actually writing a library, at first the code itself will no longer be under active development, and then later the code that uses your code won’t be under active development. At that point it begins to become inactive. Given that natural transition from active to inactive, in some sense, it might be optimal for code to start out as library code and slowly transition to being simple, static, no error checking and no frills code. Intuitively this makes sense: while people are developing on your code, give them nice error messages if they use your thing wrong. As soon as the program at large is “done” (insofar as that is possible for any code, another topic) then that input checking code doesn’t have as much value, perhaps less value than its cognitive weight. Actually doing this isn’t realistic though. We’re going to write it one way and that’s going to stay, or even go in the opposite direction as it accrues new frills and features. Oh well.

    I didn’t have much more to add to that, it was just a final thought. Thanks for reading this, I hope it resonated a bit or at least was somewhat interesting.

    1. There is also something to be said of straight complexity. Libraries are hard to read, because the do tend to be big. Lots of code, complex architectures and structures, the works. But If they are doing it, and they are popular and have great programmers maintaining them, why shouldn’t we all do the same? ↩︎