Bias Compounds, Variance Washes Out

(convergentthinking.sh)

21 points | by jxmorris12 2 days ago

3 comments

  • jstanley 1 hour ago
    This post exhibits a specific LLM tell that I haven't seen mentioned before. It's the style where inanimate objects or concepts are treated as actors, using verbs as if they were actually the one doing something.

    * bias compounds

    * variance diffuses

    * configs store parameters

    * BF16 + RNE (6 bytes) plateaus

    * errors repeat

    * six bytes match ten

    This sort of thing reads really well and conveys the idea in very few words. It's good writing! But in my experience humans don't generally "let nouns verb" as much as LLMs do, maybe we're just not as clever with words.

    • ForceBru 46 minutes ago
      LLM tell? Inanimate objects and concepts are treated as actors all the time: the series converges, the function reaches its maximum, the sun shines, the wind blows, history repeats itself, words rhyme, interest compounds, etc.

      What's wrong with "configs store parameters"? I guess "parameters are stored in configs" could be more correct, but IMO it means exactly the same thing and sounds just as natural. "Six bytes match ten" is shorthand for "the performance of the algorithm that uses six bytes of storage matches the performance of the algorithm that uses ten bytes of storage". But here we have "performance matches", which is an inanimate concept doing something, so is this an LLM smell too?

      • jstanley 24 minutes ago
        It's not wrong, I tried to make this clear. It's good. It's just unusual, in my experience, for humans to use that kind of wording.

        Yes everyone says the sun shines and the wind blows, those are specific idioms. Noone says bias compounds or variance diffuses or six bytes beat ten.

        I'm not saying they shouldn't! They probably should! It's just that LLMs say it much more than humans do.

        > "Six bytes match ten" is shorthand for "the performance of the algorithm that uses six bytes of storage matches the performance of the algorithm that uses ten bytes of storage".

        Yes, I understand this and support it. I am emphatically not saying it is bad writing. It's an unbelievably brilliant piece of terse writing that most human writers would not stumble upon in the course of writing the post.

        • strogonoff 13 minutes ago
          I recommend to think twice before penalising potential human creativity by saying that a novel to you turn of speech is a sign of LLM use. If you base your judgement on “unusual phrase”, it should be a sign that you are probably unable to tell.
        • thaumasiotes 7 minutes ago
          > It's not wrong, I tried to make this clear. It's good. It's just unusual, in my experience, for humans to use that kind of wording.

          But... it's not unusual in the slightest.

  • nnevod 37 minutes ago
    This feels very much like dithering. I am no expert by far, so I'm likely missing something.
    • yccs27 12 minutes ago
      Yeah, this is basically stochastic dithering applied to numeric floating point quantization instead of image color quantization.

      This makes me wonder whether you could apply different dithering approaches to numeric computations. You cannot use diffusion or similar mehods, because you don't have information about neighboring pixels/computations. Using low-discrepancy sequences might work to reduce stochastic noise, but it could also reintroduce bias for some computations.

  • ongy 1 hour ago
    Gotta admit, I was expecting some hiring/Social biases topic.

    This was quite interesting though. Surprised to see it work so well on a real example.