Another realization might be that the humans whose output ChatGPT was trained on were probably already 40% wrong about everything. But let’s not think about that either. AI Bad!
This is a salient point that’s well worth discussing. We should not be training large language models on any supposedly factual information that people put out. It’s super easy to call out a bad research study and have it retracted. But you can’t just explain to an AI that that study was wrong, you have to completely retrain it every time. Exacerbating this issue is the way that people tend to view large language models as somehow objective describers of reality, because they’re synthetic and emotionless. In truth, an AI holds exactly the same biases as the people who put together the data it was trained on.
-there are three humans who are 98% right about what they say, and where they know they might be wrong, they indicate it
now there is an llm (fuck capitalization, I hate the ways they are shoved everywhere that much) trained on their output
now llm is asked about the topic and computes the answer string
By definition that answer string can contain all the probably-wrong things without proper indicators (“might”, “under such and such circumstances” etc)
If you want to say 40% wrong llm means 40% wrong sources, prove me wrong
It’s more up to you to prove that a hypothetical edge case you dreamed up is more likely than what happens in a normal bell curve. Given the size of typical LLM data this seems futile, but if that’s how you want to spend your time, hey knock yourself out.
Another realization might be that the humans whose output ChatGPT was trained on were probably already 40% wrong about everything. But let’s not think about that either. AI Bad!
This is a salient point that’s well worth discussing. We should not be training large language models on any supposedly factual information that people put out. It’s super easy to call out a bad research study and have it retracted. But you can’t just explain to an AI that that study was wrong, you have to completely retrain it every time. Exacerbating this issue is the way that people tend to view large language models as somehow objective describers of reality, because they’re synthetic and emotionless. In truth, an AI holds exactly the same biases as the people who put together the data it was trained on.
I’ll bait. Let’s think:
-there are three humans who are 98% right about what they say, and where they know they might be wrong, they indicate it
now there is an llm (fuck capitalization, I hate the ways they are shoved everywhere that much) trained on their output
now llm is asked about the topic and computes the answer string
By definition that answer string can contain all the probably-wrong things without proper indicators (“might”, “under such and such circumstances” etc)
If you want to say 40% wrong llm means 40% wrong sources, prove me wrong
It’s more up to you to prove that a hypothetical edge case you dreamed up is more likely than what happens in a normal bell curve. Given the size of typical LLM data this seems futile, but if that’s how you want to spend your time, hey knock yourself out.
Lol. Be my guest and knock yourself out, dreaming you know things
Yes, it is. But not in, like a moral sense. It’s just not good at doing things.