dgs_sgd a day ago

I have to disagree with the author's argument for why hallucinations won't get solved:

> If there were a way to eliminate the hallucinations, somebody already would have. An army of smart, experienced people people, backed by effectively infinite funds, have been hunting this white whale for years now without much success.

Research has been going on for what, like 10 years in earnest, and the author thinks they might as well throw in the towel? I feel like the interest in solving this problem will only grow! And there's a strong incentive to solve it for the important use cases where a non-zero hallucination rate isn't good enough.

Plus, scholars have worked on problems for _far far_ longer and eventually solved them, e.g. Fermat's Last Theorem took hundreds of years to solve.

  • simonw a day ago

    The problem with hallucinations is that they really are an expected part of what LLMs are used for today.

    "Write me a story about the first kangaroo on the moon" - that's a direct request for a hallucination, something that's never actually happened.

    "Write me a story about the first man on the moon" - that could be interpreted as "a made-up children's story about Neil Armstrong".

    "Tell me about the first man on the moon" - that's a request for factual information.

    All of the above are reasonable requests of an LLM. Are we asking for a variant of an LLM that can flat refuse the first prompt because it's asking for non-real information?

    Even summarizing an article could be considered a hallucination: there's a truth in the world, which is the exact text of that article. Then there's the made-up shortened version which omits certain details to act as a summary. What would a "hallucination free" LLM do with that?

    I would argue that what we actually want here is for LLMs to get better over time at not presenting made-up information as fact in answer to clear requests for factual information. And that's what we've been getting - GPT-5 is far less likely to invent things in response to a factual question than GPT-4 was.

    • janalsncm 21 hours ago

      > What would a "hallucination free" LLM do with that?

      To me, there’s a qualitative question of what details to include. Ideally the most important ones. And there’s the binary question of whether it included details not in the original.

      A related issue is that preference tuning loves wordy responses, even if they’re factually equivalent.

  • janalsncm 21 hours ago

    The author gave two arguments, a weak one and a stronger one. You quoted the weaker one. The OpenAI paper contains the stronger one, basically explaining that models will guess at the next token rather than saying “idk” because its guess could be correct.

    The strongest argument in my mind for why statistical models cannot avoid hallucinations is the fact that reality is inherently long-tail. There simply isn’t enough data or FLOPs to consume that data. If we focus on the limited domain of chess, LLMs cannot avoid hallucinating moves that do not exist, let alone give you the best move. And scaling up training data to all positions is simply computationally impossible.

    And even if it were possible (but still expensive) it wouldn’t be practical at all. Your phone can run a better chess algorithm than the best LLM.

    All of this is to say, going back to your Fermat’s last theorem point, that we may eventually figure out a faster and cheaper way, and decide we don’t care about tall stacks of transformers anymore.

  • ACCount37 21 hours ago

    It really depends on how strictly you define "solved".

    If for "solved", you want AI to be as accurate and reliable as simply retrieving the relevant data from an SQL database? Then hallucinations might never truly "get solved".

    If for "solved", you want AI to be as accurate and reliable as a human? Doable at least in theory. The bar isn't high enough to remain out of reach forever.

    To me, this looks like an issue of self-awareness - and I mean "self-awareness" in a very mechanical, no-nonsense way: "having usable information about itself and its own capabilities".

    Humans don't have perfect awareness of their own knowledge, capabilities or competences. But LLMs have even less of each. They can recognize their own inability or uncertainty or lack of knowledge sometimes, but not always. Which seems like it would be very hard but not entirely impossible to rectify.

  • rmwaite a day ago

    Exactly. I mean, if you asked people how probable the current LLMs would be (warts and all) 20 years ago I think there would have been a similar cynicism.

  • tomieinlove 21 hours ago

    Hallucinations aren't being eliminated because the optimal number of hallucinations is far more than zero.

jsnell 20 hours ago

> Where I suspect LLM output won’t help much. [...] Low level infrastructure code the kind I’ve spent my whole life on

That's just wishful thinking.

First, that kind of code is one of the things that's most valuable for AI to get good on, so a lot of effort will be directed to that. More efficient low level systems help with recursive self-improvement. Writing JS frontends doesn't.

Second, systems programming is a domain with objective quality metrics that makes it a good RL target.

Third, the models are already very good in that domain. They obviously don't have perfect taste yet, so having an experienced systems programmer with a vision guiding the model will be much better than fire and forget.

When implementing a new optimisation is 10% the cost that it used to be when done manually, the tradeoff for which ones are worth implementing changes.

But in addition it's way easier to justify exploration and speculative ideas when they're so cheap to test out.

losvedir 21 hours ago

I disagree with the zero sum perspective that the valuations only make sense with mass layoffs. The economy is big. You didn't need that much of a marginal increase in growth to justify huge investments.

Havoc a day ago

The no mass layoffs strikes me as incorrect. As fashionable as it currently is to claim zero impact, there are enough signs that there will be impact. e.g.

* Translators & transcription

* Waymo doing real world shuttling of people (not strictly genAI I guess)

* AI art being used in places where it's "good enough" - blogs, powerpoint etc

* Call centers

* Boilerplate coding

* Beginnings of robots "understanding" their surroundings for trivial warehouse tasks

...a job apocalypse that does not make, but to me that is sufficient evidence that replacement is feasible in principle for some things and combined with trajectory we'll see *some* displacement and societal impact. Whether that constitutes "mass" layoffs remains to be seen, but I feel a surprisingly low % number will already rattle societies foundations. Just because my job might be safe doesn't mean a sudden +5% in unemployment doesn't affect my community.

aetherson a day ago

These aren't predictions. They're vague sentiment.

It would be interesting if the author were to try to express these with a timeline in a way that is falsifiable, optionally with some kind of measure of certainty.

  • janalsncm 21 hours ago

    Reminds me of this really interesting marathon of an article from Dan Luu: https://danluu.com/futurist-predictions/

    A lot of futurist predictions are too vague to be falsifiable. Or they’re already happening, in which case they’re not really predictions.

m_a_g a day ago

There are many people who are adamant that this bubble will burst. Those who believe that, did you sell all your S&P 500 stocks?

I have many friends and coworkers who think the AI race will come crashing down. But they are not selling their stocks. I’d love for some AI skeptic to help me make sense of this mess.

  • etrautmann a day ago

    Selling stock triggers huge tax implications. It can be rational to think the market will retreat significantly without selling everything you own.

  • Tarkus2038 13 hours ago

    "Bears sound smart, bulls get rich".

    "The market can stay irrational for longer than you can stay solvent" (or in this case, for longer than you're willing to lose on gains).

    Some individuals are simply less risk averse (or more influenced by FOMO) and more willing to play the musical chairs game for longer.

  • pram a day ago

    The S&P 500 has mostly been moving sideways without the Mag 7 included so obviously, if AI revenues don’t materialize the result would be predictable for the index, because the growth is so concentrated.

  • anthomtb a day ago

    Do they also believe that the GenAI bubble is propping up the value of the S&P 500? If so, they are behaving irrationally. If not, then it is perfectly reasonable to maintain an S&P 500 investment while asserting the AI bubble will burst.

justcallmejm a day ago

“panoply of grifters and chancers and financial engineers” — accurate

Yet, let’s look at intelligence for a sec… it’s been evolving for quite some time and isn’t stopping at humans. Humans are building intelligence at a rate far faster than biological evolution. It is almost inevitable (pending humans wiping ourselves out by any number of catastrophic failures of governance) that we will build intelligence that supersedes our own. Yeah?

I’m a co-founder of Aloe (https://aloe.inc) - a generalist AI that recently became state of the art on the GAIA benchmark. As I was hand-checking the output of our test to ensure Aloe had done the task, not just found answers in some leak online, I had a real come-to-Jesus moment when it hit that this agent is already a better problem-solver than most of the adults I’ve worked with in my career. And this is the floor of its capability.

It is a humbling moment to be human. The few humans at the helms of companies developing these technologies will inevitably reshape the trajectory of humanity.

Legend2440 21 hours ago

>The central goal of GenAI is the elimination of tens of millions of knowledge workers.

There is no central goal. We made a cool toy and we’re trying to figure out what to do with it.

Starting with that incorrect premise means all your conclusions are wrong.