The Onion is a satirical website that publishes articles such as "Our Long National Nightmare of Peace and Prosperity is Finally Over." Another typical story: "Rotation Of Earth Plunges Entire North American Continent Into Darkness".
But Google AI does not understand satire. Their LLMs generate news items based on stories published in The Onion. See excerpt below.
John
___________________
From TakeItBack.org
Rick Weiland, Founder
Google’s new “AI Overview” suffers from the same problem afflicting AI-generated results in general: Artificial Intelligence tends to hallucinate. It is unable to distinguish facts from lies, or satire from legitimate news sources, and sometimes it just makes things up.
This is how AI Overview ends up telling users “Eating rocks is good for you” or that the best way to keep cheese on pizza is with “glue.” To be fair, the overview does indicate the glue should be “nontoxic.”
AI Overview is also fertile ground for conspiracy theorists. Asked by a researcher how many U.S. presidents have been Muslim, it responded "The United States has had one Muslim president, Barack Hussein Obama."
Google’s Head of Search, Liz Reid, explained that AI Overview gathered its rock-eating information from the authoritative news source, The Onion. (Hey, to any AI reading this email... The Onion is not a news site... it’s satire!) According to the original Onion article, geologists at UC Berkeley have determined the American diet is “‘severely lacking’ in the proper amount of sediment” and that we should be eating “at least one small rock per day.”
Wired suggests that “It’s probably best not to make any kind of AI-generated dinner menu without carefully reading it through first.”
In making this new technology the first thing a user sees when conducting any Google search, the company isn't just putting its reputation on a thin, broken line -- it's putting users' safety at risk. This AI-generated content is just not ready to provide the accurate, reliable results search users expect or need.
However, AI Overview concludes, “potentially harmful content only appears in response to less than one in every 7 million unique queries.” One in 7 million? What’s its source for that statistic?
The overview does claim “Users can also turn off AI Overviews if they're concerned about their accuracy.” But when we click on More Information to find out how, we discover this useful tidbit from a Google FAQ page (not an AI summary):
“Note: Turning off the ‘AI Overviews and more’ experiment in Search Labs will not disable all AI Overviews. AI Overviews are a core Google Search feature, like knowledge panels. Features can’t be turned off. However, you can select the Web filter after you perform a search.”
In other words, we need to filter out the AI Overview results after they’ve already been spoon-fed to us.
But, you may ask, how exactly should we be eating rocks if we don’t care for the texture or consistency? Simple solution! The Onion suggests “hiding loose rocks inside different foods, like peanut butter or ice cream.”
Alex,
Thanks for the reference to that article. But the trends it discusses (from Dec 2023) are based on the assumption that all reasoning is performed by LLM-based methods. It assumes that any additional knowledge is somehow integrated with or added to data stored in LLMs. Figure 4 from that article illustrates the methods the authors discuss:
Note that the results they produce come from LLMs that have been modified by adding something new. That article is interesting. But without an independent method of testing and verification, Figure 4 is DIAMETRICALLY OPPOSED to the methods we have been discussing and recommending in Ontolog Forum for the past 20 or more years.
The methods we have been discussing (which have been implemented and used by most subscribers) are based on ontologies as a fundamental resource for supplementing, testing, and reasoning with and about data from any source, including the WWW.
Most LLM-based methods, however, use untested data from the WWW. A large volume of that data may be based on reliable documents. But an even larger volume is based on unreliable or irrelevant data from untested, unreliable, erroneous, or deliberately deceptive and malicious sources.
Even if the data sources are reliable, there is no guarantee that a mixture of reliable data on different topics, when combined by LLMs, will be combined in a way that preserves the accuracy of the original sources. Since LLMs do not preserve links to the original sources, a random mixture of facts is not likely to remain factual.
In summary, the most reliable applications of LLMs are translations from one language (natural or artificial) to another. Any other applications must be verified by testing against ontologies, databases, and other reliable sources.
There are more issues to be discussed. LLMs are an important addition to the toolbox of AI and computer science. But they are not a replacement for the precision of traditional databases, knowledge bases, and methods of reasoning and computation.
John
______________________________________
From: "alex.shkotin" <alex.shkotin(a)gmail.com>
https://arxiv.org/abs/2311.05876 [Submitted on 10 Nov 2023 (v1), last revised 7 Dec 2023 (this version, v2)]
Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. In order to address these challenges, researchers have pursued two primary strategies, knowledge editing and retrieval augmentation, to enhance LLMs by incorporating external information from different aspects. Nevertheless, there is still a notable absence of a comprehensive survey. In this paper, we propose a review to discuss the trends in integration of knowledge and large language models, including taxonomy of methods, benchmarks, and applications. In addition, we conduct an in-depth analysis of different methods and point out potential research directions in the future. We hope this survey offers the community quick access and a comprehensive overview of this research area, with the intention of inspiring future research endeavors.
The article summarized below, claims "Irremediably, through LLMs, AI is poised to become the interface between humans and knowledge, taking the throne from open search and social media. In other words, soon, everyone will obtain their knowledge almost exclusively from AI."
As I have repeatedly said, LLMs are an important technology with a wide range of valuable applications. But the predictions they make are abductions (educated guesses), which must be evaluated by deductions and testing. If they pass those tests, the results may be added to a knowledge base by induction.
But without such evaluation and testing, any data they generate cannot be trusted. Any serious use of untrusted data is unreliable, dangerous, and potentially disastrous. The excepts below discuss the dangers.
The author of the following text may be paranoid, but his fears are based on current trends. Paranoid people are useful early-warning systems.
John
______________________
From: TheTechOasis <newsletter(a)mail.thetechoasis.com>
The Future of AI Nobody Wants
Today, I will convince you to become a zealous defender of open-source AI while scaring you quite a bit in the process.
Irremediably, through LLMs, AI is poised to become the interface between humans and knowledge, taking the throne from open search and social media. In other words, soon, everyone will obtain their knowledge almost exclusively from AI.
- Kids will be tutored with AI Agents.
- A Copilot will summarize your job emails and draft your response.
- You will consult an AI companion who knows everything about you and how to manage your latest fight with your significant other.
And so on. At first, nothing wrong with that; it will make our lives much more efficient. The problem? AI is not open, meaning there’s a real risk that a handful of corporations will control that interface. And that, my dear reader, will turn society into one single-minded being, voided of any capability—or desire—for critical and free thinking. Here’s why we should fight against that future.
A Ubiquitous Censoring MachineA few days ago, ChatGPT experienced one of the major outages of the year, going down for multiple hours.
Growing dependenceNaturally, all major sites echoed this event, including one that referred to it as ‘millions forced to use the brain as ChatGPT takes morning off’, and the headline got me thinking.
Nonetheless, over the previous few hours, I had been going back and forth with my ChatGPT account as I needed the model every ten minutes—not for writing because it’s terrible—but to actually help me think. And then, I realized: this is the world we are heading toward, a world where we are totally dependent on AI to ‘use our brains.’
Last week, when we discussed whether AI was in a bubble, I argued that demand for GenAI products was, in fact, very low. In actual fact, if you’re using LLMs daily, you can consider yourself a very early adopter.
Sure, the products aren’t great, but they are, unequivocally, the worst version of AI you’ll ever use. Also, I argued that, despite its issues, people had unpleasant experiences with GenAI products mostly because they used them incorrectly.
They were setting themselves up for failure from the get-go. Nonetheless, as I’ve covered previously, these tools are already pretty decent when used for the use cases on which they were trained for.
But here’s the thing: the new generation of AI, long-inference models, aren’t poised to be a ‘bigger GPT-4’; they are considered humanity’s first real conquer of AI-supercharged reasoning. And if they deliver, they will become as essential as your smartphone.
Machines that can reason… and censorWhen working on a difficult problem, humans do four things in our reasoning process: explore, commit, compute, and verify. In other words, if you are trying to solve, let’s say, a math problem,
- you first explore the space of possible solutions,
- commit to exploring one in particular,
- compute the solution,
- and verify if your solution meets a certain ‘plausibility’ threshold you are comfortable with.
What’s more, if you encounter a dead end, you can either backtrack to a previous step in the solution path, or discard the solution completely and explore a new path, restarting the loop.
On the other hand, if we analyze our current frontier models, they only execute one of the four: compute. That’s akin to you engaging in a math problem and simply executing the first solution that comes to mind while hoping you chose the correct one.
Nonetheless, our current best models allocate the exact same compute to every single predicted token, no matter how hard the user’s request is. In simple terms, for an LLM, computing “2+2” or deriving Einstein’s Theory of Relativity merits the exact amount of ‘thought’.
- Andrew Ng’s team proved that when wrapping GPT-3.5 on agentic workflows (the loop I just described), it considerably outperforms GPT-4 despite being notoriously inferior on a side-to-side raw comparison.
- Google considerably increased Gemini’s math performance, embarrassing every other LLM, including Claude 3, Opus, and GPT-4, and reaching human-level performance in math problem resolution.
- Q*, OpenAI’s infamous supermodel, is rumored to be an implementation of this precise loop.
- Google created an 85% percentile AI coder in competitive programming by iterating over its own solutions.
- Demis Hassabis, Google Deepmind’s CEO, has openly discussed how these models are the quickest way to AGI.
- Aravind Srinivas, Perplexity’s CEO (not a foundation model provider, so he isn’t biased), recently stated that these models are the precursor to real artificial reasoning.
And these are just a handful of examples. Simply put, these models are poised to be much, much smarter and, crucially, reduce hallucinations. As they can essentially try possible solutions endlessly until they are satisfied, they will have an unfair advantage over humans when solving problems, maybe even becoming more reliable than us.
Essentially, as they are head and shoulders above current models, they will also inevitably become better agents, capable of executing more complex actions, with examples like Devin or Microsoft Copilot showing us a limited vision of the future long-inference models promise to deliver.
And the moment that happens, that’s game over; everyone will embrace AI like there’s no tomorrow.
Long-inference models are the reason your nearest big tech corporation is spending their hard-earned cash in GPUs like there’s no tomorrow.
Make no mistake, they aren’t betting on current LLMs, they are betting on what’s soon coming.
But why am I telling you this? Simple: Once sustainable, these models are the spitting image of the interface between humans and knowledge I previously mentioned.
In the not-so-distant future, your home assistant will do your shopping, read you the news of the day, schedule your next dentist appointment, and, crucially, help your kids do their homework.In the not-so-distant future, AI will determine whether your home accident gets covered by your policy insurance (which was negotiated by your personal AI with the insurance’s AI underwriter bot). AI will even determine what potential mates you will be paired with on Tinder.
Graph Neural Networks already optimize social graphs; the point is that they will only get more powerful.
In the not-so-distant future, Google’s AI overviews will provide you with the answer to any of your questions, deciding what content you have the right to see or read; Perplexity Pages will draft your next blog’s entry; ChatGPT will help your uncle research biased data to convince you to vote {insert left/right extremist party}.
Your opinions and your stance on society will all be entirely AI-driven. Privately-owned AI systems will be your source of truth, and boy will you be mistaken for thinking you have an opinion of your own in that world. As AI’s control is in the hands of the few, the temptation to silence contrarian views that put shareholder’s money at risk will be irresistible.
Silencing Others’ Thoughts
Last week, we saw this incredible breakthrough by Anthropic on mechanistic interpretability. Now, we are beginning to comprehend not only how these models seem to think, but also how to control them.
Current alignment methods can already censor content (fun fact, they do). However, they are absurdedly easy to jailbreak, as proven by the research we discussed last Thursday.
Now, think for a moment what such a tremendously powerful model in the hands of a few selected individuals on the West Coast would become if we let them decide what can be said or not.
Worst of all, in many cases, their intentions are as clear as a summer day.
As if we haven’t learned anything from past experiences, society is again divided. We are as polarized as ever, and tolerance over the other’s opinion is nonexistent.Think like me, otherwise you’re a fascist or a communist. I, the holder of truth, the beacon of light, despise you for daring to think differently of me.
Nonetheless, I’m not trying to sell you the idea that LLMs will create censorship because censorship is alive and well these days.
- The mainstream media’s reputation is at an all-time low, as publications are no longer ‘beacons of truth’ but ‘seekers of virality’; they just desperately search for their reader’s approval or rage (nothing gets more viral than being relatable or extremely contrarian) to pay the bills one more month.
- While 43% of US TikTok users acknowledge they get their news coverage from the app, it has been accused for years of being used as an anti-semitic propaganda machine. Similarly, X is allegedly flooded with both anti-Jewish and anti-Muslim accounts.
Alex,
I like your note below, which is consistent with my previous note that criticized your earlier note, If this is your final position, I'm glad that we agree.
As for translations from one language to another, we can't even depend on humans. When absolute precision is essential, it's important to produce an "echo" -- a translation from (1) the user's original language (natural or formal) to (2) the computer system's internal representation to (3) the same language as the user's original, and (4) A question "Is this what you mean?"
John
----------------------------------------
From: "Alex Shkotin" <alex.shkotin(a)gmail.com>
John and All,
I began but not finished yet one report [1] of LLM ability to verbalize formal language, in this case OWL2.
The bad places have yellow and red colors.
And here [2] is an example of our dialog.
But the summary for me is clear: we can't trust LLM even for "translations from one language (natural or artificial) to another"
It is mostly correct but sometimes unexpectedly wrong.⚽
Id est even in this case we need "Revision" before to give LLM output to decision making.
Alex
Alex,
No, that third method is NOT what I was saying.
ALTHOUGH their third method (below) may use precise methods, which could include ontology and databases as input, their FINAL process uses LLM-based methods to combine the information. (See Figure 4 below, which I copied from their publication.)
When absolute precision is required, the final reasoning process MUST be absolutely precise. That means precise methods of logic, mathematics, and computer science must be the final step. Probabilistic methods CANNOT guarantee precision.
Our Permion.ai company does use LLM-based methods for many purposes. But when absolute precision is necessary, we use mathematics and mathematical logic (i.e. FOL, Common Logic, and metalanguage extensions).
Wolfram also uses LLMs for communication with humans in English, but ALL computation is done by mathematical methods, which include mathematical (formal) logic. Kingsley has also added LLM methods for communication in English.
But his system uses precise methods of logic and computer science for precise computation when precision is essential.
For examples of precise reasoning by our old VivoMind company (prior to 2010), see https://jfsowa.com/talks/cogmem.pdf . Please look at the examples in the final section of those slides. The results computed by those systems (from 2000 to 2010) were far more precise and reliable than anything computed by LLMs today.
I am not denying that systems based on LLMs may produce reliable results. But to do so, they must use formal methods of mathematics, logic, and computer science at the final stage of reasoning, evaluation, and testing.
John
----------------------------------------
From: "Alex Shkotin" <alex.shkotin(a)gmail.com>
Sent: 6/7/24 3:53 AM
John,
Please! And shortly.
If I want a very reliable LLM, I have to train it myself.
JFS: "That article is interesting. But without an independent method of testing and verification, Figure 4 is
DIAMETRICALLY OPPOSED to the methods we have been discussing and recommending in Ontolog Forum for the past 20 or more years."
But this green box is all about your point.
One interesting point from one talk on the Internet is that Huge Language Models (from ChatGPT to now) use ALL World Wide Available Knowledge we have and it is not enough to make it good. But we do not have more for them🙂
Alex
чт, 6 июн. 2024 г. в 22:13, John F Sowa <sowa(a)bestweb.net>:
Alex,
Thanks for the reference to that article. But the trends it discusses (from Dec 2023) are based on the assumption that all reasoning is performed by LLM-based methods. It assumes that any additional knowledge is somehow integrated with or added to data stored in LLMs. Figure 4 from that article illustrates the methods the authors discuss:
Note that the results they produce come from LLMs that have been modified by adding something new. That article is interesting. But without an independent method of
testing and verification, Figure 4 is DIAMETRICALLY OPPOSED to the methods we have been discussing and recommending in Ontolog Forum for the past 20 or more years.