Much like their tech creators, AI is hallucinating

We’re back! Did you miss us?

Today Alec and Sascha discuss a legal debacle involving a lawsuit against Colombian airline, Avianca. Plaintiff Roberto Mata claimed to be injured by a serving cart during a flight, but a brief presented by his lawyer, Steven Schwartz, created chaos. The brief, prepared with the assistance of ChatGPT – an AI program – cited several non-existent court cases. Schwartz admitted to his reliance on AI for legal research, sparking a discussion about the reliability of artificial intelligence in law. We ask – why is AI hallucinating, and what do we have to do to stop believing it?

Tell us what you think of The Dive – email us at thedive@equitymates.com. Follow our Instagram here, or find out more here. Stay engaged with the Equity Mates community by joining our forum.

In the spirit of reconciliation, Equity Mates Media and the hosts of The Dive acknowledge the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respects to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander people today.

*****

This podcast is intended for education and entertainment purposes. Any advice is general advice only, and has not taken into account your personal financial circumstances, needs or objectives.

Before acting on general advice, you should consider if it is relevant to your needs and read the relevant Product Disclosure Statement. And if you are unsure, please speak to a financial professional.

Equity Mates Media operates under Australian Financial Services Licence 540697.

The Dive is part of the Acast Creator Network.

Sascha: [00:00:03] I'm Sascha Kelly and welcome to the Dive. The podcast asks, How have they said that business news needs to be your business as well as producing podcasts and hosting the dive. I also occasionally write things for equity markets like our Get Started Investing newsletter. So yesterday I was putting together a list of artificial intelligence ETFs that you could add to a watch list. I mean, eyes everywhere at the moment. And I thought that that would be a kind of fun article to write. As I finished my research, I asked ChatGPT to summarise a website that I'd found within a few moments the program had written me an article with investment options called Tech Titans, Emerging AI Starts and Ethical AI Funds. To be honest, they sounded pretty good, and that's what made me think, Hang on, I haven't heard of any of these ETFs before and it only took me a few more moments of reading the output to realise that none of these existed. That's why I hadn't heard of them and I would be better served just to do the hard work and rewrite my own research. On the other side of the world a lawyer named Stephen Schwartz decided to do pretty much the same as me, turning to ChatGPT to help him finish some work. The program helped him finish a ten page brief and the product cited more than half a dozen cases except after he submitted it. No one could find these references because ChatGPT had completely made them up, much like their tech creators at Burning Man AI is hallucinating. It's Wednesday, the 7th of June, and today I want to know why is this technology telling us lies and what are the repercussions for those of us who believe it? To talk about this today, I'm joined by my colleague here at Equity Mates and the co-founder. I shouldn't sell him short. It's Alec Renehan. And Alec, welcome back to The Dive.

Alec: [00:01:57] Good to be back, Sascha. It's been a long month without the dive where a man down. Hopefully Darcy is listening to this even as he's on the other side of the world. But it's good to be back.

Sascha: [00:02:09] Yeah, we might have a UK correspondent sometime soon, but let's get straight into it. Tell me about the legal case that led to this lawyer screwing up. What exactly happened?

Alec: [00:02:20] So a man named Roberto Matta sued the Colombian airline Avianca, saying he was injured when a metal serving cart struck his knee during a flight to New York. One of the lawyers representing Roberto Marta was Steven Schwartz. Now Avianca, the airline, asked a Manhattan federal judge to toss out the case. And Marta's lawyers, including Steven Schwartz, vehemently objected to this, and they submitted a ten page brief in response that cited more than half a dozen relevant court decisions.

Sascha: [00:02:57] And this is where all the trouble begins. It's this ten page brief that sets off another chain of events.

Alec: [00:03:05] That's right. So the brief cited a number of cases. There was Martinez V Delta Airlines, Zicherman v Korean Airlines, and of course the very famous case Varghese v China Southern Airlines, with its discussion of, quote, the tolling effect of the automatic stay on a statute of limitations. Just one hitch, Sascha. None of those cases were real, and that discussion wasn't real either. The airline's lawyers, when they read the brief, couldn't find the cases. The judge himself read the brief and couldn't find the cases, ChatGPT hallucinated them all.

Sascha: [00:03:43] I love that when ChatGPT you make stuff up, it doesn't make stuff up by halves. It gives you really catchy names and it really commits to giving you something that you could sell.

Alec: [00:03:53] And speaks with so much certainty as well.

Sascha: [00:03:56] So he used ChatGPT to write the brief and that's what happened. It just completely went astray and made up all these references for him, and he just took them sight unseen.

Alec: [00:04:09] Yeah, that's right. Now, a little bit about Steven Schwartz. He is a lawyer of the firm, Levidow and Oberman, and he's practised law in New York for three decades.

Sascha: [00:04:20] So he's pretty experienced.

Alec: [00:04:21] Yeah, some would say he should know better, but I guess ChatGPT is new and we're all trying to figure it out. When the judge and the opposing lawyers couldn't find any of these cases, he realised he'd screwed up in a big way. He threw himself on the mercy of the court, saying in an affidavit that he used ChatGPT to do his legal research, quote, A source that has revealed itself to be unreliable.

Sascha: [00:04:50] I've got to give him some kudos for owning up to it as soon as it became apparent. But was it basically that he just didn't want to check his work? He was running out of time, want to cut corners?

Alec: [00:05:00] I know Sascha. He did check his work. He asked ChatGPT to verify that the cases were real. And he said, yes, they were. Now, this story has almost finished playing out. But Schwartz and one of his colleagues, who is also representing Roberto Marta Paul LoDuca, face a hearing on the 8th of June that will decide on possible sanctions.

Audio Clip: [00:05:24] My worst fears are that we cause significant we the field, the technology, the industry caused significant harm to the world. I think if this technology goes wrong, it can go quite wrong and we want to be vocal about that. We want to work with the government to prevent that from happening. But we try to be very clear eyed about what the downside cases and the work that we have to do to mitigate that.

Sascha: [00:05:48] Look, as much as we laugh at Steven Schwartz, this isn't the first example of fake facts, and the term A.I. hallucinations has actually been coined to talk about this phenomenon. I asked ChatGPT to explain exactly what AI hallucinations are, and it says AI hallucination refers to instances where artificial intelligence systems generate outputs that are not based on real data or observations, but rather fabricate information or generate misleading results. And it also reassured me it's important to note that A.I. hallucinations are not intentional efforts by A.I. systems to deceive or mislead. They are the result of limitations and biases inherent in the training, data and algorithms used. Researchers are actively working to address these issues and improve the accuracy and reliability of A.I. systems. Alec, that's a lot of words, and to me it just reads. Sometimes we make things up. This is a fancy word of why. And please don't hate us.

Alec: [00:06:50] Yeah. Yeah. When I think about this topic, it really reminds me of how we were taught about computers and the internet back in the day when we were at school to remember how, like, we could. You never allowed to cite Wikipedia as a source. And there was a whole bunch of like, you can't just trust online sources and you've got to go to reliable sources and all of that stuff. And over time, the Internet developed more and it got more reliable. But also we got smarter about how we use Google and what sources we use and stuff like that. I feel like in time we'll look back and there will be a similar conversation around how we use A.I. and how we use it properly and the prompts we use similar to how do we use Google properly to find information and not be misled?

Sascha: [00:07:36] Yeah, I think that multiple sources saying that's what I remember getting drilled into me at school is that you can read it on Wikipedia, but try to find another reference that is not also citing Wikipedia somewhere else in the internet.

Alec: [00:07:49] A tip for any high school or university students out there. Just go to Wikipedia, see what references are being referenced on Wikipedia and then claim them as your reference.

Sascha: [00:08:00] Yeah, that's a good and old trick. That served us well, so Alec, what are some other examples of how our air is hallucinating at the moment?

Alec: [00:08:09] So you and Steven certainly not alone, AI is hallucinating all over the world. Probably the most notable one we saw early days of this chat bot explosion was when Google first unveiled the chat bot bot. It explained that the James Webb Space Telescope had captured the very first pictures of a planet outside of our solar system. Not true. ChatGPT famously has claimed that a few countries have left the European Union, which wasn't true. Also, they claim that the Philadelphia Eagles won the Super Bowl in 2023. Not true. This one I found particularly fascinating. The New York Times asked ChatGPT when The New York Times first reported on artificial intelligence. It's and so was in July 10, 1956, in an article titled Machines Will be Capable of Learning Solving Problems, scientists predict about a conference at Dartmouth College. Now, the conference in 1956 was real, but the article was not. Now, That in itself mildly interesting. But what I found more interesting was then The New York Times went to the other chat bots. They asked Microsoft's Bing and Google's Bard the same question and they got the same answer. So Microsoft's Bing even provided the New York Times with a url like a link to the 1956 article that looked very much like a New York Times weblink, like New York Times URL. But that web page, much like the article, never existed. So if you kind of try and triple source your work by asking all three chat bots, be careful because they might all be having the same hallucination.

Sascha: [00:09:57] Because this makes me think it's so interesting. We've invented this program to help alleviate our workloads. But in doing so, we're creating more work, having to triple check the facts that it's providing us. So let's take a quick break there, but then we're going to come back and you promise to share why this is happening and also how I can change my promise to limit these hallucinations when I ask ChatGPT for answers. Welcome back to The Dive, the podcast that says, Why does business news have to be your business? I'm joined by my colleague Alec Renehan. And today we're talking about A.I. hallucinations. Alec, before the break, we had some fun talking about how ridiculous some of these answers can be, but they're going to present a real challenge as we come to use generative AI more and more. I mean, I love using it. I'd love to be able to rely on it. Why is it hallucinating? Why is it giving me these false answers?

Alec: [00:10:58] Yes. So when we think about technology and software in particular, traditionally software was designed, you know, one line of code at a time, and the output was very carefully designed and very carefully controlled. But this technology, generative AI, is a little bit different. Instead of controlling the output, we train it well, not way I've had nothing to do with it, but people far smarter than me train it on incredible amounts of data. And then all generative AI tries to do is predict what the next word or the next thing is in a sequence of words or a sequence of things based on all of the data it's been trained on. And it's not there to decide what is true and what is not true. It's there to decide what is next. And the Internet, it's full of a lot of useful information, a lot of factual information, but also a lot of misinformation. And chat bots absorb it all. They've been trained on it all. And when they're trying to predict what word comes next or how to answer a question, they draw on it all. And as a result, sometimes we get great answers and sometimes we don't. And when we don't, we get the irrelevant, the nonsensical or the factually incorrect answers. We've come up with this catch all term hallucination. There was an internal Microsoft document that got leaked, and it said the new A.I. systems are, quote, built to be persuasive, not truthful. And also that quote, this means that outputs can look very realistic, but include statements that aren't true. So it's a win, I guess not a feature, but it's a known bug with this technology.

Sascha: [00:12:40] Man that really hit. So when you say persuasive, not truthful. There was also an amazing description in The New York Times that you said to me earlier that I thought really gave me a clear perspective of how this program works.

Alec: [00:12:55] Well, as a musician yourself, Sascha, and a big fan of classical music, I thought this would stand out for you. Quote The New York Times. Think of the chatbots as jazz musicians. They can digest huge amounts of information like, say, every song that has ever been written and then riff on the results. They have the ability to stitch together ideas in surprising and creative ways, but they also play wrong notes with absolute confidence.

Sascha: [00:13:23] Nothing like a jazz musician just leaning into a blue note. So Alec the natural next question is what is being done to stop it? Is anything being done?

Alec: [00:13:34] Yeah, there's a lot being done to stop it, but we should be very clear from the outset. No one has stopped it yet. Sundar Pichai, the CEO of Google and Alphabet, recently acknowledged that, quote, No one in the field has yet solved the hallucination problems.

Audio Clip: [00:13:49] All models do have. This is an issue. It's a matter of intense debate. I think we'll make progress.

Alec: [00:13:58] So don't think that there's one chat bot that you can rely on that won't hallucinate. Companies like Google, Microsoft and Openai are all working to solve these problems. Openai worked to refine the chat bot using feedback from human testers. This is a technique called reinforcement learning, where the system is sort of trying to get a better understanding of what it should and shouldn't do. Microsoft has found that the longer a conversation goes with a chat bot, the more chance of it hallucinating. So it's a work to limit the length of conversations with its Bing chat bot, but open. I have also tried something new, which is quite interesting. They're now training the AI model on each step of reasoning when it's arriving at an answer and they're rewarding the H like correct logical step rather than just rewarding the correct final conclusion. This is a process known as process supervision as opposed to outcome supervision, and it's meant to lead to better results, but also more explainable results from the AI. But I think it's important to say this is the cutting edge of technology. The people working on it don't fully understand how these chatbots work. And so there's a lot of effort going on, but no one is there yet. And somewhat counterintuitively. As these chatbots get better, the hallucinations actually become more dangerous because there's less. We trust them more and we think they're providing more truthful information. But then when those few hallucinations still do slip through the cracks, then we really get tripped up.

Sascha: [00:15:42] Wow. We're almost lured into a false sense of security. It just makes me feel like it's AI's world, and we're just all living in it now. But, Alec, while people work on the solutions that are actual concrete and practical tips that we can take into our prompts to help minimise the risk of hallucinations happening when we're asking for results. Right.

Alec: [00:16:05] Yeah. I think the first one is just stay sceptical. Remember Steven Schwartz and don't submit a ten page brief uni essay or a work to your boss without checking it with a good old fashioned google search.

Sascha: [00:16:19] Yeah, it feels like a new life motto or something. Hashtag stay sceptical.

Alec: [00:16:23] Yeah. Yeah. I think that's just the gist of it. And then there's some work being done on how you ask questions which you might see referred to as prompt engineering. Zapier published an article with a few recommendations about how you ask questions that can reduce the chance of A.I. hallucination. So I'll just run through the list quickly. First one limits the possible outcomes. Yes or no. Questions are better than open ended ones. Secondly, pack in relevant sources and data into the question. Give the AI more information to go off in your question. This is called grounding your prompts. A third one that I've seen a little bit on Twitter, but I can't say I've done it myself. Give your AI a specific role. Start your question with you are a historian that only relies on primary sources and then ask you a question and then a fourth one they suggested, Be really explicit. Tell it what you want and what you don't want. So, for example, I want to know about the history of Mount Olympus in Greece. Please exclude all literary or fictional references to Mount Olympus and return only real historical details. Give them more information to go off and there's less of a chance of it hallucinating. So I guess be sceptical. Always check your work with a Google search. And if you're a lawyer like Steven Schwartz, check your citations. If you're a lawyer in Australia, Australia might be a good place to start.

Sascha: [00:17:53] Thanks, Alec, for that really practical suggestion. It's great to be back and it's great to be looking at these stories again. I'm really excited to see what we're going to talk about on Friday. As a reminder, if you missed the dive as well, the best thing you can do is send us to a friend. Word of mouth is the most powerful way for a podcast to grow, and we love getting in front of you. But until next time, I'll talk to you soon.

Alec: [00:18:17] Alec Sounds good.

Much like their tech creators, AI is hallucinating

Listen On

More About

Companies Mentioned

Meet your hosts

Alec Renehan

Sascha Kelly