Episode 29 AI Full Transcript

Mad AIs and Poisoned LLMs: Fighting Digital Insanity with Michael Silva

Michael Silva · April 16, 2024 · 48:35

Back to Episode

◆ ◆ ◆

SpeakersJoe Patti — HostAdam Roth — HostUNKNOWN — GuestMichael Silva — Guest

Joe Patti00:05

All right. Welcome to the Security Cocktail Hour. I'm Joe Patti.

Adam Roth00:09

I'm Adam Roth.

Joe Patti00:12

Yes, you're Adam Roth. That's right. I forget the time. That's good that you remembered it. It is Sunday morning, or not quite Sunday morning anymore, but it's good to remember who you are.

UNKNOWN00:21

That's cool.

Joe Patti00:23

How are you doing today, Adam?

Adam Roth00:24

I'm good. I don't know if you know, in Staten Island, it's about 65 degrees and the wind's coming from the southwest.

Joe Patti00:34

Okay, I think you need a lot more coffee because here, across the river or whatever in Jersey, it's about like 15 degrees. So, you know, slight difference. But anyway, today we have yet another guest. Today we have Michael Silva. Michael, welcome to the show.

Michael Silva00:53

Hey guys, thanks so much, Joe. Adam, it's a pleasure to have you guys here on Sunday.

Joe Patti00:56

Thank you, thank you very much. Thanks for joining. And Michael is like us, an industry veteran for 24 years. He has a bachelor in IT, a master's in cybersecurity, which you see, we're old, the two of us, we're too old to have like, or at least I'm too old to have a degree in cybersecurity. I came through before that was around.

Adam Roth01:17

I have a master's in cybersecurity. What are you talking about? Oh, but you got it more recently. Yes, sir. Sorry, sir. Yes, you always like to say that. You feel better?

Joe Patti01:24

I feel so much better. Okay. Michael also has the CISP, which is a security certification, another one that I don't have. He still is an adjunct professor for 11 years, runs two companies. an IT company for over 13 years and he started Emily AI four months ago. He works in New York city and lives on Long Island and has four kids under eight and a hundred pound golden doodle. Wow. So Michael, let me guess, you got some like You must drive some big-ass suburban or something like a bus, right?

Michael Silva02:03

I'm about one seat short of that if my wife has her way of looking for the daughter because I got all boys. No, we actually had two cars for a while. We did. We did. And then I sold two of them at the height of the used car market and upgraded to a six-seat Tesla Model X, which It's actually surprisingly really good in the wintertime. I know we get a lot of questions around that, but it's actually awesome. It preheats at like five o'clock in the morning. I get in, it's 80 degrees in the interior at 7 a.m. It's actually way better than not. Yeah. I would have guessed a minivan. I mean, I am one seat short of that, Adam, in fairness.

Joe Patti02:39

Yeah. Well, you know, that's... I guess it's cool where we are because, you know, we're joking. Like, it is January here and we're kind of in this But this is a week when like earlier this week we were seeing like in Chicago and you know the Midwest and all people with the Tesla's that they were like Frozen solid or they wouldn't charge or something. Yeah, the charging station Yeah, yeah, and you know, I was thinking to myself at first I said gee I'll never buy an electric car because of that and I thought to myself I would never move to one of those places anyway, so it's too damn cold for me So, you know the difference Joe you damned if you do and damned if you don't if you have a power outage You can't charge a Tesla, but guess what?

Adam Roth03:21

The gas pumps don't work either and I experienced that twice. Really? Yeah, the gas pumps don't work. When I went, Hurricane Irene and Hurricane Sandy, when they've got the end of blackout, when we had no power, you sitting out waiting for gas, you can't get gas, so.

Joe Patti03:39

There was no gas for Hurricane Sandy for a bit, that was frustrating.

Adam Roth03:42

Well, you can't get the pumps, you can't get, yeah, crazy.

Joe Patti03:46

Alright, but for today, we are, you know, on the internet connected here with civilization is not collapsed. And, you know, we're not frozen solid. And we're here to talk about, you know, some not quite electric cars, really, but you know, AI, pretty, pretty intense subject. And, you know, Michael, you We've talked about it before. A lot of people have different things. But you are a real practitioner here. You are building AI. You are way deep in the thick of it. it as a response to the fact that, you know, the general tools out there bring a lot of risks. We've talked about it, particularly for enterprises, for companies, and, you know, we need to protect their data. They can't just throw it out to open AI or whoever. So, you know. What are your thoughts on that? I know you've got a few.

Michael Silva04:44

Yeah, well, there's quite a few. So I'll start with the most interesting ones first, I think. So if you look at the history of warfare between humans, right? War originally started with, you know, physical weapons and things like that. Today, we're in an age of information war, which is kind of interesting, right? Like propaganda and information kind of leads the way. And one of the biggest problems publicly that we have is we can't determine whether or not information that we're seeing or reading is actually genuine or true. And this is becoming even more prevalent, more pronounced with, you know, with videos being generated, you know, synthetic audio. And these things do have their places, but the problem is that ultimately, because of the nature of the way that generative AI works, it's going to take that information, it's going to create something that seems very compelling either in writing which is like, chat GPT is a big problem, right? It gives you an answer that looks correct, but may not be correct, right? Then you've got video, where even if you look at like the upcoming campaign election, you know, there are people creating virtual bots, uploading content, so you can ask questions to that person, person, I say, right? And, you know, it appears to be real, right? But it's actually not. But at the end of the day, it's all being powered by text on the back end, right? That's really ultimately what's powering this. And if you really break down the structure of what information looks like, where it comes from to make these tools work, this is where it gets to be a little bit scary, right? Because open AI and every large language model is in essence, effectively just a really good search engine to draw a connection from your question out to the answer as fast as possible, right? So the question that you really need to ask yourself is where's the information coming from? And I think that in large language models that use public data sets like ChatGPT and others, where they've expressly stated that they use training data from sites like Reddit, which I use, but for obvious reasons, don't really trust. And they use feedback from other users that contribute answers, which may contain proprietary, private, or incorrect information as responses to new people's questions. So this really begs the question, when you get these very convincing written answers, and you get these very convincing audio tapes that sound just like the person, sounds just like Taylor Swift promoting something, some new product that she never actually promoted, or a politician saying something they didn't say, can you really trust it? And I think that this is really the fundamental tenet problem that we have with AI today, is can you trust what you're seeing? Because it looks correct, but that $10 million national security slash personal sanity question is, Is it correct?

Joe Patti07:21

Well, I agree with you on that, but you know, you actually brought up two, um, I think two sides of it that are really interesting. Um, one is the very obvious deliberate deception, uh, deep fakes, uh, fake videos, people doing stuff like that, which, you know, yes, uh, that's a big problem. But in a sense, the other one we've talked about the, uh, you know, the general accuracy of it is one that's, uh, you know, a bit more subtle because you're right. It's, it is in a sense, the same old thing, garbage in garbage out. If you. give it incorrect data and it's trained on it, it will give back incorrect data. It will reflect all the biases put into it. And you're right, it looks very convincing, extremely convincing.

Adam Roth08:04

Joe, we've touched upon this in other episodes, but not to this level, right? We've spoken about scams where people have been virtually kidnapped. And even at one point, a gentleman killed himself because the chat, sorry, I don't want to say specific tool, but the AI told him that he should no longer live and he ended up killing himself. And, you know, there are consequences for AI. I know military wants to use it in warfare for identification purposes and quick responses to threats. But AI, while it's an incredibly, incredibly, incredibly good tool, it's incredibly dangerous because let's just say you do have a tool and let's say it is proper and somebody hijacks that tool, a nation state, a threat actor, whatever it is, there are terrible consequences where things are gonna happen. And then the last part I'll say about this part of it here is there's nowhere or no one place to authenticate the level of authenticity of any one tool, one source of data, one source of video, one source of anything. So while 50 people might say it's correct, 50 people might say it's not correct. And that's what becomes an issue about social media.

Joe Patti09:25

Yeah, that's right. So we've covered some of this, but what do you do about it, I guess, is the question. I mean, I've heard some people say it can be dangerous, but it's like anything else, you have to use it correctly or whatever. Michael, what do you see as some of the potential solutions and ways to use this effectively?

Michael Silva09:48

Yeah, well, I think what Adam's alluding to here is supply chain attack on LLM's underlying logic, right? which has actually already happened. So in December of 2022, there was a open source machine learning framework called PyTorch, which had a supply chain attack. There actually was a compromise where Python package exchange repository of, I believe it was Torchitron, which is one of the dependencies of PyTorch was uploaded. And that malware could execute code on the machines of PyTorch users that installed the infected packages. You know, that's one example. I know that there was another Poisoned model that was uploaded into GPT, I think it was like J6B or something, I forgot the exact version, but the model was uploaded to Hugging Face. And, you know, unfortunately, anybody that used that Poisoned model could unknowingly generate and disseminate false information. And there's lots of others where, you know, we could just go down the list. But the point is that there's this traditional new version of the old school open source problem, right? Which is that when anybody can edit it and anybody can distribute it, then how do you know whether or not it's legit? And this is like ultimately just the latest flavor of ice cream in that old problem. And I think that a lot of what we're seeing here, and as interesting as everything AI is, it's almost like a new version of problems that existed before. So Jody, your point, your question is, how do we deal with that, right? So if you have these incredibly effective advancements in AI, I mean, A, how do you know what AIs are risky or not? And B, how can you trust them? And I think that the direct answer to this question is, you're depending in large part on the organization that's hosting it and running it, and then any software that's built around it. You also have to trust that that model only uses those trusted AI models. Because if you think about it, any AI model that's drawing from accurate data that uses a predictably Execution or predictable execution on the LLM and the back end, right? It's probably going to be fairly safe and predictable, right? Particularly when you compare it against an open source model where you download it, you run it, you're uploading your documents. And it gives you an answer and you're like, Hey, cool. But like what really happened, right? Like that's, this is the real question. So I think not to go on too long about the topic, but if I was going to pick some of the top guys that are doing this really well, you can look at Microsoft, right? Microsoft and, and Satya Nadella did a fantastic job with Sam Altman and particularly the board of directors. I'm not sure if you guys. saw that story, what happened, you guys know what happened with that, right?

Joe Patti12:18

Oh, he did quite a job by the board of directors.

Michael Silva12:23

I've never seen, I've never seen a staff mutiny outside of a union as hard and as public as when everybody at open AI said, fire the board or we're going to Microsoft. And then Satyala said, Hey, I got jobs for all you guys. Right. So what ended up happening there was open AI ended up putting in a new board. And now they have a separate, effectively, copy of their code running, where Altman and his team are now developing two different copies of the same code. Except one of them is going to Microsoft, and the other one is staying open source. So the question is, which one are you going to trust? And if you look at these bigger models, like ChatGPT and others, they have huge security problems. And this is not even to count, Joe, what you were alluding to before around, you know, the content of this data. You know, later I'll talk to you guys about mad AI disease, which is kind of a crazy topic in and of itself. But, you know, on the Microsoft side, I don't want to say it's completely immune to it, but it's systemically designed to protect against it. Right. Because you've got this pre like Microsoft writes code for a living. That's what they do. So they're going to take the LLM and dissect it and make sure it runs correctly. It's like very similar to Apple. You know, they're looking at the code and they're saying, is it predictable? Can my customers trust it? When I put my data into it, is it going to produce a consistent output? And those tools, you know, those development tools, if you're building something, are really critical components. If you choose the wrong tools, I mean, A, you're probably going to get something weird coming out of it at some point because it's running in an environment. It may even put your data at risk. Actually, it will put your data at risk. But B, the bigger problem is none of your customers are going to know about it, right? Like they find out when there's a huge breach and whoops, hey, guess what? Now we're on that supply chain attack list. Like it's a horrible place to be unless it's built from the ground up with security in mind.

Adam Roth14:03

Well, there's two things I wanted to touch upon. One, and I'm not an expert at this, right, Michael? You are. If enough people start putting things into any type of AI, they can normalize data that's not normally normalized. And what I mean by that for the audience is, if I keep on telling you that the sky is orange, eventually that sky really will be orange according to that artificial intelligence. And the second thing I wanted to bring up is one thing that Joe and I used to deal with, especially working at a law firm, is the issue with people doing search engines and looking at you know like whether it's google or bing and you keep on searching for things and that data resides within those search engines proprietary information so when people work with ai they end up like oh i want to write code for my organization they write code that might be proprietary but now it becomes searchable by certain other people so we have to be very careful what we upload put into ai even though we want to use ai You want to be able to put it in more of a sandbox or compartmentalize it so that nobody else can get access to that, which becomes an issue within itself. This is evident in uploading data to VirusTotal or search engines or anywhere else.

Joe Patti15:22

Yeah, agreed. Yeah. You know, again, one of the things to remember with these AIs that you use is if you're using one that's up in the cloud from someone, it's a cloud application. It's a SaaS application, software as a service. You're uploading data into it, and you've got to be trusting where you're sending it. Another thing I find really interesting, Michael, that you were bringing up is the way that, you know, open AI kind of split and, you know, the whole, the whole story of that, of that board change and everything and all inspiring is a whole interesting discussion of, you know, corporate governance and stuff. That's probably only the people involved really know that. the real story. But one very interesting aspect of it that I found as you were talking about it is OpenAI, as I understand it, actually had a board with a very unique mission and a very unique type of governance that they had, that they said their primary focus was to see that AI was being used ethically, responsibly. It was done in a the board claimed that that was driving their decisions. But, and you know, things are often at odds with business. So it's interesting to see that the solution to this and the evolution is that they would now have two tracks. track, the more altruistic track, but also the business track of stuff going to Microsoft. And it is very interesting to note that those two things can have very different goals. And I think it again gets to the point that you need to know who you're dealing with and what they're building and whether it suits you. That's crucial no matter what you're doing, especially with this.

Michael Silva17:11

Yeah, for sure. I mean, you know, if you look at where the internet started, right, that's, that's old IM, right? When the internet started, it was a group of computers between universities looking to talk to each other. And if you really want to go like, super far back, you're gonna go back to like, Aloha and rest, but that's like, you know, ancient by today's technology. But I think that you're highlighting an important point here, which is, you know, how are you using this technology? And I think that that is probably the scariest and most exciting part of my job, right? As I'm looking at this and I'm seeing tools that are being used that do have legitimate purposes being misused. Like Adam, you brought up VirusTotal before. I mean, in May of 2021, there was a ransomware attack that targeted the health service executive of Ireland. And some of the stolen data was uploaded to VirusTotal by attackers. And they used it to see if the files were encrypted or not. But then these files were downloaded by other VirusTotal users who then leaked the personal information of the HSE victims, right? Yeah. And I think there was another one in January 2022. There's a article called, I think it was Virus Total Hacking, you guys can look that up, it's a similar story. So, you know, really the question there is, you know, in the history of the way things have been, have things really changed there? And I think the answer is no, right? From a human perspective, right? Theft is still theft, deception is still deception, you know, these things are still going on. But the challenge is that it's becoming increasingly difficult to understand where the data is coming from and what the answer is. And Adam, you hit it spot on the head, right, with the information warfare propaganda, you know, comment, which is, you know, if everybody says this is a certain way, I mean, you know, everybody knows, right, history is written by the, what is it, history is written by the victors, right, I think is the exact phrase. Yeah, so this is just the way it is. You know, some people have the foresight to be able to see, you know, this is what we want to do going forward. But, you know, some people will look back and say, well, that's the way it was. And that's everything I see in writing. That's the way it must have been. And that's kind of a scary place to be when you're talking about what's recorded and how you're getting that information back out to the present day.

Adam Roth19:17

And that touches upon another episode we have with an attorney. were discussing about Google and what a lot of people just don't realize is that you're, and I know this, like I wouldn't want my conversations leaked, right? You know, your conversations are constantly being recorded by Google. You're listened to by Google sometimes just to kind of figure out how to reset up their listening techniques. I'm not saying they don't listen to you because they want to hear what you're having for breakfast, but they use it for quality assurance, right? So if your conversations are constantly recorded by Apple and Google and all these other organizations, what if they start using it to normalize AI, right? What if all that data is being able to recreate you? You know, it's crazy, right? Some threat actor says, oh, I'm going to hack Google. I'm going to get access to Joe's account. And then they're going to make Joe have conversations he never had. I was watching on one of these neighbor apps or something like that. Never say yes if a scammer calls you. They'll use that word, yes, to replay it to somebody and then use it to get information about you. Oh, do I have permission to release your information? Yes. You know, so people claim there are certain phrases that you can say, and then it can use AI to make any conversation you want with that person.

Joe Patti20:45

Yet with these AIs, they don't even need a specific word anymore, if I understand it. They can recreate realistically any word. Of course, but I'm seeing what people said. It's gotten worse, as far as that goes. That's pretty scary.

Michael Silva21:00

Yeah, yeah, I think it goes back to education, Joe, really, at the end, you know, like Adam, there's actually a pretty famously published, I guess, it was a congressman actually, if I remember correctly, who had his son, right, calling saying he was stuck and he's wire money transferred or whatever. I'm not sure if you guys, I wrote about the story a couple weeks ago. But, you know, this congressman went up and talked about the convincing nature of the call. And how he very quickly, his son, right, not really his son, pretty quickly switched to text messaging. And the text message followed through with all sorts of demands. And, you know, the father sitting here thinking that this is him actually talking to his son until he decided, let me call him back, called him back. And his son's like, what are you talking about? Right? So I think there's this like mutual authentication that needs to occur. And you know, things like passwordless authentication, you know, one time tokens, things like that are very important. You know, there is almost a improved psychology, I would say, that needs to be in place for most people today to understand that they might be talking to AI. They might, and they may not know it.

Joe Patti22:07

I think we need to look forward to the future that is way closer than we think. I mean even within the first few months where you can even get, you know, say a video call from someone. You can hear their voice, you can see their face and it's not them and it's completely interactive and they're having a conversation and to go even further with it. It may speak in, you know, use phrases they usually use, speak in the manner they usually do. And what does that leave us with for all our friends? Do we need to have a set of secret code words or something to use to duress passwords for everyone we know? It's really kind of crazy when you extrapolate it out.

Adam Roth22:50

It's funny because we brought this up too, right? So if I'm having a conversation with maybe Joe, And Joe and I, it seems like we really know each other. I mean, obviously we do. And I'll say, I've used challenge questions to people, not Joe specifically. I'm like, am I speaking to the right person, right? You know, hey, do you remember what we did last year in March? Yeah, I do. Well, and they say, tell me what? Adam, why are you doing this to me? Do you have a cold or something? Yeah, I have a cold. Your voice does not sound perfectly right. So I'm just using a challenge question. Now, people can think it's paranoia, but in this day and age with all this AI, you gotta be really careful. I'm gonna date myself, Michael. About 40 something years ago, I was watching a movie, and in the movie, they started killing these actors. And they started making them into virtual videos. What they wanted to do is they didn't want to pay these actors anymore for what they were doing. And they, do you remember that movie? Can you tell me? What movie was this?

Joe Patti23:57

I don't remember this one.

Adam Roth23:58

I'm gonna find a few. It was 40 years ago. And I was watching, I'm like, wow. And I keep on thinking to myself, even recently, what did they know 40 years ago that we didn't know now, you know? So it's crazy.

Joe Patti24:10

Well, science fiction, I guess someone thought of it.

Adam Roth24:12

Well, it's like Star Trek, right? It's like Star Trek.

Joe Patti24:14

Now they're, well, yeah, now they're actually doing that with, I saw some show, I think it's opening up in London. Elvis too. Yeah, they're basically, Faken Elvis.

Adam Roth24:23

Yeah, yeah.

Joe Patti24:23

He's gonna do a new show.

Adam Roth24:25

And one of the, I don't know if it was a winning act on America's Got Talent it was, they had that virtual box, you remember that Michael? Where they did, they did like Simon and everyone else singing a song and it looked like it was really them. It was, it was a really good act.

Michael Silva24:41

I think it was George Carlin just got like a whole hour long comedy tape released with all of his material. And it actually sounds really good. And it's really funny. Really? Yeah, I think it came out like whatever was like a week or two ago. But Adam, to your point on the challenge questions, I mean, that that's great. I actually love doing that. You know, my my friends and I, my close friends and I have kind of a running joke that type of thing starts to happen. But You know, if you're really super paranoid, you can also start making up stories and get people to agree to them along the way and just see how far it goes. You know, like if you start giving them a little bit of incorrect information, they agree. Then it gets weirder and weirder and weirder. Oh yeah, I remember that. I remember that. I'm going to be very careful.

Joe Patti25:18

I'll tell you the problem I have, and you remember this, Adam, when we, uh, when we were talking about Eric, a lot of people tell me, Oh, don't you remember years ago, we did this, we did this. And I'm like, you know, I think there was a lot of drinking involved. I don't quite know. I think I got that story locked down.

Adam Roth25:33

So anyway, I'm going to be very careful here. My challenge questions on a lot of sites have answers that have nothing to do with my real life. It's just questions I had specific answers for. So like, if you know my history, like, um, What is my mother's maiden name? It's not the real maiden name.

Joe Patti25:54

So do you put down like your favorite sport is balloon racing or something like that?

Adam Roth25:58

My favorite, my mother, I put down my mother's maiden name is Patty.

Joe Patti26:04

I was gonna say like some Italian name or something.

Michael Silva26:07

Adam, I got a funny story to tell you on this one. So my son, my eight-year-old son is really interested in security. And he got a tablet from my wife. My wife authenticated him into the tablet that she uses, right? That's problem number one. I didn't aware that was going on. I wasn't aware that was going on. So I pull up to Whole Foods one day to pick up a food delivery order. I'm with my sons because you need something to do with the boys. And the guy rolls down the window. He says, oh, do you have an order? And I said, yeah, we do. And he goes, what's your name? I said, Michael Silva. And he says, no, that's not the name on the order. I'm like, are you sure? He goes, yeah, yeah. I said, is it my wife's name? He goes, is it Georgina Silva? He says, no, it's not that either. And I'm like, what the hell is it? And the guy looks at it. He starts smirking. He's like, And I'm like, what the fuck? Like, what are you? What are you reading there, bro? And then my kid in the back goes, Dad, it's hacker 1234. So here I am at Whole Foods. I got to look at the guy. I'm like, I got an order for hacker one, two, three, four. And the guy goes, yep, that's the one. And I look back and I say, what the hell is this? So my kid actually went into my wife's Amazon account and changed his name. And, you know, he wanted to be like a hacker. And then he got something in the mail, like targeted mail that said to hacker one, two, three, four, showed it to me. It was like a football promotion from Amazon video or something. He was so proud of himself. So yeah, I think you're right. It doesn't have to be your name.

Adam Roth27:24

So I did that stuff like, what's it called? Starbucks. I always use like a fake name, like, you know, and sometimes I try not to be really, really bad. I haven't done this in years though, but I mean, I might say something like, you know, something funny, like when they read the name, it sounds like a phrase, but yeah, I don't use my real name sometimes either, so.

Joe Patti27:44

I thought you were going to say that can put it down as like homosexual or something like Bart Simpson.

Michael Silva27:51

Let me tell you this show. I got every night I play Xbox, right? Just for fun. I just catch up with my friends and Xbox restricts the names that you can use based on certain profanity filters. So you want to talk about like, you know, crowdsourcing creativity. There's about a hundred people in each game. And every game, there's one guy that's got a name that's just not right, but it's not profane either. And I actually have a written version of it. I'll share it with you at some point. It's cool. They managed to slip something through. It's amazing. Phenomenal.

Adam Roth28:20

That happens with license plates too. People like New York state and all the different DMVs, they have all these words that you can't use. And every once in a while, somebody's able to come up with something profane. They put it in there and then like three, four months later, they go, we're changing your plate. You can't use that plate. I hear stories about that too.

Joe Patti28:41

Like, you know, now you see that's a good use of, of generative AI. Just like, look, I need a bunch of suggestions for a screen name. That's not profane, but has the connotation of this, you know, you got it. But on a more serious note. So again, we've talked a bit about the deliberate deception. There's deliberate deception, there's inaccuracies or biases and things coming through because that's what the model is trained on. And as if that's not enough, sometimes just because of the way the AI works, you don't get the right answer back. the suggestion that it's almost like, like the Ridley used to say, it's not about the answers, it's about the questions. It's like, it's probably impossible to know if you get an answer that's right, but supposedly you can ask better questions or maybe know the questions that you shouldn't be asking. I don't know, Michael, have you encountered that or any thoughts in that area? Yeah, we do. We encounter this all the time.

Michael Silva29:54

Okay, great. I mean, really, I hate to say it, but the answer is so simple you wouldn't believe. it depends on the data that you're using to answer the question. That's ultimately what it comes down to, right? So I promised you guys I'd explain mad AI disease to you guys earlier. So the gentleman named mad AI disease, I'll share this picture with you. It's unbelievable. It's got like three cows that are actually insane looking with the names of the three popular AI search algorithms above them. It's a gentleman named Gus, I can't take credit for it. But there's actually academic white paper written on it, and I think it was either Vice or another news article wrote a similar story. The output of these LLMs is becoming the input of the LLMs, right? And this is a huge problem. So I think the research paper that I posted up showed something like, you know, 40 or 50% of the new content that's coming up is AI generated at this point. So everybody wants to be creative, you know, by taking real human content and then publishing the AI results of that. And then what happens with these major LLMs. How do they work? Very simple. They go out and they index a whole bunch of data. They create search indexes based on that. They take your prompts. They point those prompts at the search data. They get the results back. Then they take those results of the data, the chunks of that data, and they answer you in natural language. That's how it works. The problem is when you're starting data set is poor quality, and you regurgitate more poor quality data back into the same pool again, and that cycle just continues. That's kind of like what happened with mad cow disease.

Adam Roth31:23

Oh, I get it. That's really nasty. Why don't you describe mad cow disease for real? Or are we going to leave that alone?

Joe Patti31:40

mad cow disease comes, essentially cannibalism. When they feed beef products to the cows, it makes them crazy.

Adam Roth31:51

It goes to degenerative brain disease. Okay, you feel better? I described it. But it's not so much it's poor quality, it's poisoned data. How do you say it? Poisoned

Michael Silva32:09

Yeah. Well, this is the whole point, right? Is that, you know, by the way, the gentleman's name is Gus Bekdash. That's the gentleman's name. Thank you, Gus. Thank you, Gus. We'll tag you on this later. No, but you're 100% right, Adam. And that is really the problem is, you know, however you look at it, the algorithm itself can be great. But, you know, OpenAI's algorithms and really any of these algorithms are really, like I said at the start of the video here, They're search engines, right? They're ways to get an answer. It's to connect your question to an answer as quickly as possible. And to your point, the truth comes out of that data. It's all about where the data comes from. So the real question here is, how do you take the power of these LLMs and put it against data that you know you trust to get an answer with all the benefits of the large language model, but have it be factually correct.

Adam Roth33:04

So let me throw this at you, Michael, right? There are SEOs and people who say, hey, guess what? I can get you to the top of the search engines. And how do they really do that? They kind of sort of poison the search engines with certain, you know, meditator and words and everything else. So let me ask you this, is there anybody out there that's doing the same thing with AI. Can you hire somebody to get a certain answer? Can you game it?

Michael Silva33:30

Yeah. Yeah. So this is a really good question, right? This goes back to the start of, you know, can teachers identify AI written papers, right? That's more or less the question you're asking. And the answer is no, you really can't. But it becomes increasingly likely over time that older versions of that data will be identified as AI generated. And this is one of the things that makes me laugh so hard whenever I see somebody, some congressman or politician talking about AI watermarking. I mean, you want, that's the most insane thing I've ever heard of. Like there's just no, if you, if you generate, how would you describe AI watermarking for us? This is, this is kind of a ludicrous concept, right? Like in theory, it sounds great, right? If you look, if you're familiar with steganography and computer systems and security, it's the, it's the art of hiding information within information. Right? You know, it's like, oh, the first letter of every sentence creates this like, you know, crazy message for the next guy. Or, you know, if you look really closely at the way these words are organized, and you sum the, you know, alphabet position of the length of each word, like it creates a new sentence. Okay, this has been around forever. Right? That's steganography, hiding information with information. But they do that with an AI so that text as it spits out will have some kind of noticeable pattern? No, I'm getting there. That's not the case. No, no, it's okay. It's okay. That's the traditional way you would go about it. AI, because of its generative nature, it's creating an answer for you based on what it has seen in the past as the next most likely thing to say. That's how it works. Right. It looks at what is likely to be correct and then returns that back. So it's not actually cognizant of what it's writing, which is why it makes me like it makes my head spin when people talk about these systems becoming sentient, because it's more like staring into a mirror of your own data. Right. It's not like it's sentient. It's just like, it's a great copy of you. Right. So, but it's not like aware of what's going on. Right. I mean, maybe some future version would be, but you can't watermark that output because the output that's coming out is creative. And even if, even if you could. Right. Then the question becomes, how do you even accurately identify it? And there've been lots of people who've tried to do this and you, you cannot today, you cannot, what you can do, get back to the SEO point is you can take enough old data that looks like generated a generative AI data. you know, in conclusion, you know, at the end of the paragraph, you know, multiple bullet points and all that. There's a format to it, but it's not, it's not a perfect science. It's a degree of confidence. And that degree of confidence is so poor right now that nobody can trust it.

Adam Roth36:01

I got a weird thing about watermarking. Did you, do you know that your printer at home watermarks a serial number every time it prints something?

Joe Patti36:12

Is that like, is that a conspiracy theory?

Adam Roth36:14

No, it's not. No, Google it. So if you, if all printers... Let's ask the AI. Michael's checking it now. In the bottom of the page, every printer, every modern day printer has these super small pixels that provide a serial number in the bottom of the page that people don't realize. And that was to prevent counterfeit money and other things. It's crazy, especially if your packet has a whole paper on it.

Michael Silva36:50

I just checked this on AI and it says that some printers do, some printers don't. However, the current pattern is printers sold to a gentleman named Adam Roth tend to have a higher degree. No, that's not true. I'm just joking. The answer is it's a mix according to AI. Yeah.

Joe Patti37:11

Okay. So Michael, we're getting into this now. So let me ask you this. Like I said, I know what the generative AI, how it's essentially predictive, how it does it. They say it can create, but I mean, the more I look at it, the more I wonder, and knowing that they largely do text processing. I mean, there are other things too, where they write code and that kind of data analysis. But when we're talking about the text processing part of it, Is it really creative, or is it transformative, but it looks like it's creative? You know?

Michael Silva37:54

Yeah, I think it's a combination of both of them, you know, I think that when you like the concept of Truth and creativity are almost like independent philosophical questions, right? So, you know to adam's point earlier about what color is this guy? Um, you know when you create something out of nothing, I mean, yes, it is Creating something right because you're giving it a question. You didn't have anything as a result and it's creating something but it's well, you're creating something But but is it truly?

Joe Patti38:19

Is it creating new information, something of substance, or is it taking what it already has or what you give it and just transforming it into something else?

Michael Silva38:30

This is a really good question. This borders on the copyright question and the data ownership question. Data that's produced can be a copy. Right. It can be a complete copy paste. I've seen that a lot. Actually, it can also be unique. And because it is a gray area, there are enough instances of people who will complain that their data was effectively copy, you know, pirated for lack of a better phrase, you know, copy pasted, and then you're taking credit. Um, that they say, oh, it's the AI tools fault. And, and this is a whole nother question around, you know, whether or not data that's public is actually public. And, you know, it goes to that crazy question of like, well, your wifi is leaking into my house. Therefore I can use it. You know what I mean? If you look at what Microsoft's official position on this is they'll pay your legal bills. Right. Microsoft will actually, if you have a copyright claim. If someone else produces a copyright claim against data that you have produced using Microsoft's generative AI, Microsoft will pay your bills. Now, is Microsoft actually doing this? I mean, probably I have to assume that they are, but to me it just seems like, you know, they're just puffing their chest as the, you know, 800 pound gorilla in the room going, you can't touch us, we're too big.

Joe Patti39:44

Well, that's it. I mean, I'll tell you, I found that really interesting when it came out and I think it is a huge differentiator for them from a business perspective. And, you know, yes, they are the 800 pound gorilla and everything, they're not afraid of lawyers, but I mean, the way things are going, it only surprised me because the potential liability for that is,

Adam Roth40:05

beyond massive, I mean. Now you got me thinking though, like we're talking about data that exists, but if you go to like a, if you go to an AI application and you say, what's the meaning of life? They're going to get that from 5,000 other publications that are on the internet. If you say, is there a God? He's doing it right now. Is there a God?

Michael Silva40:32

What's the answer? I want to just confirm a number here, because I'm pretty sure it's 42. Sorry. Yeah, but sometimes it's, you know, better.

Joe Patti40:40

It's 42.

Adam Roth40:41

Very funny. Is there a God? Is that copyrighted? Can a bot, can an AI bot, you know, have a consciousness? Because it can't, right?

Joe Patti40:53

It's not, but... But the thing is, if what really happens, as I understand it, Michael, you can correct me, because you know more about this than I do, probably. When you say, is there a God, it's really going to take, it essentially goes and takes all this stuff that's been trained on, on that subject, all these thousands of articles written by someone else, and goes and grabs it and produces something based on that.

Adam Roth41:18

But it's regurgitating, Joe.

Joe Patti41:19

It's regurgitating. Well, it's regurgitating. And I am not a lawyer. But I believe that people are claiming that that is essentially a derivative work when they're doing that.

Adam Roth41:31

It's not like, it's not like the bot is saying, I had five, 42 sources. Sorry, Michael. It's 42 sources. And those sources have led me to believe in my own consciousness that there is a God. It's not, it's not coming up with its own information. It's coming up with information that's derived from multiple sources. I'm waiting for the day that we have AI and it can come up with his own thoughts.

Joe Patti41:59

That actually gets into a very deep philosophical question too, because you say it comes up with his own thoughts. However, if I ask you, is there a God? Well, you didn't invent your answer all by yourself. That's based on a true, but it's also based on your own training and culture you come from and the things you've read.

Adam Roth42:21

I'm also looking for Sarah Connor. Have you seen Sarah Connor? No, I haven't. Move on to the next one. I'd like to see.

Michael Silva42:36

I'll be back.

Adam Roth42:38

That's when we start talking about Skybot, right?

Joe Patti42:42

Yeah. When we start, when we start pulling out the, pulling out the shorts and Edgar, you know, we're getting to the end of the road here. It's getting weird, but this, but this stuff kind of does get to the heart of a lot of the AI discussion, but Michael, I do agree with you. I think it's level of creativity is much, much lower than, than people think it is. I do not think it's anywhere near.

Adam Roth43:03

Don't be conscious. The only reason why I bring this up is because, you know, Michael's talking about cyber warfare. And yeah, I know, Joe, you're going to make fun of me. I want to do a PhD. My PhD is going to be revolving around cyber warfare. Yeah, I know that we have kinetic wars, but I really, truly, honestly believe with all these countries moving towards AI, whether we're going to use whether we're going to have cyber attacks or manipulate data to have a cyber war to have these these these computers and these networks become in a kind of sort of semi consciousness that says I want to do these things to this country, whether it's. shutting down power plants because it sees a threat to AI or it decides to launch a nuclear weapon, that's where warfare becomes crazy. And we have no type of treaties other than nuclear, biological and like basically kinetic warfare. We have no treaties based on AI and no treaties based on cyber warfare really from a worldwide level. There's some countries that have their own little small treaties about cyber warfare. But when we start talking about AI, we get into a very dangerous place because we have taken that level of control out of humans and now into artificial intelligence.

Michael Silva44:29

Yeah. I mean, Adam, I've got some thoughts on that for you. I don't know if you want to hear them or not.

Adam Roth44:32

I definitely do.

Michael Silva44:34

Okay.

Joe Patti44:35

Yeah. We're headed towards last call here, even though we're not drinking. So yeah, let's make the really frightening stuff how we wrap it up. Definitely want to hear what you think.

Michael Silva44:44

Well, in fairness, we started with information warfare. So Adam, there's a couple of bullet points just to kind of keep it succinct. First is just Personal experience, I'm not a warfare expert by any stretch, but the people that sign those treaties are the people that are going to abide by them. It's not the people I'm worried about. It's the people who have access to the same tools, and they take AES-256, and they create Mujahideen secrets too. Those guys don't give a shit about a treaty. They're using the same technology, but they're using it for really nefarious ways. So I feel like the amount of the threat level just in general is just going to be constant. And I think that the amount of information that is now going to be generated with the ease of these tools is increasing. I think the likelihood of this information being poisoned, like you said, is really high. And I think it's getting higher because these tools are getting easier to use. So, you know, really as a, from a cybersecurity standpoint, you know, the message that I would have, you know, to all of our listeners today and really to anybody who's in this field is, You know, make sure that you trust where you're getting your information from. That's really what it comes down to. We're pivoting towards a place where information will start or stop wars. It starts and stops financial aid from coming and going to countries. You know, it changes public opinion for things like, you know, elections and other things like that. Whether or not that information is true is in large part due to the volume And nation state actors have figured this out, mass posting on public forums like Reddit, you know, with bots and things like that. This is not a new, propaganda is not new, right? It's just a lot easier now. So the question that I would pose is, where are you getting this information from? Can you trust the tool that brought you that information? Because if not, I think there's real risk to just saying, oh, I read it. Therefore, it must be true.

Adam Roth46:28

But trust is subjective also, Michael.

Michael Silva46:31

Of course, yeah, it just depends on where it's coming from. Yeah, I agree.

Joe Patti46:35

Yeah, but you know, it's funny how, you know, getting to the end here, we end up really, again, at the basics and the fundamentals of security, that it's about data. It's about trust. It's about who you're talking to. It really still comes down to that. And it seems like it always comes down to that no matter what the new technology is. All right. Well, this has been An unexpectedly heavy discussion, but we really got into it. That was very cool. Um, Michael, thank you so much for joining. This has really been a really been interesting and a lot of fun.

Michael Silva47:11

Yeah. It's a real pleasure. I really appreciate the invite. It's great questions, by the way, from you and Adam, we're really just scratching the surface of this. So, you know, we'd love to talk about it further, but, um, any listeners have any questions? I'm sure you'll be happy to answer them for him. It's a real pleasure to talk with you today. Absolutely. Thanks, Adam.

Adam Roth47:27

My final last words are, as we look to the future, we're going to see things like how autonomous vehicles are going to be driving on their own, trying to understand whether or not there's a possible collision or things like that, drone deliveries, trying to establish where to land and how to deliver things, a lot of AI in that. And then what's going to be really scary, my final last words are, when you start having offices call you for your doctor's appointment, and it's not a human, and they're asking you for information to validate who you are, and they're talking to an AI bot, and you don't know if that bot is real or not. So these are gonna be crazy things that'll be coming out in the next couple of years.

Joe Patti48:13

Okay, Michael, Adam, thank you both. Thanks everyone for listening. And oh, don't forget, like, subscribe, follow. Send us feedback, we'd love to talk to you. And have a good day, everyone.