The Oracle Database + AI: Global Strategy Announcement
- 7 May 2024
Oracle hosted a groundbreaking online event with Juan Loaiza and Larry Ellison that you won't want to miss - the Oracle Database + AI: Global Strategy Announcement.
In today's rapidly evolving business landscape, leveraging cutting-edge technologies like AI is crucial for staying competitive. Oracle, a global leader in database solutions, is at the forefront of this revolution, seamlessly integrating AI into its database offerings to empower businesses like never before.
Subtitles for the video
Kendall Fisher: 0:09 Hi, everyone, welcome to the global launch of Oracle Database 23 Ai. To kick off this monumental release, we've got one Louisa, Executive Vice President of mission critical database technologies, and a special guest, our founder and CTO, Larry Ellison. So without further ado, over to you, one.
Juan Loaisa: 0:32 Thanks, Kendall. And thanks, Larry, for joining me today for a discussion about Oracle Database 23. Ai. This is a game changing release for the industry, we have over 300 major new features in Oracle Database 23. And we have 1000s of enhancements. And there's three main focus areas that we looked at. One was AI, the other one is developers, and the other one is mission critical. And today we're going to discuss some of the major features the major game changing features in this release. So let's start with AI. So probably the most important feature in this release for AI is AI vector search. So just to give everyone a little bit of background, AI vector search is a new capability that allows the database to search data by its content. So for example, you can search for documents by its content, not the words in it, but the overall content and concepts in it. You can search for images, you can search for videos, you can search for audio, you can also search relational data by its conceptual content. So your thoughts on AI vector search, Larry?
Larry Ellison: 1:41 Well, being able to look at an image, converting everything to vectors, I'm converting an image, converting a string of base pairs for DNA, reducing everything to a vector allows the database to actually find similar pictures. So if I want to find all the pictures of one lab ways in my in my album, I can do that. Actually, I tried that yesterday. One, I don't have any pictures of you. But But But I confirm that with the Oracle vector search, the the system allows us, as you say, to search the contents of the data and find similarities, you'll we've had vectors, vector searches and similarities. Similarity searches for a very long time. This is not one of the newer things, newer things, but the vectorization. But now it's all stored in the database, all these we can convert virtually anything a language, that's how the large language models work, they convert language into vectors. And they look at phrases that are similar. They can bind phrases that are identical or almost identical to how they track down plagiarism, by the way, right? Right. Now, plagiarism is virtually impossible. Because I can do a vector search and not that doesn't have to be an exact match of what you wrote. But if it's very similar, very similar to what what you wrote, the vector search will find it and then it's this interesting ability to embed to convert data into vectors and then find vectors that are similar to one another are what we say what they say it's close in the vector space.
Juan Loaisa: 3:23 Larry, what are your thoughts on vectors? Should these be delivered as a standalone product? Or should they be delivered as a feature of a database?
Larry Ellison: 3:34 I think, I think it's a feature. And, you know, there were XML databases, if anyone's old and I'm old enough to remember XML databases. Every time there was a new data type, if you will, a new every time there was a new feature, someone thought this is a great opportunity to develop a very quickly develop and deliver a new kind of database. And, again, XML is my maybe my favorite example, it didn't last very long. But but we had object databases, we have relational databases we have in bed, but the answer is, I think the answer is always that all of your data should be in one place, it just makes life much easier to meant to ask a question as a query that spans spans, you don't know what you don't anticipate the query you'd like to have. The hardest problem usually is finding the data. And but if you keep all the data in one, one database that requires that that database handles every data type, whether it's a graph data type, whether it's a vector, whether it's XML, whether it's an adjacent adjacent document, whether it's a sequel table. So we think that the right way to approach this problem is to have to have a database that can manage all of your data, and then may end and do it in a highly performant very and very economical way.
Juan Loaisa: 4:58 It's interesting new technology. Yeah, and you know, other areas are finding anomalies, which can be things like fraud, or broken parts in a factory, things like that. So it has, it has a lot of different applications. And of course, one of the main ones is to combine it with LLM with AI chat. So now you can basically do natural language AI chat with the data in your database. So for example, you can ask it questions about your bank account, or your phone account or your medical history. And we the local database will take that question and use AI vector search to find relevant data in the database. So your medical records, your account data, your phone records, and then we can answer the question using the LM again, using natural language. So that's a big new thing. Yeah.
Larry Ellison: 5:46 Well, that I mean, they call that retrieval augmented a generation where we can do that a couple of ways. We can do that with prompt engineering, where you can actually include some of the data in you, before you acts ask your question, you kind of tell the the large language model, give it some context before you ask your question. But actually, that doesn't scale very well. And by putting, specializing and personalizing large language models with your data, is something you can do with the with the Oracle vector database. So you have a trained, foundational model that is trained, but it doesn't know Adobe doesn't have your backing records. And it doesn't have your personal email that you know that that's not public data that these large language models are trained on. But you can specialize these large language models, with your personal data, that's one of the things we can do, you can specialize these large language models with the latest clinical research papers that have been written and come out in the last month or so. And make sure that the the foundational model is up to date, and it's training it has, it is up to date up to the minute because the very latest news is in the vector database. The other stuff is available was available when the foundational model was initially trained.
Juan Loaisa: 7:09 Yeah, so that combination of database and and the LLM is is going to be amazing for users. So a lot of amazing new capabilities that were never possible before. We've also added AI all over the place, we've added a exa dated accelerate AI, we've added data, AI to Golden Gate. So you can you can distribute your your vectors across your entire enterprise, you can pull in data into Oracle Database 23, that you want to vectorize and do a search on without having to upgrade your production database right away. So there's a lot of new capabilities. Do you want to describe what our pricing philosophy is, and this AI technology inside the Oracle database.
Larry Ellison: 7:49 As you and I discussed, before we made the decision. We think this is a transformational technology, artificial intelligence. And we think that the Oracle database, the the the Oracle vector database, is a highly scalable, very powerful, highly secure system that can store your personal data, highly secure data, medical record data, those kinds of things, keep that data very, very secure. use that data to make the neural network smarter. And we think that is so important, we want to encourage you to we want many people to see this capability and use this capability as quickly as possible. Therefore, it just as my mother would say comes with the mail. There is no extra charge for these AI capabilities. They're built in to 23 Ai, the latest version of our database, and everyone, everyone can use them. Yeah.
Juan Loaisa: 8:49 So it's just there. So AI is the new normal. It's just going to be everywhere. And we're making sure that it's it's available for all your data. So let's move on to another topic developers. So another big focus for us making developers more productive, making developers making data easier to use by developers. So a big focus of ours was the unification of JSON graph in relational. So we've had JSON and graph available in the Oracle database for a lot of years, we have literally industry leading technologies. But up to now you had to choose one. So you format your data as either relational JSON or graph. And once you've chosen that one format, you get all the benefits of that format. But you also get all the limitations and drawbacks to that format. So the big new thing in 23 AI is we've unified these. So now you can store your data once. And then one part of the app can treat it as relational and access it using SQL. Another part of the app can treat it as JSON, both read and write it as JSON using JSON API. So you get all the benefits of JSON, and then another part of that can treat it as a graph and run graph queries. So we think this is is a huge development, not just for us, but for the entire industry. And it will greatly simplify development make developers much more productive. So you've been around this, this whole object representing objects is has been an issue for databases for decades. And you've seen the evolution of this thing. What are your thoughts on this new technology?
Larry Ellison: 10:19 Well, I can just tell you from our own experience in building applications, the first, the first thing people decide, well, I'm not going to build fundamentally an object schema, or I'm going to build fundamentally a relational schema to you know, two dimensional arrays. And there seem to be no right answer, because at Oracle, we built some of our applications with object schemas, primarily object schemas on the which we had to manage in the app at the application layer. And we built some of them with, with underlying relational schemas. And now but yeah, and and then we do a mapping between, but we will always do a mapping between the objects and and the relations because a lot of the transactions a lot of the data entry is much more convenient in in JSON and JSON objects. And people plus people don't like to, before they start programming, completely define their relational schema, a lot of programmers resistance equal and I mean, no SQL used to be mean that no SQL now it means not only SQL, they decided they really didn't want to give up SQL entirely. But a lot of programmers want to get started building their application and start collecting information as quickly as possible. And they don't want to work out in advance what their schema is going to be. So what we did is let you Okay, define your JSON objects. And we will generate the relational schema from that. And the beauty of that is, you can then query all your data, which is much very hard to do, if you actually use an object database. Having a powerful query language is not something object databases come with. And it is fundamental to relational. So we have all the power of a SQL query language to sit on top of the objects. But we do you can view them as objects, JSON objects, or as real as relations, they coexist, you do not have to predefine the schema, you're a programmer, you can get started writing your application. But what all the trouble they get into is when they write their application without pre defining the schema, if they want to make a change, they want to evolve that schema, they want to add things to it, they want to modify things, it's very difficult to do with an object like schema with underlying relational schema, the schema of evolution is quite easy. So there are advantages of objects, there are advantages of relations to the two dimensional tables. With Oracle 23. Ai, you get both you get all the advantages of objects, you can use objects, you from your point of view, you can think of as entirely as an object database. Afterwards, you said wait a second, I need a powerful query language and I need to do some schema evolution. No problem. It really is both. You get the best of both worlds with this unification. And it's completely seamless. I, some users can think of it one way and somebody is just thinking but the other way. It all works.
Juan Loaisa: 13:13 Yeah. So this is a problem that we've been trying to crack for decades. And we feel like we're there. Now. This is it, you know, it actually, once you figure it out, it's much simpler than then some of the things that we tried in the past. It's interesting.
Larry Ellison: 13:27 Oh, it's all automated, we used to have to do this at the applicate application level, right, we had to do a lot, it was never completely transparent to move from objects to relations. The the application had to be aware of it. Now that's not true. The application developer, the users can see objects if they want to, you know, pop a switch, and they see that same data in relations, they can operate, operate through both sets of
Juan Loaisa: 13:54 API's, right? And graph is yet another benefit, which is you can run these graph queries to navigate. Right?
Larry Ellison: 14:00 Yeah, no, no graph is yet another example of a model that we've incorporated. So we really have unified three separate models into 23. Ai, right.
Juan Loaisa: 14:10 And it's not even application by application, its use case. So you can use it, you know, one format, then use it a different format.
Larry Ellison: 14:18 Oh, exactly, exactly. So you can you can build your all your data entry applications using the JSON schema. And you can build all your analytic applications using SQL tables. And in the SQL language, and those two people and those two different use cases, same application, same data, same everything, just different use cases, analytics versus data entry.
Juan Loaisa: 14:40 Yeah. Yeah, that's great. So you mentioned that no SQL so in the last decade, there was a big push for no SQL and also for getting rid of transactions transactions were considered kind of, at best an annoyance and at worst kind of a problem. And so there was a move the move transaction management into the application. Now, one of the things we've done with Oracle Database 23, is we've addressed some of the long standing problems with transact, because transactions had some issues. So among those are stateless transaction. So like rest transactions, you know, how do we read data and write it back and make sure that nobody wrote it? Well, well, while I was looking at it, another one is long running transactions. We don't want to lock a row for a long time that blocks out other users. And another one is microservices, how do you keep the microservices consistent? So these are areas that we put a lot of work into. So I think, you know, already that trend toward moving transactions into the application is really kind of dying out, because people have been trying, it's not working so well. So now I think I think what really kind of get back to, you know, data integrity. I mean, the whole idea of data integrity is kind of a lot of people have lost track of that data consistency. So your thoughts on this?
Larry Ellison: 15:53 Well, again, it's a matter of rediscovery. Yeah. Transactions can have complexities certainly long running transactions holding locks for too long. You know, Oracle's always had a unique locking model, we really don't, we have update locks, we really don't have read locks, which is very unusual, I'm not going to spend a lot of time talking about it. But it solves a lot of long running transaction problems already, but it wasn't. But there were certain cases, there were certain cases where longer running transactions were problems with the Oracle database as well. And we've addressed that and by the way, moving this into the application, you're simply saying, Okay, I've got this great idea, rather than solving the problem once in the database, and the application, developers don't have to worry about it at all. But every time you have you write an application, you have to handle trend, you have data consistency in the application, Oh, that makes your data, your application almost impossible to write. And it makes your application 10 times harder to read, it's not really a very good idea. As much stuff you can shut down in the stack, the better to ensure you got data, data. And same thing with security. If you want to shut that down. You don't want the application developer at risk, but be responsible for security, you don't want the application developer to be responsible for application gets very data consistency, that should be the responsibility of the database. So the application developer focuses on getting the job done building that application and inheriting from below from the database data, both security, consistency, reliability, all that should be done at the database level, not the obligation.
Juan Loaisa: 17:30 Yeah, the idea of you have the integrity of your data is as good as your worst developer on his worst day. What do you think of that idea? Larry, is that a good plan?
Larry Ellison: 17:40 I think, because we were a very unusual hyper scalar. We both provide infrastructure, things like like autonomous Linux and the Oracle autonomous database. And we also build a lot of applications, whether it be in healthcare, or an ERP or human resources, we we build a large bunch of different industries, we build a lot of applications. And, again, I think you want those application developers to be highly productive. And so you don't want to you want to do you don't want to burden them with the problem of getting data consistency, right. And plus, you really don't want to have to audit every single application to make sure there, you know, the data consistency is programmed properly in the application. That should be something you simply inherit from the underlying infrastructure from the underlying database. Let's keep the application developers focused on the applet what the application does not under underlying problems like data consistency. Yeah.
Juan Loaisa: 18:40 So here's another related issue that that new technology has come around, which is caching in the middle tier. So caching the database in the middle tier. So there's a number of products now in the market that cache data. And the way this works is the actual application developer, reads the data from the back end tier and puts them manually into the cache and then manages their consistency in the cache. So this is kind of what's in the market. One of the features that we've added Oracle 23, is called True cache. And what this is very simply a real cache a true cache, not a folk cache. So we have an applic a cache in the middle tier, the developer, just, you know, runs the regular database operations against that query SQL JSON, as data is, is not as it finds data not in there, it will automatically pull the data from the database, you don't have to do that manually. And then once it pulls the data in their database will automatically keep it consistent. So you don't have to worry about hey, somebody changed the data when I have a stale copy of the data. Now I got to deal with this thing. So So again, very similar thing. What are you what are your thoughts on this whole area of caching in the middle tier?
Larry Ellison: 19:48 Well, I think what people were doing at the application level at level building their own middle tier caches, was just a way of monitoring the market and seeing that they wanted better performance. So, you know, someone once joked and said, Three ways to improve performance is Caching, caching and caching. I think that's a bit of a simplification. But there's an element of truth in that. And so having this mid tier cache up for the database is very important. But once again, it shouldn't be done at the application layer, the application should just expect the database provide that kind of performance by storing a can data consistent cash for them in the middle tier two is they should get the improved performance without going into making any extra effort in their application. And that's what we've done by providing true cash.
Juan Loaisa: 20:38 Yeah, basically, it's impossible to keep it consistent for the you can't even if you try, so this is a case where no matter what you do, because you don't know when someone's changing the data in the backend, so and your data goes forward and backward in time, because part of the data you've loaded earlier and later, and it's changed and, you know, going forward and backward in time in the application. That's a hard problem.
Larry Ellison: 20:59 It is a hard problem. But a lot of things they they take they take on at the application layer is what's called a digital cry for help. Okay, look, I just need this performance. Not. So I'm going to try to do it this way in the obligation because you're not providing this for me. So we watched what was going on in the marketplace, we realized we could dramatically improve Oracle's Oracle's performance. From where we were with is the idea of a mid tier cache was a good one. But let's make sure we do it properly and the database and not burden the application developer with
Juan Loaisa: 21:32 What? Yeah, that's exactly right. So one other thing, let's talk about one more thing, which is globally distributed data. So now, there's more and more countries that are passing data sovereignty data residency laws. So that means that country's data has to be stored inside the country. However, if you're an enterprise, you want to see the data as one global unit, as you know, regardless of what country is in, so you can run your operations on that. And for that, we've introduced something we used to call a shard. sharded database. Now, it's called a globally distributed database. And it has two main functions. One is you can place your data wherever you want in the world. So European data goes in Europe, Indian data goes in India, American data goes in America, everybody's happy, and it's all transparent. The other thing, of course, you get much higher scalability. So this is a technology we've been working on for a long time, we've been enhancing it in 23, we've enhanced that a lot. We've added something called raft replication, which allows much faster failover. So in single digit second failover is when you when you have some kind of an issue. So what are your thoughts on the whole data regulation, data sovereignty, you know, this thing, presumably, is only going to increase over time.
Larry Ellison: 22:41 So we keep the simplest the illusion of a single database. Even though it is not a single database, it's partitioned geographically, there are also partitions that exist for reliability purposes, and performance purposes. You know, there are caches, and there's lots of copies of that data. But But again, once again, automatically, we adhere to the different data sovereignty laws in every country, that Swedish laws might be somewhat different than the Ukrainian laws. And we implement we implement this and implement the recording the requirements of what data has to be unsweetened, what what data has to be in Ukraine, we implement that all automatically. And from the point of view of the of the organization that that owns the data that's running, that's running the application, it looks like a unified global database. But from the point of view of compliance, regulatory compliance, we partition it, and obey all of the rules. So we just had once again, trying to make life easier for people who are developing these applications and have to build global applications yet comply to local sovereignty laws.
Juan Loaisa: 23:58 Yeah. Yeah. So overall, I mean, there's we've talked about a number of these technologies, that they're not just better, they do things that were never possible before. So the AI the vector search, the LLM is the ability to do transactions across microservices long running distributed. Things like the true cash. You know, we talked about the distributed database, the JSON, unification with relational graph unification worldview, these aren't just additional features. These are things that had been pain points for the whole data management industry for decades, forever, that we're addressing in this research. It's really I believe, a monumentally. So what do you think of the overall package?
Larry Ellison: 24:36 Well, the overall package again, we solve a lot of problems. The the Object Relational unification is I think, one of the biggest deals from the logical level of the database that one of the biggest things we ever done well, the most important things we've ever done in terms of ending the debate, should it be a relational schema, or should it be an object schema? And the answer is simply yes, we need it needs to be both at the same time. because that that debate never ended. But now, you can have the best of both worlds we can, we can actually deliver that AI, as you say, is this extraordinary new technology. And one of the ways we know these, these models are trained on all the data in the internet, on the internet, all the language on the internet, and lots and lots of images as well. It's not just language that they train on. But they're not trained on your, your personal data, they're not trained on your corporate data, you're provided and not trained on a lot of medical records, because that data is private. But the great thing about our vector database that allows you that to supplement the training of the of the foundational model, whether it's check GPT, from from open AI, or, or grok, or what llama or what what have you, you can supplement the training of that model with, with your personal data with your corporate your proprietary corporate data with private data without, without having to disclose that data to the people who are building those models. So you can specialize the the AI for your company for you personally for a particular topic, like a certain type of cancer to make sure you've got all of that data in a single place. So so the model is smarter than it otherwise would have been. Because you can safely add your private data to that. So because you want that that data should be anonymized. The patient identifiers won't be there, if you're collecting a lot of cancer data, but the anonymized data will make much better recommendations to doctors about how to treat a cancer patient. Having that data there is going to make it means we've got better recommendations on how doctors treat cancer. And I think that's a very big deal or how in Wall Street, they'll want to and much better recommendations on what stock to buy next, you know, we're, we're we're focused on healthcare at Oracle building healthcare applications. But there are lots of applications for this technology and
Juan Loaisa: 27:16 All the new transaction capabilities, transparent caching, you know, distributed data. So I think it's it's not just another release, it's really kind of a monumental milestone in the industry.
Larry Ellison: 27:26 No, absolutely. I mean, AI. I mean, I remember I gave a presentation a little while ago is AI most important technology and the history of it. And I think I said I'm maybe worthless, find out very soon. And the answer is probably yes. Yeah.
Juan Loaisa: 27:41 Maybe Maybe transistors and AI. Those are the two big things. Okay, this has been a great conversation, Larry. Thank you.
Kendall Fisher: 27:51 Thank you so much. One and Larry, what an exciting conversation and hey, for all of you tuning in. If you want to learn more about Oracle Database 23 Ai, go to oracle.com/database by