12/19/24 - Episode 7 - HyperionAI
12/19/24 - Episode 7 - HyperionAI
Text Summary
Eric Snyder: Hello everyone and welcome to webinar number seven. I'm Eric Snyder, I'm the executive director of the University of Rochester Medical Center's Wilmot Cancer Institute and its technology and innovation team. Today I'm joined by Kevin Desousa, who is our principal developer for the Hyperion AI foundational platform we're about to show you.
We chose to present this platform next for several important reasons. Over the past six months especially I've observed a concerning amount of misinformation circulating about AI platforms and healthcare. Many individuals in this space will not intentionally misleading lack of deep technical expertise necessary to build out and fully understand the complexities of these systems. Unfortunately, their misconceptions are often shaping the narrative in healthcare. Having covered frequently repeated claims about AI that are something not true in these misconceptions while not malicious can have significant consequences when they influence decisions in such a critical space as healthcare. This issue is compounded by the fact that we often lack the kind of interdisciplinary expertise needed to navigate these challenges without extensive domain knowledge and AI and development. As well as an AI on ethics, the conversation can be dominated by individuals who well mean well don't have the required deep test of knowledge we are speaking about here, as well as vendors who frankly have largely dismantled their ethics teams. A particularly troubling trend I've noticed in healthcare AI is the growing enthusiasm adoption for enterprise GPT systems under the belief that they are secure and HIPAA compliant.
I want to caution everyone against this adoption when using these systems, you're still sending your data to third parties often multiple ones, such as open AI and Microsoft. We've already seen breaches involved in GPT several to be precise and the question becomes are we truly willing to take such risk with protected health information. So, in our group, the answer is a resounding no. That's why everything Kevin is about to show you runs completely on premise. This approach eliminates the need for costly architecture, as well as keeps security local and remaining fully scalable to the enterprise.
So, before I hand it over to Kevin, I want to take one more note of the exceptional talent he and the team bring to this project, having someone with his skills set embedded within an academic healthcare team is incredibly rare. As many of you, I'm sure are aware, the diversity of knowledge and talent on this team is truly extraordinary. So, with that, I'll step off my soapbox and hand it over to Kevin, our brilliant colleague from the north and Kevin take it away.
Kevin Desousa: Thank you so much, Eric. It feels great being brought back on this webinar series to do another talk.
With that in mind, I'll just share the screen really quick and kind of like what Eric was saying today, I'm going to be talking about Hyperion AI, Secure and Scalable AI solutions that healthcare can trust. Before I kind of get into it, I just wanted to take a moment to kind of go over what you can expect in today's presentation. First, we're going to start with kind of an overview and general purpose of Hyperion AI. What is it? What is it able to do? We're going to talk about some of Hyperion AI's key features, move on over to a demo of the system, and then talk about some key applications that we've used Hyperion AI for on our team. Then talk about some future enhancements and move it over to discussion.
What is HyperionAI?:
So, the first question that comes to mind, what is Hyperion AI or HAI for short? Well, Hyperion AI is our vision for a platform that offers high performance artificial intelligence on a secure on-premise environment. With the recent introduction of AI and these AI tech companies, we feel that it's growing more important to emphasize the need to keep AI local. To achieve this with Hyperion AI, we offer an interface that matches the current industry standard, similar to that of what Open AI offers. This allows our users to take tools and libraries that were originally designed for use with Open AI in mind and with one change, simply just to point to Hyperion AI, use them with our software. Currently, we offer three different types of models on our system. First is chat models or completions, and this is the kind of model that you typically interact with when you use something like a chat GPT for example. This is by far the most popular endpoint that users expect, and we offer it as well. Secondly, we offer transcriptions, which is a range of whisper models allowing for users to offer secure and local transcriptions of audio files. And then lastly, we support the embeddings, which is a technology that allows you to transform the uploaded documents into a format that LLMs or large language models can actually process and understand. This is also called retrieval augmented generation for those that understand what that means.
Speaking of models, Hyperion AI is both future proof and backwards compatible, and it's AI model support. The first these kind of make sense. You always want to be able to use the latest and greatest when it comes to AI model technologies when they come out.
But it's the latter that we find to be so important, backwards compatibility. With these A.I. vendors, it's all too common that whenever they introduce in models, which is good, that they also deprecate or remove their previous iterations of their models.
And while this doesn't sound bad initially, this can actually cause issues for those that are developing apps or programs with these A.I. tools in mind. For every model that gets removed, you now have to go back to your program or app and take the time, money, and the personnel to actually go and update the model that your program's targeting and typically when these models update or when new models get released, the kind of prompt engineering or the prompts they use with these models typically have to change because what you were using before does not apply to these new models and sometimes this can just be a larger lift. Whereas with Hyperion AI, you get the choice of whether or not you want to upgrade to new models or keep old models or just do both.
In addition, we've actually designed Hyperion AI to actually scale to your hardware. We found that in our experience, it's too common that institutions have very large data centers with a vast compute resource, but they still pay extreme amounts to A.I. vendors like OpenAI when they're capable of serving these models on premise. So, with Hyperion AI, we've designed it so that it can take care or take advantage of all these servers all automatically so that you can offer your AI models quickly and efficiently. Now, it's important to mention that we're able to support all this, all of what I just mentioned, while being on premise. And this is critical because it helps you increase your security by not sending your data to these A.I. vendors who, although claim HIPAA compliance due to their close source nature, we have no way to actually verify that information. Kind of like what I was talking about before, it also decreased costs because you're using onsite compute that already exists. You're not spending or you're not giving money to these external companies that have to pay for their compute and make a revenue profit on top of that. And then lastly, you have greater privacy because you know that your conversations, your chats, your audio files, they're not being sent to some company that who knows what their data policy looks like.
Features:
So, with all that being said, I wanted to give a quick little demo of the Hyperion AI system, and since Hyperion AI is a more foundational kind of background technology, the application that we actually chose to demo this with is an application we've designed internally called chat Hyperion AI. And the goal of the system is to kind of offer an interface that's similar to chat GPT, but for Hyperion AI. So, as you can see with the interface here, I can just kind of come in and I can type in a question so I can just say like "Hi, how are you"? And immediately the model will respond. And as you can see, the speed is quite fast due to our scaling technology. And if you were to come up into the top left, you can see all the different kinds of models that we support, and this list can grow as much as you want with models being available online. But the feature I think is the most cool is actually the document uploading capacity. So, let's say if I click this document uploading option, and let's say if I upload this social media guidelines document that I've taken from you of our social media page. I can upload this document right here and let that process just for a short second. And then once this document is uploaded, it'll actually be made available to these AI models using Hyperion AI. So then I can come in and say something along the lines of what are UR social media guidelines. And once I hit enter, the model will immediately respond and give that relevant information from me from the document, and over here, I also have a document that kind of documents laptop recommendations for new students. So, I can say what are some good laptops for new students, and again, our AI systems able to pull this relevant information and feed it to this LLM or this chatbot and be able to effectively generate a response. And finally, I can type something along the lines of draft a social media post to Twitter doc outlining these laptops for new students, and then what it'll do is take that relevant information that we just typed out, combine it with the social media document and draft a post that could be then sent externally. And this just highlights the power of Hyperion AI where we're able to offer all these different models. Let's say if this model isn’t performing as it should for this use case, we can easily just switch to one of the other ones that are available here, and we can intertwine this with document uploading, provide it rich context.
That being said, though, I kind of wanted to take a second to move away from the demo and actually highlight some of the applications of Hyperion AI that we have running locally and the first application I wanted to highlight is EOS and I actually gave a webinar talk of on EOS, a few webinars back. So, if you're interested in more information videos, feel free to watch that video. But to summarize EOS in one sentence, it's a multi-level data relationship platform. In this case, it's using Hyperion AI's models to allow for complex natural language queries. So if you look to the video on the right, you can actually see with the search bar on the top that depending on what the user will type, we actually query and filter the nodes on screen and this is using Hyperion AI to actually transform the user's natural language queries into SQL queries that are able to be processed by the database to show relevant information. The cool part about the system and how it was designed is that it's actually designed with a kind of prompt injection in mind where it is impossible for prompts to be injected into the system or for malicious users to pull information that otherwise would not be accessible to them.
Another example that we have is an application called Calliope designed by Eric Snyder, and this is a clinical transcription and analysis tool, and in this case, what he uses Hyperion AI for is for its transcription model support, and it uses these to actually process audio files and turn them into transcriptions securely on premise. So, you can see with this little screenshot on the right here that we're able to, given an audio file, transform it into a line by a line transcription. And then afterwards, actually generate some sentiment analysis through each of those and actually highlight key parts of the transcript with audio. So just another example of how we're able to leverage these different models that Hyperion is AI is able to offer all on premise.
And then lastly for today that I'm going to highlight is Theia, and this is a clinical trial match or tool designed and iterated on by Anna Brown, and in this case, this app utilizes Hyperion AI to actually summarize trials based on existing patient criteria. So due to the large data set that we have within Hyperion that contains patient and trial information, we're actually able to give this information to Hyperion AI using its chat models and have its kind of harsh through trials and determine key inclusion and exclusion criteria that might be relevant for a particular patient and then make that available to a user. Now, it's important to note with this application in mind that on this team, we are all big fans of human-in-the-loop machine learning, meaning that we don't have the machine make the final decision. We let the user make that decision at the end, but we use the machine to give the user all the information right in front of them. And a final inclusion that I will mention is that within Theia, we actually use Hyperion AI to generate an AI explanation that you can see at the very top there, just to kind of give a bite-sized snippet of all the information on the screen.
Future Enhancements:
So, before I end the talk for today, I kind of just wanted to take a second to go over some future enhancements of Hyperion AI and Chat Hyperion, or Chat Hyperion AI, and start what we're working on adding is model fine-tuning, and what this is effectively allowing users or researchers to, given a model that exists already on Hyperion AI, give it their data and train them out to be even better or even more accurate at that specific corpus or that specific data set. And we think this is so important to offer on Hyperion AI because while existing AI vendors do offer this support, you again have to send your data to their platforms, and especially with some tasks, it requires sending extremely sensitive information. So, we believe that this is a very important task that we offer on-premise so that we can ensure that these models are being trained with accurate and critical data but also make sure that they're safe at the same time.
Secondly, we're also looking to offer a model upload webpage, and this is a simple webpage that users with administrative permissions can kind of come to. And let's say you're an AI researcher and you've created some cool new AI model. This will allow you to upload that model so that you can make it available to your institution so that other researchers can use it freely without having to kind of jump through the hoops to use those models regularly.
Thirdly, we're planning on adding the moderation API. So, if you've used OpenAI systems, it's kind of that system within ChatGPT that'll say, "Hey, while you're inputting, I can't generate a response, maybe it's too insensitive." So, along those lines, we want to offer that within Hyperion AI as well.
And then lastly, we're looking to offer multimodal models. So those are models that, aside from just generating text, can generate things like images as well.
Then moving on over to the Hyperion and chat side, we're looking to add PHI censoring. So, given a user's inputted information, if we deem it to be too sensitive for the user to be sharing on a platform such as this, have the system to automatically kind of deny and reject and alert the user. "Hey, maybe you should be uploading such sensitive information." Then we're also working on text to query through Hyperion, and what this is since Hyperion captures such vast medical data, we're looking to see if we can use AI or Hyperion AI, in this case, to actually help generate some queries for maybe more novice researchers that are looking to kind of query the system and gain some support with that.
And then lastly, auto-loading information from the web. And if you've used ChatGPT before, this will be fairly familiar for you, but this is allowing the model to actually go to the internet to pull relevant information should it deem it necessary to answer your specific question.
Well, with that being said, before I end it off today, I just wanted to highlight the group that I'm a part of: the Technology and Innovation Group. And if you're interested about anything I talked about today, whether it be Hyperion AI or the team itself, feel free to scan any of these QR codes on screen now. And otherwise, thank you.
Eric Snyder: Thanks, Kevin. So, we have several questions. I just want to say really quickly, too, as the designer of Calliope there how easy and fast it is to use Hyperion AI as a foundational platform, that took less than a day for me to design that because I'm just making API calls to Kevin's system. So, some of these questions are sort of in the same vein, so I'll combine some. But let's start with:
Q/A Session:
Q: Does the platform log usage from the API routes?
A: It does, but with that specific kind of question in mind, it's actually where we had some careful design work because one part of designing Hyperion AI, again, was kind of making sure everything was secure and private. So within Hyperion AI, we've by design made it such that with logging in mind, we do log the user that made the request and what type of request, so whether it's an audio model or whether it's a chat model. But by design, we don't request or we don't log the request contents. That being said, if this was to be deployed in a certain institution and it was extremely critical, let's say with working with medical data, for example, that they actually logged the type of request, we can enable that. Just currently, we have it disabled just for user privacy.
Q: I'm in a different space and I saw this webinar through a post, but I could see this type of platform having impact where I am in manufacturing. Is this centered only for healthcare, or would this be easy to move to other industries?
A: That's a perfect question. With Hyperion AI, although we focus a lot on healthcare because that's the current space that we are working in, Hyperion has actually been created with other industries in mind. We've designed it purposely to be as portable as possible, so you could take it and plant it in whatever industry you find suitable and especially with that model fine-tuning feature we're working on, we want to make it super easy for you to train these AI models for your industry. So let's say manufacturing, for example, if you have a lot of manufacturing documents, we want it to be easy for you to upload these documents and have these models fine-tune for your use cases.
Q: How are you controlling access to the platform and then to the applications you just showed that are using the integration?
A: That's a good question. It's actually kind of funny because it's multi-part, but with Hyperion AI, its gating is using a platform that we created called Hyperion Author or Hyperion Auth for short. Within Hyperion as a foundational platform, we're able to gain access to individual applications, but we also have added support to gate individual access for AI within those applications, so should a user not be allowed to, say, utilize AI support they will be immediately barred from using any one of those applications.
Q: So, my center is currently available in a GPT-like clone, but it's expensive and time-consuming and isn't on-premise anyway. So I appreciate this presentation even more, especially the on-premise comments. Can you give me an idea of how long it took you to build this out?
A: This is actually going to be the funny part. Truthfully, like as with any software, and as I'm sure many developers can attest to, the initial kind of hump in creating software is usually the quickest. It's usually having to fine-tune to make sure it works in certain scenarios and like bug-checking, that sort of thing that takes longer. But I would say, as far as the initial development of this platform, we had these three models, of course, excluding the bug fixing, we had it up within less than a month. I would say one to two weeks. More time, I would say, gets focused on making sure it's compatible with all models out there. But the initial kind of basics of it, where it was able just to kind of take in requests, send it to whatever model that's deemed necessary and respond, that was done within a week or two. More complexity came from bug fixing and scalability, making sure it adapts to several systems and load balances evenly. That took a little bit more time.
Q: If I don't know programming, do you think I could figure this out? I think what we're referring to is the API platform.
A: It's actually funny that you mention that because I believe yes, you can. And the reason I say that is because there's been actually a lot of tools, and I can't put the name of them off the top of my head, but there's been a lot of tools created for OpenAI-styled frameworks and interfaces. And since we're using a framework that is extremely similar, again, almost drop-in compatible with these applications, as long as you just point it to the system, you can take advantage of these no-code environments or things like using the chat models or the AI models, and building on that, we've actually been working on updating the chat app because we want it to be kind of the initial landing page for people to interact with Hyperion AI that maybe don’t have a programming background. So we want to be able to offer audio transcriptions, chatting, and then like any other model fine-tuning. Once we have that, we want that to be the main landing page, but you can take external tools that have been built for OpenAI's API and use them with the system.
Q: Is there a limit to how many queries I can enter, like with GPT?
A So this is where we think we are able to offer a bigger leg up compared to something like ChatGPT, whereas with any model, you do run into issues with context window size—how many tokens, how many words you can kind of feed the model. But with OpenAI systems, unless you pay the subscription fees, which are growing more and more expensive, you are limited by how many tokens or how many words you can actually send to these models, whereas with this system, there is no cap. You're only limited by the model's capacity itself and for things like newer models that are coming out, like Llama 3.2 or Llama 3.3, we're starting to see token window sizes of 130,000 tokens, which allow you to handle entire, say, code bases of raw code and your additional queries on top of it and still be within that window. So that's where I think we're able to offer a kind of leg up compared to some of our other competition.
Eric Snyder: Sounds good. So, I think a lot of these are somewhat similar to each of them, and we'll end up posting all of them regardless on the website in the next couple of days. It usually takes us about three days depending on what's going on, but I think I'll stop there because we are way over. We try to keep these pretty short, but anything that I've missed will be posted.
Any questions that I've missed will be posted on the website in the next three days, but thanks, Kevin, for showing the amazing platform. I think that foundational platform is going to be pretty awesome. I mean, it allows for such rapid development as you saw with Callipe and others, and Theia, and we have several more that are going to be using it in the next couple of months.
So I'll close it there. This will all be posted on the website in the next three days with any questions I didn't answer, and thanks everyone for joining.
Kevin Desousa: Thank you, everyone.