Hippocratic AI Launches as First Safety-Focused Model for Healthcare

Hippocratic AI Launches to Build Safety-Focused Large Language Model for Healthcare

The company’s LLM has passed 100+ healthcare certifications and exceeded GPT-4 and other commercial models’ performance on those same benchmarks. The company has also developed a novel benchmark measuring the bedside manner of large language models to ensure emotional well-being of patients

Hippocratic AI launched out of stealth to announce the industry’s first safety-focused Large Language Model (LLM) designed specifically for healthcare, as well as a $50M seed round co-led by General Catalyst and Andreessen Horowitz.

Large language models (LLMs) and Foundation Models (FMs) like ChatGPT and GPT-4 have surprised the world with their abilities. While researchers have shown that these AI models can pass the USMLE (US Medical Licensing Exam), no company has built a commercial model specifically tuned for healthcare applications. Hippocratic AI is building the first LLM for Healthcare with an initial focus on non-diagnostic, patient-facing applications. This will allow the company to ensure patient safety while improving healthcare access and outcomes.

"The healthcare industry needs its own AI platform, one that is focused on empowering the workforce, reducing burnout, and improving patient safety and experiences with the healthcare system. We joined forces with the Hippocratic AI team, our health assurance ecosystem, and the a16z team to build this platform. Our goal is to fundamentally increase the supply and scalability of healthcare professionals. This is the key to achieving the health assurance vision: a more proactive, more affordable, and equitable system of care for all," said Hemant Taneja, CEO and Managing Director at General Catalyst.

Hippocratic AI was founded by a group of physicians, hospital administrators, Medicare professionals, and artificial intelligence researchers from El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, UPenn, Google, and Nvidia.

"After working with Munjal and team for years in his prior company, we know that his lived experience as a healthcare and tech operator gives him an edge in understanding what it takes to bring high-ROI products to market - especially at a time when existing industry players are in such dire need of better operating leverage and financial sustainability. We believe Hippocratic AI’s cross-disciplinary, safety-first approach is what the healthcare industry needs to be able to maintain trust in the power of responsible deployment of generative AI solutions," said Julie Yoo, General Partner at Andreessen Horowitz.

To build a safer large language model the company has focused on three main things: certification, RLHF via healthcare professionals, and bedside manner.

Certification

Passing the USMLE is not enough to ensure a model is ready for the wide variety of healthcare roles that exist in care and payor settings. Therefore, Hippocratic AI focused on testing its model on a wide variety of 114 healthcare certifications and roles. The company also strived to not just get a passing score but to outperform existing state-of-the-art language models such as GPT-4 and other commercially available models. The company was able to outperform GPT-4 on 105 of the 114 tests and certifications, outperform by 5% or more on 74 of the certifications, and outperform by 10% or more on 43 of their certifications. Below are some sample results. Full results here: (www.HippocraticAI.com/benchmarks)

	Name	Commercial LLM #1	Commercial LLM #2	GPT-4	Hippocratic	Δ Improvement vs Best Competitor
NAPLEX	North American Pharmacist Licensure Examination	51.0%	0.0%	70.9%	91.1%	20.2%
NCLEX-RN	Registered Nurse	58.8%	25.8%	76.2%	88.6%	12.4%
CPNP-AC	Acute Care Certified Pediatric NP	64.0%	22.0%	86.7%	96.0%	9.3%
CPC	Certified Professional Coder	54.7%	50.0%	65.3%	71.0%	5.7%
ABOG	American Board of Obstetrics and Gynecology Licensing Exam	44.00%	24.00%	80.30%	92.33%	12.03%
ABU	American Board of Urology - Licensing Exam	42.09%	24.24%	67.30%	77.10%	9.80%
Hospital Safety Training	Hospital Safety Training Compliance Quiz	39.4%	27.3%	48.5%	72.7%	24.2%
RD	Registered Dietician	57.1%	46.9%	71.4%	83.7%	12.3%
CLC	Certified Lactation Consultant	60.9%	51.7%	79.3%	98.9%	19.6%
CPCO	Certified Professional Compliance Officer	60.7%	54.0%	67.3%	86.0%	18.7%

RLHF with Healthcare professionals

Hippocratic AI has decided that the best people to determine LLM readiness for deployment in the healthcare system are the experts who serve in that role in today’s system. In large language models, there is a technique to mold the AI using human feedback: Reinforcement Learning with Human Feedback (RLHF). Many believe this technique is what led to the remarkable performance of ChatGPT compared to that of prior versions of OpenAI’s language models.

In building Hippocratic AI, the company has engaged healthcare professionals to help guide and train the LLM by rating its responses.

“RLHF with healthcare professionals isn’t just a feature but is really our commitment to partner deeply with the industry,” said Munjal Shah, Co-Founder and CEO of Hippocratic AI. “We aren’t just saying these professions will help us evaluate our system. We are saying we won’t launch each unique role for the LLM unless the professionals who do that exact task today agree the system is ready and safe.”

Some of the roles and tasks the company is exploring include patient navigator, dietician, genetic counselor, enrollment specialist, medication reminders, and more.

Bedside Manner

“In healthcare settings, it isn’t just important to answer the patient accurately. It is equally important that it is done with great bedside manner. Many studies have shown that bedside manner impacts emotional well-being and quality of outcomes. This isn’t just true for doctors but also true for everyone interacting with patients: billing agents, schedulers, and more,” said Meenesh Bhimani MD, Co-Founder and Chief Medical Officer of Hippocratic AI.

To date there are no benchmarks for evaluating the bedside manner of a language model when interacting with patients. Hippocratic AI will be releasing the first of many bedside manner benchmarks for the entire community to use. Below are the initial results the company has achieved against these benchmarks.

Name	Commercial LLM #1	GPT-4	Hippocratic	Δ Improvement vs Best Competitor
Shows Empathy	30.0%	68.3%	75.0%	6.7%
Shows care and compassion	43.3%	75.0%	85.0%	10.0%
Making Patient feel at ease	5.0%	29.2%	57.5%	28.3%
Taking a personal interest in patient’s life	33.3%	63.3%	70.0%	6.7%
Helps patient take control	35.0%	61.7%	65.0%	3.3%

Hippocratic AI will use language models to massively increase healthcare access, reduce costs, and close the healthcare skills gap left behind by the global pandemic. Large language models are one of the best new ways to achieve this, but it has to be done in a safe way and tuned for the healthcare industry.

Source: Hippocratic AI media announcement

AI-driven orchestration for real-time BSS

Follow @PipelineWire