Artificial Intelligence & Machine Learning - Verilogue

VERICORPUS

Verilogue’s A.I. Training Database ( ASR & NLP applications )

Since 2006, Verilogue has recorded, transcribed, and archived over 150,000 healthcare based conversations between patients and their physicians in the exam room. With over 150 disease states across over 50 different physician specialties, we have amassed the largest healthcare dialogue database in the world. Our entire dataset, “The VeriCorpus,” has been structured for the development of new machine learning and artificial intelligence applications. The Vericorpus offers anonymous audio files associated with timestamped verbatim transcripts along with a series of data points from the patient chart data for model development and training purposes.

The Building Blocks of Natural Language Processing & Automated Speech Recognition programs start here.

Over 1,000,000 Minutes of Audio

Each audio package is transcribed verbatim and timestamped to it’s associated audio (mp3 / mp4) file.

Across 12 Countries and 7 Languages

Conversations are captured in the natural language of participants and a verbatim transcription and translation is available for each recording.

Over 50 Physician Specialties

We have worked with physicians all over the world to capture and upload both simple and complex patient conversations.

Publications & Medical Journals Featuring Verilogue Data:

Discussing Out-of-Pocket Expenses During Clinical Appointments: An Observational Study of Patient-Psychiatrist Interactions

High out-of-pocket expenses for medical treatment have been associated with worse quality of life, decreased treatment adherence, and increased risk of adverse health outcomes. Treatment of depression potentially has high out-of-pocket expenses. Limited data characterize psychiatrist-patient conversations about health care costs.

Finding Needles in the Right Haystack: Double Modals in Medical Consultations
While naturally-occurring double modals have been exceedingly rare in sociolinguistic interviews, our study represents the very first corpus investigation of double modals through a search of the right ‘haystack’: the nationwide Verilogue, Inc database of recorded and transcribed physician-patient interactions (~85 million words). As a vast source of potentially face-threatening negotiations, the Verilogue corpus provides the ideal speech situation in which to search for low frequency, non-standard syntactic features like the double modal.

Disordered Thought, Disordered Language: A corpus-based description of the speech of individuals undergoing treatment for schizophrenia

The characteristics of patient speech are used in clinical settings to make assumptions about the thought processes of people with psychotic disorders such as schizophrenia. However, there have not been any studies of the language of people with schizophrenia that present evidence drawn from a large group of speakers. This study employs a combination of quantitative and qualitative methods to determine whether 140 medicated individuals diagnosed with schizophrenia exhibit the linguistic abnormalities claimed in the literature.