The World of Language Models in healthcare!
Unlocking the Future: How Language Models are Revolutionizing Healthcare
Section 1: What Makes LLMs Possible?
1. Learning on Its Own - Self-supervision
How is this possible. It starts with a simple idea of Self-supervision training, a learning approach in which machine learning model learns without relying on explicitly labeled data as examples. Instead, the model generates its own training objective from the input data without the need for human-annotated data, which can be time-consuming and expensive to produce..
A common type of self-supervision is in the form of an autoregressive training objective, in which the model is trained to predict the next word or token in a sequence, given the previous words or tokens. The training objective is to maximize the likelihood of the correct word given the context.
2. Building Language Skills - Transformers
Training in this manner is often the first stage in training LLMs (generative pre-trained transformer GPT) and helps the model learn language structure, grammar, and semantics. Transformer is a deep neural network architecture that is designed to be efficient at capturing relationships and dependencies between elements in a sequence, such as words in a sentence.
Section 2: Training and Tuning LMs for Medical Applications
3. Getting Better at Specific Jobs - Tuning
Tuning refers to the process of adapting a pre-trained LLM (GPT) to perform well on a specific task or domain. This process involves training the model on a smaller labeled dataset that is specific to the target task. (During this process, the model’s weights and parameters are updated using pertinent examples to optimize its performance on the target task.)
The GPT-3 model was trained on 45 terabytes of text data comprising roughly 500 billion tokens (1 token is approximately 4 characters or three-fourths of a word for English text). In comparison GPT 4 was trained on 170 Trillion tokens.
4. Learning to Follow Instructions - Instruction Tuning
Instruction turning Refers to a kind of tuning in which training is done on a dataset containing pairs of instructions and corresponding desired outputs or responses.
Section 3: Medical LLMs in Action
5. Medical Language Models - Helping Doctors
Although general-purpose LLMs can perform many medically relevant tasks, they have not been exposed to medical records during self-supervised training and they are not specifically instruction tuned for any medical task. Medical records can be viewed as consisting of sequences of time-stamped clinical events represented by medical codes and textual documents, which can be the training data for a language model. Wornow et al8 reviewed the training data and the kind of self-supervision used by more than 80 medical language models and found 2 categories. https://www.nature.com/articles/s41746-023-00879-8
6. Medical Documents:
First, there are medical LLMs that are trained on documents. The self-supervision is via learning to predict the next word in a textual document, such as a progress note or a PubMed abstract, and conditioned on prior words seen. Therefore, these models are similar in their anatomy to general purpose LLMs (eg, GPT-3), but are trained on clinical or biomedical text.
researchers at the Center for Research on Foundation Models at Stanford University created a model called Alpaca with 4% as many parameters as OpenAI’s text-davinci-003, matching its performance at a cost of $600 to create. https://github.com/tatsu-lab/stanford_alpaca
7. Medical Codes:
Second, there are medical LLMs that are trained on the sequence of medical codes in a patient’s entire record that take time into account. Here, the self-supervision is in the form of learning the probability of the next day’s set of codes, or learning how much time elapses until a certain code is seen. As a result, the sequence and timing of medical events in a patient’s entire record is considered.
As a concrete example, given the code for “hypertension,” these models learn when a code for a stroke, myocardial infarction, or kidney failure is likely to occur. When provided with a patient’s medical record as input, such models will not output text but instead a machine understandable “representation” of that patient, referred to as an “embedding,” which is a fixed-length, high-dimensional vector representing the patient’s medical record. Such embeddings can be used in building models for predicting 30-day readmissions, long hospital lengths of stay, and in-patient mortality using less training data (as few as 100 examples). https://www.sciencedirect.com/science/article/pii/S1532046420302653?via%3Dihub