Top Large Language Model readings of the Week: Applications in Politics, Healthcare, Economics, and more...



As the power and reach of large language models (LLMs) continue to grow, so too does their potential to reshape various fields, from healthcare to central banking. This week’s recommended readings provide valuable insights into how LLMs are applied across diverse sectors, including politics, medical record analysis, quantitative economics, and more. These carefully curated articles and papers explore real-world applications, technical innovations, and the implications of LLM usage in complex environments, making them essential for anyone interested in the current and future role of AI-driven language models.

1. The Politics of AI: An Evaluation of Political Preferences in Large Language Models from a European Perspective

In this insightful report, the author address the limitations in understanding AI's political tendencies by conducting a unique set of experiments. He prompted 24 top-performing language models to generate detailed, open-ended answers to politically sensitive questions. This analysis offers a fascinating look into the responses generated by these models, shedding light on potential biases and tendencies in AI language generation. If you're curious about the political dimensions of AI, this report provides essential findings and context (Centre for Policy Studies).

2. Quantitative Economics with Deep Learning

We argue that deep learning provides a promising avenue for taming the curse of dimensionality in quantitative economics. We begin by exploring the unique challenges posed by solving dynamic equilibrium models, especially the feedback loop between individual agents’ decisions and the aggregate consistency conditions required by equilibrium. Following this, we introduce deep neural networks and demonstrate their application by solving the stochastic neoclassical growth model. Next, we compare deep neural networks with traditional solution methods in quantitative economics. We conclude with a survey of neural network applications in quantitative economics and offer reasons for cautious optimism (Galo Nuño).

3. CB-LMs: Language Models for Central Banking

We introduce central bank language models (CB-LMs) - specialised encoder-only language models retrained on a comprehensive corpus of central bank speeches, policy documents and research papers. We show that CB-LMs outperform their foundational models in predicting masked words in central bank idioms. Some CB-LMs not only outperform their foundational models, but also surpass state-of-the-art generative Large Language Models (LLMs) in classifying monetary policy stance from Federal Open Market Committee (FOMC) statements. In more complex scenarios, requiring sentiment classification of extensive news related to the US monetary policy, we find that the largest LLMs  outperform the domain-adapted encoder-only models. However, deploying such large LLMs presents substantial challenges for central banks in terms of confidentiality, transparency, replicability and cost-efficiency (BIS).

4. LLMD: A Large Language Model for interpreting Longitudinal Medical Records

We introduce LLMD, a large language model designed to analyze a patient's medical history based on their medical records. Along with domain knowledge, LLMD is trained on a large corpus of records collected over time and across facilities, as well as tasks and labels that make nuanced connections among them. This approach is critical to an accurate picture of patient health, and has distinctive advantages over models trained on knowledge alone, unlabeled records, structured EHR data, or records from a single health system.

The recipe for LLMD continues pretraining a foundational model on both domain knowledge and the contents of millions of records. These span an average of 10 years of care and as many as 140 care sites per patient. LLMD is then instruction fine-tuned on structuring and abstraction tasks. The former jointly identify and normalize document metadata, provenance information, clinical named-entities, and ontology mappings, while the latter roll these into higher-level representations, such a continuous era of time a patient was on a medication. LLMD is deployed within a layered validation system that includes continual random audits and review by experts, e.g. based on uncertainty, disease-specific rules, or use-case.

LLMD exhibits large gains over both more-powerful generalized models and domain-specific models. On medical knowledge benchmarks, LLMD-8B achieves state of the art accuracy on PubMedQA text responses, besting orders-of-magnitude larger models. On production tasks, we show that LLMD significantly outperforms all other models evaluated, and among alternatives, large general purpose LLMs like GPT-4o are more accurate than models emphasizing medical knowledge. We find strong evidence that accuracy on today's medical benchmarks is not the most significant factor when analyzing real-world patient data, an insight with implications for future medical LLMs (arXiv).

5. Measuring short-term factuality in large language models

We present SimpleQA, a benchmark that evaluates the ability of language models to answer short, fact-seeking questions. We prioritized two properties in designing this eval. First, SimpleQA is challenging, as it is adversarially collected against GPT-4 responses. Second, responses are easy to grade, because questions are created such that there exists only a single, indisputable answer. Each answer in SimpleQA is graded as either correct, incorrect, or not attempted. A model with ideal behavior would get as many questions correct as possible while not attempting the questions for which it is not confident it knows the correct answer. SimpleQA is a simple, targeted evaluation for whether models “know what they know,” and our hope is that this benchmark will remain relevant for the next few generations of frontier models (Open AI).

This week’s selections offer a glimpse into the transformative power of large language models across different fields, each application bringing its own unique challenges and breakthroughs. Whether you're interested in enhancing political data analysis, streamlining medical records, improving economic forecasting, or understanding LLM factuality in dynamic environments, these readings provide an informative foundation. As LLMs continue to evolve, keeping up with these applications will be crucial in understanding the broader implications of AI on society. Stay tuned for more insights and stay ahead in this rapidly advancing landscape.

Feel free to follow me on X

Comentarios

Entradas populares de este blog

¿Qué significan los números en el triángulo de reciclaje de los plásticos?

Metallica versus Megadeth ¿quien es mejor? la estadística nos da la respuesta

Los programas más usados por economistas