Read more
In recent years, LLMs (such as Claude, DeepSeek, Llama and other transformer-based models) have emerged as powerful tools in chemistry, enabling new approaches to scientific discovery. While many chemists, from undergraduate students to researchers find these AI models interesting, they may lack certain knowledge base to better integrate these tools into their daily research.
Large Language Models for Chemists breaks down that barrier by demystifying how LLMs work in an accessible way and showing, step-by-step, how they can be applied to solve real chemistry problems. Written in a friendly, tutorial style, the book assumes only a basic background in chemistry and minimal programming experience. It begins by gently introducing artificial intelligence and machine learning concepts in lay terms, building up to the inner workings of LLMs without heavy math. Readers will learn how these models "think" and generate text, gaining an intuitive understanding of concepts like neural networks, transformers, and training data using analogies and simple diagrams. Crucially, each concept is reinforced with chemistry-focused examples - from understanding chemical nomenclature and reactions as a "language," to exploring how an LLM can suggest synthetic routes or explain spectral data.
Beyond theory, this book emphasizes practical application. Each chapter includes hands-on tutorials and case studies that invite readers to experiment with real tools. Using open-source libraries (such as RDKit for cheminformatics and standard Python machine learning frameworks), readers will walk through projects like predicting molecular properties with the aid of an LLM, generating novel compound ideas, analyzing research papers, and even using an LLM as a conversational chemistry assistant. For example, one case study guides the reader in using an LLM to mine a chemistry literature database and then write Python code to analyze reaction trends, mirroring cutting-edge research where LLMs assist in code generation and data mining for chemical discovery.
List of contents
1. Introduction - AI's Evolving Role in Chemistry 2. How to Start with Data-Driven Chemistry? 3. Foundations of AI and Tools for Chemists 4. Large Language Models in Chemistry 5. Literature and Knowledge Mining with LLMs 6. Generative Models for Molecule and Materials Design 7. LLMs and Automation 8. Ethical Considerations and Future Perspectives
About the author
Zhiling "Zach" Zheng is an Assistant Professor of Chemistry at Washington University in St. Louis, where he directs the Deep Synthesis Lab. His group combines artificial intelligence and automation to accelerate the discovery of porous materials for sustainability and human health. In addition to investigating fundamental aspects of metal-organic framework (MOF) synthesis and new structures, he explores how large language models can aid data mining, reaction and material design, and synthesis planning.
Before joining WashU, Dr. Zheng was a BIDMaP Fellow at UC Berkeley's Department of Electrical Engineering and Computer Sciences from 2024 to 2025 and a postdoctoral researcher in the MIT Department of Chemical Engineering from 2023 to 2024 under the supervision of Professor Klavs Jensen. He earned his Ph.D. in Chemistry at the University of California, Berkeley, in 2023, working in Professor Omar Yaghi's laboratory on MOFs for atmospheric water harvesting. He holds a B.A. in Chemistry, summa cum laude, from Cornell University (2019), where he worked with Professor Kyle Lancaster.
Dr. Zheng's contributions have been recognized with the 2025 Carbon Future Young Investigator Award and the Inflection Award for AI-driven climate solutions. He was also a finalist for the Dream Chemistry Award.