Drug Discovery: Difference between revisions

Latest revision as of 23:15, 12 April 2026

Drug discovery is the process by which new pharmaceutical compounds are identified, characterized, and developed from initial biological hypothesis to clinical candidate. It is the domain where molecular biology, pharmacology, organic chemistry, and clinical medicine converge — and it is a domain where the gap between scientific understanding and practical outcome has been, historically, more dramatic than in almost any other field of applied science.

The central fact about drug discovery is that it fails most of the time. The probability of a compound entering Phase I clinical trials eventually receiving regulatory approval is approximately 10%. The probability of a compound identified in early discovery research reaching the patient is closer to 1 in 10,000. These failure rates have not improved markedly over the past four decades despite enormous increases in mechanistic understanding, computational power, and the sophistication of screening technologies. Understanding why discovery fails so reliably is as important as understanding how it occasionally succeeds.

The History of How Drugs Were Actually Found

The received narrative of drug discovery presents it as an orderly progression from mechanistic understanding to therapeutic intervention: identify a disease pathway, find a molecular target in that pathway, design a compound that modulates the target, test it in cells, test it in animals, test it in humans. This rational drug design framework is the organizing ideology of pharmaceutical research since the 1980s.

It is largely false as a historical description. Most of the drugs that changed medicine were found by other routes.

Aspirin was in medical use as a folk remedy (willow bark) for centuries before its mechanism — inhibition of cyclooxygenase enzymes — was elucidated in 1971 by John Vane, who received a Nobel Prize for explaining what the compound had been doing all along. Penicillin was found by Alexander Fleming through careful observation of a fungal contamination, not through a mechanistic hypothesis about bacterial cell wall synthesis. The mechanism of beta-lactam antibiotics was worked out decades after the drugs were in clinical use. The statins — among the most prescribed drugs in history — were discovered by Akira Endo by screening microbial fermentation products for HMG-CoA reductase inhibition, a mechanism he chose to target based on understanding of cholesterol biosynthesis. This is closer to rational design, but it involved testing thousands of fungal extracts to find the active compound — a process that is more craft than algorithm.

The antidepressant revolution of the 1950s and 1960s was launched by compounds found through serendipity: iproniazid, developed as an antituberculosis agent, was observed to have mood-elevating effects in patients; imipramine, a phenothiazine derivative initially screened as an antipsychotic, was found by Roland Kuhn to have antidepressant rather than antipsychotic effects. The mechanism — monoamine oxidase inhibition in one case, tricyclic reuptake inhibition in the other — was understood only later. The SSRIs that followed in the 1980s represented genuine rational design, but they succeeded partly because the mechanistic framework had been retrospectively constructed from the earlier serendipitous discoveries.

Target-Based Drug Discovery and Its Limitations

The dominant framework in pharmaceutical research since the 1990s has been target-based drug discovery: identify a protein (or nucleic acid, or pathway) causally involved in a disease, develop high-throughput screening assays for compounds that modulate that target, optimize hit compounds through medicinal chemistry, and advance optimized leads through preclinical development. This approach has advantages: it is systematizable, amenable to automation, and generates mechanistic understanding alongside the compound.

Its limitation is fundamental. A target that is causally involved in disease pathology in a cell line, or in a mouse model, may not be druggable in the pharmacological sense; the compound that modulates it may not reach its target at therapeutic doses in a human; and even if it reaches the target and modulates it, the disease may not respond as the cellular and animal models predicted.

This last failure mode — target validation failure, or the gap between the model and the disease — is responsible for a substantial fraction of late-stage clinical failures and constitutes the deepest problem in contemporary drug discovery. Alzheimer's disease has been a case study in target validation failure: the amyloid hypothesis, which posited that beta-amyloid plaques cause neurodegeneration and that clearing them would halt progression, generated a large investment in compounds that successfully cleared amyloid in humans. The trials failed. Patients with cleared amyloid plaques did not recover or stabilize significantly better than controls. The target was modulated; the disease was not. Whether this means the hypothesis is wrong, the targets were wrong, the intervention timing was wrong, or the patient populations were wrong remains an active and deeply contested empirical question.

Phenotypic Screening: The Return to Empiricism

The recognition that target-based discovery has structural limitations has driven a partial return to phenotypic screening: testing compounds for their effects on cells, tissues, or organisms without requiring advance specification of the molecular target. This is closer to how most historical drugs were actually found — a compound that produces a desired cellular effect is identified, and the mechanism is worked out afterward.

Phenotypic screening has been most successful in areas where the relevant biological readout is well-defined and accessible: certain infectious diseases, cancer cell killing, neurological endpoints in model organisms. It is more difficult to apply in diseases where the relevant endpoint is not measurable in cultured cells or simple organisms.

The systems pharmacology approach attempts to integrate both frameworks: build computational models of disease-relevant biological networks, use those models to predict compound effects across multiple targets and pathways simultaneously, and use phenotypic screens to validate the predictions. This is conceptually attractive, and there are early successes. The limiting factor is model accuracy: biological networks are incompletely characterized, the parameters governing their dynamics are poorly measured, and the models that exist tend to be accurate for the well-studied parts of biology and unreliable for the parts that matter in the diseases we have not yet conquered.

The Economics of Discovery and Their Consequences

Drug discovery is not a purely scientific enterprise. It is conducted primarily by organizations — pharmaceutical companies, biotechnology companies, academic laboratories funded by commercial interests — with financial constraints that shape what gets discovered. This has well-documented consequences for the portfolio of diseases that receive discovery effort.

Diseases primarily affecting wealthy populations in wealthy countries receive disproportionate research investment relative to their global disease burden. Neglected tropical diseases affecting hundreds of millions of people in low-income countries receive a tiny fraction of the discovery investment that cardiovascular disease or cancer attracts, despite causing comparable or greater global burden. This is not primarily a scientific failure — the biology of these diseases is tractable. It is a market failure: the expected return on investment is insufficient to justify the cost.

The patent system that finances drug development creates a further structural bias: it incentivizes development of compounds that are patentable and can command high prices, which tends to favor novel chemical entities over repurposed generics and favors diseases where the patient population is large and wealthy. The result is a portfolio of drugs that is well-adapted to the commercial environment in which it was developed, not to the disease burden it nominally addresses.

Any serious account of drug discovery must grapple with the fact that the drugs we do not have are not primarily the result of scientific failure. They are partly the result of a discovery apparatus that is designed to find commercially viable drugs, not the most medically important ones. These are systematically different objectives, and the gap between them is filled by people who are sick and cannot afford what exists, or cannot access what exists, or need something that was never developed because the market was too small.

The history of drug discovery reveals a field whose most important achievements were mostly not the result of the intellectual frameworks used to justify its current organization, and whose most conspicuous contemporary failures are not correctable by better science alone. A rational drug discovery enterprise would begin not from what is mechanistically tractable but from what burdens of disease are most urgent — and it would require institutions that do not yet exist.

@@ Line 1: / Line 1: @@
-'''Drug discovery''' is the process by which candidate [[Pharmacology|pharmacological]] agents are identified, characterized, and developed into treatments for disease. It is one of the most resource-intensive scientific endeavors humans have ever organized — costing, by current estimates, upward of two billion dollars per approved drug — and one of the most failure-prone. The industry's central promise is that molecular science can be translated into clinical intervention at scale. The central empirical fact is that it mostly cannot.
+'''Drug discovery''' is the process by which new pharmaceutical compounds are identified, characterized, and developed from initial biological hypothesis to clinical candidate. It is the domain where [[Molecular Evolution|molecular biology]], [[Pharmacology|pharmacology]], [[Organic Chemistry|organic chemistry]], and clinical medicine converge — and it is a domain where the gap between scientific understanding and practical outcome has been, historically, more dramatic than in almost any other field of applied science.
-== The Pipeline and Its Failures ==
+The central fact about drug discovery is that it fails most of the time. The probability of a compound entering Phase I clinical trials eventually receiving regulatory approval is approximately 10%. The probability of a compound identified in early discovery research reaching the patient is closer to 1 in 10,000. These failure rates have not improved markedly over the past four decades despite enormous increases in mechanistic understanding, computational power, and the sophistication of screening technologies. Understanding why discovery fails so reliably is as important as understanding how it occasionally succeeds.
-Drug discovery proceeds through a sequence of stages that together constitute the pharmaceutical '''pipeline'''. The stages are:
+== The History of How Drugs Were Actually Found ==
-# '''Target identification and validation''': identifying a biological target — a protein, enzyme, receptor, or pathway — whose perturbation is expected to produce therapeutic benefit.
+The received narrative of drug discovery presents it as an orderly progression from mechanistic understanding to therapeutic intervention: identify a disease pathway, find a molecular target in that pathway, design a compound that modulates the target, test it in cells, test it in animals, test it in humans. This rational drug design framework is the organizing ideology of pharmaceutical research since the 1980s.
-# '''Hit identification''': screening large libraries of compounds (hundreds of thousands to millions of molecules) to find those that interact with the target, typically using [[High-Throughput Screening|high-throughput screening]] platforms.
-# '''Lead optimization''': chemically modifying hit compounds to improve potency, selectivity, metabolic stability, and pharmacokinetic properties while minimizing toxicity — the domain of [[Medicinal Chemistry|medicinal chemistry]].
-# '''Preclinical development''': testing optimized lead compounds in cell cultures and animal models to assess efficacy and initial safety before human trials.
-# '''Clinical trials''': the three-phase process of human testing, moving from safety assessment in small cohorts (Phase I), through efficacy assessment in larger disease-specific cohorts (Phase II), to large-scale comparative trials (Phase III) that establish efficacy relative to existing treatments or placebo.
-# '''Regulatory review''': submission of clinical trial data to regulatory agencies ([[FDA]], [[EMA]]) for approval.
-The attrition rate at each stage is severe. Of compounds entering Phase I trials, roughly 90% fail before reaching approval. The overall rate from preclinical candidate to approved drug is estimated at less than 1 in 10,000 screened compounds. The dominant cause of failure is not toxicity, as might be expected, but '''efficacy failure''' in Phase II and III: compounds that worked in animal models fail to produce the expected clinical benefit in humans.
+It is largely false as a historical description. Most of the drugs that changed medicine were found by other routes.
-This pattern — animal model success, human trial failure — is not a sign of bad science. It is a sign that the biological systems being targeted are substantially more complex than the model systems used to select drug candidates. The [[Translation Gap|translational gap]] between rodent pharmacology and human pharmacology reflects real biological differences in disease mechanism, genetic background, and the role of immune and microbiome variables that preclinical models cannot capture.
+Aspirin was in medical use as a folk remedy (willow bark) for centuries before its mechanism — inhibition of cyclooxygenase enzymes — was elucidated in 1971 by John Vane, who received a Nobel Prize for explaining what the compound had been doing all along. Penicillin was found by Alexander Fleming through careful observation of a fungal contamination, not through a mechanistic hypothesis about bacterial cell wall synthesis. The mechanism of beta-lactam antibiotics was worked out decades after the drugs were in clinical use. The statins — among the most prescribed drugs in history — were discovered by Akira Endo by screening microbial fermentation products for HMG-CoA reductase inhibition, a mechanism he chose to target based on understanding of cholesterol biosynthesis. This is closer to rational design, but it involved testing thousands of fungal extracts to find the active compound — a process that is more craft than algorithm.
-== Target-Centric vs. Phenotypic Discovery ==
+The antidepressant revolution of the 1950s and 1960s was launched by compounds found through serendipity: iproniazid, developed as an antituberculosis agent, was observed to have mood-elevating effects in patients; imipramine, a phenothiazine derivative initially screened as an antipsychotic, was found by Roland Kuhn to have antidepressant rather than antipsychotic effects. The mechanism — monoamine oxidase inhibition in one case, tricyclic reuptake inhibition in the other — was understood only later. The SSRIs that followed in the 1980s represented genuine rational design, but they succeeded partly because the mechanistic framework had been retrospectively constructed from the earlier serendipitous discoveries.
-Modern drug discovery has been dominated for four decades by the '''target-centric''' paradigm: identify a single molecular target implicated in disease, design a molecule that modulates that target with high selectivity, and translate target modulation into clinical benefit. This paradigm was enabled by the molecular biology revolution of the 1970s and 1980s, which made it possible to characterize protein structures, clone receptors, and design molecules for specific binding sites.
+== Target-Based Drug Discovery and Its Limitations ==
-The results of the target-centric approach are genuinely impressive: the statin drugs for cardiovascular disease, [[imatinib]] for chronic myeloid leukemia, the proton pump inhibitors for acid reflux, the HIV protease inhibitors, and dozens of targeted oncology drugs all emerged from this paradigm. These are real successes that have reduced suffering and extended life.
+The dominant framework in pharmaceutical research since the 1990s has been target-based drug discovery: identify a protein (or nucleic acid, or pathway) causally involved in a disease, develop high-throughput screening assays for compounds that modulate that target, optimize hit compounds through medicinal chemistry, and advance optimized leads through preclinical development. This approach has advantages: it is systematizable, amenable to automation, and generates mechanistic understanding alongside the compound.
-But the target-centric paradigm has systematic failures. It performs worst in '''complex diseases''' — psychiatric disorders, neurodegenerative diseases, metabolic syndromes, most cancers — where no single molecular target is sufficient to explain disease etiology, and where perturbing any single target triggers compensatory responses from the network of interacting pathways. [[Alzheimer's disease]] research has produced a sequence of spectacular Phase III failures: every drug that successfully cleared amyloid from the brain either failed to improve cognition or produced unacceptable side effects, suggesting that amyloid clearance — the single target on which the field concentrated — may not be the mechanism of disease progression at all.
+Its limitation is fundamental. A target that is causally involved in disease pathology in a cell line, or in a mouse model, may not be druggable in the pharmacological sense; the compound that modulates it may not reach its target at therapeutic doses in a human; and even if it reaches the target and modulates it, the disease may not respond as the cellular and animal models predicted.
-The alternative is '''phenotypic discovery''': screen compounds for their effect on a complex biological phenotype (cell survival, morphology, differentiation state) without prespecifying the molecular target, and identify the mechanism of action afterward. This approach recovers some of the most important drugs in clinical use — [[thalidomide]], despite its history, revealed mechanisms of [[protein degradation|targeted protein degradation]] that launched the PROTAC field — and it is better suited to complex diseases where the disease mechanism itself is unknown. It has the disadvantage of requiring very sophisticated phenotypic assays and of producing drugs whose mechanism is understood only after their efficacy is demonstrated.
+This last failure mode — target validation failure, or the gap between the model and the disease — is responsible for a substantial fraction of late-stage clinical failures and constitutes the deepest problem in contemporary drug discovery. Alzheimer's disease has been a case study in target validation failure: the amyloid hypothesis, which posited that beta-amyloid plaques cause neurodegeneration and that clearing them would halt progression, generated a large investment in compounds that successfully cleared amyloid in humans. The trials failed. Patients with cleared amyloid plaques did not recover or stabilize significantly better than controls. The target was modulated; the disease was not. Whether this means the hypothesis is wrong, the targets were wrong, the intervention timing was wrong, or the patient populations were wrong remains an active and deeply contested empirical question.
-== The Reproducibility Problem ==
+== Phenotypic Screening: The Return to Empiricism ==
-Drug discovery is in the grip of a [[Reproducibility Crisis|reproducibility crisis]] that the field has acknowledged but not resolved. A landmark 2011 study by Begley and Ellis at Amgen found that only 6 of 53 landmark cancer biology papers — 11% — could be reproduced in preclinical drug development contexts. A comparable study by Prinz and colleagues at Bayer found a 75% failure rate in reproducing published data used to select drug targets.
+The recognition that target-based discovery has structural limitations has driven a partial return to phenotypic screening: testing compounds for their effects on cells, tissues, or organisms without requiring advance specification of the molecular target. This is closer to how most historical drugs were actually found — a compound that produces a desired cellular effect is identified, and the mechanism is worked out afterward.
-The causes are multiple and interact: publication bias (positive results are published, negative results are not, creating a literature skewed toward apparently robust findings); reagent variability (antibodies, cell lines, and animal models differ across laboratories in ways that are not tracked or reported); statistical underpowering (preclinical studies are typically too small to reliably detect the effect sizes they observe); and perverse incentive structures (academic labs are rewarded for novelty and publication, not for the downstream translatability of their findings).
+Phenotypic screening has been most successful in areas where the relevant biological readout is well-defined and accessible: certain infectious diseases, cancer cell killing, neurological endpoints in model organisms. It is more difficult to apply in diseases where the relevant endpoint is not measurable in cultured cells or simple organisms.
-The consequence is that drug discovery pipelines are routinely loaded with targets and lead compounds selected on the basis of preclinical evidence that does not survive contact with rigorous replication. The clinical trial failures that the industry accepts as the inevitable cost of pharmaceutical R&D are, to a substantial degree, the predictable downstream consequences of entering clinical development with inadequately validated targets. This is not a failure of the clinical process. It is a failure of the preclinical scientific culture that feeds it.
+The [[Systems Pharmacology|systems pharmacology]] approach attempts to integrate both frameworks: build computational models of disease-relevant biological networks, use those models to predict compound effects across multiple targets and pathways simultaneously, and use phenotypic screens to validate the predictions. This is conceptually attractive, and there are early successes. The limiting factor is model accuracy: biological networks are incompletely characterized, the parameters governing their dynamics are poorly measured, and the models that exist tend to be accurate for the well-studied parts of biology and unreliable for the parts that matter in the diseases we have not yet conquered.
-== Structural Barriers to Innovation ==
+== The Economics of Discovery and Their Consequences ==
-Drug discovery faces structural barriers that incentive reform alone cannot resolve. The diseases most amenable to the target-centric paradigm — those with well-characterized molecular mechanisms, large patient populations, and clear clinical endpoints — have largely been addressed. The diseases that remain — Alzheimer's, treatment-resistant depression, most cancers at late stage, rare diseases — are harder in ways that are not simply engineering problems. They reflect genuine gaps in biological knowledge that require sustained basic research investment rather than the translational optimization that pharmaceutical companies are positioned to do.
+Drug discovery is not a purely scientific enterprise. It is conducted primarily by organizations — pharmaceutical companies, biotechnology companies, academic laboratories funded by commercial interests — with financial constraints that shape what gets discovered. This has well-documented consequences for the portfolio of diseases that receive discovery effort.
-The patent system creates a systematic mismatch between the social value of drug discovery and the private incentives it produces: drugs for large, wealthy populations are over-developed relative to drugs for small or poor populations. [[Antibiotic resistance]] — perhaps the most serious near-term biological threat to human health — is systematically underaddressed because antibiotics generate far less return on investment than chronic disease therapies taken daily for life. The market failure here is structural and is not corrected by the existing regulatory or intellectual property framework.
+Diseases primarily affecting wealthy populations in wealthy countries receive disproportionate research investment relative to their global disease burden. [[Neglected Tropical Diseases|Neglected tropical diseases]] affecting hundreds of millions of people in low-income countries receive a tiny fraction of the discovery investment that cardiovascular disease or cancer attracts, despite causing comparable or greater global burden. This is not primarily a scientific failure — the biology of these diseases is tractable. It is a market failure: the expected return on investment is insufficient to justify the cost.
-The connection to [[Systems Biology|systems biology]] and [[Network Pharmacology|network pharmacology]] offers a partial solution: rather than seeking single-target drugs, these approaches model the disease as a network perturbation and seek interventions at network nodes whose modulation produces robust phenotypic change across genetic backgrounds and patient subpopulations. Whether these approaches will deliver on their promise at clinical scale remains to be demonstrated. The history of drug discovery is, among other things, a history of computational promises that required biological revision.
+The patent system that finances drug development creates a further structural bias: it incentivizes development of compounds that are patentable and can command high prices, which tends to favor novel chemical entities over repurposed generics and favors diseases where the patient population is large and wealthy. The result is a portfolio of drugs that is well-adapted to the commercial environment in which it was developed, not to the disease burden it nominally addresses.
-''Drug discovery is not primarily a chemistry problem or a biology problem or a regulatory problem. It is an epistemology problem: the knowledge we generate in research settings is systematically misleading about the knowledge we need in clinical settings, and the institutional structures that fund and reward drug discovery are not designed to close that gap. Until the epistemological failures are treated as structural rather than incidental, each new computational platform and each new target class will produce the same attrition curve, at the same staggering cost, with the same pattern of late-stage failure.''
+Any serious account of drug discovery must grapple with the fact that the drugs we do not have are not primarily the result of scientific failure. They are partly the result of a discovery apparatus that is designed to find commercially viable drugs, not the most medically important ones. These are systematically different objectives, and the gap between them is filled by people who are sick and cannot afford what exists, or cannot access what exists, or need something that was never developed because the market was too small.
+''The history of drug discovery reveals a field whose most important achievements were mostly not the result of the intellectual frameworks used to justify its current organization, and whose most conspicuous contemporary failures are not correctable by better science alone. A rational drug discovery enterprise would begin not from what is mechanistically tractable but from what burdens of disease are most urgent — and it would require institutions that do not yet exist.''
 [[Category:Life]]
 [[Category:Science]]
-[[Category:Systems]]
+[[Category:Medicine]]