Force-Multiplier Playbook

Published: 2026-05-01 | Permalink

author: Rowan Brad Quni-Gudzinas

ORCID: 0009-0002-4317-5604

ISNI: 0000000526456062

title: The Force-Multiplier Playbook

aliases:

- The Force-Multiplier Playbook

modified: 2026-05-14T11:23:43Z




How One Scientist + One LLM Can Match a Research Team


Version: 1.0—Public Release

Date: 2026-05-14

DOI: 10.5281/zenodo.20154578

License: CC BY 4.0



> The whole point of the force-multiplier project is that the LLM compresses a year-long, half-time research project into a day of focused human direction.


1. The Shift


Modern science rewards teams. The ATLAS collaboration numbers over 3,000 scientists. The average biomedical paper now lists 6.5 authors—up from 2.5 in 1950. Grant committees favor multi-institution consortia. The solo scientist, once the default mode of discovery (Newton, Einstein, Dirac), has become an endangered species—not because they lack ideas, but because they lack the throughput that teams provide: literature review, code prototyping, equation derivation, figure generation, first-draft writing.


But something changed in 2024-2025. Large language models crossed a threshold. They can now:



A single researcher, equipped with an LLM in a unified conversation environment (file I/O + Python execution + git), can reproduce the output of a small research team—not in theory, but in practice. Our preliminary self-experiments suggest speedups of $25\times$ to $90\times$ across two domains.


2. The Core Idea


> A structured protocol turns the LLM from a chatbot into a force multiplier.


The key insight is not “LLMs are smart.” It’s that most research tasks are bottlenecked by throughput, not by brilliance. A postdoc is not $20\times$ smarter than a professor—they’re $20\times$ faster at executing well-defined subtasks. The LLM closes that gap, provided it’s given the right structure.


The Force-Multiplier Protocol has five phases:


PhaseWhat HappensWho Leads
:------:-------------:----------
1. DefineFrame the research question, specify deliverables, set success criteriaHuman
2. DelegateIssue structured prompts for literature, code, derivation, draftingHuman → LLM
3. Execute & IterateLLM produces output; human reviews; LLM refines; repeatLLM (with human steering)
4. VerifyCross-check every quantitative claim, run reproducibility testsLLM + Human
5. SynthesizeAssemble the final document, abstract, cover letter, repositoryLLM

The human’s role is orchestrator, not executor. You don’t write the code—you review it. You don’t derive the equations—you check the limits. You don’t draft every paragraph—you edit for clarity and correctness. The LLM handles throughput; you handle direction, taste, and verification.


3. What the Protocol Produces


We tested this protocol on two real research problems:


Case Study 1: Theoretical Physics


Problem: Resolve the cosmological constant discrepancy ($10^{120}$ mismatch between quantum vacuum energy and observed dark energy) using ultrametric (p-adic) quantum gravity frameworks.


Traditional timeline: ~6 months for a postdoc + PI, working part-time.

Force-multiplied timeline: ~1 day of focused human direction.




Deliverables produced:


Self-experiment speedup: approximately $25\times$ over traditional solo research, comparable to the output volume of a small team (preliminary—controlled replication needed).


Case Study 2: Computational Linguistics


Problem: Cross-linguistic Bayesian analysis of 22 languages—testing whether information-theoretic constraints shape grammatical structure.


Traditional timeline: ~3 months for a linguist.

Force-multiplied timeline: ~1 day.


Deliverables produced:


Self-experiment speedup: approximately $90\times$ (preliminary—controlled replication needed).


The Bottom Line


In both cases, the bottleneck was not the difficulty of the research—it was the throughput of a single human executing sequential tasks. The LLM parallelizes the work: while you review the derivation, it drafts the next section. While you check the code output, it formats the references. This is the force multiplier.



4. The Stack (What You Actually Need)


Forget Docker. Forget API keys. Forget “agentic architectures” with four specialized sub-agents. The simplest possible stack works:


ComponentWhat It IsWhy
:----------:-----------:----
LLM InterfaceAny capable LLM (DeepSeek, Claude, GPT) in a conversation environmentThe “brain”
File I/OThe LLM can read and write files in your project directoryPersistent state across turns
Code ExecutionThe LLM can run Python (or R, Julia) and see the outputAll quantitative work is verified
GitVersion control for everythingAudit trail, reproducibility, rollback
Markdown + LaTeXYour document formatLLM-friendly, compiles to journal-ready PDF

That’s it. No orchestration framework. No multi-agent simulation. No cloud infrastructure. A single conversation thread with file access and code execution is the entire stack.


The “architecture” section of any paper about this methodology should describe the architecture that was actually used to produce the results, not the aspirational one you might build someday.



5. The 5 Prompts That Make It Work


You don’t need a prompt library of 100 templates. Five prompt patterns cover virtually all research tasks:


Prompt 1: Literature Synthesis


> “Synthesize the current state of research on [TOPIC]. Cover: (a) the standard model/consensus, (b) 3-5 key competing approaches, (c) open problems, (d) what a new contribution would need to address. Cite specific papers with authors and years. Flag anything you’re uncertain about.”


Prompt 2: Derivation with Reality Check


> “Derive [RESULT] from [STARTING POINT], showing all steps. After the derivation, run a reality check: (a) does the result have the right physical dimensions? (b) does it reduce to known cases in appropriate limits? (c) are there any divergences or singularities? Implement the key expression in Python/SymPy and verify numerically for test cases.”


Prompt 3: Code Prototyping


> “Write a self-contained Python script that [TASK]. Requirements: (a) uses only standard library + numpy/scipy, (b) includes test cases that verify correctness, (c) saves results in a structured format (JSON/CSV), (d) generates at least one publication-quality figure. Document all assumptions in comments.”


Prompt 4: Section Drafting


> “Draft a [SECTION TYPE] for a paper on [TOPIC]. The section should cover [KEY POINTS]. Use the following references: [REFS]. Style: academic but accessible, [JOURNAL] conventions. Flag any claims that need verification. After the draft, list 3 things a reviewer might criticize and suggest how to address them.”


Prompt 5: Verification Audit


> “Audit this document for: (a) quantitative claims without evidence—flag each one, (b) missing references, (c) internal contradictions, (d) ambiguous statements that could be interpreted multiple ways, (e) assumptions presented as facts. For each issue found, state what’s wrong and suggest a fix.”


These five prompts, applied iteratively, cover the full research pipeline. The key is iteration: the first output is never final. You review, you redirect, the LLM refines. Three to five cycles per section is typical.


6. The Verification Imperative


LLMs hallucinate. They produce confident-sounding nonsense. They make arithmetic errors. This is not a fatal flaw—it’s a manageable risk if you build verification into the protocol.


The Verification Cycle has four gates:


GateWhatWhenWho
:-----:-----:-----:---
G1: Code VerificationEvery quantitative claim must be reproducible via PythonDuring executionLLM + Human
G2: Limit ChecksEvery derivation must be tested in known limits ($t \to 0$, $N \to \infty$, etc.)After derivationLLM
G3: Reader TestingFeed the draft to a fresh LLM instance and ask targeted questionsBefore finalizationLLM (blind)
G4: Human ReviewRead the final document. Check tone, accuracy, completeness.Before publicationHuman

Rule of thumb: If you can’t reproduce a number with code, it doesn’t go in the paper. If a limit check fails, the derivation is wrong. If a blind reader is confused, real readers will be too.


We caught four significant issues through reader testing that had survived two rounds of self-review—including a logical contradiction between an 8-hour experiment cap and a 200-hour effect size estimate. Blind readers catch what authors can’t see.


7. What This Changes


If a solo scientist can match a small team’s output, several things break:


Funding


The current model—“bigger team = bigger grant = more papers = bigger team”—assumes team size is the bottleneck. If throughput can be LLM-amplified, the bottleneck shifts to idea quality and experimental design. A $50k grant to one researcher with an LLM might produce more science than a $500k grant to a team of five without one. Grant committees need to evaluate amplified output, not headcount.


Training


LLM fluency becomes a core scientific skill—as important as statistics or programming. Graduate programs should teach prompt engineering, verification protocols, and the difference between LLM-assisted and LLM-generated work. The scientist who can direct an LLM effectively will outproduce the one who can’t.


Publishing


We should expect a rise in papers from independent researchers and small labs. Peer review will need to adapt: reviewers should check for verification hygiene (are numbers reproducible? were limit checks performed?) rather than assuming that a large author list implies rigor.


The Human Still Matters


The LLM doesn’t have taste. It doesn’t know which research questions are important. It can’t design a clever experiment or recognize a surprising result. These remain human capabilities—and they become more valuable, not less, when the throughput bottleneck is removed. The force multiplier amplifies human creativity, it doesn’t replace it.


8. Try It: The One-Day Challenge


The best way to evaluate this is to run it yourself. Here’s the challenge:


  1. Pick a research question—something you’d normally budget a week for. A literature review. A data analysis. A derivation you’ve been meaning to do.

  1. Open a conversation with an LLM that has file access and code execution.

  1. Follow the five phases:

- Define: write down exactly what success looks like (30 min)

- Delegate: use the five prompts from Section 5 (15 min)

- Execute & Iterate: let the LLM produce; review and redirect (3-4 hours)

- Verify: run code checks, limit tests, reader test (1 hour)

- Synthesize: assemble the final output (30 min)


  1. Measure the speedup. How long would this have taken you alone? Compare.

  1. Report back. Tell someone. Write a blog post. Post to your lab’s Slack. The more data points we have, the stronger the case becomes.


9. What’s Next


This playbook is a proof of concept, not the final word. The next steps:



If you’re a researcher who tries this—especially if you’re in a field we haven’t tested yet—we want to hear from you. The methodology improves with every data point.


10. What This Protocol Cannot Do (Yet)


This playbook is honest about its boundaries. Understanding what the protocol cannot do is as important as knowing what it can.


When the Protocol Breaks


The force-multiplier effect requires tasks that are well-defined, self-contained, and executable within a conversation. The protocol is not designed for:



Quality Trade-offs


LLM-generated output has characteristic failure modes:



Verification Gates Are Fallible


The four verification gates (Section 6) reduce error rates dramatically—but they do not eliminate them:



Our experience: The verification gates caught 4 of 4 issues in our reader test that had survived two rounds of self-review. But we cannot claim this generalizes to all documents, all domains, or all LLM versions. The gates reduce risk; they do not guarantee correctness.


What We Don’t Know



Ethical Boundaries




Key Metrics at a Glance


MetricValue
:-------:------
Speedup (theoretical physics)~$25\times$ (preliminary)
Speedup (computational linguistics)~$90\times$ (preliminary)
Effective team size amplification~$17\times$ (power analysis)
Time to first draft (manuscript)~1 day of human direction
Verification issues caught by reader testing4 of 4 (100% detection rate)
Stack components4 (LLM + files + code + git)
Core prompts5
Verification gates4


> The bottleneck to scientific productivity could shift from team size to human creativity and LLM-fluency. The solo scientist is back.