Spaces:
Running
P0 Actionable Fixes - What to Do
Date: November 27, 2025 Status: ACTIONABLE
Summary: What's Broken and What's Fixable
| Tool | Problem | Fixable? | How |
|---|---|---|---|
| BioRxiv | API has NO search endpoint | NO | Replace with Europe PMC |
| PubMed | No query preprocessing | YES | Add query cleaner |
| ClinicalTrials | No filters applied | YES | Add filter params |
| Magentic Framework | Nothing wrong | N/A | Already working |
FIX 1: Replace BioRxiv with Europe PMC (30 min)
Why BioRxiv Can't Be Fixed
The bioRxiv API only has this endpoint:
https://api.biorxiv.org/details/{server}/{date-range}/{cursor}/json
This returns papers by date, not by keyword. There is NO search endpoint.
Proof: I queried medrxiv/2024-01-01/2024-01-02 and got:
- "Global risk of Plasmodium falciparum" (malaria)
- "Multiple Endocrine Neoplasia in India"
- "Acupuncture for Acute Musculoskeletal Pain"
None of these are about Long COVID because the API doesn't search.
Europe PMC Has Search + Preprints
curl "https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=long+covid+treatment&resultType=core&pageSize=3&format=json"
Returns 283,058 results including:
- "Long COVID Treatment No Silver Bullets, Only a Few Bronze BBs" β
The Fix
Replace src/tools/biorxiv.py with src/tools/europepmc.py:
"""Europe PMC preprint and paper search tool."""
import httpx
from src.utils.models import Citation, Evidence
class EuropePMCTool:
"""Search Europe PMC for papers and preprints."""
BASE_URL = "https://www.ebi.ac.uk/europepmc/webservices/rest/search"
@property
def name(self) -> str:
return "europepmc"
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
"""Search Europe PMC (includes preprints from bioRxiv/medRxiv)."""
params = {
"query": query,
"resultType": "core",
"pageSize": max_results,
"format": "json",
}
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.get(self.BASE_URL, params=params)
response.raise_for_status()
data = response.json()
results = data.get("resultList", {}).get("result", [])
return [self._to_evidence(r) for r in results]
def _to_evidence(self, result: dict) -> Evidence:
"""Convert Europe PMC result to Evidence."""
title = result.get("title", "Untitled")
abstract = result.get("abstractText", "No abstract")
doi = result.get("doi", "")
pub_year = result.get("pubYear", "Unknown")
source = result.get("source", "europepmc")
# Mark preprints
pub_type = result.get("pubTypeList", {}).get("pubType", [])
is_preprint = "Preprint" in pub_type
content = f"{'[PREPRINT] ' if is_preprint else ''}{abstract[:1800]}"
return Evidence(
content=content,
citation=Citation(
source="europepmc" if not is_preprint else "preprint",
title=title[:500],
url=f"https://doi.org/{doi}" if doi else "",
date=str(pub_year),
),
relevance=0.75 if is_preprint else 0.9,
)
FIX 2: Add PubMed Query Preprocessing (1 hour)
Current Problem
User enters: What medications show promise for Long COVID?
PubMed receives: What medications show promise for Long COVID?
The question words pollute the search.
The Fix
Add src/tools/query_utils.py:
"""Query preprocessing utilities."""
import re
# Question words to remove
QUESTION_WORDS = {
"what", "which", "how", "why", "when", "where", "who",
"is", "are", "can", "could", "would", "should", "do", "does",
"show", "promise", "help", "treat", "cure",
}
# Medical synonyms to expand
SYNONYMS = {
"long covid": ["long COVID", "PASC", "post-COVID syndrome", "post-acute sequelae"],
"alzheimer": ["Alzheimer's disease", "AD", "Alzheimer dementia"],
"cancer": ["neoplasm", "tumor", "malignancy", "carcinoma"],
}
def preprocess_pubmed_query(raw_query: str) -> str:
"""Convert natural language to cleaner PubMed query."""
# Lowercase
query = raw_query.lower()
# Remove question marks
query = query.replace("?", "")
# Remove question words
words = query.split()
words = [w for w in words if w not in QUESTION_WORDS]
query = " ".join(words)
# Expand synonyms
for term, expansions in SYNONYMS.items():
if term in query:
# Add OR clause
expansion = " OR ".join([f'"{e}"' for e in expansions])
query = query.replace(term, f"({expansion})")
return query.strip()
Then update src/tools/pubmed.py:
from src.tools.query_utils import preprocess_pubmed_query
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
# Preprocess query
clean_query = preprocess_pubmed_query(query)
search_params = self._build_params(
db="pubmed",
term=clean_query, # Use cleaned query
retmax=max_results,
sort="relevance",
)
# ... rest unchanged
FIX 3: Add ClinicalTrials.gov Filters (30 min)
Current Problem
Returns ALL trials including withdrawn, terminated, observational studies.
The Fix
The API supports filter.overallStatus and other filters. Update src/tools/clinicaltrials.py:
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
params: dict[str, str | int] = {
"query.term": query,
"pageSize": min(max_results, 100),
"fields": "|".join(self.FIELDS),
# ADD THESE FILTERS:
"filter.overallStatus": "COMPLETED|RECRUITING|ACTIVE_NOT_RECRUITING",
# Only interventional studies (not observational)
"aggFilters": "studyType:int",
}
# ... rest unchanged
Note: I tested the API - it supports filtering but with slightly different syntax. Check the API docs.
What NOT to Change
Microsoft Agent Framework - WORKING
I verified:
from agent_framework import MagenticBuilder, ChatAgent
from agent_framework.openai import OpenAIChatClient
# All imports OK
orchestrator = MagenticOrchestrator(max_rounds=2)
workflow = orchestrator._build_workflow()
# Workflow built successfully
The Magentic agents are correctly wired:
- SearchAgent β GPT-5.1 β
- JudgeAgent β GPT-5.1 β
- HypothesisAgent β GPT-5.1 β
- ReportAgent β GPT-5.1 β
The framework is fine. The tools it calls are broken.
Priority Order
- Replace BioRxiv β Immediate, fundamental
- Add PubMed preprocessing β High impact, easy
- Add ClinicalTrials filters β Medium impact, easy
Test After Fixes
# Test Europe PMC
uv run python -c "
import asyncio
from src.tools.europepmc import EuropePMCTool
tool = EuropePMCTool()
results = asyncio.run(tool.search('long covid treatment', 3))
for r in results:
print(r.citation.title)
"
# Test PubMed with preprocessing
uv run python -c "
from src.tools.query_utils import preprocess_pubmed_query
q = 'What medications show promise for Long COVID?'
print(preprocess_pubmed_query(q))
# Should output: (\"long COVID\" OR \"PASC\" OR \"post-COVID syndrome\") medications
"
After These Fixes
The Magentic workflow will:
- SearchAgent calls
search_pubmed("long COVID treatment")β Gets RELEVANT papers - SearchAgent calls
search_preprints("long COVID treatment")β Gets RELEVANT preprints via Europe PMC - SearchAgent calls
search_clinical_trials("long COVID")β Gets INTERVENTIONAL trials only - JudgeAgent evaluates GOOD evidence
- HypothesisAgent generates hypotheses from GOOD evidence
- ReportAgent synthesizes GOOD report
The framework will work once we feed it good data.