Spaces:
Running
Running
File size: 6,679 Bytes
2f8ae1f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
# P0 Audit: Microsoft Agent Framework (Magentic) & Search Tools
**Date:** November 27, 2025
**Auditor:** Claude Code
**Status:** VERIFIED
---
## TL;DR
| Component | Status | Verdict |
|-----------|--------|---------|
| Microsoft Agent Framework | β
WORKING | Correctly wired, no bugs |
| GPT-5.1 Model Config | β
CORRECT | Using `gpt-5.1` as configured |
| Search Tools | β BROKEN | Root cause of garbage results |
**The orchestration framework is fine. The search layer is garbage.**
---
## Microsoft Agent Framework Verification
### Import Test: PASSED
```python
from agent_framework import MagenticBuilder, ChatAgent
from agent_framework.openai import OpenAIChatClient
# All imports successful
```
### Agent Creation Test: PASSED
```python
from src.agents.magentic_agents import create_search_agent
search_agent = create_search_agent()
# SearchAgent created: SearchAgent
# Description: Searches biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv)
```
### Workflow Build Test: PASSED
```python
from src.orchestrator_magentic import MagenticOrchestrator
orchestrator = MagenticOrchestrator(max_rounds=2)
workflow = orchestrator._build_workflow()
# Workflow built successfully: <class 'agent_framework._workflows._workflow.Workflow'>
```
### Model Configuration: CORRECT
```python
settings.openai_model = "gpt-5.1" # β
Using GPT-5.1, not GPT-4o
settings.openai_api_key = True # β
API key is set
```
---
## What Magentic Provides (Working)
1. **Multi-Agent Coordination**
- Manager agent orchestrates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent
- Uses `MagenticBuilder().with_standard_manager()` for coordination
2. **ChatAgent Pattern**
- Each agent has internal LLM (GPT-5.1)
- Can call tools via `@ai_function` decorator
- Has proper instructions for domain-specific tasks
3. **Workflow Streaming**
- Events: `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, etc.
- Real-time UI updates via `workflow.run_stream(task)`
4. **State Management**
- `MagenticState` persists evidence across agents
- `get_bibliography()` tool for ReportAgent
---
## What's Actually Broken: The Search Tools
### File: `src/agents/tools.py`
The Magentic agents call these tools:
- `search_pubmed` β Uses `PubMedTool`
- `search_clinical_trials` β Uses `ClinicalTrialsTool`
- `search_preprints` β Uses `BioRxivTool`
**These tools are the problem, not the framework.**
---
## Search Tool Bugs (Detailed)
### BUG 1: BioRxiv API Does Not Support Search
**File:** `src/tools/biorxiv.py:248-286`
```python
# This fetches the FIRST 100 papers from the last 90 days
# It does NOT search by keyword - the API doesn't support that
url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"
# Then filters client-side for keywords
matching = self._filter_by_keywords(papers, query_terms, max_results)
```
**Problem:**
- Fetches 100 random chronological papers
- Filters for ANY keyword match in title/abstract
- "Long COVID medications" returns papers about "calf muscles" because they mention "COVID" once
**Fix:** Remove BioRxiv or use Europe PMC (which has actual search)
---
### BUG 2: PubMed Query Not Optimized
**File:** `src/tools/pubmed.py:54-71`
```python
search_params = self._build_params(
db="pubmed",
term=query, # RAW USER QUERY - no preprocessing!
retmax=max_results,
sort="relevance",
)
```
**Problem:**
- User enters: "What medications show promise for Long COVID?"
- PubMed receives: `What medications show promise for Long COVID?`
- Should receive: `("long covid"[Title/Abstract] OR "PASC"[Title/Abstract]) AND (treatment[Title/Abstract] OR drug[Title/Abstract])`
**Fix:** Add query preprocessing:
1. Strip question words (what, which, how, etc.)
2. Expand medical synonyms (Long COVID β PASC, Post-COVID)
3. Use MeSH terms for better recall
---
### BUG 3: ClinicalTrials.gov No Filtering
**File:** `src/tools/clinicaltrials.py`
Returns ALL trials including:
- Withdrawn trials
- Terminated trials
- Observational studies (not drug interventions)
- Phase 1 (no efficacy data)
**Fix:** Filter by:
- `studyType=INTERVENTIONAL`
- `phase=PHASE2,PHASE3,PHASE4`
- `status=COMPLETED,ACTIVE_NOT_RECRUITING,RECRUITING`
---
## Evidence: Garbage In β Garbage Out
When the Magentic SearchAgent calls these tools:
```
SearchAgent: "Find evidence for Long COVID medications"
β
βΌ
search_pubmed("Long COVID medications")
β Returns 1 semi-relevant paper (raw query hits)
search_preprints("Long COVID medications")
β Returns garbage (BioRxiv API doesn't search)
β "Calf muscle adaptations" (has "COVID" somewhere)
β "Ophthalmologist work-life balance" (mentions COVID)
search_clinical_trials("Long COVID medications")
β Returns all trials, no filtering
β
βΌ
JudgeAgent receives garbage evidence
β
βΌ
HypothesisAgent can't generate good hypotheses from garbage
β
βΌ
ReportAgent produces garbage report
```
**The framework is doing its job. It's orchestrating agents correctly. But the agents are being fed garbage data.**
---
## Recommended Fixes
### Priority 1: Delete or Fix BioRxiv (30 min)
**Option A: Delete it**
```python
# In src/agents/tools.py, remove:
# from src.tools.biorxiv import BioRxivTool
# _biorxiv = BioRxivTool()
# @ai_function search_preprints(...)
```
**Option B: Replace with Europe PMC**
Europe PMC has preprints AND proper search API:
```
https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=long+covid+treatment&format=json
```
### Priority 2: Fix PubMed Query (1 hour)
Add query preprocessor:
```python
def preprocess_query(raw_query: str) -> str:
"""Convert natural language to PubMed query syntax."""
# Strip question words
# Expand medical synonyms
# Add field tags [Title/Abstract]
# Return optimized query
```
### Priority 3: Filter ClinicalTrials (30 min)
Add parameters to API call:
```python
params = {
"query.term": query,
"filter.overallStatus": "COMPLETED,RECRUITING",
"filter.studyType": "INTERVENTIONAL",
"pageSize": max_results,
}
```
---
## Conclusion
**Microsoft Agent Framework: NO BUGS FOUND**
- Imports work β
- Agent creation works β
- Workflow building works β
- Model config correct (GPT-5.1) β
- Streaming events work β
**Search Tools: CRITICALLY BROKEN**
- BioRxiv: API doesn't support search (fundamental)
- PubMed: No query optimization (fixable)
- ClinicalTrials: No filtering (fixable)
**Recommendation:**
1. Delete BioRxiv immediately (unusable)
2. Add PubMed query preprocessing
3. Add ClinicalTrials filtering
4. Then the Magentic multi-agent system will work as designed
|