File size: 6,679 Bytes
2f8ae1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
# P0 Audit: Microsoft Agent Framework (Magentic) & Search Tools

**Date:** November 27, 2025
**Auditor:** Claude Code
**Status:** VERIFIED

---

## TL;DR

| Component | Status | Verdict |
|-----------|--------|---------|
| Microsoft Agent Framework | βœ… WORKING | Correctly wired, no bugs |
| GPT-5.1 Model Config | βœ… CORRECT | Using `gpt-5.1` as configured |
| Search Tools | ❌ BROKEN | Root cause of garbage results |

**The orchestration framework is fine. The search layer is garbage.**

---

## Microsoft Agent Framework Verification

### Import Test: PASSED
```python
from agent_framework import MagenticBuilder, ChatAgent
from agent_framework.openai import OpenAIChatClient
# All imports successful
```

### Agent Creation Test: PASSED
```python
from src.agents.magentic_agents import create_search_agent
search_agent = create_search_agent()
# SearchAgent created: SearchAgent
# Description: Searches biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv)
```

### Workflow Build Test: PASSED
```python
from src.orchestrator_magentic import MagenticOrchestrator
orchestrator = MagenticOrchestrator(max_rounds=2)
workflow = orchestrator._build_workflow()
# Workflow built successfully: <class 'agent_framework._workflows._workflow.Workflow'>
```

### Model Configuration: CORRECT
```python
settings.openai_model = "gpt-5.1"  # βœ… Using GPT-5.1, not GPT-4o
settings.openai_api_key = True     # βœ… API key is set
```

---

## What Magentic Provides (Working)

1. **Multi-Agent Coordination**
   - Manager agent orchestrates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent
   - Uses `MagenticBuilder().with_standard_manager()` for coordination

2. **ChatAgent Pattern**
   - Each agent has internal LLM (GPT-5.1)
   - Can call tools via `@ai_function` decorator
   - Has proper instructions for domain-specific tasks

3. **Workflow Streaming**
   - Events: `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, etc.
   - Real-time UI updates via `workflow.run_stream(task)`

4. **State Management**
   - `MagenticState` persists evidence across agents
   - `get_bibliography()` tool for ReportAgent

---

## What's Actually Broken: The Search Tools

### File: `src/agents/tools.py`

The Magentic agents call these tools:
- `search_pubmed` β†’ Uses `PubMedTool`
- `search_clinical_trials` β†’ Uses `ClinicalTrialsTool`
- `search_preprints` β†’ Uses `BioRxivTool`

**These tools are the problem, not the framework.**

---

## Search Tool Bugs (Detailed)

### BUG 1: BioRxiv API Does Not Support Search

**File:** `src/tools/biorxiv.py:248-286`

```python
# This fetches the FIRST 100 papers from the last 90 days
# It does NOT search by keyword - the API doesn't support that
url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"

# Then filters client-side for keywords
matching = self._filter_by_keywords(papers, query_terms, max_results)
```

**Problem:**
- Fetches 100 random chronological papers
- Filters for ANY keyword match in title/abstract
- "Long COVID medications" returns papers about "calf muscles" because they mention "COVID" once

**Fix:** Remove BioRxiv or use Europe PMC (which has actual search)

---

### BUG 2: PubMed Query Not Optimized

**File:** `src/tools/pubmed.py:54-71`

```python
search_params = self._build_params(
    db="pubmed",
    term=query,  # RAW USER QUERY - no preprocessing!
    retmax=max_results,
    sort="relevance",
)
```

**Problem:**
- User enters: "What medications show promise for Long COVID?"
- PubMed receives: `What medications show promise for Long COVID?`
- Should receive: `("long covid"[Title/Abstract] OR "PASC"[Title/Abstract]) AND (treatment[Title/Abstract] OR drug[Title/Abstract])`

**Fix:** Add query preprocessing:
1. Strip question words (what, which, how, etc.)
2. Expand medical synonyms (Long COVID β†’ PASC, Post-COVID)
3. Use MeSH terms for better recall

---

### BUG 3: ClinicalTrials.gov No Filtering

**File:** `src/tools/clinicaltrials.py`

Returns ALL trials including:
- Withdrawn trials
- Terminated trials
- Observational studies (not drug interventions)
- Phase 1 (no efficacy data)

**Fix:** Filter by:
- `studyType=INTERVENTIONAL`
- `phase=PHASE2,PHASE3,PHASE4`
- `status=COMPLETED,ACTIVE_NOT_RECRUITING,RECRUITING`

---

## Evidence: Garbage In β†’ Garbage Out

When the Magentic SearchAgent calls these tools:

```
SearchAgent: "Find evidence for Long COVID medications"
    β”‚
    β–Ό
search_pubmed("Long COVID medications")
    β†’ Returns 1 semi-relevant paper (raw query hits)

search_preprints("Long COVID medications")
    β†’ Returns garbage (BioRxiv API doesn't search)
    β†’ "Calf muscle adaptations" (has "COVID" somewhere)
    β†’ "Ophthalmologist work-life balance" (mentions COVID)

search_clinical_trials("Long COVID medications")
    β†’ Returns all trials, no filtering
    β”‚
    β–Ό
JudgeAgent receives garbage evidence
    β”‚
    β–Ό
HypothesisAgent can't generate good hypotheses from garbage
    β”‚
    β–Ό
ReportAgent produces garbage report
```

**The framework is doing its job. It's orchestrating agents correctly. But the agents are being fed garbage data.**

---

## Recommended Fixes

### Priority 1: Delete or Fix BioRxiv (30 min)

**Option A: Delete it**
```python
# In src/agents/tools.py, remove:
# from src.tools.biorxiv import BioRxivTool
# _biorxiv = BioRxivTool()
# @ai_function search_preprints(...)
```

**Option B: Replace with Europe PMC**
Europe PMC has preprints AND proper search API:
```
https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=long+covid+treatment&format=json
```

### Priority 2: Fix PubMed Query (1 hour)

Add query preprocessor:
```python
def preprocess_query(raw_query: str) -> str:
    """Convert natural language to PubMed query syntax."""
    # Strip question words
    # Expand medical synonyms
    # Add field tags [Title/Abstract]
    # Return optimized query
```

### Priority 3: Filter ClinicalTrials (30 min)

Add parameters to API call:
```python
params = {
    "query.term": query,
    "filter.overallStatus": "COMPLETED,RECRUITING",
    "filter.studyType": "INTERVENTIONAL",
    "pageSize": max_results,
}
```

---

## Conclusion

**Microsoft Agent Framework: NO BUGS FOUND**
- Imports work βœ…
- Agent creation works βœ…
- Workflow building works βœ…
- Model config correct (GPT-5.1) βœ…
- Streaming events work βœ…

**Search Tools: CRITICALLY BROKEN**
- BioRxiv: API doesn't support search (fundamental)
- PubMed: No query optimization (fixable)
- ClinicalTrials: No filtering (fixable)

**Recommendation:**
1. Delete BioRxiv immediately (unusable)
2. Add PubMed query preprocessing
3. Add ClinicalTrials filtering
4. Then the Magentic multi-agent system will work as designed