VibecoderMcSwaggins commited on
Commit
8290bc9
·
unverified ·
1 Parent(s): fa1717e

fix(audit): comprehensive P1-P4 audit fixes (#45)

Browse files

* docs(audit): add remaining issues analysis for 2024-12-13

Cross-reference new audit findings against previous PR #44 fixes.
Categorizes P1-P4 issues that remain to be addressed.

Key findings:
- P1: max_concurrent_jobs default too high (10 -> 1)
- P1: timeout not propagated to direct invocation
- P2: Several unused settings (hf_cache_dir, temp_dir, deepisles_repo_path)
- P2: Concurrency check after expensive validation
- P3/P4: Documentation and cleanup items

Many issues from audit already fixed in PR #44 (confirmed).

* fix(audit): address remaining P1-P2 issues and remove config slop

P1 Fixes (Safety Critical):
- Change max_concurrent_jobs default from 10 to 1 (GPU OOM prevention)
- Pass timeout parameter through to direct invocation path

Config Slop Removed (speculative placeholders that duplicated native functionality):
- hf_cache_dir: use HF_HOME env var instead (already set in Dockerfile)
- temp_dir: use TMPDIR env var instead (Python native)
- deepisles_repo_path: no use case (direct mode only runs in container)
- DEEPISLES_PATH env var: removed from Dockerfile (unused)

P2 Fixes:
- Pre-check concurrency limit before expensive list_case_ids() validation
- Change direct invocation logging from INFO to DEBUG (prevent log explosion)
- Apply log settings (STROKE_DEMO_LOG_LEVEL/FORMAT) to FastAPI startup

P3/P4 Cleanup:
- Update package description from Gradio to React SPA + FastAPI
- Fix CORS comment (HEAD -> OPTIONS for preflight)
- Completely rewrite docs/guides/configuration.md with current settings

All 157 tests pass. ruff/mypy clean.

* fix(audit): P2 prediction path mismatch and P3 .env.example cleanup

Validated findings from external agent re-review:

P2 Fix - Prediction URL mismatch (Docker mode only):
- deepisles.py find_prediction_mask() may return path in results/ subdir
- URL contract expects files directly in output_dir
- Added copy step to ensure prediction is at expected location

P3 Fix - .env.example stale reference:
- Removed STROKE_DEMO_TEMP_DIR (setting was removed, use native TMPDIR)

Bug doc: POST-AUDIT-FINDINGS-2025-12-13.md with full analysis

* fix(audit): address P3/P4 findings from round 2 review

P3 fixes:
- main.py: add unused-ignore to torch type: ignore for env-independence
- deployment.md: complete rewrite for React+FastAPI architecture
- quickstart.md: reorder options, React SPA first, Gradio marked legacy

P4 fixes:
- direct.py: move path/command logging from INFO to DEBUG
- fixtures.ts: update URLs to match /files/{jobId}/{caseId}/ contract
- cli.py + data/__init__.py: wire --dataset arg to list_case_ids()

Docs:
- Consolidated audit findings in docs/bugs/AUDIT-FINDINGS-2025-12-13.md
- Removed redundant root-level bug docs

Test results: 157 passed (backend), 70 passed (frontend)

* docs(audit): fix P3/P4 doc drift from round 3 review

P3 fixes:
- quickstart.md/deployment.md/README.md: add --extra api/gradio to install
- README.md: React+FastAPI as primary, Gradio as legacy option 4
- TECHNICAL_DEBT.md: mark as resolved, note React migration superseded Gradio fixes

P4 fixes:
- deployment.md: clarify HF_SPACES set by Dockerfile, not auto by HF
- direct.py: docstring mentions FastAPI not Gradio
- NiiVueViewer.test.tsx: comment noting simplified URLs for component testing

Test results: 157 passed (backend), 70 passed (frontend)

* fix(coderabbit): address valid review feedback

Accepted:
- docs/bugs/NEXT-CONCERNS.md: add `text` language to fenced code block
- deepisles.py: add try/except for shutil.copy2 with DeepISLESError
- deepisles.py: improve timeout docstring (default value, None behavior)

Rejected (with justification):
- TECHNICAL_DEBT.md date: "December 2025" is correct (current year)
- fixtures.ts relative URLs: API returns absolute URLs, fixtures should match

Test results: 157 passed (backend), ruff/mypy clean

.env.example CHANGED
@@ -14,7 +14,7 @@ STROKE_DEMO_DEEPISLES_USE_GPU=true
14
 
15
  # Paths
16
  STROKE_DEMO_RESULTS_DIR=./results
17
- # STROKE_DEMO_TEMP_DIR=/tmp/custom_temp
18
 
19
  # UI
20
  STROKE_DEMO_GRADIO_SERVER_NAME=0.0.0.0
 
14
 
15
  # Paths
16
  STROKE_DEMO_RESULTS_DIR=./results
17
+ # Note: For temp directory, use the native TMPDIR environment variable
18
 
19
  # UI
20
  STROKE_DEMO_GRADIO_SERVER_NAME=0.0.0.0
Dockerfile CHANGED
@@ -59,10 +59,6 @@ RUN uv pip install --no-deps -e .
59
  ENV HF_SPACES=1
60
  ENV DEEPISLES_DIRECT_INVOCATION=1
61
 
62
- # Point to DeepISLES location for direct invocation
63
- # DeepISLES code is at /app in the base image
64
- ENV DEEPISLES_PATH=/app
65
-
66
  # Ensure HuggingFace cache uses our writable directory
67
  ENV HF_HOME=/home/user/demo/cache
68
 
 
59
  ENV HF_SPACES=1
60
  ENV DEEPISLES_DIRECT_INVOCATION=1
61
 
 
 
 
 
62
  # Ensure HuggingFace cache uses our writable directory
63
  ENV HF_HOME=/home/user/demo/cache
64
 
README.md CHANGED
@@ -74,8 +74,8 @@ This project provides a complete end-to-end workflow:
74
  git clone https://github.com/The-Obstacle-Is-The-Way/stroke-deepisles-demo.git
75
  cd stroke-deepisles-demo
76
 
77
- # Install dependencies
78
- uv sync
79
  ```
80
 
81
  ### Running the Demo
@@ -85,11 +85,15 @@ uv sync
85
  docker pull isleschallenge/deepisles
86
  ```
87
 
88
- 2. **Launch the UI**:
89
  ```bash
90
- uv run python -m stroke_deepisles_demo.ui.app
 
 
 
 
91
  ```
92
- Open [http://localhost:7860](http://localhost:7860) in your browser.
93
 
94
  3. **Run via CLI**:
95
  ```bash
@@ -100,6 +104,12 @@ uv sync
100
  uv run stroke-demo run --case sub-stroke0001
101
  ```
102
 
 
 
 
 
 
 
103
  ## Documentation
104
 
105
  - [Quickstart Guide](docs/guides/quickstart.md)
 
74
  git clone https://github.com/The-Obstacle-Is-The-Way/stroke-deepisles-demo.git
75
  cd stroke-deepisles-demo
76
 
77
+ # Install dependencies (includes FastAPI backend)
78
+ uv sync --extra api
79
  ```
80
 
81
  ### Running the Demo
 
85
  docker pull isleschallenge/deepisles
86
  ```
87
 
88
+ 2. **Launch the Web UI** (React + FastAPI):
89
  ```bash
90
+ # Terminal 1: Start backend
91
+ uv run uvicorn stroke_deepisles_demo.api.main:app --reload --port 7860
92
+
93
+ # Terminal 2: Start frontend
94
+ cd frontend && npm install && npm run dev
95
  ```
96
+ Open [http://localhost:5173](http://localhost:5173) in your browser.
97
 
98
  3. **Run via CLI**:
99
  ```bash
 
104
  uv run stroke-demo run --case sub-stroke0001
105
  ```
106
 
107
+ 4. **Legacy Gradio UI** (optional):
108
+ ```bash
109
+ uv sync --extra gradio
110
+ uv run python -m stroke_deepisles_demo.ui.app
111
+ ```
112
+
113
  ## Documentation
114
 
115
  - [Quickstart Guide](docs/guides/quickstart.md)
docs/TECHNICAL_DEBT.md CHANGED
@@ -1,12 +1,17 @@
1
  # Technical Debt and Known Issues
2
 
3
- > **Last Audit**: December 2025 (Revision 6)
4
  > **Auditor**: Claude Code + External Senior Review
5
- > **Status**: P0 BLOCKER - NiiVue/WebGL integration broken on HF Spaces
6
 
7
  ## Summary
8
 
9
- **CRITICAL ISSUE**: The NiiVue 3D viewer does not work on HuggingFace Spaces due to Gradio architecture limitations. See Issue #24.
 
 
 
 
 
10
 
11
  | Severity | Count | Description | Status |
12
  |----------|-------|-------------|--------|
@@ -90,6 +95,9 @@ See: `docs/specs/19-perf-base64-to-file-urls.md`
90
 
91
  ## Conclusion
92
 
93
- The codebase is **production-ready for all features EXCEPT the Interactive 3D Viewer (NiiVue)**. All other technical debt items are resolved.
 
 
 
94
 
95
- **Next step:** Implement Gradio Custom Component per spec #28 to fix the P0 blocker.
 
1
  # Technical Debt and Known Issues
2
 
3
+ > **Last Audit**: December 2025 (Revision 7)
4
  > **Auditor**: Claude Code + External Senior Review
5
+ > **Status**: All P0/P1/P2 issues resolved
6
 
7
  ## Summary
8
 
9
+ All critical issues have been resolved. The former P0 blocker (NiiVue/WebGL on HF Spaces)
10
+ was bypassed by migrating to a **React SPA + FastAPI** architecture, which is now the
11
+ primary deployment target.
12
+
13
+ > **Note**: The sections below document the **historical Gradio-based approach** and its
14
+ > resolution. They are preserved for context but describe legacy architecture.
15
 
16
  | Severity | Count | Description | Status |
17
  |----------|-------|-------------|--------|
 
95
 
96
  ## Conclusion
97
 
98
+ The codebase is **production-ready**. The P0 blocker was resolved by migrating to React + FastAPI:
99
+
100
+ - **Primary UI**: React SPA with NiiVue (works correctly on HF Spaces)
101
+ - **Legacy UI**: Gradio (preserved for backwards compatibility, NiiVue broken on HF Spaces)
102
 
103
+ The Gradio Custom Component approach (spec #28) was superseded by the React migration.
docs/bugs/AUDIT-FINDINGS-2025-12-13.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Audit Findings - 2025-12-13 (Final)
2
+
3
+ **Date:** 2025-12-13
4
+ **Branch:** `dev`
5
+ **Status:** ALL FIXED ✓
6
+
7
+ ---
8
+
9
+ ## Summary
10
+
11
+ Two rounds of external agent audit findings validated and fixed:
12
+
13
+ ### Round 1 Findings (P1-P4) - All Fixed in Previous Commit
14
+
15
+ | Issue | Severity | Status |
16
+ |-------|----------|--------|
17
+ | max_concurrent_jobs=10 unsafe | P1 | ✓ Fixed |
18
+ | Timeout not passed to direct invocation | P1 | ✓ Fixed |
19
+ | Slop settings removed (hf_cache_dir, temp_dir, deepisles_repo_path) | P2 | ✓ Fixed |
20
+ | P2: Prediction path mismatch (Docker mode) | P2 | ✓ Fixed |
21
+ | P3: .env.example stale TEMP_DIR reference | P3 | ✓ Fixed |
22
+ | Concurrency pre-check, logging, package description | P2-P4 | ✓ Fixed |
23
+
24
+ ### Round 2 Findings (P3-P4) - Fixed in This Commit
25
+
26
+ | Issue | Severity | Status |
27
+ |-------|----------|--------|
28
+ | mypy type: ignore becomes env-dependent | P3 | ✓ Fixed |
29
+ | deployment.md describes outdated Gradio+DinD | P3 | ✓ Fixed |
30
+ | quickstart.md missing React+FastAPI path | P3 | ✓ Fixed |
31
+ | direct.py logs paths at INFO | P4 | ✓ Fixed |
32
+ | frontend fixture URLs don't match contract | P4 | ✓ Fixed |
33
+ | CLI --dataset arg "(not used yet)" | P4 | ✓ Fixed |
34
+ | pytest-asyncio config | P4 | N/A (pkg not installed) |
35
+
36
+ ---
37
+
38
+ ## Round 2 Fixes Detail
39
+
40
+ ### P3: mypy type: ignore[import-not-found] environment-dependent
41
+
42
+ **Problem:** On machines with torch installed, mypy complains about unused ignore.
43
+
44
+ **File:** `src/stroke_deepisles_demo/api/main.py:54`
45
+
46
+ **Fix:** Added `unused-ignore` to suppress both warnings:
47
+ ```python
48
+ import torch # type: ignore[import-not-found,unused-ignore]
49
+ ```
50
+
51
+ ### P3: deployment.md described outdated architecture
52
+
53
+ **Problem:** Docs described Docker-in-Docker + Gradio as primary deployment.
54
+
55
+ **File:** `docs/guides/deployment.md`
56
+
57
+ **Fix:** Complete rewrite reflecting:
58
+ - React SPA frontend (Static SDK Space)
59
+ - FastAPI backend (Docker SDK Space with GPU)
60
+ - Direct invocation mode (subprocess to conda env)
61
+ - Gradio marked as legacy
62
+
63
+ ### P3: quickstart.md missing React+FastAPI path
64
+
65
+ **Problem:** Only showed Gradio UI as primary option.
66
+
67
+ **File:** `docs/guides/quickstart.md`
68
+
69
+ **Fix:** Reordered options:
70
+ 1. React SPA + FastAPI (Recommended)
71
+ 2. CLI
72
+ 3. Python API
73
+ 4. Legacy Gradio UI
74
+
75
+ ### P4: direct.py logged paths at INFO
76
+
77
+ **Problem:** Full input paths logged at INFO level, potentially exposing sensitive path info.
78
+
79
+ **File:** `src/stroke_deepisles_demo/inference/direct.py:158-164, 189`
80
+
81
+ **Fix:** Changed both logging calls from `logger.info` to `logger.debug`.
82
+
83
+ ### P4: frontend fixture URLs don't match API contract
84
+
85
+ **Problem:** Fixture URLs were `/files/dwi.nii.gz` but actual API returns `/files/{jobId}/{caseId}/dwi.nii.gz`.
86
+
87
+ **File:** `frontend/src/test/fixtures.ts:14-15`
88
+
89
+ **Fix:** Updated URLs to match contract:
90
+ ```typescript
91
+ dwiUrl: "http://localhost:7860/files/test-job-123/sub-stroke0001/dwi.nii.gz",
92
+ predictionUrl: "http://localhost:7860/files/test-job-123/sub-stroke0001/prediction.nii.gz",
93
+ ```
94
+
95
+ ### P4: CLI --dataset arg not wired
96
+
97
+ **Problem:** Help text said "(not used yet)".
98
+
99
+ **Files:**
100
+ - `src/stroke_deepisles_demo/cli.py:23`
101
+ - `src/stroke_deepisles_demo/data/__init__.py:34`
102
+
103
+ **Fix:**
104
+ 1. Added `source` parameter to `list_case_ids()` function
105
+ 2. Updated CLI to pass `args.dataset` to `list_case_ids(source=args.dataset)`
106
+ 3. Updated help text to "HF dataset ID or local path"
107
+
108
+ ### P4: pytest-asyncio config warning
109
+
110
+ **Claim:** pytest-asyncio emits deprecation warning about unset `asyncio_default_fixture_loop_scope`.
111
+
112
+ **Investigation:**
113
+ - pytest-asyncio is NOT installed
114
+ - No async tests in test suite
115
+ - Warning was likely from different environment
116
+
117
+ **Fix:** N/A - removed non-applicable config that was causing "unknown config option" warnings.
118
+
119
+ ---
120
+
121
+ ## Test Verification
122
+
123
+ ```
124
+ Backend: 157 passed (pytest)
125
+ Frontend: 70 passed (vitest)
126
+ Linting: All checks passed (ruff)
127
+ Types: No issues (mypy)
128
+ ```
129
+
130
+ ---
131
+
132
+ ## Files Modified (Round 2)
133
+
134
+ | File | Change |
135
+ |------|--------|
136
+ | `src/stroke_deepisles_demo/api/main.py` | mypy ignore fix |
137
+ | `src/stroke_deepisles_demo/inference/direct.py` | INFO → DEBUG logging |
138
+ | `src/stroke_deepisles_demo/data/__init__.py` | Added source param to list_case_ids |
139
+ | `src/stroke_deepisles_demo/cli.py` | Wired --dataset arg |
140
+ | `docs/guides/deployment.md` | Complete rewrite for React+FastAPI |
141
+ | `docs/guides/quickstart.md` | Reordered options, React first |
142
+ | `frontend/src/test/fixtures.ts` | Fixed URL contract |
143
+ | `pyproject.toml` | Removed non-applicable pytest-asyncio config |
docs/bugs/NEXT-CONCERNS.md CHANGED
@@ -1,38 +1,76 @@
1
- # Next Concerns - Critical Architecture Debt
2
 
3
- **Status:** VALIDATED - All claims verified from first principles
4
- **Priority:** P0/P1 - Production reproducibility and config integrity at risk
 
5
 
6
  ---
7
 
8
- ## PART 1: CONFIG DRIFT (BUG-009) - RESOLVED
9
 
10
- ### Status
11
- - **Consolidated:** `api/config.py` deleted.
12
- - **SSOT:** `core/config.py` now holds all API settings.
13
- - **Env Vars:** `Dockerfile` updated to use `STROKE_DEMO_*` prefix.
14
- - **Validation:** Tests pass, env var overrides work.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ---
17
 
18
- ## PART 2: DEPENDENCY PINNING (BUG-012) - RESOLVED
19
 
20
- ### Status
21
- - **Base Image:** Pinned to `sha256:848c9eceb67dbc585bcb37f093389d142caeaa98878bd31039af04ef297a5af4`.
22
- - **Lock File:** Dockerfile now uses `uv sync --frozen` to respect `uv.lock`.
23
- - **Path:** Dockerfile adds `.venv/bin` to `PATH` for correct execution.
24
- - **Dependency Migration:** Migrated from hard-forked `datasets` to maintained `neuroimaging-go-brrrr` extension (v0.2.1) + standard `datasets` library. Validated end-to-end in Docker.
25
- - **Validation:** Docker build succeeds, runtime verifies settings load and modules importable.
26
 
27
  ---
28
 
29
- ## PART 3: FRONTEND CONFIG - NO ACTION NEEDED
30
 
31
- ### Status
32
- - Keeping current build-time `.env.production` approach.
33
- - No immediate need for runtime variables via `window.huggingface.variables`.
 
 
 
 
 
34
 
35
  ---
36
 
37
- **Validated:** 2025-12-12
38
- **Status:** COMPLETED
 
 
 
 
 
 
 
1
+ # Next Concerns - Config Audit & Fixes Complete
2
 
3
+ **Date:** 2025-12-13
4
+ **Branch:** `fix/remaining-audit-issues`
5
+ **Status:** IMPLEMENTED - Ready for review
6
 
7
  ---
8
 
9
+ ## Summary of Changes
10
 
11
+ ### P1 Fixes (Safety Critical)
12
+
13
+ | Issue | Fix | File |
14
+ |-------|-----|------|
15
+ | max_concurrent_jobs=10 unsafe | Changed default to 1 | `config.py:97-98` |
16
+ | Timeout not passed to direct invocation | Added timeout parameter | `deepisles.py:222-223, 318` |
17
+
18
+ ### Slop Removed (Never Actually Needed)
19
+
20
+ | Setting | Reason | Status |
21
+ |---------|--------|--------|
22
+ | `hf_cache_dir` | Duplicates `HF_HOME` (already set in Dockerfile) | **REMOVED** |
23
+ | `temp_dir` | Duplicates `TMPDIR` (Python native) | **REMOVED** |
24
+ | `deepisles_repo_path` | No valid use case (direct mode only in container) | **REMOVED** |
25
+ | `DEEPISLES_PATH` env var | Unused in direct.py | **REMOVED** from Dockerfile |
26
+
27
+ ### P2 Fixes
28
+
29
+ | Issue | Fix | File |
30
+ |-------|-----|------|
31
+ | Concurrency check after expensive validation | Added pre-check before list_case_ids() | `routes.py:95-101` |
32
+ | Direct invocation logs verbose at INFO | Changed to DEBUG | `direct.py:200-206` |
33
+ | FastAPI doesn't apply log settings | Added setup_logging() at startup | `main.py:47-48` |
34
+
35
+ ### P3/P4 Cleanup
36
+
37
+ | Issue | Fix | File |
38
+ |-------|-----|------|
39
+ | Package description mentions Gradio | Updated to React SPA + FastAPI | `__init__.py:1` |
40
+ | HEAD comment wrong (should be OPTIONS) | Fixed comment | `main.py:112` |
41
+ | Configuration docs outdated | Completely rewritten | `docs/guides/configuration.md` |
42
 
43
  ---
44
 
45
+ ## Test Results
46
 
47
+ ```text
48
+ 157 passed, 7 deselected in 12.73s
49
+ ruff check: All checks passed!
50
+ ruff format: 27 files already formatted
51
+ mypy: Success: no issues found in 27 source files
52
+ ```
53
 
54
  ---
55
 
56
+ ## Files Changed
57
 
58
+ 1. `src/stroke_deepisles_demo/core/config.py` - Removed slop, fixed max_concurrent_jobs
59
+ 2. `src/stroke_deepisles_demo/inference/deepisles.py` - Pass timeout to direct invocation
60
+ 3. `src/stroke_deepisles_demo/inference/direct.py` - DEBUG logging for verbose output
61
+ 4. `src/stroke_deepisles_demo/api/routes.py` - Pre-check concurrency before validation
62
+ 5. `src/stroke_deepisles_demo/api/main.py` - setup_logging() at startup, fix OPTIONS comment
63
+ 6. `src/stroke_deepisles_demo/__init__.py` - Updated package description
64
+ 7. `Dockerfile` - Removed unused DEEPISLES_PATH env var
65
+ 8. `docs/guides/configuration.md` - Complete rewrite with current settings
66
 
67
  ---
68
 
69
+ ## What's Left (P4 - Optional)
70
+
71
+ These are exported in public API, so removing could be breaking change. Left as-is:
72
+ - `DatasetInfo` class (defined but never instantiated)
73
+ - `create_staging_directory` (has tests, is a utility function)
74
+
75
+ These are minor doc issues, not affecting functionality:
76
+ - Broken references to archived spec docs in some files
docs/bugs/REMAINING-ISSUES-2024-12-13.md ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Remaining Issues - Post-Audit 2024-12-13
2
+
3
+ **Context:** Deep audit performed after PR #44 merged (security + config wiring fixes).
4
+ **Method:** Cross-referenced new audit against existing fixes. Validated each claim from first principles.
5
+
6
+ ---
7
+
8
+ ## Executive Summary
9
+
10
+ The previous PR #44 addressed many issues. This document captures **genuinely new findings** that were not previously addressed. Issues are prioritized P1-P4.
11
+
12
+ **Scope:** Fix these in branch `fix/remaining-audit-issues`
13
+
14
+ ---
15
+
16
+ ## P1 - High Priority (Should Fix)
17
+
18
+ ### P1-001: Default max_concurrent_jobs=10 is unsafe for single GPU
19
+
20
+ **Location:** `src/stroke_deepisles_demo/core/config.py:98`
21
+
22
+ **Issue:** Default of 10 concurrent jobs can trivially trigger GPU OOM on a single GPU (HF Spaces T4 has 16GB VRAM). Even 2-3 concurrent DeepISLES runs can cause thrashing.
23
+
24
+ **Current:**
25
+ ```python
26
+ max_concurrent_jobs: int = 10
27
+ ```
28
+
29
+ **Fix:** Change default to 1 (safest for demo). Document tuning guidance.
30
+
31
+ **Severity:** High - can crash the demo under moderate load.
32
+
33
+ ---
34
+
35
+ ### P1-002: Timeout not propagated to direct invocation path
36
+
37
+ **Location:** `src/stroke_deepisles_demo/inference/deepisles.py:310-315`
38
+
39
+ **Issue:** `_run_via_direct_invocation()` doesn't pass timeout to `run_deepisles_direct()`, making `STROKE_DEMO_DEEPISLES_TIMEOUT_SECONDS` ineffective on HF Spaces.
40
+
41
+ **Current:**
42
+ ```python
43
+ def _run_via_direct_invocation(..., fast: bool) -> DeepISLESResult:
44
+ # No timeout parameter!
45
+ result = run_deepisles_direct(
46
+ dwi_path=dwi_path,
47
+ adc_path=adc_path,
48
+ output_dir=output_dir,
49
+ flair_path=flair_path,
50
+ fast=fast,
51
+ # timeout missing!
52
+ )
53
+ ```
54
+
55
+ **Fix:** Add `timeout` parameter to `_run_via_direct_invocation()` and pass it through.
56
+
57
+ ---
58
+
59
+ ## P2 - Medium Priority (Tech Debt)
60
+
61
+ ### P2-001: hf_cache_dir setting exists but is not used
62
+
63
+ **Location:** `src/stroke_deepisles_demo/core/config.py:80` + `src/stroke_deepisles_demo/data/loader.py:224`
64
+
65
+ **Issue:** `Settings.hf_cache_dir` is defined but `datasets.load_dataset()` doesn't receive `cache_dir=`. Operators setting `STROKE_DEMO_HF_CACHE_DIR` get no effect.
66
+
67
+ **Fix:** Pass `cache_dir=settings.hf_cache_dir` when set, or remove the setting.
68
+
69
+ **Decision:** Wire it in (per previous directive: wire in settings properly).
70
+
71
+ ---
72
+
73
+ ### P2-002: temp_dir setting exists but is not used
74
+
75
+ **Location:** `src/stroke_deepisles_demo/core/config.py:92` + `src/stroke_deepisles_demo/pipeline.py:118`
76
+
77
+ **Issue:** `Settings.temp_dir` defined but pipeline uses `tempfile.mkdtemp()` without `dir=` parameter.
78
+
79
+ **Fix:** Use `dir=settings.temp_dir` when set.
80
+
81
+ ---
82
+
83
+ ### P2-003: deepisles_repo_path setting exists but is not used
84
+
85
+ **Location:** `src/stroke_deepisles_demo/core/config.py:89` + `src/stroke_deepisles_demo/inference/direct.py:32-35`
86
+
87
+ **Issue:** `Settings.deepisles_repo_path` defined but direct.py hardcodes `/app` and `/opt/conda`.
88
+
89
+ **Fix:** Use setting to derive adapter path and cwd. Also wire `DEEPISLES_PATH` env var (Dockerfile:64).
90
+
91
+ ---
92
+
93
+ ### P2-004: Concurrency limit checked after expensive validation
94
+
95
+ **Location:** `src/stroke_deepisles_demo/api/routes.py:95-110`
96
+
97
+ **Issue:** `list_case_ids()` is called (potentially expensive HF dataset load) before checking concurrency limit. Wasted work when limit already reached.
98
+
99
+ **Fix:** Reorder to check concurrency limit first, then validate case_id.
100
+
101
+ ---
102
+
103
+ ### P2-005: Direct invocation logs full stdout/stderr at INFO/WARN
104
+
105
+ **Location:** `src/stroke_deepisles_demo/inference/direct.py:200-205`
106
+
107
+ **Issue:** Can explode log volume with DeepISLES's verbose output, make JSON logging invalid.
108
+
109
+ **Fix:** Log at DEBUG level, or truncate/summarize at INFO.
110
+
111
+ ---
112
+
113
+ ### P2-006: FastAPI doesn't apply log settings
114
+
115
+ **Location:** `src/stroke_deepisles_demo/api/main.py`
116
+
117
+ **Issue:** `STROKE_DEMO_LOG_LEVEL` and `STROKE_DEMO_LOG_FORMAT` are only applied in Gradio entrypoints. FastAPI startup doesn't call `setup_logging()`.
118
+
119
+ **Fix:** Call `setup_logging(settings.log_level, format_style=settings.log_format)` in FastAPI startup.
120
+
121
+ ---
122
+
123
+ ## P3 - Low Priority (Documentation/Cleanup)
124
+
125
+ ### P3-001: DEEPISLES_PATH env var set but unused
126
+
127
+ **Location:** `Dockerfile:64`
128
+
129
+ **Issue:** `ENV DEEPISLES_PATH=/app` is set but direct.py doesn't read it.
130
+
131
+ **Fix:** Either wire it in (ties to P2-003) or remove from Dockerfile.
132
+
133
+ ---
134
+
135
+ ### P3-002: Package description still mentions Gradio
136
+
137
+ **Location:** `src/stroke_deepisles_demo/__init__.py:1`
138
+
139
+ **Issue:** Says "Gradio visualization" but primary UI is now React SPA.
140
+
141
+ **Fix:** Update description.
142
+
143
+ ---
144
+
145
+ ### P3-003: gradio_niivueviewer in extras but commented out in code
146
+
147
+ **Location:** `pyproject.toml:46-50` + `src/stroke_deepisles_demo/ui/components.py:7-9`
148
+
149
+ **Issue:** Extra maintenance surface for unused component.
150
+
151
+ **Fix:** Remove from extras if truly deprecated, or document scope.
152
+
153
+ ---
154
+
155
+ ### P3-004: Docs reference non-existent spec files
156
+
157
+ **Location:** Multiple (`deepisles.py:9`, `direct.py:15`, `viewer.py:8-9`)
158
+
159
+ **Issue:** References to `docs/specs/07-hf-spaces-deployment.md` and `docs/specs/19-perf-base64-to-file-urls.md` which don't exist (may have been archived).
160
+
161
+ **Fix:** Update doc references or point to archive location.
162
+
163
+ ---
164
+
165
+ ## P4 - Nitpicks (Optional Cleanup)
166
+
167
+ ### P4-001: app.py doesn't pass gradio_show_error
168
+
169
+ **Location:** `app.py:32-38`
170
+
171
+ **Issue:** Root app.py doesn't pass `show_error=settings.gradio_show_error` like ui/app.py does.
172
+
173
+ **Fix:** Pass consistently.
174
+
175
+ ---
176
+
177
+ ### P4-002: "json" log format is not valid JSON
178
+
179
+ **Location:** `src/stroke_deepisles_demo/core/logging.py:30`
180
+
181
+ **Issue:** JSON format doesn't escape message content, breaks if messages contain quotes/newlines.
182
+
183
+ **Fix:** Use proper JSON formatter (e.g., `python-json-logger`).
184
+
185
+ ---
186
+
187
+ ### P4-003: create_staging_directory is unused
188
+
189
+ **Location:** `src/stroke_deepisles_demo/data/staging.py:93`
190
+
191
+ **Issue:** Dead code surface.
192
+
193
+ **Fix:** Remove or integrate via temp_dir setting.
194
+
195
+ ---
196
+
197
+ ### P4-004: DatasetInfo class is unused
198
+
199
+ **Location:** `src/stroke_deepisles_demo/data/loader.py:45`
200
+
201
+ **Issue:** Defined but never instantiated.
202
+
203
+ **Fix:** Remove or wire into API responses.
204
+
205
+ ---
206
+
207
+ ### P4-005: HEAD method comment is incorrect
208
+
209
+ **Location:** `src/stroke_deepisles_demo/api/main.py:108-109`
210
+
211
+ **Issue:** Comment says "HEAD for preflight checks" but CORS preflight uses OPTIONS.
212
+
213
+ **Fix:** Fix comment.
214
+
215
+ ---
216
+
217
+ ## Already Fixed (PR #44) - Confirmed
218
+
219
+ These issues from the audit were **already addressed**:
220
+
221
+ | Issue | Status | Evidence |
222
+ |-------|--------|----------|
223
+ | Docker build missing `--extra api` | FIXED | Dockerfile:44 |
224
+ | hf_dataset_id not wired | FIXED | loader.py:217 |
225
+ | hf_token not wired | FIXED | loader.py:218-224 |
226
+ | deepisles_docker_image not wired | FIXED | deepisles.py:153-155 |
227
+ | deepisles_timeout_seconds not wired | FIXED | pipeline.py:93-95 |
228
+ | deepisles_use_gpu not wired | FIXED | pipeline.py:91-93 |
229
+ | TOCTOU race in concurrency | FIXED | routes.py:107-110 (atomic create_job_if_under_limit) |
230
+ | File extension allowlist | FIXED | files.py:28 |
231
+ | Path traversal in subject_id | FIXED | loader.py:111-115 |
232
+ | Stale StaticFiles comment | FIXED | Dockerfile:70 |
233
+
234
+ ---
235
+
236
+ ## Intentional / Non-Issues
237
+
238
+ | Claim | Verdict | Reason |
239
+ |-------|---------|--------|
240
+ | No auth/rate limiting | NON-ISSUE | Intentional for public demo |
241
+ | Dataset reload per request | ACCEPTABLE | Demo scale (149 cases), adds negligible latency |
242
+ | Docker-mode file URL mismatch | NON-ISSUE | `find_prediction_mask()` returns actual path, pipeline copies to `results_dir/{case_id}/` correctly |
243
+
244
+ ---
245
+
246
+ ## Fix Order
247
+
248
+ 1. **P1-001**: max_concurrent_jobs default (safety)
249
+ 2. **P1-002**: Timeout propagation to direct invocation
250
+ 3. **P2-004**: Reorder concurrency check before validation
251
+ 4. **P2-001, P2-002, P2-003**: Wire in remaining dead settings
252
+ 5. **P2-005, P2-006**: Logging fixes
253
+ 6. **P3-***: Documentation cleanup
254
+ 7. **P4-***: Nitpicks (time permitting)
255
+
256
+ ---
257
+
258
+ **Auditor:** Claude Code
259
+ **Date:** 2024-12-13
260
+ **Target Branch:** `fix/remaining-audit-issues`
docs/guides/configuration.md CHANGED
@@ -1,25 +1,58 @@
1
  # Configuration
2
 
3
- All settings can be configured via environment variables.
4
 
5
  ## Environment Variables
6
 
 
 
7
  | Variable | Default | Description |
8
  |----------|---------|-------------|
9
  | `STROKE_DEMO_LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
10
  | `STROKE_DEMO_LOG_FORMAT` | `simple` | Log format (simple, detailed, json) |
11
- | `STROKE_DEMO_HF_DATASET_ID` | `YongchengYAO/ISLES24-MR-Lite` | HuggingFace dataset ID |
12
- | `STROKE_DEMO_HF_CACHE_DIR` | `None` | Custom HF cache directory |
13
- | `STROKE_DEMO_HF_TOKEN` | `None` | HuggingFace API token (for private datasets) |
 
 
 
 
 
 
 
 
 
 
 
14
  | `STROKE_DEMO_DEEPISLES_DOCKER_IMAGE` | `isleschallenge/deepisles` | DeepISLES Docker image |
15
- | `STROKE_DEMO_DEEPISLES_FAST_MODE` | `true` | Use single-model mode |
16
- | `STROKE_DEMO_DEEPISLES_TIMEOUT_SECONDS` | `1800` | Inference timeout |
17
- | `STROKE_DEMO_DEEPISLES_USE_GPU` | `true` | Use GPU acceleration |
18
- | `STROKE_DEMO_TEMP_DIR` | `None` | Scratch directory for intermediate files |
19
- | `STROKE_DEMO_RESULTS_DIR` | `./results` | Directory for output files |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  | `STROKE_DEMO_GRADIO_SERVER_NAME` | `0.0.0.0` | Gradio server host |
21
  | `STROKE_DEMO_GRADIO_SERVER_PORT` | `7860` | Gradio server port |
22
  | `STROKE_DEMO_GRADIO_SHARE` | `false` | Create public Gradio link |
 
23
 
24
  ## Using .env File
25
 
@@ -28,7 +61,7 @@ Create a `.env` file in the project root:
28
  ```bash
29
  STROKE_DEMO_LOG_LEVEL=DEBUG
30
  STROKE_DEMO_DEEPISLES_USE_GPU=false
31
- STROKE_DEMO_RESULTS_DIR=/data/results
32
  ```
33
 
34
  ## Programmatic Configuration
 
1
  # Configuration
2
 
3
+ All settings can be configured via environment variables with the `STROKE_DEMO_` prefix.
4
 
5
  ## Environment Variables
6
 
7
+ ### Logging
8
+
9
  | Variable | Default | Description |
10
  |----------|---------|-------------|
11
  | `STROKE_DEMO_LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
12
  | `STROKE_DEMO_LOG_FORMAT` | `simple` | Log format (simple, detailed, json) |
13
+
14
+ ### HuggingFace
15
+
16
+ | Variable | Default | Description |
17
+ |----------|---------|-------------|
18
+ | `STROKE_DEMO_HF_DATASET_ID` | `hugging-science/isles24-stroke` | HuggingFace dataset ID |
19
+ | `STROKE_DEMO_HF_TOKEN` | `None` | HuggingFace API token (for private/gated datasets) |
20
+
21
+ > **Note:** To control HF cache location, use the native `HF_HOME` env var (already set in Dockerfile).
22
+
23
+ ### DeepISLES Inference
24
+
25
+ | Variable | Default | Description |
26
+ |----------|---------|-------------|
27
  | `STROKE_DEMO_DEEPISLES_DOCKER_IMAGE` | `isleschallenge/deepisles` | DeepISLES Docker image |
28
+ | `STROKE_DEMO_DEEPISLES_FAST_MODE` | `true` | Use SEALS-only mode (faster, no FLAIR needed) |
29
+ | `STROKE_DEMO_DEEPISLES_TIMEOUT_SECONDS` | `1800` | Inference timeout (30 minutes) |
30
+ | `STROKE_DEMO_DEEPISLES_USE_GPU` | `true` | Use GPU acceleration (Docker mode only) |
31
+
32
+ ### Paths
33
+
34
+ | Variable | Default | Description |
35
+ |----------|---------|-------------|
36
+ | `STROKE_DEMO_RESULTS_DIR` | `/tmp/stroke-results` | Directory for job result files |
37
+
38
+ > **Note:** To control temp file location, use the native `TMPDIR` env var (Python's tempfile module respects it).
39
+
40
+ ### API Settings
41
+
42
+ | Variable | Default | Description |
43
+ |----------|---------|-------------|
44
+ | `STROKE_DEMO_MAX_CONCURRENT_JOBS` | `1` | Max concurrent inference jobs (increase for multi-GPU) |
45
+ | `STROKE_DEMO_FRONTEND_ORIGINS` | `["http://localhost:5173", "http://localhost:3000"]` | CORS allowed origins |
46
+ | `STROKE_DEMO_BACKEND_PUBLIC_URL` | `None` | Public URL for file links (auto-detected if not set) |
47
+
48
+ ### Gradio UI (Legacy)
49
+
50
+ | Variable | Default | Description |
51
+ |----------|---------|-------------|
52
  | `STROKE_DEMO_GRADIO_SERVER_NAME` | `0.0.0.0` | Gradio server host |
53
  | `STROKE_DEMO_GRADIO_SERVER_PORT` | `7860` | Gradio server port |
54
  | `STROKE_DEMO_GRADIO_SHARE` | `false` | Create public Gradio link |
55
+ | `STROKE_DEMO_GRADIO_SHOW_ERROR` | `false` | Show full tracebacks (security: keep false in prod) |
56
 
57
  ## Using .env File
58
 
 
61
  ```bash
62
  STROKE_DEMO_LOG_LEVEL=DEBUG
63
  STROKE_DEMO_DEEPISLES_USE_GPU=false
64
+ STROKE_DEMO_MAX_CONCURRENT_JOBS=2
65
  ```
66
 
67
  ## Programmatic Configuration
docs/guides/deployment.md CHANGED
@@ -1,39 +1,125 @@
1
  # Deployment
2
 
3
- The demo is designed to be deployed on Hugging Face Spaces.
 
 
4
 
5
- ## Hugging Face Spaces
6
 
7
- 1. **Create a Space**: Go to [huggingface.co/spaces](https://huggingface.co/spaces) and create a new Space.
8
- * **SDK**: Docker (Recommended for custom dependencies) or Gradio
9
- * **Hardware**: GPU is recommended for DeepISLES inference.
 
 
 
 
10
 
11
- 2. **Configure Dockerfile (if using Docker SDK)**:
12
- Ensure the Dockerfile installs Python 3.11, uv, and pulls the DeepISLES image (or handles it appropriately, though Spaces might restrict running Docker-in-Docker).
13
 
14
- *Note*: Since DeepISLES runs as a Docker container, running it inside a HF Space (which is a container) requires Docker-in-Docker (DinD) or a compatible runtime. If DinD is not supported, you might need to adapt the inference to run directly in the python environment if possible (DeepISLES source code integration instead of Docker wrapper), but this project wraps the Docker image.
 
 
15
 
16
- **Standard Deployment (Gradio SDK)**:
17
- The project includes `app.py` at the root for standard Gradio deployment. However, checking `requirements.txt` or `pyproject.toml` is needed.
18
 
19
- For standard Gradio Spaces, you need to ensure `docker` command is available if you stick to the current architecture. Most HF Spaces do not support running `docker run`.
 
 
 
20
 
21
- **Alternative**: Use a VM (AWS/GCP/Azure) with Docker installed.
 
 
 
 
22
 
23
- ## Local Deployment
 
 
24
 
25
- 1. **Build/Pull**:
26
- ```bash
27
- docker pull isleschallenge/deepisles
28
- ```
29
 
30
- 2. **Run App**:
31
- ```bash
32
- uv run python -m stroke_deepisles_demo.ui.app
33
- ```
 
34
 
35
- ## Environment Variables
36
 
37
- Configure the deployment using environment variables (Secrets in HF Spaces):
38
- - `STROKE_DEMO_HF_TOKEN`: Read-only token for accessing datasets if private.
39
- - `STROKE_DEMO_DEEPISLES_USE_GPU`: Set to `false` if deploying on CPU-only instance.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Deployment
2
 
3
+ The demo consists of two components deployed to Hugging Face Spaces:
4
+ 1. **Backend (FastAPI)**: Docker SDK Space running DeepISLES inference
5
+ 2. **Frontend (React SPA)**: Static SDK Space hosting the viewer UI
6
 
7
+ ## Architecture
8
 
9
+ ```
10
+ ┌──────────────────────────┐ ┌──────────────────────────┐
11
+ │ Frontend HF Space │ │ Backend HF Space │
12
+ │ (Static SDK) │────▶│ (Docker SDK + GPU) │
13
+ │ React + NiiVue │ │ FastAPI + DeepISLES │
14
+ └──────────────────────────┘ └──────────────────────────┘
15
+ ```
16
 
17
+ ## Backend: HuggingFace Spaces (Docker SDK)
 
18
 
19
+ ### Prerequisites
20
+ - HuggingFace account
21
+ - Space with GPU allocation (T4-small minimum for inference)
22
 
23
+ ### Steps
 
24
 
25
+ 1. **Create a Docker SDK Space**:
26
+ - Go to [huggingface.co/spaces](https://huggingface.co/spaces)
27
+ - SDK: **Docker** (required for custom dependencies)
28
+ - Hardware: **T4-small** or better (DeepISLES requires GPU)
29
 
30
+ 2. **Push your code**:
31
+ ```bash
32
+ git remote add hf https://huggingface.co/spaces/YOUR_ORG/YOUR_SPACE
33
+ git push hf main
34
+ ```
35
 
36
+ 3. **Configure Secrets** (Settings → Secrets):
37
+ - `HF_TOKEN`: Read-only token for gated datasets (optional)
38
+ - `STROKE_DEMO_FRONTEND_ORIGINS`: JSON array of allowed frontend origins
39
 
40
+ ### How It Works
 
 
 
41
 
42
+ The Dockerfile:
43
+ - Uses `isleschallenge/deepisles` as base (includes nnU-Net, SEALS, weights)
44
+ - Installs demo package in `/home/user/demo` (avoids overwriting DeepISLES at `/app`)
45
+ - Runs FastAPI on port 7860 (HF Spaces default)
46
+ - Uses **direct invocation** (subprocess to conda env) instead of Docker-in-Docker
47
 
48
+ ### Environment Variables
49
 
50
+ | Variable | Default | Description |
51
+ |----------|---------|-------------|
52
+ | `HF_SPACES` | `1` | Set by Dockerfile; triggers direct invocation mode |
53
+ | `DEEPISLES_DIRECT_INVOCATION` | `1` | Set by Dockerfile; forces subprocess mode |
54
+ | `STROKE_DEMO_FRONTEND_ORIGINS` | `[]` | JSON array of CORS-allowed origins |
55
+ | `HF_TOKEN` | (none) | For gated datasets |
56
+
57
+ Note: HuggingFace sets `SPACE_ID` automatically, but our detection uses `HF_SPACES` which we set explicitly in the Dockerfile for clarity.
58
+
59
+ ## Frontend: HuggingFace Spaces (Static SDK)
60
+
61
+ ### Steps
62
+
63
+ 1. **Create a Static SDK Space**:
64
+ - SDK: **Static**
65
+ - No hardware needed (static files only)
66
+
67
+ 2. **Build and deploy**:
68
+ ```bash
69
+ cd frontend
70
+ npm install
71
+ VITE_API_URL=https://your-backend.hf.space npm run build
72
+ # Copy dist/* to your Static Space
73
+ ```
74
+
75
+ 3. **Configure API URL**:
76
+ Set `VITE_API_URL` at build time to point to your backend Space.
77
+
78
+ ## Local Development
79
+
80
+ ### Backend Only
81
+ ```bash
82
+ docker pull isleschallenge/deepisles
83
+ uv sync --extra api
84
+ uv run uvicorn stroke_deepisles_demo.api.main:app --reload --port 7860
85
+ ```
86
+
87
+ ### Frontend Only
88
+ ```bash
89
+ cd frontend
90
+ npm install
91
+ VITE_API_URL=http://localhost:7860 npm run dev
92
+ ```
93
+
94
+ ### Full Stack
95
+ ```bash
96
+ # Terminal 1: Backend
97
+ uv run uvicorn stroke_deepisles_demo.api.main:app --reload --port 7860
98
+
99
+ # Terminal 2: Frontend
100
+ cd frontend && npm run dev
101
+ ```
102
+
103
+ ## Legacy: Gradio UI
104
+
105
+ The project includes a legacy Gradio interface at `app.py`:
106
+ ```bash
107
+ uv run python -m stroke_deepisles_demo.ui.app
108
+ ```
109
+
110
+ This is provided for backwards compatibility but is not the primary deployment target.
111
+ The Gradio UI connects directly to DeepISLES without the job queue.
112
+
113
+ ## Troubleshooting
114
+
115
+ ### "GPU not available" warning
116
+ - Ensure your Space has GPU hardware allocated (T4-small minimum)
117
+ - Check Space settings → Hardware
118
+
119
+ ### CORS errors in browser
120
+ - Set `STROKE_DEMO_FRONTEND_ORIGINS` to include your frontend URL
121
+ - Format: `'["https://your-frontend.hf.space"]'`
122
+
123
+ ### Inference timeouts
124
+ - Default timeout is 30 minutes (`STROKE_DEMO_DEEPISLES_TIMEOUT_SECONDS`)
125
+ - T4-small handles most cases; larger volumes may need more GPU memory
docs/guides/quickstart.md CHANGED
@@ -6,6 +6,7 @@ Get started with stroke-deepisles-demo in 5 minutes.
6
 
7
  - Python 3.11+
8
  - Docker (for DeepISLES inference)
 
9
  - ~10GB disk space (for Docker image and datasets)
10
 
11
  ## Installation
@@ -15,8 +16,8 @@ Get started with stroke-deepisles-demo in 5 minutes.
15
  git clone https://github.com/The-Obstacle-Is-The-Way/stroke-deepisles-demo.git
16
  cd stroke-deepisles-demo
17
 
18
- # Install
19
- uv sync
20
  ```
21
 
22
  ## Pull DeepISLES Docker Image
@@ -27,11 +28,19 @@ docker pull isleschallenge/deepisles
27
 
28
  ## Run Locally
29
 
30
- ### Option 1: Gradio UI
 
 
31
 
32
  ```bash
33
- uv run python -m stroke_deepisles_demo.ui.app
34
- # Open http://localhost:7860
 
 
 
 
 
 
35
  ```
36
 
37
  ### Option 2: CLI
@@ -54,6 +63,20 @@ print(f"Dice score: {result.dice_score:.3f}")
54
  print(f"Prediction: {result.prediction_mask}")
55
  ```
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ## Configuration
58
 
59
  Set environment variables or create a `.env` file:
 
6
 
7
  - Python 3.11+
8
  - Docker (for DeepISLES inference)
9
+ - Node.js 18+ (for frontend development)
10
  - ~10GB disk space (for Docker image and datasets)
11
 
12
  ## Installation
 
16
  git clone https://github.com/The-Obstacle-Is-The-Way/stroke-deepisles-demo.git
17
  cd stroke-deepisles-demo
18
 
19
+ # Install Python dependencies (includes FastAPI for backend)
20
+ uv sync --extra api
21
  ```
22
 
23
  ## Pull DeepISLES Docker Image
 
28
 
29
  ## Run Locally
30
 
31
+ ### Option 1: React SPA + FastAPI (Recommended)
32
+
33
+ Full-featured web interface with 3D NIfTI visualization via NiiVue.
34
 
35
  ```bash
36
+ # Terminal 1: Start FastAPI backend
37
+ uv run uvicorn stroke_deepisles_demo.api.main:app --reload --port 7860
38
+
39
+ # Terminal 2: Start React frontend
40
+ cd frontend
41
+ npm install
42
+ npm run dev
43
+ # Open http://localhost:5173
44
  ```
45
 
46
  ### Option 2: CLI
 
63
  print(f"Prediction: {result.prediction_mask}")
64
  ```
65
 
66
+ ### Option 4: Legacy Gradio UI
67
+
68
+ The original Gradio interface is still available for backwards compatibility:
69
+
70
+ ```bash
71
+ # Install Gradio extra (not included by default)
72
+ uv sync --extra gradio
73
+
74
+ uv run python -m stroke_deepisles_demo.ui.app
75
+ # Open http://localhost:7860
76
+ ```
77
+
78
+ Note: The Gradio UI does not support the async job queue or progress tracking.
79
+
80
  ## Configuration
81
 
82
  Set environment variables or create a `.env` file:
frontend/src/components/__tests__/NiiVueViewer.test.tsx CHANGED
@@ -23,6 +23,7 @@ vi.mock("@niivue/niivue", () => ({
23
  }));
24
 
25
  describe("NiiVueViewer", () => {
 
26
  const defaultProps = {
27
  backgroundUrl: "http://localhost:7860/files/dwi.nii.gz",
28
  };
 
23
  }));
24
 
25
  describe("NiiVueViewer", () => {
26
+ // Note: URLs are simplified for component testing (actual API uses /files/{jobId}/{caseId}/)
27
  const defaultProps = {
28
  backgroundUrl: "http://localhost:7860/files/dwi.nii.gz",
29
  };
frontend/src/test/fixtures.ts CHANGED
@@ -11,8 +11,9 @@ export const mockCasesResponse: CasesResponse = {
11
  };
12
 
13
  export const mockSegmentationResult: SegmentationResult = {
14
- dwiUrl: "http://localhost:7860/files/dwi.nii.gz",
15
- predictionUrl: "http://localhost:7860/files/prediction.nii.gz",
 
16
  metrics: {
17
  caseId: "sub-stroke0001",
18
  diceScore: 0.847,
 
11
  };
12
 
13
  export const mockSegmentationResult: SegmentationResult = {
14
+ // URLs match actual API contract: /files/{jobId}/{caseId}/{filename}
15
+ dwiUrl: "http://localhost:7860/files/test-job-123/sub-stroke0001/dwi.nii.gz",
16
+ predictionUrl: "http://localhost:7860/files/test-job-123/sub-stroke0001/prediction.nii.gz",
17
  metrics: {
18
  caseId: "sub-stroke0001",
19
  diceScore: 0.847,
src/stroke_deepisles_demo/__init__.py CHANGED
@@ -1,4 +1,4 @@
1
- """stroke-deepisles-demo: HF datasets + DeepISLES + Gradio visualization."""
2
 
3
  __version__ = "0.1.0"
4
 
 
1
+ """stroke-deepisles-demo: DeepISLES stroke segmentation with React SPA + FastAPI backend."""
2
 
3
  __version__ = "0.1.0"
4
 
src/stroke_deepisles_demo/api/main.py CHANGED
@@ -25,7 +25,7 @@ from stroke_deepisles_demo.api.files import files_router
25
  from stroke_deepisles_demo.api.job_store import init_job_store
26
  from stroke_deepisles_demo.api.routes import router
27
  from stroke_deepisles_demo.core.config import get_settings
28
- from stroke_deepisles_demo.core.logging import get_logger
29
 
30
  logger = get_logger(__name__)
31
 
@@ -42,12 +42,16 @@ async def lifespan(_app: FastAPI) -> AsyncIterator[None]:
42
  - Stop cleanup scheduler
43
  """
44
  # Startup
45
- logger.info("Starting stroke segmentation API...")
46
  settings = get_settings()
47
 
 
 
 
 
 
48
  # Check for GPU availability (DeepISLES requires GPU)
49
  try:
50
- import torch # type: ignore[import-not-found]
51
 
52
  if not torch.cuda.is_available():
53
  logger.warning(
@@ -105,7 +109,7 @@ app.add_middleware(
105
  CORSMiddleware,
106
  allow_origins=get_settings().frontend_origins,
107
  allow_credentials=False, # Not needed - no cookies/auth
108
- allow_methods=["GET", "POST", "HEAD"], # HEAD for preflight checks
109
  allow_headers=["Content-Type", "Range"], # Range needed for partial content requests
110
  expose_headers=["Content-Range", "Content-Length", "Accept-Ranges"], # NiiVue needs these
111
  )
 
25
  from stroke_deepisles_demo.api.job_store import init_job_store
26
  from stroke_deepisles_demo.api.routes import router
27
  from stroke_deepisles_demo.core.config import get_settings
28
+ from stroke_deepisles_demo.core.logging import get_logger, setup_logging
29
 
30
  logger = get_logger(__name__)
31
 
 
42
  - Stop cleanup scheduler
43
  """
44
  # Startup
 
45
  settings = get_settings()
46
 
47
+ # Apply log settings from environment (STROKE_DEMO_LOG_LEVEL, STROKE_DEMO_LOG_FORMAT)
48
+ setup_logging(settings.log_level, format_style=settings.log_format)
49
+
50
+ logger.info("Starting stroke segmentation API...")
51
+
52
  # Check for GPU availability (DeepISLES requires GPU)
53
  try:
54
+ import torch # type: ignore[import-not-found,unused-ignore]
55
 
56
  if not torch.cuda.is_available():
57
  logger.warning(
 
109
  CORSMiddleware,
110
  allow_origins=get_settings().frontend_origins,
111
  allow_credentials=False, # Not needed - no cookies/auth
112
+ allow_methods=["GET", "POST", "OPTIONS"], # OPTIONS for CORS preflight
113
  allow_headers=["Content-Type", "Range"], # Range needed for partial content requests
114
  expose_headers=["Content-Range", "Content-Length", "Accept-Ranges"], # NiiVue needs these
115
  )
src/stroke_deepisles_demo/api/routes.py CHANGED
@@ -92,7 +92,15 @@ def create_segment_job(
92
  store = get_job_store()
93
  settings = get_settings()
94
 
95
- # Validate case_id exists before creating job
 
 
 
 
 
 
 
 
96
  valid_cases = list_case_ids()
97
  if body.case_id not in valid_cases:
98
  raise HTTPException(
@@ -105,6 +113,7 @@ def create_segment_job(
105
  backend_url = get_backend_base_url(request)
106
 
107
  # Atomic concurrency limit + job creation (prevents TOCTOU race)
 
108
  job = store.create_job_if_under_limit(
109
  job_id, body.case_id, body.fast_mode, settings.max_concurrent_jobs
110
  )
 
92
  store = get_job_store()
93
  settings = get_settings()
94
 
95
+ # Pre-check concurrency limit before expensive validation
96
+ # This is a cheap check; the actual limit is enforced atomically below
97
+ if store.get_active_job_count() >= settings.max_concurrent_jobs:
98
+ raise HTTPException(
99
+ status_code=503,
100
+ detail="Server busy: too many active jobs. Please try again later.",
101
+ )
102
+
103
+ # Validate case_id exists (only after passing concurrency pre-check)
104
  valid_cases = list_case_ids()
105
  if body.case_id not in valid_cases:
106
  raise HTTPException(
 
113
  backend_url = get_backend_base_url(request)
114
 
115
  # Atomic concurrency limit + job creation (prevents TOCTOU race)
116
+ # The pre-check above is just an optimization; this is the authoritative check
117
  job = store.create_job_if_under_limit(
118
  job_id, body.case_id, body.fast_mode, settings.max_concurrent_jobs
119
  )
src/stroke_deepisles_demo/cli.py CHANGED
@@ -20,7 +20,7 @@ def main(argv: list[str] | None = None) -> int:
20
 
21
  # List command
22
  list_parser = subparsers.add_parser("list", help="List available cases")
23
- list_parser.add_argument("--dataset", default=None, help="HF dataset ID (not used yet)")
24
 
25
  # Run command
26
  run_parser = subparsers.add_parser("run", help="Run segmentation")
@@ -44,10 +44,10 @@ def main(argv: list[str] | None = None) -> int:
44
  return 0
45
 
46
 
47
- def cmd_list(args: argparse.Namespace) -> int: # noqa: ARG001
48
  """Handle 'list' command."""
49
  try:
50
- case_ids = list_case_ids()
51
  print(f"Found {len(case_ids)} cases:")
52
  for i, cid in enumerate(case_ids):
53
  print(f"[{i}] {cid}")
 
20
 
21
  # List command
22
  list_parser = subparsers.add_parser("list", help="List available cases")
23
+ list_parser.add_argument("--dataset", default=None, help="HF dataset ID or local path")
24
 
25
  # Run command
26
  run_parser = subparsers.add_parser("run", help="Run segmentation")
 
44
  return 0
45
 
46
 
47
+ def cmd_list(args: argparse.Namespace) -> int:
48
  """Handle 'list' command."""
49
  try:
50
+ case_ids = list_case_ids(source=args.dataset)
51
  print(f"Found {len(case_ids)} cases:")
52
  for i, cid in enumerate(case_ids):
53
  print(f"[{i}] {cid}")
src/stroke_deepisles_demo/core/config.py CHANGED
@@ -76,8 +76,8 @@ class Settings(BaseSettings):
76
  log_format: Literal["simple", "detailed", "json"] = "simple"
77
 
78
  # HuggingFace
 
79
  hf_dataset_id: str = "hugging-science/isles24-stroke"
80
- hf_cache_dir: Path | None = None
81
  hf_token: str | None = Field(default=None, repr=False) # Hidden from logs
82
 
83
  # DeepISLES
@@ -85,17 +85,16 @@ class Settings(BaseSettings):
85
  deepisles_fast_mode: bool = True # SEALS-only (ISLES'22 winner, no FLAIR needed)
86
  deepisles_timeout_seconds: int = 1800 # 30 minutes
87
  deepisles_use_gpu: bool = True
88
- # Path to DeepISLES repo (for direct invocation mode)
89
- deepisles_repo_path: Path | None = None
90
 
91
  # Paths
92
- temp_dir: Path | None = None
93
  # Results directory - MUST be /tmp for HF Spaces (only /tmp is writable)
94
  results_dir: Path = Path("/tmp/stroke-results")
95
 
96
  # API Settings
97
- # Concurrency control
98
- max_concurrent_jobs: int = 10
 
99
 
100
  # CORS - frontend origins allowed to call this API
101
  frontend_origins: list[str] = Field(default=["http://localhost:5173", "http://localhost:3000"])
 
76
  log_format: Literal["simple", "detailed", "json"] = "simple"
77
 
78
  # HuggingFace
79
+ # Note: To control HF cache location, use HF_HOME env var (set in Dockerfile)
80
  hf_dataset_id: str = "hugging-science/isles24-stroke"
 
81
  hf_token: str | None = Field(default=None, repr=False) # Hidden from logs
82
 
83
  # DeepISLES
 
85
  deepisles_fast_mode: bool = True # SEALS-only (ISLES'22 winner, no FLAIR needed)
86
  deepisles_timeout_seconds: int = 1800 # 30 minutes
87
  deepisles_use_gpu: bool = True
 
 
88
 
89
  # Paths
90
+ # Note: To control temp location, use TMPDIR env var (Python tempfile respects it)
91
  # Results directory - MUST be /tmp for HF Spaces (only /tmp is writable)
92
  results_dir: Path = Path("/tmp/stroke-results")
93
 
94
  # API Settings
95
+ # Concurrency control - default to 1 for single-GPU safety (T4 has 16GB VRAM)
96
+ # Increase via STROKE_DEMO_MAX_CONCURRENT_JOBS if you have multiple GPUs
97
+ max_concurrent_jobs: int = 1
98
 
99
  # CORS - frontend origins allowed to call this API
100
  frontend_origins: list[str] = Field(default=["http://localhost:5173", "http://localhost:3000"])
src/stroke_deepisles_demo/data/__init__.py CHANGED
@@ -31,11 +31,14 @@ def get_case(case_id: str | int) -> CaseFiles:
31
  return dataset.get_case(case_id)
32
 
33
 
34
- def list_case_ids() -> list[str]:
35
  """List all available case IDs.
36
 
 
 
 
37
  Uses context manager to ensure HuggingFace temp files are cleaned up.
38
  This prevents unbounded disk usage from accumulating temp NIfTI files.
39
  """
40
- with load_isles_dataset() as dataset:
41
  return dataset.list_case_ids()
 
31
  return dataset.get_case(case_id)
32
 
33
 
34
+ def list_case_ids(source: str | None = None) -> list[str]:
35
  """List all available case IDs.
36
 
37
+ Args:
38
+ source: HuggingFace dataset ID or local path. If None, uses default from settings.
39
+
40
  Uses context manager to ensure HuggingFace temp files are cleaned up.
41
  This prevents unbounded disk usage from accumulating temp NIfTI files.
42
  """
43
+ with load_isles_dataset(source=source) as dataset:
44
  return dataset.list_case_ids()
src/stroke_deepisles_demo/inference/deepisles.py CHANGED
@@ -12,6 +12,7 @@ See:
12
 
13
  from __future__ import annotations
14
 
 
15
  import time
16
  from dataclasses import dataclass
17
  from typing import TYPE_CHECKING
@@ -205,6 +206,24 @@ def _run_via_docker(
205
  # Find the prediction mask
206
  prediction_path = find_prediction_mask(output_dir)
207
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
208
  elapsed = time.time() - start_time
209
 
210
  return DeepISLESResult(
@@ -220,6 +239,7 @@ def _run_via_direct_invocation(
220
  *,
221
  flair_path: Path | None,
222
  fast: bool,
 
223
  ) -> DeepISLESResult:
224
  """
225
  Run DeepISLES via direct Python invocation.
@@ -234,9 +254,10 @@ def _run_via_direct_invocation(
234
  adc_path = input_dir / "adc.nii.gz"
235
 
236
  logger.info(
237
- "Running DeepISLES via direct invocation: input=%s, fast=%s",
238
  input_dir,
239
  fast,
 
240
  )
241
 
242
  result = run_deepisles_direct(
@@ -245,6 +266,7 @@ def _run_via_direct_invocation(
245
  output_dir=output_dir,
246
  flair_path=flair_path,
247
  fast=fast,
 
248
  )
249
 
250
  return DeepISLESResult(
@@ -275,7 +297,8 @@ def run_deepisles_on_folder(
275
  output_dir: Where to write results (default: input_dir/results)
276
  fast: If True, use single-model mode (faster, slightly less accurate)
277
  gpu: If True, use GPU acceleration (only affects Docker mode)
278
- timeout: Maximum seconds to wait for inference (only affects Docker mode)
 
279
 
280
  Returns:
281
  DeepISLESResult with path to prediction mask
@@ -312,6 +335,7 @@ def run_deepisles_on_folder(
312
  output_dir=output_dir,
313
  flair_path=flair_path,
314
  fast=fast,
 
315
  )
316
  else:
317
  logger.info("Using Docker-based DeepISLES invocation")
 
12
 
13
  from __future__ import annotations
14
 
15
+ import shutil
16
  import time
17
  from dataclasses import dataclass
18
  from typing import TYPE_CHECKING
 
206
  # Find the prediction mask
207
  prediction_path = find_prediction_mask(output_dir)
208
 
209
+ # BUG-FIX: Ensure prediction is at expected URL location
210
+ # DeepISLES may write to a results/ subdirectory, but the API URL contract
211
+ # expects files directly in output_dir. Copy to expected location if needed.
212
+ if prediction_path.parent != output_dir and prediction_path.exists():
213
+ expected_path = output_dir / prediction_path.name
214
+ logger.debug(
215
+ "Copying prediction from %s to %s (URL path fix)",
216
+ prediction_path,
217
+ expected_path,
218
+ )
219
+ try:
220
+ shutil.copy2(prediction_path, expected_path)
221
+ prediction_path = expected_path
222
+ except OSError as e:
223
+ raise DeepISLESError(
224
+ f"Failed to copy prediction from {prediction_path} to {expected_path}: {e}"
225
+ ) from e
226
+
227
  elapsed = time.time() - start_time
228
 
229
  return DeepISLESResult(
 
239
  *,
240
  flair_path: Path | None,
241
  fast: bool,
242
+ timeout: float,
243
  ) -> DeepISLESResult:
244
  """
245
  Run DeepISLES via direct Python invocation.
 
254
  adc_path = input_dir / "adc.nii.gz"
255
 
256
  logger.info(
257
+ "Running DeepISLES via direct invocation: input=%s, fast=%s, timeout=%s",
258
  input_dir,
259
  fast,
260
+ timeout,
261
  )
262
 
263
  result = run_deepisles_direct(
 
266
  output_dir=output_dir,
267
  flair_path=flair_path,
268
  fast=fast,
269
+ timeout=timeout,
270
  )
271
 
272
  return DeepISLESResult(
 
297
  output_dir: Where to write results (default: input_dir/results)
298
  fast: If True, use single-model mode (faster, slightly less accurate)
299
  gpu: If True, use GPU acceleration (only affects Docker mode)
300
+ timeout: Maximum seconds to wait for inference (default: 1800, i.e. 30 min).
301
+ Docker mode accepts None for no timeout; direct mode converts None to 1800.
302
 
303
  Returns:
304
  DeepISLESResult with path to prediction mask
 
335
  output_dir=output_dir,
336
  flair_path=flair_path,
337
  fast=fast,
338
+ timeout=timeout if timeout is not None else 1800,
339
  )
340
  else:
341
  logger.info("Using Docker-based DeepISLES invocation")
src/stroke_deepisles_demo/inference/direct.py CHANGED
@@ -3,7 +3,7 @@
3
  This module provides subprocess-based invocation of DeepISLES when running
4
  on HF Spaces. We use subprocess because:
5
  - DeepISLES runs in a conda env with Python 3.8
6
- - Our Gradio app requires Python 3.10+ for modern dependencies
7
  - The two environments are incompatible, so we bridge via subprocess
8
 
9
  Usage:
@@ -155,7 +155,8 @@ def run_deepisles_direct(
155
  # Create output directory
156
  output_dir.mkdir(parents=True, exist_ok=True)
157
 
158
- logger.info(
 
159
  "Running DeepISLES via subprocess: dwi=%s, adc=%s, flair=%s, fast=%s",
160
  dwi_path,
161
  adc_path,
@@ -186,7 +187,7 @@ def run_deepisles_direct(
186
  if fast:
187
  cmd.append("--fast")
188
 
189
- logger.info("Subprocess command: %s", " ".join(cmd))
190
 
191
  try:
192
  result = subprocess.run(
@@ -197,11 +198,13 @@ def run_deepisles_direct(
197
  cwd="/app", # Run from DeepISLES directory
198
  )
199
 
200
- # Log output
 
201
  if result.stdout:
202
- logger.info("DeepISLES stdout:\n%s", result.stdout)
203
  if result.stderr:
204
- logger.warning("DeepISLES stderr:\n%s", result.stderr)
 
205
 
206
  # Check for failure
207
  if result.returncode != 0:
 
3
  This module provides subprocess-based invocation of DeepISLES when running
4
  on HF Spaces. We use subprocess because:
5
  - DeepISLES runs in a conda env with Python 3.8
6
+ - Our FastAPI backend requires Python 3.11+ for modern dependencies
7
  - The two environments are incompatible, so we bridge via subprocess
8
 
9
  Usage:
 
155
  # Create output directory
156
  output_dir.mkdir(parents=True, exist_ok=True)
157
 
158
+ # Log paths at DEBUG to avoid exposing potentially sensitive path info at INFO
159
+ logger.debug(
160
  "Running DeepISLES via subprocess: dwi=%s, adc=%s, flair=%s, fast=%s",
161
  dwi_path,
162
  adc_path,
 
187
  if fast:
188
  cmd.append("--fast")
189
 
190
+ logger.debug("Subprocess command: %s", " ".join(cmd))
191
 
192
  try:
193
  result = subprocess.run(
 
198
  cwd="/app", # Run from DeepISLES directory
199
  )
200
 
201
+ # Log verbose output at DEBUG to avoid log explosion
202
+ # DeepISLES produces extensive stdout/stderr that would overwhelm INFO logs
203
  if result.stdout:
204
+ logger.debug("DeepISLES stdout:\n%s", result.stdout)
205
  if result.stderr:
206
+ # Log stderr at DEBUG unless it's a failure (handled below)
207
+ logger.debug("DeepISLES stderr:\n%s", result.stderr)
208
 
209
  # Check for failure
210
  if result.returncode != 0: