Spaces:
Running
Running
Commit
·
53c2d1b
0
Parent(s):
Recreate lite branch for HF Space (no data/ tracked)
Browse files- .gitignore +33 -0
- .streamlit/config.toml +3 -0
- README.md +38 -0
- app.py +889 -0
- eval/.gitkeep +0 -0
- requirements.txt +14 -0
- test_mosaic.csv +97 -0
.gitignore
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Python
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.pyc
|
| 4 |
+
|
| 5 |
+
# Streamlit local secrets (never commit)
|
| 6 |
+
.streamlit/secrets.toml
|
| 7 |
+
|
| 8 |
+
# Editor junk
|
| 9 |
+
.DS_Store
|
| 10 |
+
MOSAICapp.code-workspace
|
| 11 |
+
|
| 12 |
+
# Caches
|
| 13 |
+
.cache/
|
| 14 |
+
**/__pycache__/
|
| 15 |
+
|
| 16 |
+
# Big/binary artifacts (belt & suspenders)
|
| 17 |
+
*.npy
|
| 18 |
+
*.npz
|
| 19 |
+
*.pt
|
| 20 |
+
*.bin
|
| 21 |
+
*.gguf
|
| 22 |
+
|
| 23 |
+
# Ignore *everything* under data/, keep only .gitkeep files
|
| 24 |
+
data/
|
| 25 |
+
!data/.gitkeep
|
| 26 |
+
!data/**/.gitkeep
|
| 27 |
+
|
| 28 |
+
# (Optional) Do the same for eval/ if you don't want outputs tracked
|
| 29 |
+
eval/
|
| 30 |
+
!eval/.gitkeep
|
| 31 |
+
!eval/**/.gitkeep
|
| 32 |
+
data/*/preprocessed/cache/
|
| 33 |
+
data/
|
.streamlit/config.toml
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[server]
|
| 2 |
+
headless = true
|
| 3 |
+
maxUploadSize = 500 # MB; adjust if people upload large CSVs
|
README.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: MOSAICapp
|
| 3 |
+
colorFrom: indigo
|
| 4 |
+
colorTo: blue
|
| 5 |
+
sdk: docker
|
| 6 |
+
pinned: false
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# MOSAIC Topic Dashboard
|
| 10 |
+
|
| 11 |
+
A Streamlit app for BERTopic-based topic modelling with sentence-transformers embeddings.
|
| 12 |
+
**No data bundled** — upload CSV with one text column (any of: `reflection_answer_english`, `reflection_answer`, `text`, `report`).
|
| 13 |
+
|
| 14 |
+
## Lite Version (Free Hardware)
|
| 15 |
+
|
| 16 |
+
This Hugging Face Space runs the **`lite` version** of the app.
|
| 17 |
+
|
| 18 |
+
To make it run on free "CPU basic" hardware, the **LLM-based topic labeling feature has been disabled**. The app will use BERTopic's default keyword-based labels instead.
|
| 19 |
+
|
| 20 |
+
For the full, original version with LLM features (which requires paid GPU hardware), please see the `main` branch of the [original GitHub repository](https://github.com/romybeaute/MOSAICapp).
|
| 21 |
+
|
| 22 |
+
## Run Locally (Full Version)
|
| 23 |
+
|
| 24 |
+
To run the full version on your local machine:
|
| 25 |
+
|
| 26 |
+
```bash
|
| 27 |
+
# Clone the main branch
|
| 28 |
+
git clone [https://github.com/romybeaute/MOSAICapp.git](https://github.com/romybeaute/MOSAICapp.git)
|
| 29 |
+
cd MOSAICapp
|
| 30 |
+
|
| 31 |
+
# Install requirements
|
| 32 |
+
pip install -r requirements.txt
|
| 33 |
+
|
| 34 |
+
# Download NLTK data
|
| 35 |
+
python -c "import nltk; nltk.download('punkt')"
|
| 36 |
+
|
| 37 |
+
# Run the app
|
| 38 |
+
streamlit run app.py
|
app.py
ADDED
|
@@ -0,0 +1,889 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
File: app.py
|
| 3 |
+
Description: Streamlit app for advanced topic modeling on Innerspeech dataset
|
| 4 |
+
with BERTopic, UMAP, HDBSCAN. (LLM features disabled for lite deployment)
|
| 5 |
+
Last Modified: 06/11/2025
|
| 6 |
+
@author: r.beaut
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
# =====================================================================
|
| 10 |
+
# Imports
|
| 11 |
+
# =====================================================================
|
| 12 |
+
|
| 13 |
+
from pathlib import Path
|
| 14 |
+
import sys
|
| 15 |
+
# from llama_cpp import Llama # <-- REMOVED
|
| 16 |
+
import streamlit as st
|
| 17 |
+
import pandas as pd
|
| 18 |
+
import numpy as np
|
| 19 |
+
import re
|
| 20 |
+
import os
|
| 21 |
+
import nltk
|
| 22 |
+
import json
|
| 23 |
+
|
| 24 |
+
# from mosaic.path_utils import CFG, raw_path, proc_path, eval_path, project_root
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
try:
|
| 28 |
+
from mosaic.path_utils import CFG, raw_path, proc_path, eval_path, project_root # type: ignore
|
| 29 |
+
except Exception:
|
| 30 |
+
# Minimal stand-in so the app works anywhere (Streamlit Cloud, local without MOSAIC, etc.)
|
| 31 |
+
def _env(key: str, default: str) -> Path:
|
| 32 |
+
val = os.getenv(key, default)
|
| 33 |
+
return Path(val).expanduser().resolve()
|
| 34 |
+
|
| 35 |
+
# Defaults: app-local data/ eval/ that are safe on Cloud
|
| 36 |
+
_DATA_ROOT = _env("MOSAIC_DATA", str(Path(__file__).parent / "data"))
|
| 37 |
+
_BOX_ROOT = _env("MOSAIC_BOX", str(Path(__file__).parent / "data" / "raw"))
|
| 38 |
+
_EVAL_ROOT = _env("MOSAIC_EVAL", str(Path(__file__).parent / "eval"))
|
| 39 |
+
|
| 40 |
+
CFG = {
|
| 41 |
+
"data_root": str(_DATA_ROOT),
|
| 42 |
+
"box_root": str(_BOX_ROOT),
|
| 43 |
+
"eval_root": str(_EVAL_ROOT),
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
def project_root() -> Path:
|
| 47 |
+
return Path(__file__).resolve().parent
|
| 48 |
+
|
| 49 |
+
def raw_path(*parts: str) -> Path:
|
| 50 |
+
return _BOX_ROOT.joinpath(*parts)
|
| 51 |
+
|
| 52 |
+
def proc_path(*parts: str) -> Path:
|
| 53 |
+
return _DATA_ROOT.joinpath(*parts)
|
| 54 |
+
|
| 55 |
+
def eval_path(*parts: str) -> Path:
|
| 56 |
+
return _EVAL_ROOT.joinpath(*parts)
|
| 57 |
+
|
| 58 |
+
# BERTopic stack
|
| 59 |
+
from bertopic import BERTopic
|
| 60 |
+
# from bertopic.representation import LlamaCPP # <-- REMOVED
|
| 61 |
+
# from llama_cpp import Llama # <-- REMOVED
|
| 62 |
+
from sentence_transformers import SentenceTransformer
|
| 63 |
+
|
| 64 |
+
# Clustering/dimensionality reduction
|
| 65 |
+
from sklearn.feature_extraction.text import CountVectorizer
|
| 66 |
+
from umap import UMAP
|
| 67 |
+
from hdbscan import HDBSCAN
|
| 68 |
+
|
| 69 |
+
# Visualisation
|
| 70 |
+
import datamapplot
|
| 71 |
+
import matplotlib.pyplot as plt
|
| 72 |
+
from huggingface_hub import hf_hub_download
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
# =====================================================================
|
| 77 |
+
# 0. Constants & Helper Functions
|
| 78 |
+
# =====================================================================
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
def _slugify(s: str) -> str:
|
| 82 |
+
s = s.strip()
|
| 83 |
+
s = re.sub(r"[^A-Za-z0-9._-]+", "_", s)
|
| 84 |
+
return s or "DATASET"
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
ACCEPTABLE_TEXT_COLUMNS = [
|
| 88 |
+
"reflection_answer_english",
|
| 89 |
+
"reflection_answer",
|
| 90 |
+
"text",
|
| 91 |
+
"report",
|
| 92 |
+
]
|
| 93 |
+
|
| 94 |
+
def _pick_text_column(df: pd.DataFrame) -> str | None:
|
| 95 |
+
"""Return the first matching text column."""
|
| 96 |
+
for col in ACCEPTABLE_TEXT_COLUMNS:
|
| 97 |
+
if col in df.columns:
|
| 98 |
+
return col
|
| 99 |
+
return None
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
def _set_from_env_or_secrets(key: str):
|
| 103 |
+
"""Allow hosting: value can come from environment or from Streamlit secrets."""
|
| 104 |
+
if os.getenv(key):
|
| 105 |
+
return
|
| 106 |
+
try:
|
| 107 |
+
val = st.secrets.get(key, None)
|
| 108 |
+
except Exception:
|
| 109 |
+
val = None
|
| 110 |
+
if val:
|
| 111 |
+
os.environ[key] = str(val)
|
| 112 |
+
|
| 113 |
+
# Enable both MOSAIC_DATA and MOSAIC_BOX automatically
|
| 114 |
+
for _k in ("MOSAIC_DATA", "MOSAIC_BOX"):
|
| 115 |
+
_set_from_env_or_secrets(_k)
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
@st.cache_data
|
| 120 |
+
def count_clean_reports(csv_path: str) -> int:
|
| 121 |
+
df = pd.read_csv(csv_path)
|
| 122 |
+
col = _pick_text_column(df)
|
| 123 |
+
if col is None:
|
| 124 |
+
return 0
|
| 125 |
+
if col != "reflection_answer_english":
|
| 126 |
+
df = df.rename(columns={col: "reflection_answer_english"})
|
| 127 |
+
df.dropna(subset=["reflection_answer_english"], inplace=True)
|
| 128 |
+
df["reflection_answer_english"] = df["reflection_answer_english"].astype(str)
|
| 129 |
+
df = df[df["reflection_answer_english"].str.strip() != ""]
|
| 130 |
+
return len(df)
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
def ensure_sentence_tokenizer():
|
| 134 |
+
"""
|
| 135 |
+
Make sure NLTK sentence tokenizer data is available.
|
| 136 |
+
|
| 137 |
+
Newer NLTK (3.9+) uses 'punkt_tab' for sent_tokenize(),
|
| 138 |
+
older versions use 'punkt'
|
| 139 |
+
"""
|
| 140 |
+
for resource in ("punkt_tab", "punkt"):
|
| 141 |
+
try:
|
| 142 |
+
nltk.data.find(f"tokenizers/{resource}")
|
| 143 |
+
return
|
| 144 |
+
except LookupError:
|
| 145 |
+
# Try to download it
|
| 146 |
+
try:
|
| 147 |
+
nltk.download(resource)
|
| 148 |
+
return
|
| 149 |
+
except Exception as e:
|
| 150 |
+
print(f"Failed to download NLTK resource '{resource}': {e}")
|
| 151 |
+
|
| 152 |
+
# If we reach here, we didn't manage to get any tokenizer
|
| 153 |
+
raise LookupError("Could not load NLTK punkt or punkt_tab tokenizer data.")
|
| 154 |
+
|
| 155 |
+
|
| 156 |
+
|
| 157 |
+
# =====================================================================
|
| 158 |
+
# 1. Streamlit app setup
|
| 159 |
+
# =====================================================================
|
| 160 |
+
|
| 161 |
+
st.set_page_config(page_title="MOSAIC Dashboard", layout="wide")
|
| 162 |
+
st.title("Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling Dashboard for Phenomenological Reports")
|
| 163 |
+
|
| 164 |
+
# add if use, please cite the following paper:
|
| 165 |
+
st.markdown(
|
| 166 |
+
"""
|
| 167 |
+
_If you use this tool in your research, please cite the following paper:_\n
|
| 168 |
+
**Beauté, R., et al. (2025).**
|
| 169 |
+
**Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling and LLM applied to Stroboscopic Phenomenology**
|
| 170 |
+
https://arxiv.org/abs/2502.18318
|
| 171 |
+
"""
|
| 172 |
+
)
|
| 173 |
+
|
| 174 |
+
# ROOT = project_root()
|
| 175 |
+
# sys.path.append(str(ROOT / "MULTILINGUAL"))
|
| 176 |
+
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
# =====================================================================
|
| 180 |
+
# 2. Dataset paths (using MOSAIC structure)
|
| 181 |
+
# =====================================================================
|
| 182 |
+
|
| 183 |
+
# DATASET = "INNERSPEECH"
|
| 184 |
+
|
| 185 |
+
# --- Choose dataset/project name (drives folder names) ---
|
| 186 |
+
ds_input = st.sidebar.text_input("Project/Dataset name", value="MOSAIC", key="dataset_name_input")
|
| 187 |
+
DATASET_DIR = _slugify(ds_input).upper()
|
| 188 |
+
|
| 189 |
+
|
| 190 |
+
RAW_DIR = raw_path(DATASET_DIR)
|
| 191 |
+
PROC_DIR = proc_path(DATASET_DIR, "preprocessed")
|
| 192 |
+
EVAL_DIR = eval_path(DATASET_DIR)
|
| 193 |
+
CACHE_DIR = PROC_DIR / "cache"
|
| 194 |
+
|
| 195 |
+
PROC_DIR.mkdir(parents=True, exist_ok=True)
|
| 196 |
+
CACHE_DIR.mkdir(parents=True, exist_ok=True)
|
| 197 |
+
EVAL_DIR.mkdir(parents=True, exist_ok=True)
|
| 198 |
+
|
| 199 |
+
|
| 200 |
+
with st.sidebar.expander("About the dataset name", expanded=False):
|
| 201 |
+
st.markdown(
|
| 202 |
+
f"""
|
| 203 |
+
- The name above is converted to **UPPER CASE** and used as a folder name.
|
| 204 |
+
- If the folder doesn’t exist, it will be **created**:
|
| 205 |
+
- Preprocessed CSVs: `{PROC_DIR}`
|
| 206 |
+
- Exports (results): `{EVAL_DIR}`
|
| 207 |
+
- If you choose **Use preprocessed CSV on server**, I’ll list CSVs in `{PROC_DIR}`.
|
| 208 |
+
- If you **upload** a CSV, it will be saved to `{PROC_DIR}/uploaded.csv`.
|
| 209 |
+
""".strip()
|
| 210 |
+
)
|
| 211 |
+
|
| 212 |
+
# DATASETS = {
|
| 213 |
+
# "API Translation (Batched)": str(PROC_DIR / "innerspeech_translated_batched_API.csv"),
|
| 214 |
+
# "Local Translation (Llama)": str(PROC_DIR / "innerspeech_dataset_translated_llama.csv"),
|
| 215 |
+
# }
|
| 216 |
+
|
| 217 |
+
def _list_server_csvs(proc_dir: Path) -> list[str]:
|
| 218 |
+
return [str(p) for p in sorted(proc_dir.glob("*.csv"))]
|
| 219 |
+
|
| 220 |
+
# will be populated later (after CSV_PATH is known)
|
| 221 |
+
DATASETS = None # keep name for clarity; we’ll fill it when rendering the sidebar
|
| 222 |
+
|
| 223 |
+
|
| 224 |
+
HISTORY_FILE = str(PROC_DIR / "run_history.json")
|
| 225 |
+
|
| 226 |
+
|
| 227 |
+
|
| 228 |
+
# =====================================================================
|
| 229 |
+
# 3. Embedding & LLM loaders
|
| 230 |
+
# =====================================================================
|
| 231 |
+
|
| 232 |
+
@st.cache_resource
|
| 233 |
+
def load_embedding_model(model_name):
|
| 234 |
+
st.info(f"Loading embedding model '{model_name}'...")
|
| 235 |
+
return SentenceTransformer(model_name)
|
| 236 |
+
|
| 237 |
+
|
| 238 |
+
# @st.cache_resource
|
| 239 |
+
# def load_llm_model():
|
| 240 |
+
# """Loads LlamaCPP quantised model for topic labeling."""
|
| 241 |
+
# model_repo = "NousResearch/Meta-Llama-3-8B-Instruct-GGUF"
|
| 242 |
+
# model_file = "Meta-Llama-3-8B-Instruct-Q4_K_M.gguf"
|
| 243 |
+
# model_path = hf_hub_download(repo_id=model_repo, filename=model_file)
|
| 244 |
+
# return Llama(model_path=model_path, n_gpu_layers=-1, n_ctx=8192,
|
| 245 |
+
# stop=["Q:", "\n"], verbose=False)
|
| 246 |
+
|
| 247 |
+
|
| 248 |
+
@st.cache_data
|
| 249 |
+
def load_precomputed_data(docs_file, embeddings_file):
|
| 250 |
+
docs = np.load(docs_file, allow_pickle=True).tolist()
|
| 251 |
+
emb = np.load(embeddings_file, allow_pickle=True)
|
| 252 |
+
return docs, emb
|
| 253 |
+
|
| 254 |
+
|
| 255 |
+
|
| 256 |
+
# =====================================================================
|
| 257 |
+
# 4. Topic modeling function
|
| 258 |
+
# =====================================================================
|
| 259 |
+
|
| 260 |
+
def get_config_hash(cfg):
|
| 261 |
+
return json.dumps(cfg, sort_keys=True)
|
| 262 |
+
|
| 263 |
+
|
| 264 |
+
@st.cache_data
|
| 265 |
+
def perform_topic_modeling(_docs, _embeddings, config_hash):
|
| 266 |
+
"""Fit BERTopic using cached result."""
|
| 267 |
+
|
| 268 |
+
_docs = list(_docs)
|
| 269 |
+
_embeddings = np.asarray(_embeddings)
|
| 270 |
+
if _embeddings.dtype == object or _embeddings.ndim != 2:
|
| 271 |
+
try:
|
| 272 |
+
_embeddings = np.vstack(_embeddings)
|
| 273 |
+
except Exception:
|
| 274 |
+
st.error(f"Embeddings are invalid (dtype={_embeddings.dtype}, ndim={_embeddings.ndim}). "
|
| 275 |
+
"Please click **Prepare Data** to regenerate.")
|
| 276 |
+
st.stop()
|
| 277 |
+
_embeddings = np.ascontiguousarray(_embeddings, dtype=np.float32)
|
| 278 |
+
|
| 279 |
+
if _embeddings.shape[0] != len(_docs):
|
| 280 |
+
st.error(f"Mismatch between docs and embeddings: len(docs)={len(_docs)} vs "
|
| 281 |
+
f"embeddings.shape[0]={_embeddings.shape[0]}. "
|
| 282 |
+
"Delete the cached files for this configuration and regenerate.")
|
| 283 |
+
st.stop()
|
| 284 |
+
|
| 285 |
+
config = json.loads(config_hash)
|
| 286 |
+
|
| 287 |
+
# Prepare vectorizer parameters
|
| 288 |
+
if "ngram_range" in config["vectorizer_params"]:
|
| 289 |
+
config["vectorizer_params"]["ngram_range"] = tuple(config["vectorizer_params"]["ngram_range"])
|
| 290 |
+
|
| 291 |
+
# Load LLM for labeling
|
| 292 |
+
# llm = load_llm_model() # <-- REMOVED
|
| 293 |
+
|
| 294 |
+
# prompt = """Q:
|
| 295 |
+
# You are an expert in micro-phenomenology. The following documents are reflections from participants about their experience.
|
| 296 |
+
# I have a topic that contains the following documents:
|
| 297 |
+
# [DOCUMENTS]
|
| 298 |
+
# The topic is described by the following keywords: '[KEYWORDS]'.
|
| 299 |
+
# Based on the above information, give a short, informative label (5–10 words).
|
| 300 |
+
# A:"""
|
| 301 |
+
|
| 302 |
+
# rep_model = {
|
| 303 |
+
# "LLM": LlamaCPP(llm, prompt=prompt, nr_docs=25, doc_length=300, tokenizer="whitespace")
|
| 304 |
+
# }
|
| 305 |
+
|
| 306 |
+
# <-- MODIFIED: Use BERTopic's default representation instead of LLM
|
| 307 |
+
rep_model = None
|
| 308 |
+
|
| 309 |
+
umap_model = UMAP(
|
| 310 |
+
random_state=42, metric="cosine",
|
| 311 |
+
**config["umap_params"]
|
| 312 |
+
)
|
| 313 |
+
hdbscan_model = HDBSCAN(
|
| 314 |
+
metric="euclidean", prediction_data=True,
|
| 315 |
+
**config["hdbscan_params"]
|
| 316 |
+
)
|
| 317 |
+
vectorizer_model = CountVectorizer(**config["vectorizer_params"]) if config["use_vectorizer"] else None
|
| 318 |
+
|
| 319 |
+
nr_topics_val = None if config["bt_params"]["nr_topics"] == "auto" \
|
| 320 |
+
else int(config["bt_params"]["nr_topics"])
|
| 321 |
+
|
| 322 |
+
topic_model = BERTopic(
|
| 323 |
+
umap_model=umap_model,
|
| 324 |
+
hdbscan_model=hdbscan_model,
|
| 325 |
+
vectorizer_model=vectorizer_model,
|
| 326 |
+
representation_model=rep_model,
|
| 327 |
+
top_n_words=config["bt_params"]["top_n_words"],
|
| 328 |
+
nr_topics=nr_topics_val,
|
| 329 |
+
verbose=False
|
| 330 |
+
)
|
| 331 |
+
|
| 332 |
+
topics, _ = topic_model.fit_transform(_docs, _embeddings)
|
| 333 |
+
info = topic_model.get_topic_info()
|
| 334 |
+
|
| 335 |
+
outlier_pct = 0
|
| 336 |
+
if -1 in info.Topic.values:
|
| 337 |
+
outlier_pct = (info.Count[info.Topic == -1].iloc[0] / info.Count.sum()) * 100
|
| 338 |
+
|
| 339 |
+
# <-- MODIFIED: Use default topic names instead of LLM labels
|
| 340 |
+
# Get the default keyword-based names generated by BERTopic
|
| 341 |
+
topic_info = topic_model.get_topic_info()
|
| 342 |
+
# Create a map of {topic_id: "topic_name", ...}
|
| 343 |
+
name_map = topic_info.set_index('Topic')['Name'].to_dict()
|
| 344 |
+
# Map each document's topic_id to its name
|
| 345 |
+
all_labels = [name_map[topic] for topic in topics]
|
| 346 |
+
|
| 347 |
+
|
| 348 |
+
reduced = UMAP(
|
| 349 |
+
n_neighbors=15, n_components=2, min_dist=0.0,
|
| 350 |
+
metric="cosine", random_state=42
|
| 351 |
+
).fit_transform(_embeddings)
|
| 352 |
+
|
| 353 |
+
return topic_model, reduced, all_labels, len(info)-1, outlier_pct
|
| 354 |
+
|
| 355 |
+
|
| 356 |
+
|
| 357 |
+
# =====================================================================
|
| 358 |
+
# 5. CSV → documents → embeddings pipeline
|
| 359 |
+
# =====================================================================
|
| 360 |
+
|
| 361 |
+
def generate_and_save_embeddings(csv_path, docs_file, emb_file,
|
| 362 |
+
selected_embedding_model,
|
| 363 |
+
split_sentences, device):
|
| 364 |
+
|
| 365 |
+
# ---------------------
|
| 366 |
+
# Load & clean CSV
|
| 367 |
+
# ---------------------
|
| 368 |
+
st.info(f"Reading and preparing CSV: {csv_path}")
|
| 369 |
+
df = pd.read_csv(csv_path)
|
| 370 |
+
|
| 371 |
+
col = _pick_text_column(df)
|
| 372 |
+
if col is None:
|
| 373 |
+
st.error("CSV must contain one of: " + ", ".join(ACCEPTABLE_TEXT_COLUMNS))
|
| 374 |
+
return
|
| 375 |
+
|
| 376 |
+
if col != "reflection_answer_english":
|
| 377 |
+
df = df.rename(columns={col: "reflection_answer_english"})
|
| 378 |
+
|
| 379 |
+
df.dropna(subset=["reflection_answer_english"], inplace=True)
|
| 380 |
+
df["reflection_answer_english"] = df["reflection_answer_english"].astype(str)
|
| 381 |
+
df = df[df["reflection_answer_english"].str.strip() != ""]
|
| 382 |
+
reports = df["reflection_answer_english"].tolist()
|
| 383 |
+
|
| 384 |
+
# ---------------------
|
| 385 |
+
# Sentence / report granularity
|
| 386 |
+
# ---------------------
|
| 387 |
+
if split_sentences:
|
| 388 |
+
try:
|
| 389 |
+
ensure_sentence_tokenizer()
|
| 390 |
+
except LookupError as e:
|
| 391 |
+
st.error(f"Failed to load NLTK sentence tokenizer data: {e}")
|
| 392 |
+
st.stop()
|
| 393 |
+
|
| 394 |
+
sentences = [s for r in reports for s in nltk.sent_tokenize(r)]
|
| 395 |
+
docs = [s for s in sentences if len(s.split()) > 2]
|
| 396 |
+
else:
|
| 397 |
+
docs = reports
|
| 398 |
+
|
| 399 |
+
np.save(docs_file, np.array(docs, dtype=object))
|
| 400 |
+
st.success(f"Prepared {len(docs)} documents")
|
| 401 |
+
|
| 402 |
+
# ---------------------
|
| 403 |
+
# Embeddings
|
| 404 |
+
# ---------------------
|
| 405 |
+
st.info(f"Encoding {len(docs)} documents with {selected_embedding_model} on {device}")
|
| 406 |
+
|
| 407 |
+
model = load_embedding_model(selected_embedding_model)
|
| 408 |
+
|
| 409 |
+
encode_device = None
|
| 410 |
+
batch_size = 32
|
| 411 |
+
if device == "CPU":
|
| 412 |
+
encode_device = "cpu"
|
| 413 |
+
batch_size = 64
|
| 414 |
+
|
| 415 |
+
embeddings = model.encode(
|
| 416 |
+
docs,
|
| 417 |
+
show_progress_bar=True,
|
| 418 |
+
batch_size=batch_size,
|
| 419 |
+
device=encode_device,
|
| 420 |
+
convert_to_numpy=True
|
| 421 |
+
)
|
| 422 |
+
embeddings = np.asarray(embeddings, dtype=np.float32)
|
| 423 |
+
np.save(emb_file, embeddings)
|
| 424 |
+
|
| 425 |
+
st.success("Embedding generation complete!")
|
| 426 |
+
st.balloons()
|
| 427 |
+
st.rerun()
|
| 428 |
+
|
| 429 |
+
|
| 430 |
+
|
| 431 |
+
# =====================================================================
|
| 432 |
+
# 6. Sidebar — dataset, upload, parameters
|
| 433 |
+
# =====================================================================
|
| 434 |
+
|
| 435 |
+
# --- User CSV upload / server dataset ---
|
| 436 |
+
st.sidebar.header("Data Input Method")
|
| 437 |
+
|
| 438 |
+
source = st.sidebar.radio(
|
| 439 |
+
"Choose data source",
|
| 440 |
+
("Use preprocessed CSV on server", "Upload my own CSV"),
|
| 441 |
+
index=0,
|
| 442 |
+
key="data_source",
|
| 443 |
+
)
|
| 444 |
+
|
| 445 |
+
uploaded_csv_path = None
|
| 446 |
+
CSV_PATH = None # will be set in the chosen branch
|
| 447 |
+
|
| 448 |
+
# if source == "Use preprocessed CSV on server":
|
| 449 |
+
# # Show dataset selector ONLY in this branch
|
| 450 |
+
# selected_dataset_name = st.sidebar.selectbox(
|
| 451 |
+
# "Choose a dataset",
|
| 452 |
+
# list(DATASETS.keys()),
|
| 453 |
+
# key="dataset_name",
|
| 454 |
+
# )
|
| 455 |
+
# CSV_PATH = DATASETS[selected_dataset_name]
|
| 456 |
+
|
| 457 |
+
# else: # Upload my own CSV
|
| 458 |
+
# up = st.sidebar.file_uploader("Upload a CSV", type=["csv"], key="upload_csv")
|
| 459 |
+
# if up is not None:
|
| 460 |
+
# tmp_df = pd.read_csv(up)
|
| 461 |
+
# col = _pick_text_column(tmp_df)
|
| 462 |
+
# if col is None:
|
| 463 |
+
# st.error("CSV must contain a text column such as: " + ", ".join(ACCEPTABLE_TEXT_COLUMNS))
|
| 464 |
+
# st.stop()
|
| 465 |
+
# if col != "reflection_answer_english":
|
| 466 |
+
# tmp_df = tmp_df.rename(columns={col: "reflection_answer_english"})
|
| 467 |
+
# uploaded_csv_path = str((PROC_DIR / "uploaded.csv").resolve())
|
| 468 |
+
# tmp_df.to_csv(uploaded_csv_path, index=False)
|
| 469 |
+
# st.success(f"Uploaded CSV saved to {uploaded_csv_path}")
|
| 470 |
+
# CSV_PATH = uploaded_csv_path
|
| 471 |
+
# else:
|
| 472 |
+
# st.info("Upload a CSV to continue.")
|
| 473 |
+
# st.stop()
|
| 474 |
+
if source == "Use preprocessed CSV on server":
|
| 475 |
+
# List preprocessed CSVs inside this dataset’s folder
|
| 476 |
+
available = _list_server_csvs(PROC_DIR)
|
| 477 |
+
if not available:
|
| 478 |
+
st.info(f"No CSVs found in {PROC_DIR}. Switch to 'Upload my own CSV' or change the dataset name.")
|
| 479 |
+
st.stop()
|
| 480 |
+
selected_csv = st.sidebar.selectbox("Choose a preprocessed CSV", available, key="server_csv_select")
|
| 481 |
+
CSV_PATH = selected_csv
|
| 482 |
+
else:
|
| 483 |
+
up = st.sidebar.file_uploader("Upload a CSV", type=["csv"], key="upload_csv")
|
| 484 |
+
if up is not None:
|
| 485 |
+
tmp_df = pd.read_csv(up)
|
| 486 |
+
col = _pick_text_column(tmp_df)
|
| 487 |
+
if col is None:
|
| 488 |
+
st.error("CSV must contain a text column such as: " + ", ".join(ACCEPTABLE_TEXT_COLUMNS))
|
| 489 |
+
st.stop()
|
| 490 |
+
if col != "reflection_answer_english":
|
| 491 |
+
tmp_df = tmp_df.rename(columns={col: "reflection_answer_english"})
|
| 492 |
+
# Save into THIS dataset’s preprocessed folder
|
| 493 |
+
uploaded_csv_path = str((PROC_DIR / "uploaded.csv").resolve())
|
| 494 |
+
tmp_df.to_csv(uploaded_csv_path, index=False)
|
| 495 |
+
st.success(f"Uploaded CSV saved to {uploaded_csv_path}")
|
| 496 |
+
CSV_PATH = uploaded_csv_path
|
| 497 |
+
else:
|
| 498 |
+
st.info("Upload a CSV to continue.")
|
| 499 |
+
st.stop()
|
| 500 |
+
|
| 501 |
+
|
| 502 |
+
|
| 503 |
+
|
| 504 |
+
# --- Subsample ---
|
| 505 |
+
st.sidebar.subheader("Data Granularity & Subsampling")
|
| 506 |
+
|
| 507 |
+
selected_granularity = st.sidebar.checkbox("Split reports into sentences", value=True)
|
| 508 |
+
granularity_label = "sentences" if selected_granularity else "reports"
|
| 509 |
+
|
| 510 |
+
subsample_perc = st.sidebar.slider("Data sampling (%)", 10, 100, 100, 5)
|
| 511 |
+
|
| 512 |
+
# line break
|
| 513 |
+
st.sidebar.markdown("---")
|
| 514 |
+
|
| 515 |
+
|
| 516 |
+
|
| 517 |
+
# --- Embedding model selection ---
|
| 518 |
+
st.sidebar.header("Model Selection")
|
| 519 |
+
|
| 520 |
+
|
| 521 |
+
|
| 522 |
+
selected_embedding_model = st.sidebar.selectbox("Choose an embedding model", (
|
| 523 |
+
"intfloat/multilingual-e5-large-instruct",
|
| 524 |
+
"Qwen/Qwen3-Embedding-0.6B",
|
| 525 |
+
"BAAI/bge-small-en-v1.5",
|
| 526 |
+
"sentence-transformers/all-mpnet-base-v2",
|
| 527 |
+
))
|
| 528 |
+
|
| 529 |
+
|
| 530 |
+
|
| 531 |
+
|
| 532 |
+
|
| 533 |
+
# --- Device selection ---
|
| 534 |
+
# st.sidebar.header("Data Preparation")
|
| 535 |
+
selected_device = st.sidebar.radio(
|
| 536 |
+
"Processing device",
|
| 537 |
+
["GPU (MPS)", "CPU"],
|
| 538 |
+
index=0,
|
| 539 |
+
)
|
| 540 |
+
|
| 541 |
+
|
| 542 |
+
|
| 543 |
+
|
| 544 |
+
|
| 545 |
+
|
| 546 |
+
# =====================================================================
|
| 547 |
+
# 7. Precompute filenames and pipeline triggers
|
| 548 |
+
# =====================================================================
|
| 549 |
+
|
| 550 |
+
def get_precomputed_filenames(csv_path, model_name, split_sentences):
|
| 551 |
+
base = os.path.splitext(os.path.basename(csv_path))[0]
|
| 552 |
+
safe_model = re.sub(r"[^a-zA-Z0-9_-]", "_", model_name)
|
| 553 |
+
suf = "sentences" if split_sentences else "reports"
|
| 554 |
+
return (
|
| 555 |
+
str(CACHE_DIR / f"precomputed_{base}_{suf}_docs.npy"),
|
| 556 |
+
str(CACHE_DIR / f"precomputed_{base}_{safe_model}_{suf}_embeddings.npy"),
|
| 557 |
+
)
|
| 558 |
+
|
| 559 |
+
DOCS_FILE, EMBEDDINGS_FILE = get_precomputed_filenames(
|
| 560 |
+
CSV_PATH, selected_embedding_model, selected_granularity
|
| 561 |
+
)
|
| 562 |
+
|
| 563 |
+
# --- Cache management (after DOCS_FILE / EMBEDDINGS_FILE exist) ---
|
| 564 |
+
st.sidebar.markdown("### Cache")
|
| 565 |
+
if st.sidebar.button("Clear cached files for this configuration", use_container_width=True):
|
| 566 |
+
try:
|
| 567 |
+
for p in (DOCS_FILE, EMBEDDINGS_FILE):
|
| 568 |
+
if os.path.exists(p):
|
| 569 |
+
os.remove(p)
|
| 570 |
+
# also clear Streamlit caches tied to these functions
|
| 571 |
+
try:
|
| 572 |
+
load_precomputed_data.clear() # st.cache_data func
|
| 573 |
+
except Exception:
|
| 574 |
+
pass
|
| 575 |
+
try:
|
| 576 |
+
perform_topic_modeling.clear() # st.cache_data func
|
| 577 |
+
except Exception:
|
| 578 |
+
pass
|
| 579 |
+
|
| 580 |
+
st.success("Deleted cached docs/embeddings and cleared caches. Click **Prepare Data** again.")
|
| 581 |
+
st.rerun()
|
| 582 |
+
except Exception as e:
|
| 583 |
+
st.error(f"Failed to delete cache files: {e}")
|
| 584 |
+
|
| 585 |
+
#add line break
|
| 586 |
+
st.sidebar.markdown("---")
|
| 587 |
+
|
| 588 |
+
|
| 589 |
+
|
| 590 |
+
# =====================================================================
|
| 591 |
+
# 8. Prepare Data OR Run Analysis
|
| 592 |
+
# =====================================================================
|
| 593 |
+
|
| 594 |
+
if not os.path.exists(EMBEDDINGS_FILE):
|
| 595 |
+
st.warning(f"No precomputed embeddings found for this configuration ({granularity_label} / {selected_embedding_model}).")
|
| 596 |
+
|
| 597 |
+
if st.button("Prepare Data for This Configuration"):
|
| 598 |
+
generate_and_save_embeddings(
|
| 599 |
+
CSV_PATH, DOCS_FILE, EMBEDDINGS_FILE,
|
| 600 |
+
selected_embedding_model, selected_granularity, selected_device
|
| 601 |
+
)
|
| 602 |
+
|
| 603 |
+
else:
|
| 604 |
+
# Load cached data
|
| 605 |
+
docs, embeddings = load_precomputed_data(DOCS_FILE, EMBEDDINGS_FILE)
|
| 606 |
+
|
| 607 |
+
# Coerce to 2-D float array even if saved as object
|
| 608 |
+
embeddings = np.asarray(embeddings)
|
| 609 |
+
if embeddings.dtype == object or embeddings.ndim != 2:
|
| 610 |
+
try:
|
| 611 |
+
embeddings = np.vstack(embeddings).astype(np.float32)
|
| 612 |
+
except Exception:
|
| 613 |
+
st.error("Cached embeddings are invalid. Please regenerate them for this configuration.")
|
| 614 |
+
st.stop()
|
| 615 |
+
|
| 616 |
+
# Subsample
|
| 617 |
+
if subsample_perc < 100:
|
| 618 |
+
n = int(len(docs) * (subsample_perc / 100))
|
| 619 |
+
idx = np.random.choice(len(docs), size=n, replace=False)
|
| 620 |
+
docs = [docs[i] for i in idx]
|
| 621 |
+
# embeddings = embeddings[idx]
|
| 622 |
+
embeddings = np.asarray(embeddings)
|
| 623 |
+
embeddings = embeddings[idx, :] # keep it 2-D
|
| 624 |
+
st.warning(f"Running analysis on {subsample_perc}% subsample ({len(docs)} documents)")
|
| 625 |
+
|
| 626 |
+
# st.metric("Documents to Analyze", len(docs), granularity_label)
|
| 627 |
+
# --- Dataset summary metrics ---
|
| 628 |
+
st.subheader("Dataset summary")
|
| 629 |
+
n_reports = count_clean_reports(CSV_PATH) # total cleaned reports in CSV
|
| 630 |
+
unit = "sentences" if selected_granularity else "reports"
|
| 631 |
+
n_units = len(docs) # actual units analyzed
|
| 632 |
+
|
| 633 |
+
c1, c2 = st.columns(2)
|
| 634 |
+
c1.metric("Reports in CSV (cleaned)", n_reports)
|
| 635 |
+
c2.metric(f"Units analysed ({unit})", n_units)
|
| 636 |
+
|
| 637 |
+
# --- Parameter controls ---
|
| 638 |
+
st.sidebar.header("Model Parameters")
|
| 639 |
+
|
| 640 |
+
use_vectorizer = st.sidebar.checkbox("Use CountVectorizer", value=True)
|
| 641 |
+
|
| 642 |
+
with st.sidebar.expander("Vectorizer"):
|
| 643 |
+
ng_min = st.slider("Min N-gram", 1, 5, 1)
|
| 644 |
+
ng_max = st.slider("Max N-gram", 1, 5, 2)
|
| 645 |
+
min_df = st.slider("Min Doc Freq", 1, 50, 1)
|
| 646 |
+
stopwords = st.select_slider("Stopwords", options=[None, "english"], value=None)
|
| 647 |
+
|
| 648 |
+
with st.sidebar.expander("UMAP"):
|
| 649 |
+
um_n = st.slider("n_neighbors", 2, 50, 15)
|
| 650 |
+
um_c = st.slider("n_components", 2, 20, 5)
|
| 651 |
+
um_d = st.slider("min_dist", 0.0, 1.0, 0.0)
|
| 652 |
+
|
| 653 |
+
with st.sidebar.expander("HDBSCAN"):
|
| 654 |
+
hs = st.slider("min_cluster_size", 5, 100, 10)
|
| 655 |
+
hm = st.slider("min_samples", 2, 100, 5)
|
| 656 |
+
|
| 657 |
+
with st.sidebar.expander("BERTopic"):
|
| 658 |
+
nr_topics = st.text_input("nr_topics", value="auto")
|
| 659 |
+
top_n_words = st.slider("top_n_words", 5, 25, 10)
|
| 660 |
+
|
| 661 |
+
# --- Build config ---
|
| 662 |
+
current_config = {
|
| 663 |
+
"embedding_model": selected_embedding_model,
|
| 664 |
+
"granularity": granularity_label,
|
| 665 |
+
"subsample_percent": subsample_perc,
|
| 666 |
+
"use_vectorizer": use_vectorizer,
|
| 667 |
+
"vectorizer_params": {
|
| 668 |
+
"ngram_range": (ng_min, ng_max),
|
| 669 |
+
"min_df": min_df,
|
| 670 |
+
"stop_words": stopwords,
|
| 671 |
+
},
|
| 672 |
+
"umap_params": {
|
| 673 |
+
"n_neighbors": um_n,
|
| 674 |
+
"n_components": um_c,
|
| 675 |
+
"min_dist": um_d,
|
| 676 |
+
},
|
| 677 |
+
"hdbscan_params": {
|
| 678 |
+
"min_cluster_size": hs,
|
| 679 |
+
"min_samples": hm,
|
| 680 |
+
},
|
| 681 |
+
"bt_params": {
|
| 682 |
+
"nr_topics": nr_topics,
|
| 683 |
+
"top_n_words": top_n_words,
|
| 684 |
+
},
|
| 685 |
+
}
|
| 686 |
+
|
| 687 |
+
# --- Run Button ---
|
| 688 |
+
run_button = st.sidebar.button("Run Analysis", type="primary")
|
| 689 |
+
|
| 690 |
+
|
| 691 |
+
# =================================================================
|
| 692 |
+
# 9. Visualization & History Tabs
|
| 693 |
+
# =================================================================
|
| 694 |
+
main_tab, history_tab = st.tabs(["Main Results", "Run History"])
|
| 695 |
+
|
| 696 |
+
def load_history():
|
| 697 |
+
path = HISTORY_FILE
|
| 698 |
+
if not os.path.exists(path):
|
| 699 |
+
return []
|
| 700 |
+
try:
|
| 701 |
+
data = json.load(open(path))
|
| 702 |
+
except Exception:
|
| 703 |
+
return []
|
| 704 |
+
# --- migrate old keys for backward-compat ---
|
| 705 |
+
for e in data:
|
| 706 |
+
if "outlier_pct" not in e and "outlier_perc" in e:
|
| 707 |
+
e["outlier_pct"] = e.pop("outlier_perc")
|
| 708 |
+
return data
|
| 709 |
+
|
| 710 |
+
|
| 711 |
+
def save_history(h):
|
| 712 |
+
json.dump(h, open(HISTORY_FILE, "w"), indent=2)
|
| 713 |
+
|
| 714 |
+
if "history" not in st.session_state:
|
| 715 |
+
st.session_state.history = load_history()
|
| 716 |
+
|
| 717 |
+
if run_button:
|
| 718 |
+
|
| 719 |
+
if not isinstance(embeddings, np.ndarray):
|
| 720 |
+
embeddings = np.asarray(embeddings)
|
| 721 |
+
|
| 722 |
+
if embeddings.dtype == object or embeddings.ndim != 2:
|
| 723 |
+
try:
|
| 724 |
+
embeddings = np.vstack(embeddings).astype(np.float32)
|
| 725 |
+
except Exception:
|
| 726 |
+
st.error("Cached embeddings are invalid (object/ragged). Click **Prepare Data** to regenerate.")
|
| 727 |
+
st.stop()
|
| 728 |
+
|
| 729 |
+
if embeddings.shape[0] != len(docs):
|
| 730 |
+
st.error(f"len(docs)={len(docs)} but embeddings.shape[0]={embeddings.shape[0]}.\n"
|
| 731 |
+
"Likely stale cache (e.g., switched sentences↔reports or model). "
|
| 732 |
+
"Use the **Clear cache** button below and regenerate.")
|
| 733 |
+
st.stop()
|
| 734 |
+
|
| 735 |
+
|
| 736 |
+
with st.spinner("Performing topic modeling..."):
|
| 737 |
+
model, reduced, labels, n_topics, outlier_pct = perform_topic_modeling(
|
| 738 |
+
docs, embeddings, get_config_hash(current_config)
|
| 739 |
+
)
|
| 740 |
+
st.session_state.latest_results = (model, reduced, labels)
|
| 741 |
+
|
| 742 |
+
# Save in history
|
| 743 |
+
entry = {
|
| 744 |
+
"timestamp": str(pd.Timestamp.now()),
|
| 745 |
+
"config": current_config,
|
| 746 |
+
"num_topics": n_topics,
|
| 747 |
+
"outlier_pct": f"{outlier_pct:.2f}%",
|
| 748 |
+
"llm_labels": [ # <-- This will use the fallback logic in the export section
|
| 749 |
+
name for name in model.get_topic_info().Name.values
|
| 750 |
+
if ("Unlabelled" not in name and "outlier" not in name)
|
| 751 |
+
],
|
| 752 |
+
}
|
| 753 |
+
st.session_state.history.insert(0, entry)
|
| 754 |
+
save_history(st.session_state.history)
|
| 755 |
+
st.rerun()
|
| 756 |
+
|
| 757 |
+
# --- MAIN TAB ---
|
| 758 |
+
with main_tab:
|
| 759 |
+
if "latest_results" in st.session_state:
|
| 760 |
+
tm, reduced, labs = st.session_state.latest_results
|
| 761 |
+
|
| 762 |
+
st.subheader("Experiential Topics Visualisation")
|
| 763 |
+
fig, _ = datamapplot.create_plot(reduced, labs)
|
| 764 |
+
st.pyplot(fig)
|
| 765 |
+
|
| 766 |
+
st.subheader("Topic Info")
|
| 767 |
+
st.dataframe(tm.get_topic_info())
|
| 768 |
+
|
| 769 |
+
|
| 770 |
+
# --- Export: one row per topic (topic_id, LLM topic_name, texts) ---
|
| 771 |
+
st.subheader("Export results (one row per topic)")
|
| 772 |
+
|
| 773 |
+
# 1) Pull LLM labels directly from BERTopic's representation
|
| 774 |
+
full_reps = tm.get_topics(full=True)
|
| 775 |
+
llm_reps = full_reps.get("LLM", {}) # {topic_id: [(label, score), ...], ...}
|
| 776 |
+
|
| 777 |
+
# Build topic_id -> LLM label map; fall back to Name if missing
|
| 778 |
+
llm_names = {}
|
| 779 |
+
for tid, vals in llm_reps.items():
|
| 780 |
+
try:
|
| 781 |
+
llm_names[tid] = (vals[0][0] or "").strip().strip('"').strip(".")
|
| 782 |
+
except Exception:
|
| 783 |
+
llm_names[tid] = "Unlabelled"
|
| 784 |
+
|
| 785 |
+
# <-- MODIFIED: This fallback logic is now the main logic
|
| 786 |
+
if not llm_names:
|
| 787 |
+
# Fallback: whatever BERTopic put in Name
|
| 788 |
+
st.caption("Note: Using default keyword-based topic names.")
|
| 789 |
+
llm_names = tm.get_topic_info().set_index("Topic")["Name"].to_dict()
|
| 790 |
+
|
| 791 |
+
# 2) Per-document assignments for current docs
|
| 792 |
+
doc_info = tm.get_document_info(docs)[["Document", "Topic"]]
|
| 793 |
+
|
| 794 |
+
# 3) Optionally remove outliers
|
| 795 |
+
include_outliers = st.checkbox("Include outlier topic (-1)", value=False)
|
| 796 |
+
if not include_outliers:
|
| 797 |
+
doc_info = doc_info[doc_info["Topic"] != -1]
|
| 798 |
+
|
| 799 |
+
# 4) Group texts by topic
|
| 800 |
+
grouped = (
|
| 801 |
+
doc_info.groupby("Topic")["Document"]
|
| 802 |
+
.apply(list)
|
| 803 |
+
.reset_index(name="texts")
|
| 804 |
+
)
|
| 805 |
+
|
| 806 |
+
# 5) Attach LLM names
|
| 807 |
+
grouped["topic_name"] = grouped["Topic"].map(llm_names).fillna("Unlabelled")
|
| 808 |
+
|
| 809 |
+
# 6) Reorder/rename columns
|
| 810 |
+
export_topics = grouped.rename(columns={"Topic": "topic_id"})[
|
| 811 |
+
["topic_id", "topic_name", "texts"]
|
| 812 |
+
].sort_values("topic_id").reset_index(drop=True)
|
| 813 |
+
|
| 814 |
+
# ---- CSV + JSONL outputs ----
|
| 815 |
+
# Fixed separator (no textbox)
|
| 816 |
+
# SEP = " || " # change if you prefer another fixed separator
|
| 817 |
+
SEP = "\n" # change if you prefer another fixed separator
|
| 818 |
+
|
| 819 |
+
# Flatten lists for CSV
|
| 820 |
+
export_csv = export_topics.copy()
|
| 821 |
+
export_csv["texts"] = export_csv["texts"].apply(lambda lst: SEP.join(map(str, lst)))
|
| 822 |
+
|
| 823 |
+
base = os.path.splitext(os.path.basename(CSV_PATH))[0]
|
| 824 |
+
gran = "sentences" if selected_granularity else "reports"
|
| 825 |
+
csv_name = f"topics_{base}_{gran}.csv"
|
| 826 |
+
jsonl_name = f"topics_{base}_{gran}.jsonl"
|
| 827 |
+
csv_path = (EVAL_DIR / csv_name).resolve()
|
| 828 |
+
jsonl_path = (EVAL_DIR / jsonl_name).resolve()
|
| 829 |
+
|
| 830 |
+
cL, cC, cR = st.columns(3)
|
| 831 |
+
|
| 832 |
+
with cL:
|
| 833 |
+
if st.button("Save CSV to eval/", use_container_width=True):
|
| 834 |
+
try:
|
| 835 |
+
export_csv.to_csv(csv_path, index=False)
|
| 836 |
+
st.success(f"Saved CSV → {csv_path}")
|
| 837 |
+
except Exception as e:
|
| 838 |
+
st.error(f"Failed to save CSV: {e}")
|
| 839 |
+
|
| 840 |
+
with cC:
|
| 841 |
+
# JSONL preserves the list structure of texts
|
| 842 |
+
if st.button("Save JSONL to eval/", use_container_width=True):
|
| 843 |
+
try:
|
| 844 |
+
with open(jsonl_path, "w", encoding="utf-8") as f:
|
| 845 |
+
for _, row in export_topics.iterrows():
|
| 846 |
+
rec = {
|
| 847 |
+
"topic_id": int(row["topic_id"]),
|
| 848 |
+
"topic_name": row["topic_name"],
|
| 849 |
+
"texts": list(map(str, row["texts"])),
|
| 850 |
+
}
|
| 851 |
+
f.write(json.dumps(rec, ensure_ascii=False) + "\n")
|
| 852 |
+
st.success(f"Saved JSONL → {jsonl_path}")
|
| 853 |
+
except Exception as e:
|
| 854 |
+
st.error(f"Failed to save JSONL: {e}")
|
| 855 |
+
|
| 856 |
+
with cR:
|
| 857 |
+
st.download_button(
|
| 858 |
+
"Download CSV",
|
| 859 |
+
data=export_csv.to_csv(index=False).encode("utf-8"),
|
| 860 |
+
file_name=csv_name,
|
| 861 |
+
mime="text/csv",
|
| 862 |
+
use_container_width=True,
|
| 863 |
+
)
|
| 864 |
+
|
| 865 |
+
st.caption("Preview (one row per topic)")
|
| 866 |
+
st.dataframe(export_csv.head(10))
|
| 867 |
+
|
| 868 |
+
|
| 869 |
+
|
| 870 |
+
|
| 871 |
+
|
| 872 |
+
|
| 873 |
+
else:
|
| 874 |
+
st.info("Click 'Run Analysis' to begin.")
|
| 875 |
+
|
| 876 |
+
# --- HISTORY TAB ---
|
| 877 |
+
with history_tab:
|
| 878 |
+
st.subheader("Run History")
|
| 879 |
+
if not st.session_state.history:
|
| 880 |
+
st.info("No runs yet.")
|
| 881 |
+
else:
|
| 882 |
+
for i, entry in enumerate(st.session_state.history):
|
| 883 |
+
with st.expander(f"Run {i+1} — {entry['timestamp']}"):
|
| 884 |
+
st.write(f"**Topics:** {entry['num_topics']}")
|
| 885 |
+
st.write(f"**Outliers:** {entry.get('outlier_pct', entry.get('outlier_perc', 'N/A'))}")
|
| 886 |
+
st.write("**Topic Labels (default keywords):**")
|
| 887 |
+
st.write(entry["llm_labels"])
|
| 888 |
+
with st.expander("Show full configuration"):
|
| 889 |
+
st.json(entry["config"])
|
eval/.gitkeep
ADDED
|
File without changes
|
requirements.txt
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
streamlit>=1.37
|
| 2 |
+
pandas
|
| 3 |
+
numpy
|
| 4 |
+
nltk
|
| 5 |
+
bertopic
|
| 6 |
+
umap-learn
|
| 7 |
+
hdbscan
|
| 8 |
+
scikit-learn
|
| 9 |
+
sentence-transformers
|
| 10 |
+
datamapplot
|
| 11 |
+
huggingface_hub
|
| 12 |
+
matplotlib
|
| 13 |
+
# llama-cpp-python
|
| 14 |
+
# https://github.com/romybeaute/MOSAIC/tree/mosaic2.1/src/mosaic
|
test_mosaic.csv
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
reflection_answer
|
| 2 |
+
I felt gentle waves of light across my vision.
|
| 3 |
+
"A calm, spacious feeling with slow breathing."
|
| 4 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 5 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 6 |
+
I noticed patterns when I closed my eyes.
|
| 7 |
+
Time felt slower for a few moments.
|
| 8 |
+
Warmth in my hands and a sense of clarity.
|
| 9 |
+
Sound felt crisper and more present.
|
| 10 |
+
I felt gentle waves of light across my vision.
|
| 11 |
+
"A calm, spacious feeling with slow breathing."
|
| 12 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 13 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 14 |
+
I noticed patterns when I closed my eyes.
|
| 15 |
+
Time felt slower for a few moments.
|
| 16 |
+
Warmth in my hands and a sense of clarity.
|
| 17 |
+
Sound felt crisper and more present.
|
| 18 |
+
I felt gentle waves of light across my vision.
|
| 19 |
+
"A calm, spacious feeling with slow breathing."
|
| 20 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 21 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 22 |
+
I noticed patterns when I closed my eyes.
|
| 23 |
+
Time felt slower for a few moments.
|
| 24 |
+
Warmth in my hands and a sense of clarity.
|
| 25 |
+
Sound felt crisper and more present.
|
| 26 |
+
I felt gentle waves of light across my vision.
|
| 27 |
+
"A calm, spacious feeling with slow breathing."
|
| 28 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 29 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 30 |
+
I noticed patterns when I closed my eyes.
|
| 31 |
+
Time felt slower for a few moments.
|
| 32 |
+
Warmth in my hands and a sense of clarity.
|
| 33 |
+
Sound felt crisper and more present.
|
| 34 |
+
I felt gentle waves of light across my vision.
|
| 35 |
+
"A calm, spacious feeling with slow breathing."
|
| 36 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 37 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 38 |
+
I noticed patterns when I closed my eyes.
|
| 39 |
+
Time felt slower for a few moments.
|
| 40 |
+
Warmth in my hands and a sense of clarity.
|
| 41 |
+
Sound felt crisper and more present.
|
| 42 |
+
I felt gentle waves of light across my vision.
|
| 43 |
+
"A calm, spacious feeling with slow breathing."
|
| 44 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 45 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 46 |
+
I noticed patterns when I closed my eyes.
|
| 47 |
+
Time felt slower for a few moments.
|
| 48 |
+
Warmth in my hands and a sense of clarity.
|
| 49 |
+
Sound felt crisper and more present.
|
| 50 |
+
I felt gentle waves of light across my vision.
|
| 51 |
+
"A calm, spacious feeling with slow breathing."
|
| 52 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 53 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 54 |
+
I noticed patterns when I closed my eyes.
|
| 55 |
+
Time felt slower for a few moments.
|
| 56 |
+
Warmth in my hands and a sense of clarity.
|
| 57 |
+
Sound felt crisper and more present.
|
| 58 |
+
I felt gentle waves of light across my vision.
|
| 59 |
+
"A calm, spacious feeling with slow breathing."
|
| 60 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 61 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 62 |
+
I noticed patterns when I closed my eyes.
|
| 63 |
+
Time felt slower for a few moments.
|
| 64 |
+
Warmth in my hands and a sense of clarity.
|
| 65 |
+
Sound felt crisper and more present.
|
| 66 |
+
I felt gentle waves of light across my vision.
|
| 67 |
+
"A calm, spacious feeling with slow breathing."
|
| 68 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 69 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 70 |
+
I noticed patterns when I closed my eyes.
|
| 71 |
+
Time felt slower for a few moments.
|
| 72 |
+
Warmth in my hands and a sense of clarity.
|
| 73 |
+
Sound felt crisper and more present.
|
| 74 |
+
I felt gentle waves of light across my vision.
|
| 75 |
+
"A calm, spacious feeling with slow breathing."
|
| 76 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 77 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 78 |
+
I noticed patterns when I closed my eyes.
|
| 79 |
+
Time felt slower for a few moments.
|
| 80 |
+
Warmth in my hands and a sense of clarity.
|
| 81 |
+
Sound felt crisper and more present.
|
| 82 |
+
I felt gentle waves of light across my vision.
|
| 83 |
+
"A calm, spacious feeling with slow breathing."
|
| 84 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 85 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 86 |
+
I noticed patterns when I closed my eyes.
|
| 87 |
+
Time felt slower for a few moments.
|
| 88 |
+
Warmth in my hands and a sense of clarity.
|
| 89 |
+
Sound felt crisper and more present.
|
| 90 |
+
I felt gentle waves of light across my vision.
|
| 91 |
+
"A calm, spacious feeling with slow breathing."
|
| 92 |
+
"Subtle pulsing behind the eyes, then deep relaxation."
|
| 93 |
+
"Memories surfaced briefly, then faded without emotion."
|
| 94 |
+
I noticed patterns when I closed my eyes.
|
| 95 |
+
Time felt slower for a few moments.
|
| 96 |
+
Warmth in my hands and a sense of clarity.
|
| 97 |
+
Sound felt crisper and more present.
|