romybeaute commited on
Commit
53c2d1b
·
0 Parent(s):

Recreate lite branch for HF Space (no data/ tracked)

Browse files
Files changed (7) hide show
  1. .gitignore +33 -0
  2. .streamlit/config.toml +3 -0
  3. README.md +38 -0
  4. app.py +889 -0
  5. eval/.gitkeep +0 -0
  6. requirements.txt +14 -0
  7. test_mosaic.csv +97 -0
.gitignore ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.pyc
4
+
5
+ # Streamlit local secrets (never commit)
6
+ .streamlit/secrets.toml
7
+
8
+ # Editor junk
9
+ .DS_Store
10
+ MOSAICapp.code-workspace
11
+
12
+ # Caches
13
+ .cache/
14
+ **/__pycache__/
15
+
16
+ # Big/binary artifacts (belt & suspenders)
17
+ *.npy
18
+ *.npz
19
+ *.pt
20
+ *.bin
21
+ *.gguf
22
+
23
+ # Ignore *everything* under data/, keep only .gitkeep files
24
+ data/
25
+ !data/.gitkeep
26
+ !data/**/.gitkeep
27
+
28
+ # (Optional) Do the same for eval/ if you don't want outputs tracked
29
+ eval/
30
+ !eval/.gitkeep
31
+ !eval/**/.gitkeep
32
+ data/*/preprocessed/cache/
33
+ data/
.streamlit/config.toml ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ [server]
2
+ headless = true
3
+ maxUploadSize = 500 # MB; adjust if people upload large CSVs
README.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: MOSAICapp
3
+ colorFrom: indigo
4
+ colorTo: blue
5
+ sdk: docker
6
+ pinned: false
7
+ ---
8
+
9
+ # MOSAIC Topic Dashboard
10
+
11
+ A Streamlit app for BERTopic-based topic modelling with sentence-transformers embeddings.
12
+ **No data bundled** — upload CSV with one text column (any of: `reflection_answer_english`, `reflection_answer`, `text`, `report`).
13
+
14
+ ## Lite Version (Free Hardware)
15
+
16
+ This Hugging Face Space runs the **`lite` version** of the app.
17
+
18
+ To make it run on free "CPU basic" hardware, the **LLM-based topic labeling feature has been disabled**. The app will use BERTopic's default keyword-based labels instead.
19
+
20
+ For the full, original version with LLM features (which requires paid GPU hardware), please see the `main` branch of the [original GitHub repository](https://github.com/romybeaute/MOSAICapp).
21
+
22
+ ## Run Locally (Full Version)
23
+
24
+ To run the full version on your local machine:
25
+
26
+ ```bash
27
+ # Clone the main branch
28
+ git clone [https://github.com/romybeaute/MOSAICapp.git](https://github.com/romybeaute/MOSAICapp.git)
29
+ cd MOSAICapp
30
+
31
+ # Install requirements
32
+ pip install -r requirements.txt
33
+
34
+ # Download NLTK data
35
+ python -c "import nltk; nltk.download('punkt')"
36
+
37
+ # Run the app
38
+ streamlit run app.py
app.py ADDED
@@ -0,0 +1,889 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ File: app.py
3
+ Description: Streamlit app for advanced topic modeling on Innerspeech dataset
4
+ with BERTopic, UMAP, HDBSCAN. (LLM features disabled for lite deployment)
5
+ Last Modified: 06/11/2025
6
+ @author: r.beaut
7
+ """
8
+
9
+ # =====================================================================
10
+ # Imports
11
+ # =====================================================================
12
+
13
+ from pathlib import Path
14
+ import sys
15
+ # from llama_cpp import Llama # <-- REMOVED
16
+ import streamlit as st
17
+ import pandas as pd
18
+ import numpy as np
19
+ import re
20
+ import os
21
+ import nltk
22
+ import json
23
+
24
+ # from mosaic.path_utils import CFG, raw_path, proc_path, eval_path, project_root
25
+
26
+
27
+ try:
28
+ from mosaic.path_utils import CFG, raw_path, proc_path, eval_path, project_root # type: ignore
29
+ except Exception:
30
+ # Minimal stand-in so the app works anywhere (Streamlit Cloud, local without MOSAIC, etc.)
31
+ def _env(key: str, default: str) -> Path:
32
+ val = os.getenv(key, default)
33
+ return Path(val).expanduser().resolve()
34
+
35
+ # Defaults: app-local data/ eval/ that are safe on Cloud
36
+ _DATA_ROOT = _env("MOSAIC_DATA", str(Path(__file__).parent / "data"))
37
+ _BOX_ROOT = _env("MOSAIC_BOX", str(Path(__file__).parent / "data" / "raw"))
38
+ _EVAL_ROOT = _env("MOSAIC_EVAL", str(Path(__file__).parent / "eval"))
39
+
40
+ CFG = {
41
+ "data_root": str(_DATA_ROOT),
42
+ "box_root": str(_BOX_ROOT),
43
+ "eval_root": str(_EVAL_ROOT),
44
+ }
45
+
46
+ def project_root() -> Path:
47
+ return Path(__file__).resolve().parent
48
+
49
+ def raw_path(*parts: str) -> Path:
50
+ return _BOX_ROOT.joinpath(*parts)
51
+
52
+ def proc_path(*parts: str) -> Path:
53
+ return _DATA_ROOT.joinpath(*parts)
54
+
55
+ def eval_path(*parts: str) -> Path:
56
+ return _EVAL_ROOT.joinpath(*parts)
57
+
58
+ # BERTopic stack
59
+ from bertopic import BERTopic
60
+ # from bertopic.representation import LlamaCPP # <-- REMOVED
61
+ # from llama_cpp import Llama # <-- REMOVED
62
+ from sentence_transformers import SentenceTransformer
63
+
64
+ # Clustering/dimensionality reduction
65
+ from sklearn.feature_extraction.text import CountVectorizer
66
+ from umap import UMAP
67
+ from hdbscan import HDBSCAN
68
+
69
+ # Visualisation
70
+ import datamapplot
71
+ import matplotlib.pyplot as plt
72
+ from huggingface_hub import hf_hub_download
73
+
74
+
75
+
76
+ # =====================================================================
77
+ # 0. Constants & Helper Functions
78
+ # =====================================================================
79
+
80
+
81
+ def _slugify(s: str) -> str:
82
+ s = s.strip()
83
+ s = re.sub(r"[^A-Za-z0-9._-]+", "_", s)
84
+ return s or "DATASET"
85
+
86
+
87
+ ACCEPTABLE_TEXT_COLUMNS = [
88
+ "reflection_answer_english",
89
+ "reflection_answer",
90
+ "text",
91
+ "report",
92
+ ]
93
+
94
+ def _pick_text_column(df: pd.DataFrame) -> str | None:
95
+ """Return the first matching text column."""
96
+ for col in ACCEPTABLE_TEXT_COLUMNS:
97
+ if col in df.columns:
98
+ return col
99
+ return None
100
+
101
+
102
+ def _set_from_env_or_secrets(key: str):
103
+ """Allow hosting: value can come from environment or from Streamlit secrets."""
104
+ if os.getenv(key):
105
+ return
106
+ try:
107
+ val = st.secrets.get(key, None)
108
+ except Exception:
109
+ val = None
110
+ if val:
111
+ os.environ[key] = str(val)
112
+
113
+ # Enable both MOSAIC_DATA and MOSAIC_BOX automatically
114
+ for _k in ("MOSAIC_DATA", "MOSAIC_BOX"):
115
+ _set_from_env_or_secrets(_k)
116
+
117
+
118
+
119
+ @st.cache_data
120
+ def count_clean_reports(csv_path: str) -> int:
121
+ df = pd.read_csv(csv_path)
122
+ col = _pick_text_column(df)
123
+ if col is None:
124
+ return 0
125
+ if col != "reflection_answer_english":
126
+ df = df.rename(columns={col: "reflection_answer_english"})
127
+ df.dropna(subset=["reflection_answer_english"], inplace=True)
128
+ df["reflection_answer_english"] = df["reflection_answer_english"].astype(str)
129
+ df = df[df["reflection_answer_english"].str.strip() != ""]
130
+ return len(df)
131
+
132
+
133
+ def ensure_sentence_tokenizer():
134
+ """
135
+ Make sure NLTK sentence tokenizer data is available.
136
+
137
+ Newer NLTK (3.9+) uses 'punkt_tab' for sent_tokenize(),
138
+ older versions use 'punkt'
139
+ """
140
+ for resource in ("punkt_tab", "punkt"):
141
+ try:
142
+ nltk.data.find(f"tokenizers/{resource}")
143
+ return
144
+ except LookupError:
145
+ # Try to download it
146
+ try:
147
+ nltk.download(resource)
148
+ return
149
+ except Exception as e:
150
+ print(f"Failed to download NLTK resource '{resource}': {e}")
151
+
152
+ # If we reach here, we didn't manage to get any tokenizer
153
+ raise LookupError("Could not load NLTK punkt or punkt_tab tokenizer data.")
154
+
155
+
156
+
157
+ # =====================================================================
158
+ # 1. Streamlit app setup
159
+ # =====================================================================
160
+
161
+ st.set_page_config(page_title="MOSAIC Dashboard", layout="wide")
162
+ st.title("Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling Dashboard for Phenomenological Reports")
163
+
164
+ # add if use, please cite the following paper:
165
+ st.markdown(
166
+ """
167
+ _If you use this tool in your research, please cite the following paper:_\n
168
+ **Beauté, R., et al. (2025).**
169
+ **Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling and LLM applied to Stroboscopic Phenomenology**
170
+ https://arxiv.org/abs/2502.18318
171
+ """
172
+ )
173
+
174
+ # ROOT = project_root()
175
+ # sys.path.append(str(ROOT / "MULTILINGUAL"))
176
+
177
+
178
+
179
+ # =====================================================================
180
+ # 2. Dataset paths (using MOSAIC structure)
181
+ # =====================================================================
182
+
183
+ # DATASET = "INNERSPEECH"
184
+
185
+ # --- Choose dataset/project name (drives folder names) ---
186
+ ds_input = st.sidebar.text_input("Project/Dataset name", value="MOSAIC", key="dataset_name_input")
187
+ DATASET_DIR = _slugify(ds_input).upper()
188
+
189
+
190
+ RAW_DIR = raw_path(DATASET_DIR)
191
+ PROC_DIR = proc_path(DATASET_DIR, "preprocessed")
192
+ EVAL_DIR = eval_path(DATASET_DIR)
193
+ CACHE_DIR = PROC_DIR / "cache"
194
+
195
+ PROC_DIR.mkdir(parents=True, exist_ok=True)
196
+ CACHE_DIR.mkdir(parents=True, exist_ok=True)
197
+ EVAL_DIR.mkdir(parents=True, exist_ok=True)
198
+
199
+
200
+ with st.sidebar.expander("About the dataset name", expanded=False):
201
+ st.markdown(
202
+ f"""
203
+ - The name above is converted to **UPPER CASE** and used as a folder name.
204
+ - If the folder doesn’t exist, it will be **created**:
205
+ - Preprocessed CSVs: `{PROC_DIR}`
206
+ - Exports (results): `{EVAL_DIR}`
207
+ - If you choose **Use preprocessed CSV on server**, I’ll list CSVs in `{PROC_DIR}`.
208
+ - If you **upload** a CSV, it will be saved to `{PROC_DIR}/uploaded.csv`.
209
+ """.strip()
210
+ )
211
+
212
+ # DATASETS = {
213
+ # "API Translation (Batched)": str(PROC_DIR / "innerspeech_translated_batched_API.csv"),
214
+ # "Local Translation (Llama)": str(PROC_DIR / "innerspeech_dataset_translated_llama.csv"),
215
+ # }
216
+
217
+ def _list_server_csvs(proc_dir: Path) -> list[str]:
218
+ return [str(p) for p in sorted(proc_dir.glob("*.csv"))]
219
+
220
+ # will be populated later (after CSV_PATH is known)
221
+ DATASETS = None # keep name for clarity; we’ll fill it when rendering the sidebar
222
+
223
+
224
+ HISTORY_FILE = str(PROC_DIR / "run_history.json")
225
+
226
+
227
+
228
+ # =====================================================================
229
+ # 3. Embedding & LLM loaders
230
+ # =====================================================================
231
+
232
+ @st.cache_resource
233
+ def load_embedding_model(model_name):
234
+ st.info(f"Loading embedding model '{model_name}'...")
235
+ return SentenceTransformer(model_name)
236
+
237
+
238
+ # @st.cache_resource
239
+ # def load_llm_model():
240
+ # """Loads LlamaCPP quantised model for topic labeling."""
241
+ # model_repo = "NousResearch/Meta-Llama-3-8B-Instruct-GGUF"
242
+ # model_file = "Meta-Llama-3-8B-Instruct-Q4_K_M.gguf"
243
+ # model_path = hf_hub_download(repo_id=model_repo, filename=model_file)
244
+ # return Llama(model_path=model_path, n_gpu_layers=-1, n_ctx=8192,
245
+ # stop=["Q:", "\n"], verbose=False)
246
+
247
+
248
+ @st.cache_data
249
+ def load_precomputed_data(docs_file, embeddings_file):
250
+ docs = np.load(docs_file, allow_pickle=True).tolist()
251
+ emb = np.load(embeddings_file, allow_pickle=True)
252
+ return docs, emb
253
+
254
+
255
+
256
+ # =====================================================================
257
+ # 4. Topic modeling function
258
+ # =====================================================================
259
+
260
+ def get_config_hash(cfg):
261
+ return json.dumps(cfg, sort_keys=True)
262
+
263
+
264
+ @st.cache_data
265
+ def perform_topic_modeling(_docs, _embeddings, config_hash):
266
+ """Fit BERTopic using cached result."""
267
+
268
+ _docs = list(_docs)
269
+ _embeddings = np.asarray(_embeddings)
270
+ if _embeddings.dtype == object or _embeddings.ndim != 2:
271
+ try:
272
+ _embeddings = np.vstack(_embeddings)
273
+ except Exception:
274
+ st.error(f"Embeddings are invalid (dtype={_embeddings.dtype}, ndim={_embeddings.ndim}). "
275
+ "Please click **Prepare Data** to regenerate.")
276
+ st.stop()
277
+ _embeddings = np.ascontiguousarray(_embeddings, dtype=np.float32)
278
+
279
+ if _embeddings.shape[0] != len(_docs):
280
+ st.error(f"Mismatch between docs and embeddings: len(docs)={len(_docs)} vs "
281
+ f"embeddings.shape[0]={_embeddings.shape[0]}. "
282
+ "Delete the cached files for this configuration and regenerate.")
283
+ st.stop()
284
+
285
+ config = json.loads(config_hash)
286
+
287
+ # Prepare vectorizer parameters
288
+ if "ngram_range" in config["vectorizer_params"]:
289
+ config["vectorizer_params"]["ngram_range"] = tuple(config["vectorizer_params"]["ngram_range"])
290
+
291
+ # Load LLM for labeling
292
+ # llm = load_llm_model() # <-- REMOVED
293
+
294
+ # prompt = """Q:
295
+ # You are an expert in micro-phenomenology. The following documents are reflections from participants about their experience.
296
+ # I have a topic that contains the following documents:
297
+ # [DOCUMENTS]
298
+ # The topic is described by the following keywords: '[KEYWORDS]'.
299
+ # Based on the above information, give a short, informative label (5–10 words).
300
+ # A:"""
301
+
302
+ # rep_model = {
303
+ # "LLM": LlamaCPP(llm, prompt=prompt, nr_docs=25, doc_length=300, tokenizer="whitespace")
304
+ # }
305
+
306
+ # <-- MODIFIED: Use BERTopic's default representation instead of LLM
307
+ rep_model = None
308
+
309
+ umap_model = UMAP(
310
+ random_state=42, metric="cosine",
311
+ **config["umap_params"]
312
+ )
313
+ hdbscan_model = HDBSCAN(
314
+ metric="euclidean", prediction_data=True,
315
+ **config["hdbscan_params"]
316
+ )
317
+ vectorizer_model = CountVectorizer(**config["vectorizer_params"]) if config["use_vectorizer"] else None
318
+
319
+ nr_topics_val = None if config["bt_params"]["nr_topics"] == "auto" \
320
+ else int(config["bt_params"]["nr_topics"])
321
+
322
+ topic_model = BERTopic(
323
+ umap_model=umap_model,
324
+ hdbscan_model=hdbscan_model,
325
+ vectorizer_model=vectorizer_model,
326
+ representation_model=rep_model,
327
+ top_n_words=config["bt_params"]["top_n_words"],
328
+ nr_topics=nr_topics_val,
329
+ verbose=False
330
+ )
331
+
332
+ topics, _ = topic_model.fit_transform(_docs, _embeddings)
333
+ info = topic_model.get_topic_info()
334
+
335
+ outlier_pct = 0
336
+ if -1 in info.Topic.values:
337
+ outlier_pct = (info.Count[info.Topic == -1].iloc[0] / info.Count.sum()) * 100
338
+
339
+ # <-- MODIFIED: Use default topic names instead of LLM labels
340
+ # Get the default keyword-based names generated by BERTopic
341
+ topic_info = topic_model.get_topic_info()
342
+ # Create a map of {topic_id: "topic_name", ...}
343
+ name_map = topic_info.set_index('Topic')['Name'].to_dict()
344
+ # Map each document's topic_id to its name
345
+ all_labels = [name_map[topic] for topic in topics]
346
+
347
+
348
+ reduced = UMAP(
349
+ n_neighbors=15, n_components=2, min_dist=0.0,
350
+ metric="cosine", random_state=42
351
+ ).fit_transform(_embeddings)
352
+
353
+ return topic_model, reduced, all_labels, len(info)-1, outlier_pct
354
+
355
+
356
+
357
+ # =====================================================================
358
+ # 5. CSV → documents → embeddings pipeline
359
+ # =====================================================================
360
+
361
+ def generate_and_save_embeddings(csv_path, docs_file, emb_file,
362
+ selected_embedding_model,
363
+ split_sentences, device):
364
+
365
+ # ---------------------
366
+ # Load & clean CSV
367
+ # ---------------------
368
+ st.info(f"Reading and preparing CSV: {csv_path}")
369
+ df = pd.read_csv(csv_path)
370
+
371
+ col = _pick_text_column(df)
372
+ if col is None:
373
+ st.error("CSV must contain one of: " + ", ".join(ACCEPTABLE_TEXT_COLUMNS))
374
+ return
375
+
376
+ if col != "reflection_answer_english":
377
+ df = df.rename(columns={col: "reflection_answer_english"})
378
+
379
+ df.dropna(subset=["reflection_answer_english"], inplace=True)
380
+ df["reflection_answer_english"] = df["reflection_answer_english"].astype(str)
381
+ df = df[df["reflection_answer_english"].str.strip() != ""]
382
+ reports = df["reflection_answer_english"].tolist()
383
+
384
+ # ---------------------
385
+ # Sentence / report granularity
386
+ # ---------------------
387
+ if split_sentences:
388
+ try:
389
+ ensure_sentence_tokenizer()
390
+ except LookupError as e:
391
+ st.error(f"Failed to load NLTK sentence tokenizer data: {e}")
392
+ st.stop()
393
+
394
+ sentences = [s for r in reports for s in nltk.sent_tokenize(r)]
395
+ docs = [s for s in sentences if len(s.split()) > 2]
396
+ else:
397
+ docs = reports
398
+
399
+ np.save(docs_file, np.array(docs, dtype=object))
400
+ st.success(f"Prepared {len(docs)} documents")
401
+
402
+ # ---------------------
403
+ # Embeddings
404
+ # ---------------------
405
+ st.info(f"Encoding {len(docs)} documents with {selected_embedding_model} on {device}")
406
+
407
+ model = load_embedding_model(selected_embedding_model)
408
+
409
+ encode_device = None
410
+ batch_size = 32
411
+ if device == "CPU":
412
+ encode_device = "cpu"
413
+ batch_size = 64
414
+
415
+ embeddings = model.encode(
416
+ docs,
417
+ show_progress_bar=True,
418
+ batch_size=batch_size,
419
+ device=encode_device,
420
+ convert_to_numpy=True
421
+ )
422
+ embeddings = np.asarray(embeddings, dtype=np.float32)
423
+ np.save(emb_file, embeddings)
424
+
425
+ st.success("Embedding generation complete!")
426
+ st.balloons()
427
+ st.rerun()
428
+
429
+
430
+
431
+ # =====================================================================
432
+ # 6. Sidebar — dataset, upload, parameters
433
+ # =====================================================================
434
+
435
+ # --- User CSV upload / server dataset ---
436
+ st.sidebar.header("Data Input Method")
437
+
438
+ source = st.sidebar.radio(
439
+ "Choose data source",
440
+ ("Use preprocessed CSV on server", "Upload my own CSV"),
441
+ index=0,
442
+ key="data_source",
443
+ )
444
+
445
+ uploaded_csv_path = None
446
+ CSV_PATH = None # will be set in the chosen branch
447
+
448
+ # if source == "Use preprocessed CSV on server":
449
+ # # Show dataset selector ONLY in this branch
450
+ # selected_dataset_name = st.sidebar.selectbox(
451
+ # "Choose a dataset",
452
+ # list(DATASETS.keys()),
453
+ # key="dataset_name",
454
+ # )
455
+ # CSV_PATH = DATASETS[selected_dataset_name]
456
+
457
+ # else: # Upload my own CSV
458
+ # up = st.sidebar.file_uploader("Upload a CSV", type=["csv"], key="upload_csv")
459
+ # if up is not None:
460
+ # tmp_df = pd.read_csv(up)
461
+ # col = _pick_text_column(tmp_df)
462
+ # if col is None:
463
+ # st.error("CSV must contain a text column such as: " + ", ".join(ACCEPTABLE_TEXT_COLUMNS))
464
+ # st.stop()
465
+ # if col != "reflection_answer_english":
466
+ # tmp_df = tmp_df.rename(columns={col: "reflection_answer_english"})
467
+ # uploaded_csv_path = str((PROC_DIR / "uploaded.csv").resolve())
468
+ # tmp_df.to_csv(uploaded_csv_path, index=False)
469
+ # st.success(f"Uploaded CSV saved to {uploaded_csv_path}")
470
+ # CSV_PATH = uploaded_csv_path
471
+ # else:
472
+ # st.info("Upload a CSV to continue.")
473
+ # st.stop()
474
+ if source == "Use preprocessed CSV on server":
475
+ # List preprocessed CSVs inside this dataset’s folder
476
+ available = _list_server_csvs(PROC_DIR)
477
+ if not available:
478
+ st.info(f"No CSVs found in {PROC_DIR}. Switch to 'Upload my own CSV' or change the dataset name.")
479
+ st.stop()
480
+ selected_csv = st.sidebar.selectbox("Choose a preprocessed CSV", available, key="server_csv_select")
481
+ CSV_PATH = selected_csv
482
+ else:
483
+ up = st.sidebar.file_uploader("Upload a CSV", type=["csv"], key="upload_csv")
484
+ if up is not None:
485
+ tmp_df = pd.read_csv(up)
486
+ col = _pick_text_column(tmp_df)
487
+ if col is None:
488
+ st.error("CSV must contain a text column such as: " + ", ".join(ACCEPTABLE_TEXT_COLUMNS))
489
+ st.stop()
490
+ if col != "reflection_answer_english":
491
+ tmp_df = tmp_df.rename(columns={col: "reflection_answer_english"})
492
+ # Save into THIS dataset’s preprocessed folder
493
+ uploaded_csv_path = str((PROC_DIR / "uploaded.csv").resolve())
494
+ tmp_df.to_csv(uploaded_csv_path, index=False)
495
+ st.success(f"Uploaded CSV saved to {uploaded_csv_path}")
496
+ CSV_PATH = uploaded_csv_path
497
+ else:
498
+ st.info("Upload a CSV to continue.")
499
+ st.stop()
500
+
501
+
502
+
503
+
504
+ # --- Subsample ---
505
+ st.sidebar.subheader("Data Granularity & Subsampling")
506
+
507
+ selected_granularity = st.sidebar.checkbox("Split reports into sentences", value=True)
508
+ granularity_label = "sentences" if selected_granularity else "reports"
509
+
510
+ subsample_perc = st.sidebar.slider("Data sampling (%)", 10, 100, 100, 5)
511
+
512
+ # line break
513
+ st.sidebar.markdown("---")
514
+
515
+
516
+
517
+ # --- Embedding model selection ---
518
+ st.sidebar.header("Model Selection")
519
+
520
+
521
+
522
+ selected_embedding_model = st.sidebar.selectbox("Choose an embedding model", (
523
+ "intfloat/multilingual-e5-large-instruct",
524
+ "Qwen/Qwen3-Embedding-0.6B",
525
+ "BAAI/bge-small-en-v1.5",
526
+ "sentence-transformers/all-mpnet-base-v2",
527
+ ))
528
+
529
+
530
+
531
+
532
+
533
+ # --- Device selection ---
534
+ # st.sidebar.header("Data Preparation")
535
+ selected_device = st.sidebar.radio(
536
+ "Processing device",
537
+ ["GPU (MPS)", "CPU"],
538
+ index=0,
539
+ )
540
+
541
+
542
+
543
+
544
+
545
+
546
+ # =====================================================================
547
+ # 7. Precompute filenames and pipeline triggers
548
+ # =====================================================================
549
+
550
+ def get_precomputed_filenames(csv_path, model_name, split_sentences):
551
+ base = os.path.splitext(os.path.basename(csv_path))[0]
552
+ safe_model = re.sub(r"[^a-zA-Z0-9_-]", "_", model_name)
553
+ suf = "sentences" if split_sentences else "reports"
554
+ return (
555
+ str(CACHE_DIR / f"precomputed_{base}_{suf}_docs.npy"),
556
+ str(CACHE_DIR / f"precomputed_{base}_{safe_model}_{suf}_embeddings.npy"),
557
+ )
558
+
559
+ DOCS_FILE, EMBEDDINGS_FILE = get_precomputed_filenames(
560
+ CSV_PATH, selected_embedding_model, selected_granularity
561
+ )
562
+
563
+ # --- Cache management (after DOCS_FILE / EMBEDDINGS_FILE exist) ---
564
+ st.sidebar.markdown("### Cache")
565
+ if st.sidebar.button("Clear cached files for this configuration", use_container_width=True):
566
+ try:
567
+ for p in (DOCS_FILE, EMBEDDINGS_FILE):
568
+ if os.path.exists(p):
569
+ os.remove(p)
570
+ # also clear Streamlit caches tied to these functions
571
+ try:
572
+ load_precomputed_data.clear() # st.cache_data func
573
+ except Exception:
574
+ pass
575
+ try:
576
+ perform_topic_modeling.clear() # st.cache_data func
577
+ except Exception:
578
+ pass
579
+
580
+ st.success("Deleted cached docs/embeddings and cleared caches. Click **Prepare Data** again.")
581
+ st.rerun()
582
+ except Exception as e:
583
+ st.error(f"Failed to delete cache files: {e}")
584
+
585
+ #add line break
586
+ st.sidebar.markdown("---")
587
+
588
+
589
+
590
+ # =====================================================================
591
+ # 8. Prepare Data OR Run Analysis
592
+ # =====================================================================
593
+
594
+ if not os.path.exists(EMBEDDINGS_FILE):
595
+ st.warning(f"No precomputed embeddings found for this configuration ({granularity_label} / {selected_embedding_model}).")
596
+
597
+ if st.button("Prepare Data for This Configuration"):
598
+ generate_and_save_embeddings(
599
+ CSV_PATH, DOCS_FILE, EMBEDDINGS_FILE,
600
+ selected_embedding_model, selected_granularity, selected_device
601
+ )
602
+
603
+ else:
604
+ # Load cached data
605
+ docs, embeddings = load_precomputed_data(DOCS_FILE, EMBEDDINGS_FILE)
606
+
607
+ # Coerce to 2-D float array even if saved as object
608
+ embeddings = np.asarray(embeddings)
609
+ if embeddings.dtype == object or embeddings.ndim != 2:
610
+ try:
611
+ embeddings = np.vstack(embeddings).astype(np.float32)
612
+ except Exception:
613
+ st.error("Cached embeddings are invalid. Please regenerate them for this configuration.")
614
+ st.stop()
615
+
616
+ # Subsample
617
+ if subsample_perc < 100:
618
+ n = int(len(docs) * (subsample_perc / 100))
619
+ idx = np.random.choice(len(docs), size=n, replace=False)
620
+ docs = [docs[i] for i in idx]
621
+ # embeddings = embeddings[idx]
622
+ embeddings = np.asarray(embeddings)
623
+ embeddings = embeddings[idx, :] # keep it 2-D
624
+ st.warning(f"Running analysis on {subsample_perc}% subsample ({len(docs)} documents)")
625
+
626
+ # st.metric("Documents to Analyze", len(docs), granularity_label)
627
+ # --- Dataset summary metrics ---
628
+ st.subheader("Dataset summary")
629
+ n_reports = count_clean_reports(CSV_PATH) # total cleaned reports in CSV
630
+ unit = "sentences" if selected_granularity else "reports"
631
+ n_units = len(docs) # actual units analyzed
632
+
633
+ c1, c2 = st.columns(2)
634
+ c1.metric("Reports in CSV (cleaned)", n_reports)
635
+ c2.metric(f"Units analysed ({unit})", n_units)
636
+
637
+ # --- Parameter controls ---
638
+ st.sidebar.header("Model Parameters")
639
+
640
+ use_vectorizer = st.sidebar.checkbox("Use CountVectorizer", value=True)
641
+
642
+ with st.sidebar.expander("Vectorizer"):
643
+ ng_min = st.slider("Min N-gram", 1, 5, 1)
644
+ ng_max = st.slider("Max N-gram", 1, 5, 2)
645
+ min_df = st.slider("Min Doc Freq", 1, 50, 1)
646
+ stopwords = st.select_slider("Stopwords", options=[None, "english"], value=None)
647
+
648
+ with st.sidebar.expander("UMAP"):
649
+ um_n = st.slider("n_neighbors", 2, 50, 15)
650
+ um_c = st.slider("n_components", 2, 20, 5)
651
+ um_d = st.slider("min_dist", 0.0, 1.0, 0.0)
652
+
653
+ with st.sidebar.expander("HDBSCAN"):
654
+ hs = st.slider("min_cluster_size", 5, 100, 10)
655
+ hm = st.slider("min_samples", 2, 100, 5)
656
+
657
+ with st.sidebar.expander("BERTopic"):
658
+ nr_topics = st.text_input("nr_topics", value="auto")
659
+ top_n_words = st.slider("top_n_words", 5, 25, 10)
660
+
661
+ # --- Build config ---
662
+ current_config = {
663
+ "embedding_model": selected_embedding_model,
664
+ "granularity": granularity_label,
665
+ "subsample_percent": subsample_perc,
666
+ "use_vectorizer": use_vectorizer,
667
+ "vectorizer_params": {
668
+ "ngram_range": (ng_min, ng_max),
669
+ "min_df": min_df,
670
+ "stop_words": stopwords,
671
+ },
672
+ "umap_params": {
673
+ "n_neighbors": um_n,
674
+ "n_components": um_c,
675
+ "min_dist": um_d,
676
+ },
677
+ "hdbscan_params": {
678
+ "min_cluster_size": hs,
679
+ "min_samples": hm,
680
+ },
681
+ "bt_params": {
682
+ "nr_topics": nr_topics,
683
+ "top_n_words": top_n_words,
684
+ },
685
+ }
686
+
687
+ # --- Run Button ---
688
+ run_button = st.sidebar.button("Run Analysis", type="primary")
689
+
690
+
691
+ # =================================================================
692
+ # 9. Visualization & History Tabs
693
+ # =================================================================
694
+ main_tab, history_tab = st.tabs(["Main Results", "Run History"])
695
+
696
+ def load_history():
697
+ path = HISTORY_FILE
698
+ if not os.path.exists(path):
699
+ return []
700
+ try:
701
+ data = json.load(open(path))
702
+ except Exception:
703
+ return []
704
+ # --- migrate old keys for backward-compat ---
705
+ for e in data:
706
+ if "outlier_pct" not in e and "outlier_perc" in e:
707
+ e["outlier_pct"] = e.pop("outlier_perc")
708
+ return data
709
+
710
+
711
+ def save_history(h):
712
+ json.dump(h, open(HISTORY_FILE, "w"), indent=2)
713
+
714
+ if "history" not in st.session_state:
715
+ st.session_state.history = load_history()
716
+
717
+ if run_button:
718
+
719
+ if not isinstance(embeddings, np.ndarray):
720
+ embeddings = np.asarray(embeddings)
721
+
722
+ if embeddings.dtype == object or embeddings.ndim != 2:
723
+ try:
724
+ embeddings = np.vstack(embeddings).astype(np.float32)
725
+ except Exception:
726
+ st.error("Cached embeddings are invalid (object/ragged). Click **Prepare Data** to regenerate.")
727
+ st.stop()
728
+
729
+ if embeddings.shape[0] != len(docs):
730
+ st.error(f"len(docs)={len(docs)} but embeddings.shape[0]={embeddings.shape[0]}.\n"
731
+ "Likely stale cache (e.g., switched sentences↔reports or model). "
732
+ "Use the **Clear cache** button below and regenerate.")
733
+ st.stop()
734
+
735
+
736
+ with st.spinner("Performing topic modeling..."):
737
+ model, reduced, labels, n_topics, outlier_pct = perform_topic_modeling(
738
+ docs, embeddings, get_config_hash(current_config)
739
+ )
740
+ st.session_state.latest_results = (model, reduced, labels)
741
+
742
+ # Save in history
743
+ entry = {
744
+ "timestamp": str(pd.Timestamp.now()),
745
+ "config": current_config,
746
+ "num_topics": n_topics,
747
+ "outlier_pct": f"{outlier_pct:.2f}%",
748
+ "llm_labels": [ # <-- This will use the fallback logic in the export section
749
+ name for name in model.get_topic_info().Name.values
750
+ if ("Unlabelled" not in name and "outlier" not in name)
751
+ ],
752
+ }
753
+ st.session_state.history.insert(0, entry)
754
+ save_history(st.session_state.history)
755
+ st.rerun()
756
+
757
+ # --- MAIN TAB ---
758
+ with main_tab:
759
+ if "latest_results" in st.session_state:
760
+ tm, reduced, labs = st.session_state.latest_results
761
+
762
+ st.subheader("Experiential Topics Visualisation")
763
+ fig, _ = datamapplot.create_plot(reduced, labs)
764
+ st.pyplot(fig)
765
+
766
+ st.subheader("Topic Info")
767
+ st.dataframe(tm.get_topic_info())
768
+
769
+
770
+ # --- Export: one row per topic (topic_id, LLM topic_name, texts) ---
771
+ st.subheader("Export results (one row per topic)")
772
+
773
+ # 1) Pull LLM labels directly from BERTopic's representation
774
+ full_reps = tm.get_topics(full=True)
775
+ llm_reps = full_reps.get("LLM", {}) # {topic_id: [(label, score), ...], ...}
776
+
777
+ # Build topic_id -> LLM label map; fall back to Name if missing
778
+ llm_names = {}
779
+ for tid, vals in llm_reps.items():
780
+ try:
781
+ llm_names[tid] = (vals[0][0] or "").strip().strip('"').strip(".")
782
+ except Exception:
783
+ llm_names[tid] = "Unlabelled"
784
+
785
+ # <-- MODIFIED: This fallback logic is now the main logic
786
+ if not llm_names:
787
+ # Fallback: whatever BERTopic put in Name
788
+ st.caption("Note: Using default keyword-based topic names.")
789
+ llm_names = tm.get_topic_info().set_index("Topic")["Name"].to_dict()
790
+
791
+ # 2) Per-document assignments for current docs
792
+ doc_info = tm.get_document_info(docs)[["Document", "Topic"]]
793
+
794
+ # 3) Optionally remove outliers
795
+ include_outliers = st.checkbox("Include outlier topic (-1)", value=False)
796
+ if not include_outliers:
797
+ doc_info = doc_info[doc_info["Topic"] != -1]
798
+
799
+ # 4) Group texts by topic
800
+ grouped = (
801
+ doc_info.groupby("Topic")["Document"]
802
+ .apply(list)
803
+ .reset_index(name="texts")
804
+ )
805
+
806
+ # 5) Attach LLM names
807
+ grouped["topic_name"] = grouped["Topic"].map(llm_names).fillna("Unlabelled")
808
+
809
+ # 6) Reorder/rename columns
810
+ export_topics = grouped.rename(columns={"Topic": "topic_id"})[
811
+ ["topic_id", "topic_name", "texts"]
812
+ ].sort_values("topic_id").reset_index(drop=True)
813
+
814
+ # ---- CSV + JSONL outputs ----
815
+ # Fixed separator (no textbox)
816
+ # SEP = " || " # change if you prefer another fixed separator
817
+ SEP = "\n" # change if you prefer another fixed separator
818
+
819
+ # Flatten lists for CSV
820
+ export_csv = export_topics.copy()
821
+ export_csv["texts"] = export_csv["texts"].apply(lambda lst: SEP.join(map(str, lst)))
822
+
823
+ base = os.path.splitext(os.path.basename(CSV_PATH))[0]
824
+ gran = "sentences" if selected_granularity else "reports"
825
+ csv_name = f"topics_{base}_{gran}.csv"
826
+ jsonl_name = f"topics_{base}_{gran}.jsonl"
827
+ csv_path = (EVAL_DIR / csv_name).resolve()
828
+ jsonl_path = (EVAL_DIR / jsonl_name).resolve()
829
+
830
+ cL, cC, cR = st.columns(3)
831
+
832
+ with cL:
833
+ if st.button("Save CSV to eval/", use_container_width=True):
834
+ try:
835
+ export_csv.to_csv(csv_path, index=False)
836
+ st.success(f"Saved CSV → {csv_path}")
837
+ except Exception as e:
838
+ st.error(f"Failed to save CSV: {e}")
839
+
840
+ with cC:
841
+ # JSONL preserves the list structure of texts
842
+ if st.button("Save JSONL to eval/", use_container_width=True):
843
+ try:
844
+ with open(jsonl_path, "w", encoding="utf-8") as f:
845
+ for _, row in export_topics.iterrows():
846
+ rec = {
847
+ "topic_id": int(row["topic_id"]),
848
+ "topic_name": row["topic_name"],
849
+ "texts": list(map(str, row["texts"])),
850
+ }
851
+ f.write(json.dumps(rec, ensure_ascii=False) + "\n")
852
+ st.success(f"Saved JSONL → {jsonl_path}")
853
+ except Exception as e:
854
+ st.error(f"Failed to save JSONL: {e}")
855
+
856
+ with cR:
857
+ st.download_button(
858
+ "Download CSV",
859
+ data=export_csv.to_csv(index=False).encode("utf-8"),
860
+ file_name=csv_name,
861
+ mime="text/csv",
862
+ use_container_width=True,
863
+ )
864
+
865
+ st.caption("Preview (one row per topic)")
866
+ st.dataframe(export_csv.head(10))
867
+
868
+
869
+
870
+
871
+
872
+
873
+ else:
874
+ st.info("Click 'Run Analysis' to begin.")
875
+
876
+ # --- HISTORY TAB ---
877
+ with history_tab:
878
+ st.subheader("Run History")
879
+ if not st.session_state.history:
880
+ st.info("No runs yet.")
881
+ else:
882
+ for i, entry in enumerate(st.session_state.history):
883
+ with st.expander(f"Run {i+1} — {entry['timestamp']}"):
884
+ st.write(f"**Topics:** {entry['num_topics']}")
885
+ st.write(f"**Outliers:** {entry.get('outlier_pct', entry.get('outlier_perc', 'N/A'))}")
886
+ st.write("**Topic Labels (default keywords):**")
887
+ st.write(entry["llm_labels"])
888
+ with st.expander("Show full configuration"):
889
+ st.json(entry["config"])
eval/.gitkeep ADDED
File without changes
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ streamlit>=1.37
2
+ pandas
3
+ numpy
4
+ nltk
5
+ bertopic
6
+ umap-learn
7
+ hdbscan
8
+ scikit-learn
9
+ sentence-transformers
10
+ datamapplot
11
+ huggingface_hub
12
+ matplotlib
13
+ # llama-cpp-python
14
+ # https://github.com/romybeaute/MOSAIC/tree/mosaic2.1/src/mosaic
test_mosaic.csv ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ reflection_answer
2
+ I felt gentle waves of light across my vision.
3
+ "A calm, spacious feeling with slow breathing."
4
+ "Subtle pulsing behind the eyes, then deep relaxation."
5
+ "Memories surfaced briefly, then faded without emotion."
6
+ I noticed patterns when I closed my eyes.
7
+ Time felt slower for a few moments.
8
+ Warmth in my hands and a sense of clarity.
9
+ Sound felt crisper and more present.
10
+ I felt gentle waves of light across my vision.
11
+ "A calm, spacious feeling with slow breathing."
12
+ "Subtle pulsing behind the eyes, then deep relaxation."
13
+ "Memories surfaced briefly, then faded without emotion."
14
+ I noticed patterns when I closed my eyes.
15
+ Time felt slower for a few moments.
16
+ Warmth in my hands and a sense of clarity.
17
+ Sound felt crisper and more present.
18
+ I felt gentle waves of light across my vision.
19
+ "A calm, spacious feeling with slow breathing."
20
+ "Subtle pulsing behind the eyes, then deep relaxation."
21
+ "Memories surfaced briefly, then faded without emotion."
22
+ I noticed patterns when I closed my eyes.
23
+ Time felt slower for a few moments.
24
+ Warmth in my hands and a sense of clarity.
25
+ Sound felt crisper and more present.
26
+ I felt gentle waves of light across my vision.
27
+ "A calm, spacious feeling with slow breathing."
28
+ "Subtle pulsing behind the eyes, then deep relaxation."
29
+ "Memories surfaced briefly, then faded without emotion."
30
+ I noticed patterns when I closed my eyes.
31
+ Time felt slower for a few moments.
32
+ Warmth in my hands and a sense of clarity.
33
+ Sound felt crisper and more present.
34
+ I felt gentle waves of light across my vision.
35
+ "A calm, spacious feeling with slow breathing."
36
+ "Subtle pulsing behind the eyes, then deep relaxation."
37
+ "Memories surfaced briefly, then faded without emotion."
38
+ I noticed patterns when I closed my eyes.
39
+ Time felt slower for a few moments.
40
+ Warmth in my hands and a sense of clarity.
41
+ Sound felt crisper and more present.
42
+ I felt gentle waves of light across my vision.
43
+ "A calm, spacious feeling with slow breathing."
44
+ "Subtle pulsing behind the eyes, then deep relaxation."
45
+ "Memories surfaced briefly, then faded without emotion."
46
+ I noticed patterns when I closed my eyes.
47
+ Time felt slower for a few moments.
48
+ Warmth in my hands and a sense of clarity.
49
+ Sound felt crisper and more present.
50
+ I felt gentle waves of light across my vision.
51
+ "A calm, spacious feeling with slow breathing."
52
+ "Subtle pulsing behind the eyes, then deep relaxation."
53
+ "Memories surfaced briefly, then faded without emotion."
54
+ I noticed patterns when I closed my eyes.
55
+ Time felt slower for a few moments.
56
+ Warmth in my hands and a sense of clarity.
57
+ Sound felt crisper and more present.
58
+ I felt gentle waves of light across my vision.
59
+ "A calm, spacious feeling with slow breathing."
60
+ "Subtle pulsing behind the eyes, then deep relaxation."
61
+ "Memories surfaced briefly, then faded without emotion."
62
+ I noticed patterns when I closed my eyes.
63
+ Time felt slower for a few moments.
64
+ Warmth in my hands and a sense of clarity.
65
+ Sound felt crisper and more present.
66
+ I felt gentle waves of light across my vision.
67
+ "A calm, spacious feeling with slow breathing."
68
+ "Subtle pulsing behind the eyes, then deep relaxation."
69
+ "Memories surfaced briefly, then faded without emotion."
70
+ I noticed patterns when I closed my eyes.
71
+ Time felt slower for a few moments.
72
+ Warmth in my hands and a sense of clarity.
73
+ Sound felt crisper and more present.
74
+ I felt gentle waves of light across my vision.
75
+ "A calm, spacious feeling with slow breathing."
76
+ "Subtle pulsing behind the eyes, then deep relaxation."
77
+ "Memories surfaced briefly, then faded without emotion."
78
+ I noticed patterns when I closed my eyes.
79
+ Time felt slower for a few moments.
80
+ Warmth in my hands and a sense of clarity.
81
+ Sound felt crisper and more present.
82
+ I felt gentle waves of light across my vision.
83
+ "A calm, spacious feeling with slow breathing."
84
+ "Subtle pulsing behind the eyes, then deep relaxation."
85
+ "Memories surfaced briefly, then faded without emotion."
86
+ I noticed patterns when I closed my eyes.
87
+ Time felt slower for a few moments.
88
+ Warmth in my hands and a sense of clarity.
89
+ Sound felt crisper and more present.
90
+ I felt gentle waves of light across my vision.
91
+ "A calm, spacious feeling with slow breathing."
92
+ "Subtle pulsing behind the eyes, then deep relaxation."
93
+ "Memories surfaced briefly, then faded without emotion."
94
+ I noticed patterns when I closed my eyes.
95
+ Time felt slower for a few moments.
96
+ Warmth in my hands and a sense of clarity.
97
+ Sound felt crisper and more present.