2.9 KiB
2.9 KiB
Session history (Oct 25, 2025)
This document captures the key decisions, features added, and workflows established in the current development session so that future runs have quick context.
Highlights
- Added a new plot:
daily_volume_and_sentiment.pngshowing bars for total volume (posts+replies) and lines for positive% and negative% per day. - Improved daily activity chart with in-plot match labels (team abbreviations), density controls, and dynamic width/height.
- Implemented matchday sentiment rollups and plots:
matchday_sentiment_overall.csv/.png,matchday_posts_volume_vs_sentiment.png. - Integrated multiple sentiment backends:
- VADER (default)
- Transformers (local model at
models/sentiment-distilbert) - Local GPT via Ollama (JSON {label, confidence} mapped to compound) with graceful fallback to VADER
- Labeled data workflow:
src/apply_labels.pymerges labels back into posts/replies assentiment_label- Analyzer reuses
sentiment_labelwhen present src/plot_labeled.pyprovides QA plots
- Convenience: created
run_allalias to run from scratch (scrape → replies → fixtures → analyze) non-interactively.
Key files and outputs
- Code
src/analyze_csv.py— analyzer with plots and matchday integration (now with module docstring)src/gpt_sentiment.py,src/transformer_sentiment.py,src/auto_label_sentiment.py,src/apply_labels.py,src/plot_labeled.pyscripts/aliases.zsh— includesrun_all,apply_labels_and_analyze, and more
- Outputs (examples)
data/daily_activity_stacked.pngdata/daily_volume_and_sentiment.pngdata/posts_heatmap_hour_dow.pngdata/sentiment_by_tag_posts.pngdata/matchday_sentiment_overall.csv/.pngdata/matchday_posts_volume_vs_sentiment.png
Important flags (analyze)
- Sizing:
--plot-width-scale,--plot-max-width,--plot-height - Labels:
--activity-top-n,--labels-max-per-day,--labels-per-line,--labels-stagger-rows,--labels-band-y,--labels-annotate-mode - Sentiment backends:
--sentiment-backend vader|transformers|gpt, plus--transformers-modelor--gpt-model/--gpt-base-url - Emoji:
--emoji-mode keep|demojize|stripand--emoji-boost
Aliases summary
run_all [CH] [START] [END] [POSTS] [REPLIES] [FIXTURES] [TAGS] [SESS_SCRAPE] [SESS_REPLIES] [CONC] [BACKEND] [MODEL] [GPT_MODEL] [GPT_URL]- Full pipeline non-interactive, defaults set in
scripts/aliases.zsh
- Full pipeline non-interactive, defaults set in
apply_labels_and_analyze [LABELED_CSV] [POSTS_IN] [REPLIES_IN] [POSTS_OUT] [REPLIES_OUT]analyze_transformers,analyze_emoji,analyze_combined,fast_replies,chunked_forwards,plot_labeled
Old vs New outputs
- We maintain side-by-side outputs under
data/oldanddata/newwhen running legacy vs labeled pipelines.
Next ideas
- Per-club matchday sentiment breakdowns (fixture-level small multiples)
- Side-by-side montage generation for old vs new plots