No description
  • Rust 86.4%
  • Python 5.8%
  • Svelte 4.1%
  • TypeScript 2.6%
  • Shell 0.9%
Find a file
2026-05-18 20:09:41 -07:00
crates fix soulseek downloader 2026-05-18 20:09:41 -07:00
data first pass 2026-03-27 17:28:36 -07:00
docs backlog update 2026-05-17 14:37:16 -07:00
frontend backlog update 2026-05-17 14:37:16 -07:00
mk genre import 2026-05-13 19:44:48 -07:00
src fix soulseek downloader 2026-05-18 20:09:41 -07:00
tests fix soulseek downloader 2026-05-18 20:09:41 -07:00
tools genre import 2026-05-13 19:44:48 -07:00
.gitignore todo 2026-05-14 00:36:07 -07:00
backlog-methodology.md backlog checkpoint 2026-04-15 13:41:33 -07:00
Cargo.lock rename, backlog 2026-05-16 22:44:24 -07:00
Cargo.toml rename, backlog 2026-05-16 22:44:24 -07:00
Dockerfile genre import 2026-05-13 19:44:48 -07:00
README.md improvements 2026-05-14 20:21:31 -07:00
rust-toolchain.toml first pass 2026-03-27 17:28:36 -07:00
TODO.md first pass 2026-03-27 17:28:36 -07:00
todo.md todo 2026-05-14 00:36:07 -07:00
TUTORIAL.md improvements 2026-05-14 20:21:31 -07:00

Music Recommender

A Bay Area-first music discovery engine that starts from upcoming local shows, then helps you figure out what to listen to next.

Current scope

  • Ingest local shows from 19hz and Foopee via the structured crawler on localhost:11235
  • Capture editorial recommendation signals from NPR, The Needle Drop, and KEXP
  • Store artists, venues, events, genres, images, and source evidence in SQLite
  • Enrich artists with Every Noise and MusicBrainz genre data
  • Produce genre-first artist recommendations using simple likes and dislikes
  • Import persistent loved-artist assertions from CSV and use them as cautious taste signals
  • Record attended shows directly from the CLI or CSV with artist/venue resolution and unresolved review support
  • Use stored loved-artist and attended-show history as light, explicit taste signals in recommendation scoring

Quick start

cargo run -- init-db
cargo run -- doctor
cargo run -- musicbrainz-health
cargo run -- add-loved-artist Broadcast --source manual --note "favorite band"
cargo run -- import-loved-artists ./loved-artists.csv --default-source manual
cargo run -- list-loved-artists
cargo run -- add-attended-show --artist Broadcast --venue "Fox Theater" --source manual
cargo run -- import-attended-shows ./attended-shows.csv --default-source history.csv
cargo run -- list-attended-shows --unresolved-only
cargo run -- completions --all-shells --output-dir ./completions
cargo run -- ingest-all --artist-limit 500
cargo run -- recommend --liked-genre punk --disliked-genre edm

Notes

  • The app expects the crawler service at http://localhost:11235/crawl
  • The app now defaults to XDG locations: ~/.cache/music-recommender, ~/.config/music-recommender, and ~/.local/share/music-recommender
  • On first run, the app writes a config template to ~/.config/music-recommender/config.toml
  • Point the app at your own MusicBrainz by setting musicbrainz_domain = "192.168.50.119:5000" or musicbrainz_domain = "https://musicbrainz.example.com" in config; the app derives the /ws/2 base URL automatically
  • musicbrainz_base_url still exists as an explicit full-URL override and takes precedence over musicbrainz_domain
  • crawler_url in config may be a comma-separated fallback list
  • Set searx_search_url and crawl4ai_url in ~/.config/music-recommender/config.toml or via SEARX_SEARCH_URL / CRAWL4AI_URL; both accept comma-separated failover lists tried in order, and crawl4ai_url can be either the bare domain or the full /md endpoint
  • The default SQLite database lives at ~/.local/share/music-recommender/music-recommender.sqlite3
  • On first run, the bundled Every Noise seed file is copied into ~/.local/share/music-recommender/everynoise_genres_20260317_135315.json unless MUSIC_RECOMMENDER_EVERYNOISE_PATH is set
  • Crawler and MusicBrainz responses are cached under ~/.cache/music-recommender
  • Show crawler responses default to a 24 hour TTL and signal crawler responses default to a 48 hour TTL
  • Configure crawler TTLs with crawler_show_cache_ttl_secs and crawler_signal_cache_ttl_secs
  • The crawler also has configurable timeout and retry settings: crawler_timeout_secs, crawler_max_retries, and crawler_retry_base_delay_ms
  • Set musicbrainz_user_agent in ~/.config/music-recommender/config.toml or via MUSIC_RECOMMENDER_MUSICBRAINZ_USER_AGENT; use a real contactable value like music-recommender/0.1.0 (you@example.com)
  • MusicBrainz lookups retry transient failures; for public MusicBrainz the default is a small self-throttle with parallelism 1, while self-hosted MusicBrainz defaults to no throttle and higher musicbrainz_parallelism
  • Tune MusicBrainz behavior with musicbrainz_min_interval_ms, musicbrainz_max_retries, musicbrainz_retry_base_delay_ms, and musicbrainz_parallelism
  • Use cargo run -- doctor to print resolved config and endpoint reachability
  • Use cargo run -- musicbrainz-health to run a fast MusicBrainz-specific probe that validates both artist search and artist lookup with usable genre data
  • Use cargo run -- import-loved-artists ./loved-artists.csv --default-source manual to store explicit loved-artist assertions from CSV and immediately try to enrich them with MusicBrainz genres
  • Use cargo run -- add-loved-artist Broadcast --source manual for quick one-off loved-artist entry without preparing a CSV first
  • Use cargo run -- list-loved-artists to inspect the stored per-source loved-artist assertions, including notes, confidence, and any persisted MBID
  • Use cargo run -- add-attended-show --artist "Visages" --venue "Public Works" --source manual to log attended shows directly from the CLI
  • Use cargo run -- import-attended-shows ./attended-shows.csv --default-source history.csv to import attended-show history from CSV using an artists column split by |
  • Use cargo run -- ingest-signals --source kexp --refresh to ingest a broader recent KEXP review set plus a recent KEXP playlist snapshot
  • add-attended-show tries exact local resolution first, then MusicBrainz for artists; unresolved venue/artist names are preserved instead of discarded
  • import-attended-shows uses the same resolution path and also merges likely duplicates so a corrected typo re-import can land on the earlier unresolved row instead of creating a second copy
  • Use cargo run -- list-attended-shows --unresolved-only to review typo-prone or still-unmatched attended-show rows and see likely local candidates
  • KEXP review mentions and KEXP playlist plays are now stored as separate editorial sources, so recommendation buzz/source diversity can treat them comparably to NPR and Needledrop support
  • KEXP review and playlist support can also weakly reinforce an artist's already-strong genres during enrichment, so KEXP affects genre overlap as well as editorial buzz
  • If the local crawler is down, ingest-signals --source kexp now warns and continues with the playlist snapshot instead of failing the whole KEXP ingest
  • Use -v or --verbose on any command to print detailed request/progress output
  • imported loved artists and resolved attended-show artists now feed recommend automatically as light exact-match and cautious genre-overlap signals
  • recommend now defaults to compact output; add --explain if you want the full accumulated reason text
  • recommend now caches results under ~/.cache/music-recommender/recommend for 15 minutes and invalidates when recommendation-relevant data changes
  • Use cargo run -- recommend --refresh to force a recompute, or cargo run -- recommend --no-cache to bypass the recommendation cache entirely
  • Use cargo run -- recommend --after 10d --before 30d to limit recommendations to shows happening within a relative day window
  • Use cargo run -- recommend --after 2026-05-01 --before 2026-06-30 to filter by absolute show-day bounds
  • --after and --before work at the local event-day level and combine with --horizon-days; invalid ranges where --after resolves later than --before fail fast
  • Use cargo run -- completions --all-shells --output-dir ./completions to generate Bash, Zsh, Fish, Elvish, Nushell, and PowerShell completions plus an install guide
  • Use cargo run -- completions --shell zsh --stdout to print a single shell script directly to stdout
  • Use cargo run -- ingest-all --artist-limit 500 to run ingest-shows, ingest-signals, and enrich-artists in one command
  • Add --check-musicbrainz to enrich-artists or ingest-all to run the fast MusicBrainz preflight before any MusicBrainz-backed work starts and fail early if it does not pass
  • ingest-all fails fast by default; add --allow-signal-ingest-failures if you want it to continue into enrichment when editorial signal fetching fails
  • Use --cache-ttl-hours <n> on ingest-shows or ingest-signals to override crawler TTLs for that run
  • Use --show-cache-ttl-hours <n> and --signal-cache-ttl-hours <n> on ingest-all to override stage-specific crawler TTLs
  • Use --cache-forever on ingest-shows or ingest-signals to reuse the last crawler response regardless of age
  • Use --show-cache-forever and --signal-cache-forever on ingest-all to reuse stale cache for those stages
  • Use --refresh to skip cache reads and write fresh responses, or --no-cache to skip both reads and writes entirely
  • The Rust implementation avoids unsafe