You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Music/docs/superpowers/specs/2026-05-30-itunes-date-back...

97 lines
4.7 KiB

# iTunes/Music → app DB date & stats backfill (one-time)
Date: 2026-05-30
## Problem
`ScannerService.extractMetadata` sets `dateAdded: Date()` at scan time
(`Music/Services/ScannerService.swift:186`), so every track's "added date" in the
app DB is really its *scan* date, not the date the user originally added it in
Apple Music. The user wants the **true `Date Added`** (and, since Music.app tracks
them too, `Play Count`, `Rating`, and last-played) copied from their Apple Music
library into the app's SQLite database.
## Context (verified)
- The app is **sandboxed**. Current `PRODUCT_BUNDLE_IDENTIFIER` is `com.staxriver.mu`
(HEAD and working tree; not part of the uncommitted diff). The live DB is therefore at:
`~/Library/Containers/com.staxriver.mu/Data/Library/Application Support/Music/db.sqlite`
- **The real app and real library are on a different computer.** This machine only has a
3-track dev DB. The script must be portable and is intended to run on the other Mac.
- The user confirmed the audio files are the **same files** Apple Music references (they
live inside Apple Music's media folder, e.g.
`~/Music/Music/Media.localized/Music/...`). So the join key is the **file path**.
- `tracks.dateAdded` is a GRDB `.datetime` column, stored as the string
`YYYY-MM-DD HH:MM:SS.SSS` in UTC (confirmed from existing rows, e.g.
`2026-05-24 06:46:01.713`). GRDB is lenient on read, so `....000` round-trips.
- App rating scale is **0–5 stars** (`TrackTableView.swift:284` renders
`String(repeating: "★", count: track.rating)`). Music.app stores 0–100, so map `// 20`.
## Approach
A single **stdlib-only Python 3 script**, run once. Source of truth is the Music.app
**File ▸ Library ▸ Export Library…** XML plist (chosen over live AppleScript: no
Automation prompt, no timeouts, trivially parseable with `plistlib`). On matched tracks
it does a **blunt overwrite** of all four fields, with Music.app as the source of truth.
## Matching (the bug-prone part)
Join Music.app `Location` to `tracks.fileURL` on a **normalized decoded POSIX path**,
not raw URL strings. `norm_path()`:
1. strip leading `file://`, then optional `localhost` host segment,
2. percent-decode (`urllib.parse.unquote`),
3. `unicodedata.normalize("NFC", …)` — neutralizes the accented-filename NFC/NFD mismatch
between APFS storage and the two URL string sources,
4. strip a trailing slash.
Music tracks with no `Location` (Apple Music streaming entries) are skipped.
## Field mapping (matched rows only; blunt overwrite)
| Column | XML key | Rule |
|----------------|------------------|-------------------------------------------------------------|
| `dateAdded` | `Date Added` | `%Y-%m-%d %H:%M:%S.000` UTC. If absent, keep existing (col is NOT NULL). |
| `playCount` | `Play Count` | integer, `0` if absent. |
| `rating` | `Rating` (0–100) | `// 20` → 0–5, `0` if absent. |
| `lastPlayedAt` | `Play Date UTC` | same date format, or `NULL` if absent. |
## Safety
- **Dry-run by default**: prints match rate, a sample of before→after changes, and the
counts + samples of unmatched-in-DB and unmatched-in-XML. Writes nothing.
- `--apply`: first copies `db.sqlite` + `-wal` + `-shm` to a timestamped backup, then
performs all writes in a single transaction, then `PRAGMA wal_checkpoint(TRUNCATE)`.
Reversible by restoring the backup.
- The app must be **quit** before running so the sandbox DB isn't mid-write.
## CLI
```
python3 scripts/backfill_itunes_dates.py --xml <Library.xml> [--db <path>] [--apply] [--self-test]
```
- Default `--db`: `~/Library/Containers/com.staxriver.mu/Data/Library/Application Support/Music/db.sqlite`
computed from `$HOME`, so it resolves on the other Mac too.
## Testing
`scripts/test_backfill_itunes_dates.py` (stdlib `unittest`):
- `norm_path`: NFC/NFD equivalence, `file://localhost/` form, percent-encoding,
filenames with spaces/`#`/parentheses/apostrophes.
- `build_updates`: date formatting, rating `// 20`, playCount & lastPlayed present/absent,
unmatched-row handling.
- Integration: a temp SQLite DB with the real `tracks` schema seeded with the user's actual
3 track paths + scan dates and a synthetic Library.xml → `apply` → assert rows updated.
## Delivery
Script + test live in the repo under `scripts/`. The user commits (via `/commit`), pushes,
pulls on the real machine, runs **File ▸ Library ▸ Export Library…** there, then runs the
dry-run, eyeballs the match rate, and re-runs with `--apply`.
## Out of scope
Scanning the library into the app (the user does that in-app first), ongoing/automatic sync,
and non-file (streaming) tracks.