# CSV Cleaner Profile and clean tabular data (CSV/TSV/Excel) with pandas. ## What it does - Profiles a file: row/col counts, dtypes, null counts, duplicates, examples. - Cleans explicitly: trim whitespace, dedupe, snake_case headers, drop empty columns, coerce column types. - Always writes to a new file; never mutates the original. ## Files | File | Purpose | |------|---------| | `SKILL.md` | Instructions the agent receives on activation | | `scripts/clean.py` | Profile and clean a CSV/TSV/Excel file | | `references/recipes.md` | Common cleaning recipes | ## Requirements Installs `pandas` (and `openpyxl` for Excel) on first use. ## License Apache-2.0. --- name: csv-cleaner display_name: CSV Cleaner description: "Profile and clean tabular data (CSV/TSV/Excel): inspect columns, types, nulls, and duplicates, then trim whitespace, fix types, dedupe, and standardize headers. Use when the user has a messy spreadsheet or CSV and wants it profiled or cleaned before analysis. Do NOT use for charting (use a plotting skill) or for writing complex documents." license: Apache-2.0 --- # CSV Cleaner Profile a tabular file and apply safe, explicit cleaning steps with pandas. ## When to use The user has a CSV/TSV/Excel file that is messy — inconsistent headers, stray whitespace, wrong types, nulls, or duplicate rows — and wants it profiled or cleaned. ## Execution steps 1. **Profile first**: `python scripts/clean.py profile data.csv`. It reports row/col counts, per-column dtype, null counts, duplicate-row count, and example values. Always profile before changing anything. 2. **Decide the cleaning steps** with the user based on the profile. See `references/recipes.md` for common fixes. 3. **Clean**: `python scripts/clean.py clean data.csv -o clean.csv` with flags: - `--strip` trim whitespace in string cells - `--dedupe` drop exact duplicate rows - `--snake-headers` normalize headers to snake_case - `--drop-empty-cols` drop all-null columns - `--coerce col:type` coerce a column (`int`, `float`, `datetime`, `str`) 4. **Report** what changed (rows before/after, columns affected) and the output path. ## Rules - Profile before cleaning; never mutate blindly. - Every transformation must be explicit and reported — no silent dropping. - Preserve the original file; always write to a new `-o` output. ## Available resources - `scripts/clean.py` — profile and clean a CSV/TSV/Excel file (installs pandas). - `references/recipes.md` — common cleaning recipes and when to apply them.
CSV Cleaner by langbot-team
Profile and clean tabular data (CSV/TSV/Excel): types, nulls, duplicates, headers.
Loading...