bytefreq is the command-line version of DataRadar, designed for power users who need to profile large datasets (millions of rows) on their desktop or in CI/CD pipelines. Built in Rust with multi-threaded processing, it's fast, reliable, and open source.
You'll need the Rust toolchain installed:
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo install --git https://github.com/minkymorgan/bytefreq
bytefreq --version
# Pipe CSV data to bytefreq
cat data.csv | bytefreq
# Use high-grain Unicode masking
cat data.csv | bytefreq -g HU
# Profile NDJSON data
cat data.json | bytefreq -f json
# Low-grain masking for compressed patterns
cat data.json | bytefreq -f json -g LU
# Profile Excel file (requires --features excel build)
cargo install --git https://github.com/minkymorgan/bytefreq --features excel
# Profile specific sheet
bytefreq -f excel --excel-path data.xlsx --sheet 1
# Analyze character frequencies (useful for encoding issues)
cat data.csv | bytefreq -r CP
# Generate enhanced JSON with data quality assertions
cat data.csv | bytefreq -e
# Flat enhanced JSON (easier parsing)
cat data.csv | bytefreq -E
Full documentation, examples, and source code available on GitHub:
If you're working with datasets that exceed millions of rows (billions or trillions of data points), check out our enterprise Spark-based solution: