bytefreq CLI

Command-Line Data Quality Tool

bytefreq is the command-line version of DataRadar, designed for power users who need to profile large datasets (millions of rows) on their desktop or in CI/CD pipelines. Built in Rust with multi-threaded processing, it's fast, reliable, and open source.

Features

Installation

Prerequisites

You'll need the Rust toolchain installed:

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Install from GitHub

cargo install --git https://github.com/minkymorgan/bytefreq

Verify Installation

bytefreq --version

Quick Start

Basic CSV Profiling

# Pipe CSV data to bytefreq
cat data.csv | bytefreq

# Use high-grain Unicode masking
cat data.csv | bytefreq -g HU

JSON Profiling

# Profile NDJSON data
cat data.json | bytefreq -f json

# Low-grain masking for compressed patterns
cat data.json | bytefreq -f json -g LU

Excel File Profiling

# Profile Excel file (requires --features excel build)
cargo install --git https://github.com/minkymorgan/bytefreq --features excel

# Profile specific sheet
bytefreq -f excel --excel-path data.xlsx --sheet 1

Character Profiling

# Analyze character frequencies (useful for encoding issues)
cat data.csv | bytefreq -r CP

Enhanced JSON Output

# Generate enhanced JSON with data quality assertions
cat data.csv | bytefreq -e

# Flat enhanced JSON (easier parsing)
cat data.csv | bytefreq -E

Documentation

Full documentation, examples, and source code available on GitHub:

View on GitHub

Need More Power?

If you're working with datasets that exceed millions of rows (billions or trillions of data points), check out our enterprise Spark-based solution:

Explore Enterprise Options