Command-Line Interface Guide¶

SPIDB includes a powerful command-line interface for downloading datasets and building databases without writing any Python code.

Overview¶

The SPIDB CLI provides two main commands:

spidb download - Download datasets from Kaggle
spidb build - Build databases from downloaded datasets

Installation¶

To use the CLI, install SPIDB with CLI support:

pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"

Quick Start¶

Download and Build in One Step¶

# Download both datasets and build database
spidb download --build

This will:

Download A-SPID and M-SPID datasets to data/
Create database at data/spi.db
Generate 60-second records
Generate samples for all channels

Download Only¶

# Download both datasets
spidb download

# Download specific dataset
spidb download --dataset aspids
spidb download --dataset mspids

Command Reference¶

`spidb download`¶

Download SPIDB datasets from Kaggle and optionally build the database.

Syntax¶

spidb download [OPTIONS]

Options¶

Option	Type	Default	Description
`--dataset`	choice	`both`	Which dataset(s) to download: `aspids`, `mspids`, or `both`
`--output`, `-o`	path	`data`	Output directory for datasets
`--build`	flag	-	Build database after downloading
`--db-name`	string	`spi.db`	Database filename
`--duration`	int	`60`	Record duration in seconds
`--no-records`	flag	-	Skip record generation
`--no-samples`	flag	-	Skip sample generation
`--quiet`, `-q`	flag	-	Quiet mode (less output)

Examples¶

Download both datasets to default location:

spidb download

Download and build with custom settings:

spidb download --build --duration 30 --db-name custom.db

Download only acoustic dataset to custom directory:

spidb download --dataset aspids --output /path/to/data

Download quietly without building:

spidb download -q

Download and build database without samples:

spidb download --build --no-samples

`spidb build`¶

Build SPIDB database from previously downloaded datasets.

Syntax¶

spidb build [OPTIONS]

Options¶

Option	Type	Default	Description
`--data-dir`	path	`data`	Directory containing downloaded datasets
`--db-name`	string	`spi.db`	Database filename
`--duration`	int	`60`	Record duration in seconds
`--no-records`	flag	-	Skip record generation
`--no-samples`	flag	-	Skip sample generation

Examples¶

Build with default settings:

spidb build

Build with custom duration:

spidb build --duration 120

Build from custom data directory:

spidb build --data-dir /path/to/datasets

Build database structure only (no records/samples):

spidb build --no-records --no-samples

Build with records but no samples:

spidb build --no-samples

Workflows¶

First-Time Setup¶

Complete setup for new users:

# 1. Install SPIDB with CLI
pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"

# 2. Download and build everything
spidb download --build

# 3. Verify database
python -c "from spidb import Database; db = Database('data/spi.db'); print(f'Samples: {db.session.query(db.Sample).count()}')"

Rebuild Database with Different Settings¶

If you want to rebuild with different record durations:

# Datasets are already downloaded, just rebuild
spidb build --duration 30 --db-name spi_30s.db
spidb build --duration 90 --db-name spi_90s.db

Update Datasets¶

Download latest dataset versions:

# Remove old datasets
rm -rf data/aspids data/mspids

# Download fresh copies
spidb download

# Rebuild database
spidb build

Separate Download and Build¶

For slower connections or testing:

# Step 1: Download datasets (can be interrupted and resumed)
spidb download --dataset aspids
spidb download --dataset mspids

# Step 2: Build database later
spidb build

Understanding the Output¶

Download Output¶

[1/6] Select datasets to download
Datasets: A-SPID, M-SPID

[2/6] Checking for Kaggle package...
✓ Kaggle package found

[3/6] Checking Kaggle API credentials...
✓ Credentials found at: ~/.kaggle/kaggle.json

[4/6] Setting up download directory...
✓ Download directory: /path/to/data

[5/6] Downloading 2 dataset(s)...
--- Dataset 1/2: A-SPID ---
  ✓ Downloaded successfully

--- Dataset 2/2: M-SPID ---
  ✓ Downloaded successfully

✓ All datasets downloaded successfully!

Build Output¶

[7/8] Building Database
Database location: /path/to/data/spi.db

✓ Database created

Populating A-SPID...
  Metadata: /path/to/data/aspids/metadata.json
  Audio files: /path/to/data/aspids
✓ Populated

Generating records (duration: 60s)...
✓ Records generated

Generating samples...
✓ Samples generated

✓ Database built successfully
  Location: /path/to/data/spi.db
  Size: 1.23 MB

Output Structure¶

After running spidb download --build, you’ll have:

data/
├── aspids/
│   ├── metadata.json
│   └── 2023/
│       ├── 04_13/
│       ├── 04_24/
│       └── 05_09/
├── mspids/
│   ├── metadata.json
│   └── 2024/
│       └── ...
└── spi.db

Troubleshooting¶

“Kaggle package required”¶

Install CLI support:

pip install kaggle

Or reinstall with CLI extras:

pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"

“Kaggle credentials not found”¶

Set up your Kaggle API credentials:

Go to https://www.kaggle.com/settings/account
Click “Create New Token”
Move kaggle.json to ~/.kaggle/
On Linux/Mac: chmod 600 ~/.kaggle/kaggle.json

“No datasets found”¶

Make sure you’ve downloaded datasets first:

spidb download

Then build:

spidb build

Download Fails¶

Common issues:

Haven’t accepted dataset terms
- Visit dataset page on Kaggle and accept terms
- A-SPID
- M-SPID
Invalid credentials
- Regenerate token on Kaggle
- Replace ~/.kaggle/kaggle.json
Network issues
- Check internet connection
- Try downloading one dataset at a time
- Use --quiet flag to reduce output

“Cannot build database - spidb package not properly installed”¶

The build functions aren’t available. Reinstall:

pip install --force-reinstall "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"

Advanced Usage¶

Custom Database Location¶

# Download to custom location
spidb download --output ~/my_data

# Build from custom location
spidb build --data-dir ~/my_data --db-name ~/databases/spi.db

Multiple Databases¶

Create databases with different configurations:

# Full database with samples
spidb build --db-name spi_full.db

# Lightweight database without samples
spidb build --db-name spi_light.db --no-samples

# Archive with longer records
spidb build --db-name spi_archive.db --duration 300

Scripting¶

Use in shell scripts:

#!/bin/bash

# Setup script
echo "Setting up SPIDB..."

# Download if not exists
if [ ! -d "data/aspids" ]; then
    echo "Downloading datasets..."
    spidb download
fi

# Build multiple configurations
for duration in 30 60 120; do
    echo "Building database with ${duration}s records..."
    spidb build --duration $duration --db-name "spi_${duration}s.db"
done

echo "Setup complete!"

Next Steps¶

See Usage for Python API examples
See Database for schema details
See Models for data model reference

Command-Line Interface Guide¶

Overview¶

Installation¶

Quick Start¶

Download and Build in One Step¶

Download Only¶

Command Reference¶

spidb download¶

Syntax¶

Options¶

Examples¶

spidb build¶

Syntax¶

Options¶

Examples¶

Workflows¶

First-Time Setup¶

Rebuild Database with Different Settings¶

Update Datasets¶

Separate Download and Build¶

Understanding the Output¶

Download Output¶

Build Output¶

Output Structure¶

Troubleshooting¶

“Kaggle package required”¶

“Kaggle credentials not found”¶

“No datasets found”¶

Download Fails¶

“Cannot build database - spidb package not properly installed”¶

Advanced Usage¶

Custom Database Location¶

Multiple Databases¶

Scripting¶

Next Steps¶

`spidb download`¶

`spidb build`¶