Command-Line Interface Guide

SPIDB includes a powerful command-line interface for downloading datasets and building databases without writing any Python code.

Overview

The SPIDB CLI provides two main commands:

  • spidb download - Download datasets from Kaggle

  • spidb build - Build databases from downloaded datasets

Installation

To use the CLI, install SPIDB with CLI support:

pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"

Quick Start

Download and Build in One Step

# Download both datasets and build database
spidb download --build

This will:

  1. Download A-SPID and M-SPID datasets to data/

  2. Create database at data/spi.db

  3. Generate 60-second records

  4. Generate samples for all channels

Download Only

# Download both datasets
spidb download

# Download specific dataset
spidb download --dataset aspids
spidb download --dataset mspids

Command Reference

spidb download

Download SPIDB datasets from Kaggle and optionally build the database.

Syntax

spidb download [OPTIONS]

Options

Option

Type

Default

Description

--dataset

choice

both

Which dataset(s) to download: aspids, mspids, or both

--output, -o

path

data

Output directory for datasets

--build

flag

-

Build database after downloading

--db-name

string

spi.db

Database filename

--duration

int

60

Record duration in seconds

--no-records

flag

-

Skip record generation

--no-samples

flag

-

Skip sample generation

--quiet, -q

flag

-

Quiet mode (less output)

Examples

Download both datasets to default location:

spidb download

Download and build with custom settings:

spidb download --build --duration 30 --db-name custom.db

Download only acoustic dataset to custom directory:

spidb download --dataset aspids --output /path/to/data

Download quietly without building:

spidb download -q

Download and build database without samples:

spidb download --build --no-samples

spidb build

Build SPIDB database from previously downloaded datasets.

Syntax

spidb build [OPTIONS]

Options

Option

Type

Default

Description

--data-dir

path

data

Directory containing downloaded datasets

--db-name

string

spi.db

Database filename

--duration

int

60

Record duration in seconds

--no-records

flag

-

Skip record generation

--no-samples

flag

-

Skip sample generation

Examples

Build with default settings:

spidb build

Build with custom duration:

spidb build --duration 120

Build from custom data directory:

spidb build --data-dir /path/to/datasets

Build database structure only (no records/samples):

spidb build --no-records --no-samples

Build with records but no samples:

spidb build --no-samples

Workflows

First-Time Setup

Complete setup for new users:

# 1. Install SPIDB with CLI
pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"

# 2. Download and build everything
spidb download --build

# 3. Verify database
python -c "from spidb import Database; db = Database('data/spi.db'); print(f'Samples: {db.session.query(db.Sample).count()}')"

Rebuild Database with Different Settings

If you want to rebuild with different record durations:

# Datasets are already downloaded, just rebuild
spidb build --duration 30 --db-name spi_30s.db
spidb build --duration 90 --db-name spi_90s.db

Update Datasets

Download latest dataset versions:

# Remove old datasets
rm -rf data/aspids data/mspids

# Download fresh copies
spidb download

# Rebuild database
spidb build

Separate Download and Build

For slower connections or testing:

# Step 1: Download datasets (can be interrupted and resumed)
spidb download --dataset aspids
spidb download --dataset mspids

# Step 2: Build database later
spidb build

Understanding the Output

Download Output

[1/6] Select datasets to download
Datasets: A-SPID, M-SPID

[2/6] Checking for Kaggle package...
✓ Kaggle package found

[3/6] Checking Kaggle API credentials...
✓ Credentials found at: ~/.kaggle/kaggle.json

[4/6] Setting up download directory...
✓ Download directory: /path/to/data

[5/6] Downloading 2 dataset(s)...
--- Dataset 1/2: A-SPID ---
  ✓ Downloaded successfully

--- Dataset 2/2: M-SPID ---
  ✓ Downloaded successfully

✓ All datasets downloaded successfully!

Build Output

[7/8] Building Database
Database location: /path/to/data/spi.db

✓ Database created

Populating A-SPID...
  Metadata: /path/to/data/aspids/metadata.json
  Audio files: /path/to/data/aspids
✓ Populated

Generating records (duration: 60s)...
✓ Records generated

Generating samples...
✓ Samples generated

✓ Database built successfully
  Location: /path/to/data/spi.db
  Size: 1.23 MB

Output Structure

After running spidb download --build, you’ll have:

data/
├── aspids/
│   ├── metadata.json
│   └── 2023/
│       ├── 04_13/
│       ├── 04_24/
│       └── 05_09/
├── mspids/
│   ├── metadata.json
│   └── 2024/
│       └── ...
└── spi.db

Troubleshooting

“Kaggle package required”

Install CLI support:

pip install kaggle

Or reinstall with CLI extras:

pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"

“Kaggle credentials not found”

Set up your Kaggle API credentials:

  1. Go to https://www.kaggle.com/settings/account

  2. Click “Create New Token”

  3. Move kaggle.json to ~/.kaggle/

  4. On Linux/Mac: chmod 600 ~/.kaggle/kaggle.json

“No datasets found”

Make sure you’ve downloaded datasets first:

spidb download

Then build:

spidb build

Download Fails

Common issues:

  1. Haven’t accepted dataset terms

    • Visit dataset page on Kaggle and accept terms

    • A-SPID

    • M-SPID

  2. Invalid credentials

    • Regenerate token on Kaggle

    • Replace ~/.kaggle/kaggle.json

  3. Network issues

    • Check internet connection

    • Try downloading one dataset at a time

    • Use --quiet flag to reduce output

“Cannot build database - spidb package not properly installed”

The build functions aren’t available. Reinstall:

pip install --force-reinstall "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"

Advanced Usage

Custom Database Location

# Download to custom location
spidb download --output ~/my_data

# Build from custom location
spidb build --data-dir ~/my_data --db-name ~/databases/spi.db

Multiple Databases

Create databases with different configurations:

# Full database with samples
spidb build --db-name spi_full.db

# Lightweight database without samples
spidb build --db-name spi_light.db --no-samples

# Archive with longer records
spidb build --db-name spi_archive.db --duration 300

Scripting

Use in shell scripts:

#!/bin/bash

# Setup script
echo "Setting up SPIDB..."

# Download if not exists
if [ ! -d "data/aspids" ]; then
    echo "Downloading datasets..."
    spidb download
fi

# Build multiple configurations
for duration in 30 60 120; do
    echo "Building database with ${duration}s records..."
    spidb build --duration $duration --db-name "spi_${duration}s.db"
done

echo "Setup complete!"

Next Steps

  • See Usage for Python API examples

  • See Database for schema details

  • See Models for data model reference