Command-Line Interface Guide¶
SPIDB includes a powerful command-line interface for downloading datasets and building databases without writing any Python code.
Overview¶
The SPIDB CLI provides two main commands:
spidb download- Download datasets from Kagglespidb build- Build databases from downloaded datasets
Installation¶
To use the CLI, install SPIDB with CLI support:
pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"
Quick Start¶
Download and Build in One Step¶
# Download both datasets and build database
spidb download --build
This will:
Download A-SPID and M-SPID datasets to
data/Create database at
data/spi.dbGenerate 60-second records
Generate samples for all channels
Download Only¶
# Download both datasets
spidb download
# Download specific dataset
spidb download --dataset aspids
spidb download --dataset mspids
Command Reference¶
spidb download¶
Download SPIDB datasets from Kaggle and optionally build the database.
Syntax¶
spidb download [OPTIONS]
Options¶
Option |
Type |
Default |
Description |
|---|---|---|---|
|
choice |
|
Which dataset(s) to download: |
|
path |
|
Output directory for datasets |
|
flag |
- |
Build database after downloading |
|
string |
|
Database filename |
|
int |
|
Record duration in seconds |
|
flag |
- |
Skip record generation |
|
flag |
- |
Skip sample generation |
|
flag |
- |
Quiet mode (less output) |
Examples¶
Download both datasets to default location:
spidb download
Download and build with custom settings:
spidb download --build --duration 30 --db-name custom.db
Download only acoustic dataset to custom directory:
spidb download --dataset aspids --output /path/to/data
Download quietly without building:
spidb download -q
Download and build database without samples:
spidb download --build --no-samples
spidb build¶
Build SPIDB database from previously downloaded datasets.
Syntax¶
spidb build [OPTIONS]
Options¶
Option |
Type |
Default |
Description |
|---|---|---|---|
|
path |
|
Directory containing downloaded datasets |
|
string |
|
Database filename |
|
int |
|
Record duration in seconds |
|
flag |
- |
Skip record generation |
|
flag |
- |
Skip sample generation |
Examples¶
Build with default settings:
spidb build
Build with custom duration:
spidb build --duration 120
Build from custom data directory:
spidb build --data-dir /path/to/datasets
Build database structure only (no records/samples):
spidb build --no-records --no-samples
Build with records but no samples:
spidb build --no-samples
Workflows¶
First-Time Setup¶
Complete setup for new users:
# 1. Install SPIDB with CLI
pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"
# 2. Download and build everything
spidb download --build
# 3. Verify database
python -c "from spidb import Database; db = Database('data/spi.db'); print(f'Samples: {db.session.query(db.Sample).count()}')"
Rebuild Database with Different Settings¶
If you want to rebuild with different record durations:
# Datasets are already downloaded, just rebuild
spidb build --duration 30 --db-name spi_30s.db
spidb build --duration 90 --db-name spi_90s.db
Update Datasets¶
Download latest dataset versions:
# Remove old datasets
rm -rf data/aspids data/mspids
# Download fresh copies
spidb download
# Rebuild database
spidb build
Separate Download and Build¶
For slower connections or testing:
# Step 1: Download datasets (can be interrupted and resumed)
spidb download --dataset aspids
spidb download --dataset mspids
# Step 2: Build database later
spidb build
Understanding the Output¶
Download Output¶
[1/6] Select datasets to download
Datasets: A-SPID, M-SPID
[2/6] Checking for Kaggle package...
✓ Kaggle package found
[3/6] Checking Kaggle API credentials...
✓ Credentials found at: ~/.kaggle/kaggle.json
[4/6] Setting up download directory...
✓ Download directory: /path/to/data
[5/6] Downloading 2 dataset(s)...
--- Dataset 1/2: A-SPID ---
✓ Downloaded successfully
--- Dataset 2/2: M-SPID ---
✓ Downloaded successfully
✓ All datasets downloaded successfully!
Build Output¶
[7/8] Building Database
Database location: /path/to/data/spi.db
✓ Database created
Populating A-SPID...
Metadata: /path/to/data/aspids/metadata.json
Audio files: /path/to/data/aspids
✓ Populated
Generating records (duration: 60s)...
✓ Records generated
Generating samples...
✓ Samples generated
✓ Database built successfully
Location: /path/to/data/spi.db
Size: 1.23 MB
Output Structure¶
After running spidb download --build, you’ll have:
data/
├── aspids/
│ ├── metadata.json
│ └── 2023/
│ ├── 04_13/
│ ├── 04_24/
│ └── 05_09/
├── mspids/
│ ├── metadata.json
│ └── 2024/
│ └── ...
└── spi.db
Troubleshooting¶
“Kaggle package required”¶
Install CLI support:
pip install kaggle
Or reinstall with CLI extras:
pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"
“Kaggle credentials not found”¶
Set up your Kaggle API credentials:
Go to https://www.kaggle.com/settings/account
Click “Create New Token”
Move
kaggle.jsonto~/.kaggle/On Linux/Mac:
chmod 600 ~/.kaggle/kaggle.json
“No datasets found”¶
Make sure you’ve downloaded datasets first:
spidb download
Then build:
spidb build
Download Fails¶
Common issues:
“Cannot build database - spidb package not properly installed”¶
The build functions aren’t available. Reinstall:
pip install --force-reinstall "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"
Advanced Usage¶
Custom Database Location¶
# Download to custom location
spidb download --output ~/my_data
# Build from custom location
spidb build --data-dir ~/my_data --db-name ~/databases/spi.db
Multiple Databases¶
Create databases with different configurations:
# Full database with samples
spidb build --db-name spi_full.db
# Lightweight database without samples
spidb build --db-name spi_light.db --no-samples
# Archive with longer records
spidb build --db-name spi_archive.db --duration 300
Scripting¶
Use in shell scripts:
#!/bin/bash
# Setup script
echo "Setting up SPIDB..."
# Download if not exists
if [ ! -d "data/aspids" ]; then
echo "Downloading datasets..."
spidb download
fi
# Build multiple configurations
for duration in 30 60 120; do
echo "Building database with ${duration}s records..."
spidb build --duration $duration --db-name "spi_${duration}s.db"
done
echo "Setup complete!"