# Command-Line Interface Guide

SPIDB includes a powerful command-line interface for downloading datasets and building databases without writing any Python code.

## Overview

The SPIDB CLI provides two main commands:

- `spidb download` - Download datasets from Kaggle
- `spidb build` - Build databases from downloaded datasets

## Installation

To use the CLI, install SPIDB with CLI support:

```bash
pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"
```

## Quick Start

### Download and Build in One Step

```bash
# Download both datasets and build database
spidb download --build
```

This will:
1. Download A-SPID and M-SPID datasets to `data/`
2. Create database at `data/spi.db`
3. Generate 60-second records
4. Generate samples for all channels

### Download Only

```bash
# Download both datasets
spidb download

# Download specific dataset
spidb download --dataset aspids
spidb download --dataset mspids
```

## Command Reference

### `spidb download`

Download SPIDB datasets from Kaggle and optionally build the database.

#### Syntax

```bash
spidb download [OPTIONS]
```

#### Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `--dataset` | choice | `both` | Which dataset(s) to download: `aspids`, `mspids`, or `both` |
| `--output`, `-o` | path | `data` | Output directory for datasets |
| `--build` | flag | - | Build database after downloading |
| `--db-name` | string | `spi.db` | Database filename |
| `--duration` | int | `60` | Record duration in seconds |
| `--no-records` | flag | - | Skip record generation |
| `--no-samples` | flag | - | Skip sample generation |
| `--quiet`, `-q` | flag | - | Quiet mode (less output) |

#### Examples

**Download both datasets to default location:**
```bash
spidb download
```

**Download and build with custom settings:**
```bash
spidb download --build --duration 30 --db-name custom.db
```

**Download only acoustic dataset to custom directory:**
```bash
spidb download --dataset aspids --output /path/to/data
```

**Download quietly without building:**
```bash
spidb download -q
```

**Download and build database without samples:**
```bash
spidb download --build --no-samples
```

### `spidb build`

Build SPIDB database from previously downloaded datasets.

#### Syntax

```bash
spidb build [OPTIONS]
```

#### Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `--data-dir` | path | `data` | Directory containing downloaded datasets |
| `--db-name` | string | `spi.db` | Database filename |
| `--duration` | int | `60` | Record duration in seconds |
| `--no-records` | flag | - | Skip record generation |
| `--no-samples` | flag | - | Skip sample generation |

#### Examples

**Build with default settings:**
```bash
spidb build
```

**Build with custom duration:**
```bash
spidb build --duration 120
```

**Build from custom data directory:**
```bash
spidb build --data-dir /path/to/datasets
```

**Build database structure only (no records/samples):**
```bash
spidb build --no-records --no-samples
```

**Build with records but no samples:**
```bash
spidb build --no-samples
```

## Workflows

### First-Time Setup

Complete setup for new users:

```bash
# 1. Install SPIDB with CLI
pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"

# 2. Download and build everything
spidb download --build

# 3. Verify database
python -c "from spidb import Database; db = Database('data/spi.db'); print(f'Samples: {db.session.query(db.Sample).count()}')"
```

### Rebuild Database with Different Settings

If you want to rebuild with different record durations:

```bash
# Datasets are already downloaded, just rebuild
spidb build --duration 30 --db-name spi_30s.db
spidb build --duration 90 --db-name spi_90s.db
```

### Update Datasets

Download latest dataset versions:

```bash
# Remove old datasets
rm -rf data/aspids data/mspids

# Download fresh copies
spidb download

# Rebuild database
spidb build
```

### Separate Download and Build

For slower connections or testing:

```bash
# Step 1: Download datasets (can be interrupted and resumed)
spidb download --dataset aspids
spidb download --dataset mspids

# Step 2: Build database later
spidb build
```

## Understanding the Output

### Download Output

```
[1/6] Select datasets to download
Datasets: A-SPID, M-SPID

[2/6] Checking for Kaggle package...
✓ Kaggle package found

[3/6] Checking Kaggle API credentials...
✓ Credentials found at: ~/.kaggle/kaggle.json

[4/6] Setting up download directory...
✓ Download directory: /path/to/data

[5/6] Downloading 2 dataset(s)...
--- Dataset 1/2: A-SPID ---
  ✓ Downloaded successfully

--- Dataset 2/2: M-SPID ---
  ✓ Downloaded successfully

✓ All datasets downloaded successfully!
```

### Build Output

```
[7/8] Building Database
Database location: /path/to/data/spi.db

✓ Database created

Populating A-SPID...
  Metadata: /path/to/data/aspids/metadata.json
  Audio files: /path/to/data/aspids
✓ Populated

Generating records (duration: 60s)...
✓ Records generated

Generating samples...
✓ Samples generated

✓ Database built successfully
  Location: /path/to/data/spi.db
  Size: 1.23 MB
```

## Output Structure

After running `spidb download --build`, you'll have:

```
data/
├── aspids/
│   ├── metadata.json
│   └── 2023/
│       ├── 04_13/
│       ├── 04_24/
│       └── 05_09/
├── mspids/
│   ├── metadata.json
│   └── 2024/
│       └── ...
└── spi.db
```

## Troubleshooting

### "Kaggle package required"

Install CLI support:
```bash
pip install kaggle
```

Or reinstall with CLI extras:
```bash
pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"
```

### "Kaggle credentials not found"

Set up your Kaggle API credentials:

1. Go to https://www.kaggle.com/settings/account
2. Click "Create New Token"
3. Move `kaggle.json` to `~/.kaggle/`
4. On Linux/Mac: `chmod 600 ~/.kaggle/kaggle.json`

### "No datasets found"

Make sure you've downloaded datasets first:
```bash
spidb download
```

Then build:
```bash
spidb build
```

### Download Fails

Common issues:

1. **Haven't accepted dataset terms**
   - Visit dataset page on Kaggle and accept terms
   - [A-SPID](https://www.kaggle.com/datasets/dkadyrov/stored-product-insect-database-spidb-aspids)
   - [M-SPID](https://www.kaggle.com/datasets/dkadyrov/stored-product-insect-database-spidb-mspids)

2. **Invalid credentials**
   - Regenerate token on Kaggle
   - Replace `~/.kaggle/kaggle.json`

3. **Network issues**
   - Check internet connection
   - Try downloading one dataset at a time
   - Use `--quiet` flag to reduce output

### "Cannot build database - spidb package not properly installed"

The build functions aren't available. Reinstall:
```bash
pip install --force-reinstall "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]"
```

## Advanced Usage

### Custom Database Location

```bash
# Download to custom location
spidb download --output ~/my_data

# Build from custom location
spidb build --data-dir ~/my_data --db-name ~/databases/spi.db
```

### Multiple Databases

Create databases with different configurations:

```bash
# Full database with samples
spidb build --db-name spi_full.db

# Lightweight database without samples
spidb build --db-name spi_light.db --no-samples

# Archive with longer records
spidb build --db-name spi_archive.db --duration 300
```

### Scripting

Use in shell scripts:

```bash
#!/bin/bash

# Setup script
echo "Setting up SPIDB..."

# Download if not exists
if [ ! -d "data/aspids" ]; then
    echo "Downloading datasets..."
    spidb download
fi

# Build multiple configurations
for duration in 30 60 120; do
    echo "Building database with ${duration}s records..."
    spidb build --duration $duration --db-name "spi_${duration}s.db"
done

echo "Setup complete!"
```

## Next Steps

- See [Usage](usage.md) for Python API examples
- See [Database](database.md) for schema details
- See [Models](models.md) for data model reference