# Command-Line Interface Guide SPIDB includes a powerful command-line interface for downloading datasets and building databases without writing any Python code. ## Overview The SPIDB CLI provides two main commands: - `spidb download` - Download datasets from Kaggle - `spidb build` - Build databases from downloaded datasets ## Installation To use the CLI, install SPIDB with CLI support: ```bash pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]" ``` ## Quick Start ### Download and Build in One Step ```bash # Download both datasets and build database spidb download --build ``` This will: 1. Download A-SPID and M-SPID datasets to `data/` 2. Create database at `data/spi.db` 3. Generate 60-second records 4. Generate samples for all channels ### Download Only ```bash # Download both datasets spidb download # Download specific dataset spidb download --dataset aspids spidb download --dataset mspids ``` ## Command Reference ### `spidb download` Download SPIDB datasets from Kaggle and optionally build the database. #### Syntax ```bash spidb download [OPTIONS] ``` #### Options | Option | Type | Default | Description | |--------|------|---------|-------------| | `--dataset` | choice | `both` | Which dataset(s) to download: `aspids`, `mspids`, or `both` | | `--output`, `-o` | path | `data` | Output directory for datasets | | `--build` | flag | - | Build database after downloading | | `--db-name` | string | `spi.db` | Database filename | | `--duration` | int | `60` | Record duration in seconds | | `--no-records` | flag | - | Skip record generation | | `--no-samples` | flag | - | Skip sample generation | | `--quiet`, `-q` | flag | - | Quiet mode (less output) | #### Examples **Download both datasets to default location:** ```bash spidb download ``` **Download and build with custom settings:** ```bash spidb download --build --duration 30 --db-name custom.db ``` **Download only acoustic dataset to custom directory:** ```bash spidb download --dataset aspids --output /path/to/data ``` **Download quietly without building:** ```bash spidb download -q ``` **Download and build database without samples:** ```bash spidb download --build --no-samples ``` ### `spidb build` Build SPIDB database from previously downloaded datasets. #### Syntax ```bash spidb build [OPTIONS] ``` #### Options | Option | Type | Default | Description | |--------|------|---------|-------------| | `--data-dir` | path | `data` | Directory containing downloaded datasets | | `--db-name` | string | `spi.db` | Database filename | | `--duration` | int | `60` | Record duration in seconds | | `--no-records` | flag | - | Skip record generation | | `--no-samples` | flag | - | Skip sample generation | #### Examples **Build with default settings:** ```bash spidb build ``` **Build with custom duration:** ```bash spidb build --duration 120 ``` **Build from custom data directory:** ```bash spidb build --data-dir /path/to/datasets ``` **Build database structure only (no records/samples):** ```bash spidb build --no-records --no-samples ``` **Build with records but no samples:** ```bash spidb build --no-samples ``` ## Workflows ### First-Time Setup Complete setup for new users: ```bash # 1. Install SPIDB with CLI pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]" # 2. Download and build everything spidb download --build # 3. Verify database python -c "from spidb import Database; db = Database('data/spi.db'); print(f'Samples: {db.session.query(db.Sample).count()}')" ``` ### Rebuild Database with Different Settings If you want to rebuild with different record durations: ```bash # Datasets are already downloaded, just rebuild spidb build --duration 30 --db-name spi_30s.db spidb build --duration 90 --db-name spi_90s.db ``` ### Update Datasets Download latest dataset versions: ```bash # Remove old datasets rm -rf data/aspids data/mspids # Download fresh copies spidb download # Rebuild database spidb build ``` ### Separate Download and Build For slower connections or testing: ```bash # Step 1: Download datasets (can be interrupted and resumed) spidb download --dataset aspids spidb download --dataset mspids # Step 2: Build database later spidb build ``` ## Understanding the Output ### Download Output ``` [1/6] Select datasets to download Datasets: A-SPID, M-SPID [2/6] Checking for Kaggle package... ✓ Kaggle package found [3/6] Checking Kaggle API credentials... ✓ Credentials found at: ~/.kaggle/kaggle.json [4/6] Setting up download directory... ✓ Download directory: /path/to/data [5/6] Downloading 2 dataset(s)... --- Dataset 1/2: A-SPID --- ✓ Downloaded successfully --- Dataset 2/2: M-SPID --- ✓ Downloaded successfully ✓ All datasets downloaded successfully! ``` ### Build Output ``` [7/8] Building Database Database location: /path/to/data/spi.db ✓ Database created Populating A-SPID... Metadata: /path/to/data/aspids/metadata.json Audio files: /path/to/data/aspids ✓ Populated Generating records (duration: 60s)... ✓ Records generated Generating samples... ✓ Samples generated ✓ Database built successfully Location: /path/to/data/spi.db Size: 1.23 MB ``` ## Output Structure After running `spidb download --build`, you'll have: ``` data/ ├── aspids/ │ ├── metadata.json │ └── 2023/ │ ├── 04_13/ │ ├── 04_24/ │ └── 05_09/ ├── mspids/ │ ├── metadata.json │ └── 2024/ │ └── ... └── spi.db ``` ## Troubleshooting ### "Kaggle package required" Install CLI support: ```bash pip install kaggle ``` Or reinstall with CLI extras: ```bash pip install "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]" ``` ### "Kaggle credentials not found" Set up your Kaggle API credentials: 1. Go to https://www.kaggle.com/settings/account 2. Click "Create New Token" 3. Move `kaggle.json` to `~/.kaggle/` 4. On Linux/Mac: `chmod 600 ~/.kaggle/kaggle.json` ### "No datasets found" Make sure you've downloaded datasets first: ```bash spidb download ``` Then build: ```bash spidb build ``` ### Download Fails Common issues: 1. **Haven't accepted dataset terms** - Visit dataset page on Kaggle and accept terms - [A-SPID](https://www.kaggle.com/datasets/dkadyrov/stored-product-insect-database-spidb-aspids) - [M-SPID](https://www.kaggle.com/datasets/dkadyrov/stored-product-insect-database-spidb-mspids) 2. **Invalid credentials** - Regenerate token on Kaggle - Replace `~/.kaggle/kaggle.json` 3. **Network issues** - Check internet connection - Try downloading one dataset at a time - Use `--quiet` flag to reduce output ### "Cannot build database - spidb package not properly installed" The build functions aren't available. Reinstall: ```bash pip install --force-reinstall "git+https://github.com/dkadyrov/spidb.git#egg=spidb[cli]" ``` ## Advanced Usage ### Custom Database Location ```bash # Download to custom location spidb download --output ~/my_data # Build from custom location spidb build --data-dir ~/my_data --db-name ~/databases/spi.db ``` ### Multiple Databases Create databases with different configurations: ```bash # Full database with samples spidb build --db-name spi_full.db # Lightweight database without samples spidb build --db-name spi_light.db --no-samples # Archive with longer records spidb build --db-name spi_archive.db --duration 300 ``` ### Scripting Use in shell scripts: ```bash #!/bin/bash # Setup script echo "Setting up SPIDB..." # Download if not exists if [ ! -d "data/aspids" ]; then echo "Downloading datasets..." spidb download fi # Build multiple configurations for duration in 30 60 120; do echo "Building database with ${duration}s records..." spidb build --duration $duration --db-name "spi_${duration}s.db" done echo "Setup complete!" ``` ## Next Steps - See [Usage](usage.md) for Python API examples - See [Database](database.md) for schema details - See [Models](models.md) for data model reference