Tudor Sitaru 4f73b3036e
All checks were successful
Build Docker Image / build (push) Successful in 45s
bug fixes and performance improvements
2025-11-11 11:28:01 +00:00
2025-10-09 21:27:25 +01:00
2025-10-14 22:29:16 +01:00
2025-10-14 21:58:54 +01:00
2025-10-14 21:58:54 +01:00
2025-10-23 17:00:25 +01:00
2025-10-07 14:52:04 +01:00
2025-10-07 14:52:04 +01:00

Image Downloader Script

A Python script to download images from a REST API that provides endpoints for listing assets and downloading them in full resolution.

Features

  • Concurrent Downloads: Download multiple images simultaneously for better performance
  • Error Handling: Robust error handling with detailed logging
  • Progress Tracking: Real-time progress bar with download statistics
  • Resume Support: Skip already downloaded files
  • Flexible API Integration: Supports various API response formats
  • Filename Sanitization: Automatically handles invalid characters in filenames
  • File Timestamps: Preserves original file modification dates from API

Installation

  1. Clone or download this repository
  2. Install the required dependencies:
pip install -r requirements.txt

Usage

Basic Usage

python image_downloader.py \
  --api-url "https://api.example.com" \
  --list-endpoint "/assets" \
  --download-endpoint "/download" \
  --output-dir "./images" \
  --api-key "your_api_key_here"

Advanced Usage

python image_downloader.py \
  --api-url "https://api.example.com" \
  --list-endpoint "/assets" \
  --download-endpoint "/download" \
  --output-dir "./images" \
  --max-concurrent 10 \
  --timeout 60 \
  --api-key "your_api_key_here"

Parameters

  • --api-url: Base URL of the API (required)
  • --list-endpoint: Endpoint to get the list of assets (required)
  • --download-endpoint: Endpoint to download individual assets (required)
  • --output-dir: Directory to save downloaded images (required)
  • --max-concurrent: Maximum number of concurrent downloads (default: 5)
  • --timeout: Request timeout in seconds (default: 30)
  • --api-key: API key for authentication (x-api-key header)
  • --email: Email for login authentication
  • --password: Password for login authentication

Authentication

The script supports two authentication methods:

API Key Authentication

  • Uses x-api-key header for list endpoint
  • Uses key parameter for download endpoint
  • Configure with --api-key parameter or api_key in config file

Login Authentication

  • Performs login to /v1/auth/login endpoint
  • Uses session token for list endpoint
  • Uses key parameter for download endpoint
  • Configure with --email and --password parameters or in config file

Note: Only one authentication method should be used at a time. API key takes precedence over login credentials.

API Integration

The script is designed to work with REST APIs that follow these patterns:

List Endpoint

The list endpoint should return a JSON response with asset information. The script supports these common formats:

// Array of assets
[
  {"id": "1", "filename": "image1.jpg", "url": "..."},
  {"id": "2", "filename": "image2.png", "url": "..."}
]

// Object with data array
{
  "data": [
    {"id": "1", "filename": "image1.jpg"},
    {"id": "2", "filename": "image2.png"}
  ]
}

// Object with results array
{
  "results": [
    {"id": "1", "filename": "image1.jpg"},
    {"id": "2", "filename": "image2.png"}
  ]
}

Download Endpoint

The download endpoint should accept an asset ID and return the image file. Common patterns:

  • GET /download/{asset_id}
  • GET /assets/{asset_id}/download
  • GET /images/{asset_id}

ParentZone API Format:

  • GET /v1/media/{asset_id}/full?key={api_key}&u={updated_timestamp}

Asset Object Fields

The script looks for these fields in asset objects:

Required for identification:

  • id, asset_id, image_id, file_id, uuid, or key

Optional for better filenames:

  • fileName: Preferred filename (ParentZone API)
  • filename: Alternative filename field
  • name: Alternative name
  • title: Display title
  • mimeType: MIME type for proper file extension (ParentZone API)
  • content_type: Alternative MIME type field

Required for ParentZone API downloads:

  • updated: Timestamp used in download URL parameter and file modification time

Examples

Example 1: ParentZone API with API Key

python image_downloader.py \
  --api-url "https://api.parentzone.me" \
  --list-endpoint "/v1/gallery" \
  --download-endpoint "/v1/media" \
  --output-dir "./parentzone_images" \
  --api-key "your_api_key_here"

Example 2: ParentZone API with Login

python image_downloader.py \
  --api-url "https://api.parentzone.me" \
  --list-endpoint "/v1/gallery" \
  --download-endpoint "/v1/media" \
  --output-dir "./parentzone_images" \
  --email "your_email@example.com" \
  --password "your_password_here"

Example 2: API with Authentication

The script now supports API key authentication via the --api-key parameter. For other authentication methods, you can modify the script to include custom headers:

# In the get_asset_list method, add headers:
headers = {
    'Authorization': 'Bearer your_token_here',
    'Content-Type': 'application/json'
}
async with session.get(url, headers=headers, timeout=self.timeout) as response:

Example 3: Custom Response Format

If your API returns a different format, you can modify the get_asset_list method:

# For API that returns: {"images": [...]}
if 'images' in data:
    assets = data['images']

Output

The script creates:

  1. Downloaded Images: All images are saved to the specified output directory with original modification timestamps
  2. Log File: download.log in the output directory with detailed information
  3. Progress Display: Real-time progress bar showing:
    • Total assets
    • Successfully downloaded
    • Failed downloads
    • Skipped files (already exist)

File Timestamps

The downloader automatically sets the file modification time to match the updated timestamp from the API response. This preserves the original file dates and helps with:

  • File Organization: Files are sorted by their original creation/update dates
  • Backup Systems: Backup tools can properly identify changed files
  • Media Libraries: Media management software can display correct dates
  • Data Integrity: Maintains the temporal relationship between files

Error Handling

The script handles various error scenarios:

  • Network Errors: Retries and continues with other downloads
  • Invalid Responses: Logs errors and continues
  • File System Errors: Creates directories and handles permission issues
  • API Errors: Logs HTTP errors and continues

Performance

  • Concurrent Downloads: Configurable concurrency (default: 5)
  • Connection Pooling: Efficient HTTP connection reuse
  • Chunked Downloads: Memory-efficient large file handling
  • Progress Tracking: Real-time feedback on download progress

Troubleshooting

Common Issues

  1. "No assets found": Check your list endpoint URL and response format
  2. "Failed to fetch asset list": Verify API URL and network connectivity
  3. "Content type is not an image": API might be returning JSON instead of image data
  4. Permission errors: Check write permissions for the output directory

Debug Mode

For detailed debugging, you can modify the logging level:

logging.basicConfig(level=logging.DEBUG)

License

This script is provided as-is for educational and personal use.

Contributing

Feel free to submit issues and enhancement requests!

Description
No description provided
Readme 26 MiB
Languages
Python 97.5%
Shell 2.1%
Dockerfile 0.4%