Files
parentzone_downloader/docs/README.md
Tudor Sitaru d8637ac2ea
All checks were successful
Build Docker Image / build (push) Successful in 1m3s
repo restructure
2025-10-14 21:58:54 +01:00

242 lines
7.2 KiB
Markdown

# Image Downloader Script
A Python script to download images from a REST API that provides endpoints for listing assets and downloading them in full resolution.
## Features
- **Concurrent Downloads**: Download multiple images simultaneously for better performance
- **Error Handling**: Robust error handling with detailed logging
- **Progress Tracking**: Real-time progress bar with download statistics
- **Resume Support**: Skip already downloaded files
- **Flexible API Integration**: Supports various API response formats
- **Filename Sanitization**: Automatically handles invalid characters in filenames
- **File Timestamps**: Preserves original file modification dates from API
## Installation
1. Clone or download this repository
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
## Usage
### Basic Usage
```bash
python image_downloader.py \
--api-url "https://api.example.com" \
--list-endpoint "/assets" \
--download-endpoint "/download" \
--output-dir "./images" \
--api-key "your_api_key_here"
```
### Advanced Usage
```bash
python image_downloader.py \
--api-url "https://api.example.com" \
--list-endpoint "/assets" \
--download-endpoint "/download" \
--output-dir "./images" \
--max-concurrent 10 \
--timeout 60 \
--api-key "your_api_key_here"
```
### Parameters
- `--api-url`: Base URL of the API (required)
- `--list-endpoint`: Endpoint to get the list of assets (required)
- `--download-endpoint`: Endpoint to download individual assets (required)
- `--output-dir`: Directory to save downloaded images (required)
- `--max-concurrent`: Maximum number of concurrent downloads (default: 5)
- `--timeout`: Request timeout in seconds (default: 30)
- `--api-key`: API key for authentication (x-api-key header)
- `--email`: Email for login authentication
- `--password`: Password for login authentication
## Authentication
The script supports two authentication methods:
### API Key Authentication
- Uses `x-api-key` header for list endpoint
- Uses `key` parameter for download endpoint
- Configure with `--api-key` parameter or `api_key` in config file
### Login Authentication
- Performs login to `/v1/auth/login` endpoint
- Uses session token for list endpoint
- Uses `key` parameter for download endpoint
- Configure with `--email` and `--password` parameters or in config file
**Note**: Only one authentication method should be used at a time. API key takes precedence over login credentials.
## API Integration
The script is designed to work with REST APIs that follow these patterns:
### List Endpoint
The list endpoint should return a JSON response with asset information. The script supports these common formats:
```json
// Array of assets
[
{"id": "1", "filename": "image1.jpg", "url": "..."},
{"id": "2", "filename": "image2.png", "url": "..."}
]
// Object with data array
{
"data": [
{"id": "1", "filename": "image1.jpg"},
{"id": "2", "filename": "image2.png"}
]
}
// Object with results array
{
"results": [
{"id": "1", "filename": "image1.jpg"},
{"id": "2", "filename": "image2.png"}
]
}
```
### Download Endpoint
The download endpoint should accept an asset ID and return the image file. Common patterns:
- `GET /download/{asset_id}`
- `GET /assets/{asset_id}/download`
- `GET /images/{asset_id}`
**ParentZone API Format:**
- `GET /v1/media/{asset_id}/full?key={api_key}&u={updated_timestamp}`
### Asset Object Fields
The script looks for these fields in asset objects:
**Required for identification:**
- `id`, `asset_id`, `image_id`, `file_id`, `uuid`, or `key`
**Optional for better filenames:**
- `fileName`: Preferred filename (ParentZone API)
- `filename`: Alternative filename field
- `name`: Alternative name
- `title`: Display title
- `mimeType`: MIME type for proper file extension (ParentZone API)
- `content_type`: Alternative MIME type field
**Required for ParentZone API downloads:**
- `updated`: Timestamp used in download URL parameter and file modification time
## Examples
### Example 1: ParentZone API with API Key
```bash
python image_downloader.py \
--api-url "https://api.parentzone.me" \
--list-endpoint "/v1/gallery" \
--download-endpoint "/v1/media" \
--output-dir "./parentzone_images" \
--api-key "your_api_key_here"
```
### Example 2: ParentZone API with Login
```bash
python image_downloader.py \
--api-url "https://api.parentzone.me" \
--list-endpoint "/v1/gallery" \
--download-endpoint "/v1/media" \
--output-dir "./parentzone_images" \
--email "your_email@example.com" \
--password "your_password_here"
```
### Example 2: API with Authentication
The script now supports API key authentication via the `--api-key` parameter. For other authentication methods, you can modify the script to include custom headers:
```python
# In the get_asset_list method, add headers:
headers = {
'Authorization': 'Bearer your_token_here',
'Content-Type': 'application/json'
}
async with session.get(url, headers=headers, timeout=self.timeout) as response:
```
### Example 3: Custom Response Format
If your API returns a different format, you can modify the `get_asset_list` method:
```python
# For API that returns: {"images": [...]}
if 'images' in data:
assets = data['images']
```
## Output
The script creates:
1. **Downloaded Images**: All images are saved to the specified output directory with original modification timestamps
2. **Log File**: `download.log` in the output directory with detailed information
3. **Progress Display**: Real-time progress bar showing:
- Total assets
- Successfully downloaded
- Failed downloads
- Skipped files (already exist)
### File Timestamps
The downloader automatically sets the file modification time to match the `updated` timestamp from the API response. This preserves the original file dates and helps with:
- **File Organization**: Files are sorted by their original creation/update dates
- **Backup Systems**: Backup tools can properly identify changed files
- **Media Libraries**: Media management software can display correct dates
- **Data Integrity**: Maintains the temporal relationship between files
## Error Handling
The script handles various error scenarios:
- **Network Errors**: Retries and continues with other downloads
- **Invalid Responses**: Logs errors and continues
- **File System Errors**: Creates directories and handles permission issues
- **API Errors**: Logs HTTP errors and continues
## Performance
- **Concurrent Downloads**: Configurable concurrency (default: 5)
- **Connection Pooling**: Efficient HTTP connection reuse
- **Chunked Downloads**: Memory-efficient large file handling
- **Progress Tracking**: Real-time feedback on download progress
## Troubleshooting
### Common Issues
1. **"No assets found"**: Check your list endpoint URL and response format
2. **"Failed to fetch asset list"**: Verify API URL and network connectivity
3. **"Content type is not an image"**: API might be returning JSON instead of image data
4. **Permission errors**: Check write permissions for the output directory
### Debug Mode
For detailed debugging, you can modify the logging level:
```python
logging.basicConfig(level=logging.DEBUG)
```
## License
This script is provided as-is for educational and personal use.
## Contributing
Feel free to submit issues and enhancement requests!