242 lines
7.2 KiB
Markdown
242 lines
7.2 KiB
Markdown
|
|
# Image Downloader Script
|
||
|
|
|
||
|
|
A Python script to download images from a REST API that provides endpoints for listing assets and downloading them in full resolution.
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
- **Concurrent Downloads**: Download multiple images simultaneously for better performance
|
||
|
|
- **Error Handling**: Robust error handling with detailed logging
|
||
|
|
- **Progress Tracking**: Real-time progress bar with download statistics
|
||
|
|
- **Resume Support**: Skip already downloaded files
|
||
|
|
- **Flexible API Integration**: Supports various API response formats
|
||
|
|
- **Filename Sanitization**: Automatically handles invalid characters in filenames
|
||
|
|
- **File Timestamps**: Preserves original file modification dates from API
|
||
|
|
|
||
|
|
## Installation
|
||
|
|
|
||
|
|
1. Clone or download this repository
|
||
|
|
2. Install the required dependencies:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
pip install -r requirements.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
### Basic Usage
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python image_downloader.py \
|
||
|
|
--api-url "https://api.example.com" \
|
||
|
|
--list-endpoint "/assets" \
|
||
|
|
--download-endpoint "/download" \
|
||
|
|
--output-dir "./images" \
|
||
|
|
--api-key "your_api_key_here"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Advanced Usage
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python image_downloader.py \
|
||
|
|
--api-url "https://api.example.com" \
|
||
|
|
--list-endpoint "/assets" \
|
||
|
|
--download-endpoint "/download" \
|
||
|
|
--output-dir "./images" \
|
||
|
|
--max-concurrent 10 \
|
||
|
|
--timeout 60 \
|
||
|
|
--api-key "your_api_key_here"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Parameters
|
||
|
|
|
||
|
|
- `--api-url`: Base URL of the API (required)
|
||
|
|
- `--list-endpoint`: Endpoint to get the list of assets (required)
|
||
|
|
- `--download-endpoint`: Endpoint to download individual assets (required)
|
||
|
|
- `--output-dir`: Directory to save downloaded images (required)
|
||
|
|
- `--max-concurrent`: Maximum number of concurrent downloads (default: 5)
|
||
|
|
- `--timeout`: Request timeout in seconds (default: 30)
|
||
|
|
- `--api-key`: API key for authentication (x-api-key header)
|
||
|
|
- `--email`: Email for login authentication
|
||
|
|
- `--password`: Password for login authentication
|
||
|
|
|
||
|
|
## Authentication
|
||
|
|
|
||
|
|
The script supports two authentication methods:
|
||
|
|
|
||
|
|
### API Key Authentication
|
||
|
|
- Uses `x-api-key` header for list endpoint
|
||
|
|
- Uses `key` parameter for download endpoint
|
||
|
|
- Configure with `--api-key` parameter or `api_key` in config file
|
||
|
|
|
||
|
|
### Login Authentication
|
||
|
|
- Performs login to `/v1/auth/login` endpoint
|
||
|
|
- Uses session token for list endpoint
|
||
|
|
- Uses `key` parameter for download endpoint
|
||
|
|
- Configure with `--email` and `--password` parameters or in config file
|
||
|
|
|
||
|
|
**Note**: Only one authentication method should be used at a time. API key takes precedence over login credentials.
|
||
|
|
|
||
|
|
## API Integration
|
||
|
|
|
||
|
|
The script is designed to work with REST APIs that follow these patterns:
|
||
|
|
|
||
|
|
### List Endpoint
|
||
|
|
The list endpoint should return a JSON response with asset information. The script supports these common formats:
|
||
|
|
|
||
|
|
```json
|
||
|
|
// Array of assets
|
||
|
|
[
|
||
|
|
{"id": "1", "filename": "image1.jpg", "url": "..."},
|
||
|
|
{"id": "2", "filename": "image2.png", "url": "..."}
|
||
|
|
]
|
||
|
|
|
||
|
|
// Object with data array
|
||
|
|
{
|
||
|
|
"data": [
|
||
|
|
{"id": "1", "filename": "image1.jpg"},
|
||
|
|
{"id": "2", "filename": "image2.png"}
|
||
|
|
]
|
||
|
|
}
|
||
|
|
|
||
|
|
// Object with results array
|
||
|
|
{
|
||
|
|
"results": [
|
||
|
|
{"id": "1", "filename": "image1.jpg"},
|
||
|
|
{"id": "2", "filename": "image2.png"}
|
||
|
|
]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Download Endpoint
|
||
|
|
The download endpoint should accept an asset ID and return the image file. Common patterns:
|
||
|
|
|
||
|
|
- `GET /download/{asset_id}`
|
||
|
|
- `GET /assets/{asset_id}/download`
|
||
|
|
- `GET /images/{asset_id}`
|
||
|
|
|
||
|
|
**ParentZone API Format:**
|
||
|
|
- `GET /v1/media/{asset_id}/full?key={api_key}&u={updated_timestamp}`
|
||
|
|
|
||
|
|
### Asset Object Fields
|
||
|
|
|
||
|
|
The script looks for these fields in asset objects:
|
||
|
|
|
||
|
|
**Required for identification:**
|
||
|
|
- `id`, `asset_id`, `image_id`, `file_id`, `uuid`, or `key`
|
||
|
|
|
||
|
|
**Optional for better filenames:**
|
||
|
|
- `fileName`: Preferred filename (ParentZone API)
|
||
|
|
- `filename`: Alternative filename field
|
||
|
|
- `name`: Alternative name
|
||
|
|
- `title`: Display title
|
||
|
|
- `mimeType`: MIME type for proper file extension (ParentZone API)
|
||
|
|
- `content_type`: Alternative MIME type field
|
||
|
|
|
||
|
|
**Required for ParentZone API downloads:**
|
||
|
|
- `updated`: Timestamp used in download URL parameter and file modification time
|
||
|
|
|
||
|
|
## Examples
|
||
|
|
|
||
|
|
### Example 1: ParentZone API with API Key
|
||
|
|
```bash
|
||
|
|
python image_downloader.py \
|
||
|
|
--api-url "https://api.parentzone.me" \
|
||
|
|
--list-endpoint "/v1/gallery" \
|
||
|
|
--download-endpoint "/v1/media" \
|
||
|
|
--output-dir "./parentzone_images" \
|
||
|
|
--api-key "your_api_key_here"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Example 2: ParentZone API with Login
|
||
|
|
```bash
|
||
|
|
python image_downloader.py \
|
||
|
|
--api-url "https://api.parentzone.me" \
|
||
|
|
--list-endpoint "/v1/gallery" \
|
||
|
|
--download-endpoint "/v1/media" \
|
||
|
|
--output-dir "./parentzone_images" \
|
||
|
|
--email "your_email@example.com" \
|
||
|
|
--password "your_password_here"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Example 2: API with Authentication
|
||
|
|
The script now supports API key authentication via the `--api-key` parameter. For other authentication methods, you can modify the script to include custom headers:
|
||
|
|
|
||
|
|
```python
|
||
|
|
# In the get_asset_list method, add headers:
|
||
|
|
headers = {
|
||
|
|
'Authorization': 'Bearer your_token_here',
|
||
|
|
'Content-Type': 'application/json'
|
||
|
|
}
|
||
|
|
async with session.get(url, headers=headers, timeout=self.timeout) as response:
|
||
|
|
```
|
||
|
|
|
||
|
|
### Example 3: Custom Response Format
|
||
|
|
If your API returns a different format, you can modify the `get_asset_list` method:
|
||
|
|
|
||
|
|
```python
|
||
|
|
# For API that returns: {"images": [...]}
|
||
|
|
if 'images' in data:
|
||
|
|
assets = data['images']
|
||
|
|
```
|
||
|
|
|
||
|
|
## Output
|
||
|
|
|
||
|
|
The script creates:
|
||
|
|
|
||
|
|
1. **Downloaded Images**: All images are saved to the specified output directory with original modification timestamps
|
||
|
|
2. **Log File**: `download.log` in the output directory with detailed information
|
||
|
|
3. **Progress Display**: Real-time progress bar showing:
|
||
|
|
- Total assets
|
||
|
|
- Successfully downloaded
|
||
|
|
- Failed downloads
|
||
|
|
- Skipped files (already exist)
|
||
|
|
|
||
|
|
### File Timestamps
|
||
|
|
|
||
|
|
The downloader automatically sets the file modification time to match the `updated` timestamp from the API response. This preserves the original file dates and helps with:
|
||
|
|
|
||
|
|
- **File Organization**: Files are sorted by their original creation/update dates
|
||
|
|
- **Backup Systems**: Backup tools can properly identify changed files
|
||
|
|
- **Media Libraries**: Media management software can display correct dates
|
||
|
|
- **Data Integrity**: Maintains the temporal relationship between files
|
||
|
|
|
||
|
|
## Error Handling
|
||
|
|
|
||
|
|
The script handles various error scenarios:
|
||
|
|
|
||
|
|
- **Network Errors**: Retries and continues with other downloads
|
||
|
|
- **Invalid Responses**: Logs errors and continues
|
||
|
|
- **File System Errors**: Creates directories and handles permission issues
|
||
|
|
- **API Errors**: Logs HTTP errors and continues
|
||
|
|
|
||
|
|
## Performance
|
||
|
|
|
||
|
|
- **Concurrent Downloads**: Configurable concurrency (default: 5)
|
||
|
|
- **Connection Pooling**: Efficient HTTP connection reuse
|
||
|
|
- **Chunked Downloads**: Memory-efficient large file handling
|
||
|
|
- **Progress Tracking**: Real-time feedback on download progress
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Common Issues
|
||
|
|
|
||
|
|
1. **"No assets found"**: Check your list endpoint URL and response format
|
||
|
|
2. **"Failed to fetch asset list"**: Verify API URL and network connectivity
|
||
|
|
3. **"Content type is not an image"**: API might be returning JSON instead of image data
|
||
|
|
4. **Permission errors**: Check write permissions for the output directory
|
||
|
|
|
||
|
|
### Debug Mode
|
||
|
|
|
||
|
|
For detailed debugging, you can modify the logging level:
|
||
|
|
|
||
|
|
```python
|
||
|
|
logging.basicConfig(level=logging.DEBUG)
|
||
|
|
```
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
This script is provided as-is for educational and personal use.
|
||
|
|
|
||
|
|
## Contributing
|
||
|
|
|
||
|
|
Feel free to submit issues and enhancement requests!
|