first commit
This commit is contained in:
242
README.md
Normal file
242
README.md
Normal file
@@ -0,0 +1,242 @@
|
||||
# Image Downloader Script
|
||||
|
||||
A Python script to download images from a REST API that provides endpoints for listing assets and downloading them in full resolution.
|
||||
|
||||
## Features
|
||||
|
||||
- **Concurrent Downloads**: Download multiple images simultaneously for better performance
|
||||
- **Error Handling**: Robust error handling with detailed logging
|
||||
- **Progress Tracking**: Real-time progress bar with download statistics
|
||||
- **Resume Support**: Skip already downloaded files
|
||||
- **Flexible API Integration**: Supports various API response formats
|
||||
- **Filename Sanitization**: Automatically handles invalid characters in filenames
|
||||
- **File Timestamps**: Preserves original file modification dates from API
|
||||
|
||||
## Installation
|
||||
|
||||
1. Clone or download this repository
|
||||
2. Install the required dependencies:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
python image_downloader.py \
|
||||
--api-url "https://api.example.com" \
|
||||
--list-endpoint "/assets" \
|
||||
--download-endpoint "/download" \
|
||||
--output-dir "./images" \
|
||||
--api-key "your_api_key_here"
|
||||
```
|
||||
|
||||
### Advanced Usage
|
||||
|
||||
```bash
|
||||
python image_downloader.py \
|
||||
--api-url "https://api.example.com" \
|
||||
--list-endpoint "/assets" \
|
||||
--download-endpoint "/download" \
|
||||
--output-dir "./images" \
|
||||
--max-concurrent 10 \
|
||||
--timeout 60 \
|
||||
--api-key "your_api_key_here"
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
- `--api-url`: Base URL of the API (required)
|
||||
- `--list-endpoint`: Endpoint to get the list of assets (required)
|
||||
- `--download-endpoint`: Endpoint to download individual assets (required)
|
||||
- `--output-dir`: Directory to save downloaded images (required)
|
||||
- `--max-concurrent`: Maximum number of concurrent downloads (default: 5)
|
||||
- `--timeout`: Request timeout in seconds (default: 30)
|
||||
- `--api-key`: API key for authentication (x-api-key header)
|
||||
- `--email`: Email for login authentication
|
||||
- `--password`: Password for login authentication
|
||||
|
||||
## Authentication
|
||||
|
||||
The script supports two authentication methods:
|
||||
|
||||
### API Key Authentication
|
||||
- Uses `x-api-key` header for list endpoint
|
||||
- Uses `key` parameter for download endpoint
|
||||
- Configure with `--api-key` parameter or `api_key` in config file
|
||||
|
||||
### Login Authentication
|
||||
- Performs login to `/v1/auth/login` endpoint
|
||||
- Uses session token for list endpoint
|
||||
- Uses `key` parameter for download endpoint
|
||||
- Configure with `--email` and `--password` parameters or in config file
|
||||
|
||||
**Note**: Only one authentication method should be used at a time. API key takes precedence over login credentials.
|
||||
|
||||
## API Integration
|
||||
|
||||
The script is designed to work with REST APIs that follow these patterns:
|
||||
|
||||
### List Endpoint
|
||||
The list endpoint should return a JSON response with asset information. The script supports these common formats:
|
||||
|
||||
```json
|
||||
// Array of assets
|
||||
[
|
||||
{"id": "1", "filename": "image1.jpg", "url": "..."},
|
||||
{"id": "2", "filename": "image2.png", "url": "..."}
|
||||
]
|
||||
|
||||
// Object with data array
|
||||
{
|
||||
"data": [
|
||||
{"id": "1", "filename": "image1.jpg"},
|
||||
{"id": "2", "filename": "image2.png"}
|
||||
]
|
||||
}
|
||||
|
||||
// Object with results array
|
||||
{
|
||||
"results": [
|
||||
{"id": "1", "filename": "image1.jpg"},
|
||||
{"id": "2", "filename": "image2.png"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Download Endpoint
|
||||
The download endpoint should accept an asset ID and return the image file. Common patterns:
|
||||
|
||||
- `GET /download/{asset_id}`
|
||||
- `GET /assets/{asset_id}/download`
|
||||
- `GET /images/{asset_id}`
|
||||
|
||||
**ParentZone API Format:**
|
||||
- `GET /v1/media/{asset_id}/full?key={api_key}&u={updated_timestamp}`
|
||||
|
||||
### Asset Object Fields
|
||||
|
||||
The script looks for these fields in asset objects:
|
||||
|
||||
**Required for identification:**
|
||||
- `id`, `asset_id`, `image_id`, `file_id`, `uuid`, or `key`
|
||||
|
||||
**Optional for better filenames:**
|
||||
- `fileName`: Preferred filename (ParentZone API)
|
||||
- `filename`: Alternative filename field
|
||||
- `name`: Alternative name
|
||||
- `title`: Display title
|
||||
- `mimeType`: MIME type for proper file extension (ParentZone API)
|
||||
- `content_type`: Alternative MIME type field
|
||||
|
||||
**Required for ParentZone API downloads:**
|
||||
- `updated`: Timestamp used in download URL parameter and file modification time
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: ParentZone API with API Key
|
||||
```bash
|
||||
python image_downloader.py \
|
||||
--api-url "https://api.parentzone.me" \
|
||||
--list-endpoint "/v1/gallery" \
|
||||
--download-endpoint "/v1/media" \
|
||||
--output-dir "./parentzone_images" \
|
||||
--api-key "your_api_key_here"
|
||||
```
|
||||
|
||||
### Example 2: ParentZone API with Login
|
||||
```bash
|
||||
python image_downloader.py \
|
||||
--api-url "https://api.parentzone.me" \
|
||||
--list-endpoint "/v1/gallery" \
|
||||
--download-endpoint "/v1/media" \
|
||||
--output-dir "./parentzone_images" \
|
||||
--email "your_email@example.com" \
|
||||
--password "your_password_here"
|
||||
```
|
||||
|
||||
### Example 2: API with Authentication
|
||||
The script now supports API key authentication via the `--api-key` parameter. For other authentication methods, you can modify the script to include custom headers:
|
||||
|
||||
```python
|
||||
# In the get_asset_list method, add headers:
|
||||
headers = {
|
||||
'Authorization': 'Bearer your_token_here',
|
||||
'Content-Type': 'application/json'
|
||||
}
|
||||
async with session.get(url, headers=headers, timeout=self.timeout) as response:
|
||||
```
|
||||
|
||||
### Example 3: Custom Response Format
|
||||
If your API returns a different format, you can modify the `get_asset_list` method:
|
||||
|
||||
```python
|
||||
# For API that returns: {"images": [...]}
|
||||
if 'images' in data:
|
||||
assets = data['images']
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
The script creates:
|
||||
|
||||
1. **Downloaded Images**: All images are saved to the specified output directory with original modification timestamps
|
||||
2. **Log File**: `download.log` in the output directory with detailed information
|
||||
3. **Progress Display**: Real-time progress bar showing:
|
||||
- Total assets
|
||||
- Successfully downloaded
|
||||
- Failed downloads
|
||||
- Skipped files (already exist)
|
||||
|
||||
### File Timestamps
|
||||
|
||||
The downloader automatically sets the file modification time to match the `updated` timestamp from the API response. This preserves the original file dates and helps with:
|
||||
|
||||
- **File Organization**: Files are sorted by their original creation/update dates
|
||||
- **Backup Systems**: Backup tools can properly identify changed files
|
||||
- **Media Libraries**: Media management software can display correct dates
|
||||
- **Data Integrity**: Maintains the temporal relationship between files
|
||||
|
||||
## Error Handling
|
||||
|
||||
The script handles various error scenarios:
|
||||
|
||||
- **Network Errors**: Retries and continues with other downloads
|
||||
- **Invalid Responses**: Logs errors and continues
|
||||
- **File System Errors**: Creates directories and handles permission issues
|
||||
- **API Errors**: Logs HTTP errors and continues
|
||||
|
||||
## Performance
|
||||
|
||||
- **Concurrent Downloads**: Configurable concurrency (default: 5)
|
||||
- **Connection Pooling**: Efficient HTTP connection reuse
|
||||
- **Chunked Downloads**: Memory-efficient large file handling
|
||||
- **Progress Tracking**: Real-time feedback on download progress
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **"No assets found"**: Check your list endpoint URL and response format
|
||||
2. **"Failed to fetch asset list"**: Verify API URL and network connectivity
|
||||
3. **"Content type is not an image"**: API might be returning JSON instead of image data
|
||||
4. **Permission errors**: Check write permissions for the output directory
|
||||
|
||||
### Debug Mode
|
||||
|
||||
For detailed debugging, you can modify the logging level:
|
||||
|
||||
```python
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
This script is provided as-is for educational and personal use.
|
||||
|
||||
## Contributing
|
||||
|
||||
Feel free to submit issues and enhancement requests!
|
||||
Reference in New Issue
Block a user