repo restructure
All checks were successful
Build Docker Image / build (push) Successful in 1m3s

This commit is contained in:
Tudor Sitaru
2025-10-14 21:58:54 +01:00
parent e062b51b4b
commit d8637ac2ea
69 changed files with 781 additions and 4710 deletions

View File

@@ -13,19 +13,18 @@ COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt RUN pip install --no-cache-dir -r requirements.txt
# Copy application files # Copy application files
COPY *.py ./ COPY src/ ./src/
COPY *config.json ./ COPY config/ ./config/
# Create output directories # Create output directories
RUN mkdir -p /app/snapshots /app/logs RUN mkdir -p /app/data/snapshots /app/data/logs
# Copy scripts # Copy scripts
COPY scheduler.sh ./ COPY scripts/ ./scripts/
COPY startup.sh ./ RUN chmod +x scripts/*.sh
RUN chmod +x scheduler.sh startup.sh
# Copy cron configuration # Copy cron configuration
COPY crontab /etc/cron.d/parentzone-downloader COPY scripts/crontab /etc/cron.d/parentzone-downloader
RUN chmod 0644 /etc/cron.d/parentzone-downloader RUN chmod 0644 /etc/cron.d/parentzone-downloader
RUN crontab /etc/cron.d/parentzone-downloader RUN crontab /etc/cron.d/parentzone-downloader
@@ -40,7 +39,7 @@ ENV PYTHONPATH=/app
EXPOSE 8080 EXPOSE 8080
# Expose volume for persistent data # Expose volume for persistent data
VOLUME ["/app/snapshots", "/app/logs", "/app/parentzone_images"] VOLUME ["/app/data/snapshots", "/app/data/logs", "/app/data/parentzone_images"]
# Start all services using startup script # Start all services using startup script
CMD ["./startup.sh"] CMD ["./scripts/startup.sh"]

View File

@@ -1,12 +0,0 @@
{
"api_url": "https://api.parentzone.me",
"output_dir": "snapshots",
"api_key": "YOUR_API_KEY_HERE",
"email": "your-email@example.com",
"password": "your-password",
"date_from": "2021-01-01",
"date_to": null,
"type_ids": [15],
"max_pages": null,
"debug_mode": false
}

View File

@@ -13,17 +13,16 @@ services:
# Timezone for cron scheduling # Timezone for cron scheduling
- TZ=${TZ:-UTC} - TZ=${TZ:-UTC}
volumes: volumes:
# Persistent storage for snapshots and logs # Persistent storage for data
- ./snapshots:/app/snapshots - ./data/snapshots:/app/data/snapshots
- ./logs:/app/logs - ./data/logs:/app/data/logs
# Mount your config file - ./data/parentzone_images:/app/data/parentzone_images
- ./config.json:/app/config.json:ro # Mount configuration directory
- ./config:/app/config:ro
ports: ports:
- "8080:8080" - "8080:8080"
restart: unless-stopped restart: unless-stopped
volumes: volumes:
snapshots: data:
driver: local
logs:
driver: local driver: local

242
docs/README.md Normal file
View File

@@ -0,0 +1,242 @@
# Image Downloader Script
A Python script to download images from a REST API that provides endpoints for listing assets and downloading them in full resolution.
## Features
- **Concurrent Downloads**: Download multiple images simultaneously for better performance
- **Error Handling**: Robust error handling with detailed logging
- **Progress Tracking**: Real-time progress bar with download statistics
- **Resume Support**: Skip already downloaded files
- **Flexible API Integration**: Supports various API response formats
- **Filename Sanitization**: Automatically handles invalid characters in filenames
- **File Timestamps**: Preserves original file modification dates from API
## Installation
1. Clone or download this repository
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
## Usage
### Basic Usage
```bash
python image_downloader.py \
--api-url "https://api.example.com" \
--list-endpoint "/assets" \
--download-endpoint "/download" \
--output-dir "./images" \
--api-key "your_api_key_here"
```
### Advanced Usage
```bash
python image_downloader.py \
--api-url "https://api.example.com" \
--list-endpoint "/assets" \
--download-endpoint "/download" \
--output-dir "./images" \
--max-concurrent 10 \
--timeout 60 \
--api-key "your_api_key_here"
```
### Parameters
- `--api-url`: Base URL of the API (required)
- `--list-endpoint`: Endpoint to get the list of assets (required)
- `--download-endpoint`: Endpoint to download individual assets (required)
- `--output-dir`: Directory to save downloaded images (required)
- `--max-concurrent`: Maximum number of concurrent downloads (default: 5)
- `--timeout`: Request timeout in seconds (default: 30)
- `--api-key`: API key for authentication (x-api-key header)
- `--email`: Email for login authentication
- `--password`: Password for login authentication
## Authentication
The script supports two authentication methods:
### API Key Authentication
- Uses `x-api-key` header for list endpoint
- Uses `key` parameter for download endpoint
- Configure with `--api-key` parameter or `api_key` in config file
### Login Authentication
- Performs login to `/v1/auth/login` endpoint
- Uses session token for list endpoint
- Uses `key` parameter for download endpoint
- Configure with `--email` and `--password` parameters or in config file
**Note**: Only one authentication method should be used at a time. API key takes precedence over login credentials.
## API Integration
The script is designed to work with REST APIs that follow these patterns:
### List Endpoint
The list endpoint should return a JSON response with asset information. The script supports these common formats:
```json
// Array of assets
[
{"id": "1", "filename": "image1.jpg", "url": "..."},
{"id": "2", "filename": "image2.png", "url": "..."}
]
// Object with data array
{
"data": [
{"id": "1", "filename": "image1.jpg"},
{"id": "2", "filename": "image2.png"}
]
}
// Object with results array
{
"results": [
{"id": "1", "filename": "image1.jpg"},
{"id": "2", "filename": "image2.png"}
]
}
```
### Download Endpoint
The download endpoint should accept an asset ID and return the image file. Common patterns:
- `GET /download/{asset_id}`
- `GET /assets/{asset_id}/download`
- `GET /images/{asset_id}`
**ParentZone API Format:**
- `GET /v1/media/{asset_id}/full?key={api_key}&u={updated_timestamp}`
### Asset Object Fields
The script looks for these fields in asset objects:
**Required for identification:**
- `id`, `asset_id`, `image_id`, `file_id`, `uuid`, or `key`
**Optional for better filenames:**
- `fileName`: Preferred filename (ParentZone API)
- `filename`: Alternative filename field
- `name`: Alternative name
- `title`: Display title
- `mimeType`: MIME type for proper file extension (ParentZone API)
- `content_type`: Alternative MIME type field
**Required for ParentZone API downloads:**
- `updated`: Timestamp used in download URL parameter and file modification time
## Examples
### Example 1: ParentZone API with API Key
```bash
python image_downloader.py \
--api-url "https://api.parentzone.me" \
--list-endpoint "/v1/gallery" \
--download-endpoint "/v1/media" \
--output-dir "./parentzone_images" \
--api-key "your_api_key_here"
```
### Example 2: ParentZone API with Login
```bash
python image_downloader.py \
--api-url "https://api.parentzone.me" \
--list-endpoint "/v1/gallery" \
--download-endpoint "/v1/media" \
--output-dir "./parentzone_images" \
--email "your_email@example.com" \
--password "your_password_here"
```
### Example 2: API with Authentication
The script now supports API key authentication via the `--api-key` parameter. For other authentication methods, you can modify the script to include custom headers:
```python
# In the get_asset_list method, add headers:
headers = {
'Authorization': 'Bearer your_token_here',
'Content-Type': 'application/json'
}
async with session.get(url, headers=headers, timeout=self.timeout) as response:
```
### Example 3: Custom Response Format
If your API returns a different format, you can modify the `get_asset_list` method:
```python
# For API that returns: {"images": [...]}
if 'images' in data:
assets = data['images']
```
## Output
The script creates:
1. **Downloaded Images**: All images are saved to the specified output directory with original modification timestamps
2. **Log File**: `download.log` in the output directory with detailed information
3. **Progress Display**: Real-time progress bar showing:
- Total assets
- Successfully downloaded
- Failed downloads
- Skipped files (already exist)
### File Timestamps
The downloader automatically sets the file modification time to match the `updated` timestamp from the API response. This preserves the original file dates and helps with:
- **File Organization**: Files are sorted by their original creation/update dates
- **Backup Systems**: Backup tools can properly identify changed files
- **Media Libraries**: Media management software can display correct dates
- **Data Integrity**: Maintains the temporal relationship between files
## Error Handling
The script handles various error scenarios:
- **Network Errors**: Retries and continues with other downloads
- **Invalid Responses**: Logs errors and continues
- **File System Errors**: Creates directories and handles permission issues
- **API Errors**: Logs HTTP errors and continues
## Performance
- **Concurrent Downloads**: Configurable concurrency (default: 5)
- **Connection Pooling**: Efficient HTTP connection reuse
- **Chunked Downloads**: Memory-efficient large file handling
- **Progress Tracking**: Real-time feedback on download progress
## Troubleshooting
### Common Issues
1. **"No assets found"**: Check your list endpoint URL and response format
2. **"Failed to fetch asset list"**: Verify API URL and network connectivity
3. **"Content type is not an image"**: API might be returning JSON instead of image data
4. **Permission errors**: Check write permissions for the output directory
### Debug Mode
For detailed debugging, you can modify the logging level:
```python
logging.basicConfig(level=logging.DEBUG)
```
## License
This script is provided as-is for educational and personal use.
## Contributing
Feel free to submit issues and enhancement requests!

View File

View File

@@ -12,12 +12,12 @@ import sys
import os import os
from pathlib import Path from pathlib import Path
# Add the current directory to the path so we can import modules # Add the parent directory to the path so we can import modules
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from auth_manager import AuthManager from src.auth_manager import AuthManager
from asset_tracker import AssetTracker from src.asset_tracker import AssetTracker
from image_downloader import ImageDownloader from src.image_downloader import ImageDownloader
async def demo_asset_tracking(): async def demo_asset_tracking():
@@ -32,8 +32,7 @@ async def demo_asset_tracking():
# Setup logging # Setup logging
logging.basicConfig( logging.basicConfig(
level=logging.INFO, level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
format='%(asctime)s - %(levelname)s - %(message)s'
) )
try: try:
@@ -78,7 +77,7 @@ async def demo_asset_tracking():
email=email, email=email,
password=password, password=password,
track_assets=True, track_assets=True,
max_concurrent=3 max_concurrent=3,
) )
# First download run # First download run
@@ -104,12 +103,7 @@ async def demo_asset_tracking():
print("Running again - should detect no new assets...") print("Running again - should detect no new assets...")
# Reset stats for second run # Reset stats for second run
downloader.stats = { downloader.stats = {"total": 0, "successful": 0, "failed": 0, "skipped": 0}
'total': 0,
'successful': 0,
'failed': 0,
'skipped': 0
}
# Second download run # Second download run
await downloader.download_all_assets() await downloader.download_all_assets()
@@ -118,7 +112,7 @@ async def demo_asset_tracking():
print(f" Assets to download: {downloader.stats['total']}") print(f" Assets to download: {downloader.stats['total']}")
print(f" New downloads: {downloader.stats['successful']}") print(f" New downloads: {downloader.stats['successful']}")
if downloader.stats['total'] == 0: if downloader.stats["total"] == 0:
print(" ✅ Perfect! No new assets found - all are up to date!") print(" ✅ Perfect! No new assets found - all are up to date!")
else: else:
print(f" Downloaded: {downloader.stats['successful']}") print(f" Downloaded: {downloader.stats['successful']}")
@@ -149,6 +143,7 @@ async def demo_asset_tracking():
except Exception as e: except Exception as e:
print(f"❌ Demo failed with error: {e}") print(f"❌ Demo failed with error: {e}")
import traceback import traceback
traceback.print_exc() traceback.print_exc()
return False return False
@@ -177,7 +172,7 @@ def show_usage():
async def main(): async def main():
"""Main function.""" """Main function."""
if len(sys.argv) > 1 and sys.argv[1] in ['--help', '-h']: if len(sys.argv) > 1 and sys.argv[1] in ["--help", "-h"]:
show_usage() show_usage()
return 0 return 0

View File

@@ -1,9 +1,9 @@
# ParentZone Downloaders Cron Schedule # ParentZone Downloaders Cron Schedule
# Run both downloaders daily at 2:00 AM # Run both downloaders daily at 2:00 AM
0 2 * * * /app/scheduler.sh >> /var/log/cron.log 2>&1 0 2 * * * /app/scripts/scheduler.sh >> /var/log/cron.log 2>&1
# Keep cron log file from growing too large (weekly cleanup) # Keep cron log file from growing too large (weekly cleanup)
0 3 * * 0 find /var/log -name "cron.log" -size +100M -exec truncate -s 50M {} \; 2>/dev/null || true 0 3 * * 0 find /var/log -name "cron.log" -size +100M -exec truncate -s 50M {} \; 2>/dev/null || true
# Cleanup old snapshot files (keep last 90 days) # Cleanup old snapshot files (keep last 90 days)
30 3 * * 0 find /app/snapshots -name "*.html" -mtime +90 -delete 2>/dev/null || true 30 3 * * 0 find /app/data/snapshots -name "*.html" -mtime +90 -delete 2>/dev/null || true

View File

@@ -3,10 +3,10 @@
# ParentZone Downloaders Daily Scheduler # ParentZone Downloaders Daily Scheduler
# This script runs both the config downloader and snapshot downloader # This script runs both the config downloader and snapshot downloader
LOG_DIR="/app/logs" LOG_DIR="/app/data/logs"
LOG_FILE="$LOG_DIR/scheduler_$(date +%Y%m%d).log" LOG_FILE="$LOG_DIR/scheduler_$(date +%Y%m%d).log"
SNAPSHOT_CONFIG_FILE="/app/snapshot_config.json" SNAPSHOT_CONFIG_FILE="/app/config/snapshot_config.json"
ASSET_CONFIG_FILE="/app/parentzone_config.json" ASSET_CONFIG_FILE="/app/config/parentzone_config.json"
# Create log directory if it doesn't exist # Create log directory if it doesn't exist
mkdir -p "$LOG_DIR" mkdir -p "$LOG_DIR"
@@ -54,7 +54,7 @@ cd /app
# Run config-based asset downloader # Run config-based asset downloader
if [ "$SKIP_ASSET_DOWNLOADER" = false ]; then if [ "$SKIP_ASSET_DOWNLOADER" = false ]; then
run_with_logging "python3 config_downloader.py --config $ASSET_CONFIG_FILE" "Config Asset Downloader" run_with_logging "python3 src/config_downloader.py --config $ASSET_CONFIG_FILE" "Config Asset Downloader"
asset_result=$? asset_result=$?
else else
log_message "SKIPPED: Config Asset Downloader (configuration file not found)" log_message "SKIPPED: Config Asset Downloader (configuration file not found)"
@@ -62,15 +62,15 @@ else
fi fi
# Run config-based snapshot downloader # Run config-based snapshot downloader
run_with_logging "python3 config_snapshot_downloader.py --config $SNAPSHOT_CONFIG_FILE" "Config Snapshot Downloader" run_with_logging "python3 src/config_snapshot_downloader.py --config $SNAPSHOT_CONFIG_FILE" "Config Snapshot Downloader"
config_result=$? config_result=$?
# Run regular snapshot downloader with environment variables # Run regular snapshot downloader with environment variables
if [ -n "$API_KEY" ]; then if [ -n "$API_KEY" ]; then
run_with_logging "python3 snapshot_downloader.py --api-key $API_KEY --output-dir snapshots" "Snapshot Downloader (API Key)" run_with_logging "python3 src/snapshot_downloader.py --api-key $API_KEY --output-dir data/snapshots" "Snapshot Downloader (API Key)"
snapshot_result=$? snapshot_result=$?
elif [ -n "$EMAIL" ] && [ -n "$PASSWORD" ]; then elif [ -n "$EMAIL" ] && [ -n "$PASSWORD" ]; then
run_with_logging "python3 snapshot_downloader.py --email $EMAIL --password $PASSWORD --output-dir snapshots" "Snapshot Downloader (Email/Password)" run_with_logging "python3 src/snapshot_downloader.py --email $EMAIL --password $PASSWORD --output-dir data/snapshots" "Snapshot Downloader (Email/Password)"
snapshot_result=$? snapshot_result=$?
else else
log_message "WARNING: No authentication method provided via environment variables, skipping direct snapshot downloader" log_message "WARNING: No authentication method provided via environment variables, skipping direct snapshot downloader"

View File

@@ -5,8 +5,8 @@
set -e set -e
LOG_DIR="/app/logs" LOG_DIR="/app/data/logs"
SNAPSHOTS_DIR="/app/snapshots" SNAPSHOTS_DIR="/app/data/snapshots"
# Create directories if they don't exist # Create directories if they don't exist
mkdir -p "$LOG_DIR" mkdir -p "$LOG_DIR"
@@ -30,7 +30,7 @@ log_message "Cron daemon started"
# Start web server in the background # Start web server in the background
log_message "Starting web server on port 8080..." log_message "Starting web server on port 8080..."
python3 webserver.py --host 0.0.0.0 --port 8080 --snapshots-dir "$SNAPSHOTS_DIR" & python3 src/webserver.py --host 0.0.0.0 --port 8080 --snapshots-dir "$SNAPSHOTS_DIR" &
WEB_SERVER_PID=$! WEB_SERVER_PID=$!
log_message "Web server started with PID: $WEB_SERVER_PID" log_message "Web server started with PID: $WEB_SERVER_PID"

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.0 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 845 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 744 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 773 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 892 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.0 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 775 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.9 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 973 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.0 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.8 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1000 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 900 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 818 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 555 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 891 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 679 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 809 KiB

View File

@@ -1,162 +0,0 @@
2025-09-05 22:23:50,764 - INFO - Starting snapshot download with configuration
2025-09-05 22:23:50,764 - INFO - Date range: 2021-10-18 to 2025-09-05
2025-09-05 22:23:50,764 - INFO - Type IDs: [15]
2025-09-05 22:23:50,764 - INFO - Output directory: ./snapshots_test
2025-09-05 22:23:50,764 - INFO - Max pages limit: 2
2025-09-05 22:23:50,764 - INFO - Starting snapshot download for period 2021-10-18 to 2025-09-05
2025-09-05 22:23:50,764 - INFO - Attempting login authentication...
2025-09-05 22:23:50,765 - INFO - Attempting login for tudor.sitaru@gmail.com
2025-09-05 22:23:51,594 - INFO - Login response status: 200
2025-09-05 22:23:51,594 - INFO - Login successful
2025-09-05 22:23:51,594 - INFO - Selected account: Tudor Sitaru at Noddy's Nursery School (ID: e518bd01-e516-4b3c-aefa-bcb369823a2e)
2025-09-05 22:23:51,594 - INFO - Creating session for user ID: e518bd01-e516-4b3c-aefa-bcb369823a2e
2025-09-05 22:23:51,994 - INFO - Create session response status: 200
2025-09-05 22:23:51,995 - INFO - Session creation successful
2025-09-05 22:23:51,995 - INFO - API key obtained successfully
2025-09-05 22:23:51,996 - INFO - Login authentication successful
2025-09-05 22:23:51,996 - INFO - Starting snapshot fetch from 2021-10-18 to 2025-09-05
2025-09-05 22:23:51,996 - INFO - Fetching snapshots (first page): https://api.parentzone.me/v1/posts?dateFrom=2021-10-18&dateTo=2025-09-05&typeIDs%5B%5D=15
2025-09-05 22:23:52,398 - INFO - Retrieved 25 snapshots (first page)
2025-09-05 22:23:52,398 - INFO - Page 1: 25 snapshots (total: 25)
2025-09-05 22:23:52,399 - INFO - Fetching snapshots (cursor: eyJsYXN0SUQiOjIzODE4...): https://api.parentzone.me/v1/posts?dateFrom=2021-10-18&dateTo=2025-09-05&cursor=eyJsYXN0SUQiOjIzODE4NTcsImxhc3RTdGFydFRpbWUiOiIyMDI0LTEwLTIzVDE0OjEyOjAwIn0%3D&typeIDs%5B%5D=15
2025-09-05 22:23:52,708 - INFO - Retrieved 25 snapshots (cursor: eyJsYXN0SUQiOjIzODE4...)
2025-09-05 22:23:52,708 - INFO - Page 2: 25 snapshots (total: 50)
2025-09-05 22:23:52,708 - INFO - Reached maximum pages limit: 2
2025-09-05 22:23:52,708 - INFO - Total snapshots fetched: 50
2025-09-05 22:23:52,715 - INFO - Generated HTML file: snapshots_test/snapshots_2021-10-18_to_2025-09-05.html
2025-09-05 22:42:28,035 - INFO - Starting snapshot download with configuration
2025-09-05 22:42:28,035 - INFO - Date range: 2021-10-18 to 2025-09-05
2025-09-05 22:42:28,036 - INFO - Type IDs: [15]
2025-09-05 22:42:28,036 - INFO - Output directory: ./snapshots_test
2025-09-05 22:42:28,036 - INFO - Max pages limit: 2
2025-09-05 22:42:28,036 - INFO - Starting snapshot download for period 2021-10-18 to 2025-09-05
2025-09-05 22:42:28,036 - INFO - Attempting login authentication...
2025-09-05 22:42:28,036 - INFO - Attempting login for tudor.sitaru@gmail.com
2025-09-05 22:42:28,783 - INFO - Login response status: 200
2025-09-05 22:42:28,783 - INFO - Login successful
2025-09-05 22:42:28,783 - INFO - Selected account: Tudor Sitaru at Noddy's Nursery School (ID: e518bd01-e516-4b3c-aefa-bcb369823a2e)
2025-09-05 22:42:28,783 - INFO - Creating session for user ID: e518bd01-e516-4b3c-aefa-bcb369823a2e
2025-09-05 22:42:29,171 - INFO - Create session response status: 200
2025-09-05 22:42:29,172 - INFO - Session creation successful
2025-09-05 22:42:29,172 - INFO - API key obtained successfully
2025-09-05 22:42:29,173 - INFO - Login authentication successful
2025-09-05 22:42:29,173 - INFO - Starting snapshot fetch from 2021-10-18 to 2025-09-05
2025-09-05 22:42:29,173 - INFO - Fetching snapshots (first page): https://api.parentzone.me/v1/posts?dateFrom=2021-10-18&dateTo=2025-09-05&typeIDs%5B%5D=15
2025-09-05 22:42:29,705 - INFO - Retrieved 25 snapshots (first page)
2025-09-05 22:42:29,706 - INFO - Page 1: 25 snapshots (total: 25)
2025-09-05 22:42:29,706 - INFO - Fetching snapshots (cursor: eyJsYXN0SUQiOjIzODE4...): https://api.parentzone.me/v1/posts?dateFrom=2021-10-18&dateTo=2025-09-05&cursor=eyJsYXN0SUQiOjIzODE4NTcsImxhc3RTdGFydFRpbWUiOiIyMDI0LTEwLTIzVDE0OjEyOjAwIn0%3D&typeIDs%5B%5D=15
2025-09-05 22:42:30,033 - INFO - Retrieved 25 snapshots (cursor: eyJsYXN0SUQiOjIzODE4...)
2025-09-05 22:42:30,034 - INFO - Page 2: 25 snapshots (total: 50)
2025-09-05 22:42:30,034 - INFO - Reached maximum pages limit: 2
2025-09-05 22:42:30,034 - INFO - Total snapshots fetched: 50
2025-09-05 22:42:30,039 - INFO - Generated HTML file: snapshots_test/snapshots_2021-10-18_to_2025-09-05.html
2025-09-05 22:49:12,928 - INFO - Starting snapshot download with configuration
2025-09-05 22:49:12,928 - INFO - Date range: 2021-10-18 to 2025-09-05
2025-09-05 22:49:12,928 - INFO - Type IDs: [15]
2025-09-05 22:49:12,928 - INFO - Output directory: ./snapshots_test
2025-09-05 22:49:12,928 - INFO - Max pages limit: 2
2025-09-05 22:49:12,928 - INFO - Starting snapshot download for period 2021-10-18 to 2025-09-05
2025-09-05 22:49:12,929 - INFO - Attempting login authentication...
2025-09-05 22:49:12,929 - INFO - Attempting login for tudor.sitaru@gmail.com
2025-09-05 22:49:13,677 - INFO - Login response status: 200
2025-09-05 22:49:13,678 - INFO - Login successful
2025-09-05 22:49:13,678 - INFO - Selected account: Tudor Sitaru at Noddy's Nursery School (ID: e518bd01-e516-4b3c-aefa-bcb369823a2e)
2025-09-05 22:49:13,678 - INFO - Creating session for user ID: e518bd01-e516-4b3c-aefa-bcb369823a2e
2025-09-05 22:49:14,082 - INFO - Create session response status: 200
2025-09-05 22:49:14,083 - INFO - Session creation successful
2025-09-05 22:49:14,083 - INFO - API key obtained successfully
2025-09-05 22:49:14,084 - INFO - Login authentication successful
2025-09-05 22:49:14,085 - INFO - Starting snapshot fetch from 2021-10-18 to 2025-09-05
2025-09-05 22:49:14,085 - INFO - Fetching snapshots (first page): https://api.parentzone.me/v1/posts?dateFrom=2021-10-18&dateTo=2025-09-05&typeIDs%5B%5D=15
2025-09-05 22:49:14,512 - INFO - Retrieved 25 snapshots (first page)
2025-09-05 22:49:14,512 - INFO - Page 1: 25 snapshots (total: 25)
2025-09-05 22:49:14,512 - INFO - Fetching snapshots (cursor: eyJsYXN0SUQiOjIzODE4...): https://api.parentzone.me/v1/posts?dateFrom=2021-10-18&dateTo=2025-09-05&cursor=eyJsYXN0SUQiOjIzODE4NTcsImxhc3RTdGFydFRpbWUiOiIyMDI0LTEwLTIzVDE0OjEyOjAwIn0%3D&typeIDs%5B%5D=15
2025-09-05 22:49:14,754 - INFO - Retrieved 25 snapshots (cursor: eyJsYXN0SUQiOjIzODE4...)
2025-09-05 22:49:14,754 - INFO - Page 2: 25 snapshots (total: 50)
2025-09-05 22:49:14,754 - INFO - Reached maximum pages limit: 2
2025-09-05 22:49:14,754 - INFO - Total snapshots fetched: 50
2025-09-05 22:49:14,758 - INFO - Generated HTML file: snapshots_test/snapshots_2021-10-18_to_2025-09-05.html
2025-09-05 23:02:05,096 - INFO - Starting snapshot download with configuration
2025-09-05 23:02:05,097 - INFO - Date range: 2021-10-18 to 2025-09-05
2025-09-05 23:02:05,097 - INFO - Type IDs: [15]
2025-09-05 23:02:05,097 - INFO - Output directory: ./snapshots_test
2025-09-05 23:02:05,097 - INFO - Max pages limit: 2
2025-09-05 23:02:05,097 - INFO - Starting snapshot download for period 2021-10-18 to 2025-09-05
2025-09-05 23:02:05,097 - INFO - Attempting login authentication...
2025-09-05 23:02:05,097 - INFO - Attempting login for tudor.sitaru@gmail.com
2025-09-05 23:02:05,767 - INFO - Login response status: 200
2025-09-05 23:02:05,767 - INFO - Login successful
2025-09-05 23:02:05,767 - INFO - Selected account: Tudor Sitaru at Noddy's Nursery School (ID: e518bd01-e516-4b3c-aefa-bcb369823a2e)
2025-09-05 23:02:05,767 - INFO - Creating session for user ID: e518bd01-e516-4b3c-aefa-bcb369823a2e
2025-09-05 23:02:06,174 - INFO - Create session response status: 200
2025-09-05 23:02:06,175 - INFO - Session creation successful
2025-09-05 23:02:06,175 - INFO - API key obtained successfully
2025-09-05 23:02:06,176 - INFO - Login authentication successful
2025-09-05 23:02:06,176 - INFO - Starting snapshot fetch from 2021-10-18 to 2025-09-05
2025-09-05 23:02:06,176 - INFO - Fetching snapshots (first page): https://api.parentzone.me/v1/posts?dateFrom=2021-10-18&dateTo=2025-09-05&typeIDs%5B%5D=15
2025-09-05 23:02:06,600 - INFO - Retrieved 25 snapshots (first page)
2025-09-05 23:02:06,600 - INFO - Page 1: 25 snapshots (total: 25)
2025-09-05 23:02:06,600 - INFO - Fetching snapshots (cursor: eyJsYXN0SUQiOjIzODE4...): https://api.parentzone.me/v1/posts?dateFrom=2021-10-18&dateTo=2025-09-05&cursor=eyJsYXN0SUQiOjIzODE4NTcsImxhc3RTdGFydFRpbWUiOiIyMDI0LTEwLTIzVDE0OjEyOjAwIn0%3D&typeIDs%5B%5D=15
2025-09-05 23:02:06,997 - INFO - Retrieved 25 snapshots (cursor: eyJsYXN0SUQiOjIzODE4...)
2025-09-05 23:02:06,997 - INFO - Page 2: 25 snapshots (total: 50)
2025-09-05 23:02:06,998 - INFO - Reached maximum pages limit: 2
2025-09-05 23:02:06,998 - INFO - Total snapshots fetched: 50
2025-09-05 23:02:06,998 - INFO - Attempting login authentication...
2025-09-05 23:02:06,998 - INFO - Attempting login for tudor.sitaru@gmail.com
2025-09-05 23:02:07,608 - INFO - Login response status: 200
2025-09-05 23:02:07,608 - INFO - Login successful
2025-09-05 23:02:07,608 - INFO - Selected account: Tudor Sitaru at Noddy's Nursery School (ID: e518bd01-e516-4b3c-aefa-bcb369823a2e)
2025-09-05 23:02:07,608 - INFO - Creating session for user ID: e518bd01-e516-4b3c-aefa-bcb369823a2e
2025-09-05 23:02:07,895 - INFO - Create session response status: 200
2025-09-05 23:02:07,896 - INFO - Session creation successful
2025-09-05 23:02:07,896 - INFO - API key obtained successfully
2025-09-05 23:02:07,897 - INFO - Login authentication successful
2025-09-05 23:02:07,897 - INFO - Downloading media file: DCC724DD-0E3C-445D-BB6A-628C355533F2.jpeg
2025-09-05 23:02:08,250 - INFO - Successfully downloaded media: DCC724DD-0E3C-445D-BB6A-628C355533F2.jpeg
2025-09-05 23:02:08,251 - INFO - Downloading media file: e4e51387-1fee-4129-bd47-e49523b26697.jpeg
2025-09-05 23:02:08,445 - INFO - Successfully downloaded media: e4e51387-1fee-4129-bd47-e49523b26697.jpeg
2025-09-05 23:02:08,447 - INFO - Downloading media file: 7ED768A6-16A7-480A-B238-34B1DB87BDE6.jpeg
2025-09-05 23:02:08,700 - INFO - Successfully downloaded media: 7ED768A6-16A7-480A-B238-34B1DB87BDE6.jpeg
2025-09-05 23:02:08,700 - INFO - Downloading media file: 6CE82D8D-FAE8-4CD3-987F-A9F0BDD57919.jpeg
2025-09-05 23:02:09,026 - INFO - Successfully downloaded media: 6CE82D8D-FAE8-4CD3-987F-A9F0BDD57919.jpeg
2025-09-05 23:02:09,026 - INFO - Downloading media file: 04F440B5-549B-48E5-A480-4CEB0B649834.jpeg
2025-09-05 23:02:09,402 - INFO - Successfully downloaded media: 04F440B5-549B-48E5-A480-4CEB0B649834.jpeg
2025-09-05 23:02:09,403 - INFO - Downloading media file: AB2FE0B6-0932-4179-A3AE-933E05FA8519.jpeg
2025-09-05 23:02:09,861 - INFO - Successfully downloaded media: AB2FE0B6-0932-4179-A3AE-933E05FA8519.jpeg
2025-09-05 23:02:09,861 - INFO - Downloading media file: 466557B6-6ED0-4750-BA37-EC6DF92CB18B.jpeg
2025-09-05 23:02:10,242 - INFO - Successfully downloaded media: 466557B6-6ED0-4750-BA37-EC6DF92CB18B.jpeg
2025-09-05 23:02:10,243 - INFO - Downloading media file: 7268DAC2-8275-47DA-8A0D-FA659F850C31.jpeg
2025-09-05 23:02:10,510 - INFO - Successfully downloaded media: 7268DAC2-8275-47DA-8A0D-FA659F850C31.jpeg
2025-09-05 23:02:10,511 - INFO - Downloading media file: 692E5DAF-0D7B-433F-AA94-75CC265F1A59.jpeg
2025-09-05 23:02:10,815 - INFO - Successfully downloaded media: 692E5DAF-0D7B-433F-AA94-75CC265F1A59.jpeg
2025-09-05 23:02:10,815 - INFO - Downloading media file: CCE3933F-84FD-4A6D-987A-77993183A054.jpeg
2025-09-05 23:02:11,036 - INFO - Successfully downloaded media: CCE3933F-84FD-4A6D-987A-77993183A054.jpeg
2025-09-05 23:02:11,036 - INFO - Downloading media file: 2A5EE1D8-A113-43F8-9416-316287DE3E8F.jpeg
2025-09-05 23:02:11,243 - INFO - Successfully downloaded media: 2A5EE1D8-A113-43F8-9416-316287DE3E8F.jpeg
2025-09-05 23:02:11,243 - INFO - Downloading media file: 80702FD5-DF2C-4EC3-948C-70EBAE7C4BFF.jpeg
2025-09-05 23:02:11,460 - INFO - Successfully downloaded media: 80702FD5-DF2C-4EC3-948C-70EBAE7C4BFF.jpeg
2025-09-05 23:02:11,460 - INFO - Downloading media file: 1BC2789D-99B7-4CC5-84F3-AEA1F0CB39B2.jpeg
2025-09-05 23:02:11,727 - INFO - Successfully downloaded media: 1BC2789D-99B7-4CC5-84F3-AEA1F0CB39B2.jpeg
2025-09-05 23:02:11,728 - INFO - Downloading media file: BA2B3A67-356C-4D22-9FA2-2CF2040EC080.jpeg
2025-09-05 23:02:11,969 - INFO - Successfully downloaded media: BA2B3A67-356C-4D22-9FA2-2CF2040EC080.jpeg
2025-09-05 23:02:11,969 - INFO - Downloading media file: F3411311-E3CE-4A74-84CB-372DA00F80B7.jpeg
2025-09-05 23:02:12,233 - INFO - Successfully downloaded media: F3411311-E3CE-4A74-84CB-372DA00F80B7.jpeg
2025-09-05 23:02:12,233 - INFO - Downloading media file: 1715613184982FE8C3F62-2F0C-4A43-8F57-864F5BA9E112.jpeg.jpg
2025-09-05 23:02:12,448 - INFO - Successfully downloaded media: 1715613184982FE8C3F62-2F0C-4A43-8F57-864F5BA9E112.jpeg.jpg
2025-09-05 23:02:12,448 - INFO - Downloading media file: 171561318498211415BA1-6E38-4D1C-8962-8ED04199856D.jpeg.jpg
2025-09-05 23:02:12,675 - INFO - Successfully downloaded media: 171561318498211415BA1-6E38-4D1C-8962-8ED04199856D.jpeg.jpg
2025-09-05 23:02:12,676 - INFO - Downloading media file: 07B7B911-58C7-4998-BBDE-A773351854D5.jpeg
2025-09-05 23:02:13,209 - INFO - Successfully downloaded media: 07B7B911-58C7-4998-BBDE-A773351854D5.jpeg
2025-09-05 23:02:13,209 - INFO - Downloading media file: 1073B5D1-D162-4D78-8135-45447BA04CAB.jpeg
2025-09-05 23:02:14,432 - INFO - Successfully downloaded media: 1073B5D1-D162-4D78-8135-45447BA04CAB.jpeg
2025-09-05 23:02:14,433 - INFO - Downloading media file: 25E15BAA-58B3-47C8-BEC9-D777ED71A0AB.jpeg
2025-09-05 23:02:14,707 - INFO - Successfully downloaded media: 25E15BAA-58B3-47C8-BEC9-D777ED71A0AB.jpeg
2025-09-05 23:02:14,707 - INFO - Downloading media file: C959CBD6-A829-43AB-87CF-732269921ADB.jpeg
2025-09-05 23:02:15,058 - INFO - Successfully downloaded media: C959CBD6-A829-43AB-87CF-732269921ADB.jpeg
2025-09-05 23:02:15,058 - INFO - Downloading media file: 045D878D-47E3-4EB5-B9DB-36B9B63299E9.jpeg
2025-09-05 23:02:15,349 - INFO - Successfully downloaded media: 045D878D-47E3-4EB5-B9DB-36B9B63299E9.jpeg
2025-09-05 23:02:15,350 - INFO - Downloading media file: 6BC18F39-5C1A-43FB-AD64-0D5AB616A292.jpeg
2025-09-05 23:02:15,634 - INFO - Successfully downloaded media: 6BC18F39-5C1A-43FB-AD64-0D5AB616A292.jpeg
2025-09-05 23:02:15,635 - INFO - Downloading media file: D827391F-6BB7-4F61-B315-FB791E5ADC2F.jpeg
2025-09-05 23:02:15,918 - INFO - Successfully downloaded media: D827391F-6BB7-4F61-B315-FB791E5ADC2F.jpeg
2025-09-05 23:02:15,920 - INFO - Generated HTML file: snapshots_test/snapshots_2021-10-18_to_2025-09-05.html

File diff suppressed because it is too large Load Diff

44
src/__init__.py Normal file
View File

@@ -0,0 +1,44 @@
"""
ParentZone Downloader - Source Package
This package contains the core application modules for the ParentZone Downloader.
Modules:
- asset_tracker: Track downloaded assets to avoid re-downloads
- auth_manager: Handle authentication with ParentZone API
- config_downloader: Configuration-based image downloader
- config_snapshot_downloader: Configuration-based snapshot downloader
- image_downloader: Download images from ParentZone API
- snapshot_downloader: Download snapshots from ParentZone API
- webserver: Web server to serve downloaded snapshots
"""
__version__ = "1.0.0"
__author__ = "ParentZone Downloader Team"
# Import main classes for easier access
try:
from .asset_tracker import AssetTracker
from .auth_manager import AuthManager
from .config_downloader import ConfigImageDownloader
from .config_snapshot_downloader import ConfigSnapshotDownloader
from .image_downloader import ImageDownloader
from .snapshot_downloader import SnapshotDownloader
from .webserver import SnapshotsWebServer
__all__ = [
"AssetTracker",
"AuthManager",
"ConfigImageDownloader",
"ConfigSnapshotDownloader",
"ImageDownloader",
"SnapshotDownloader",
"SnapshotsWebServer",
]
except ImportError as e:
# Handle case where dependencies might not be available
__all__ = []
import warnings
warnings.warn(f"Some modules could not be imported: {e}")

View File

@@ -24,12 +24,12 @@ from tqdm import tqdm
# Import the auth manager and asset tracker # Import the auth manager and asset tracker
try: try:
from auth_manager import AuthManager from src.auth_manager import AuthManager
except ImportError: except ImportError:
AuthManager = None AuthManager = None
try: try:
from asset_tracker import AssetTracker from src.asset_tracker import AssetTracker
except ImportError: except ImportError:
AssetTracker = None AssetTracker = None
@@ -453,8 +453,8 @@ Examples:
python config_downloader.py --config config.json python config_downloader.py --config config.json
# Create a config file first: # Create a config file first:
cp config_example.json my_config.json cp config/config_example.json config/my_config.json
# Edit my_config.json with your API details # Edit config/my_config.json with your API details
python config_downloader.py --config my_config.json python config_downloader.py --config my_config.json
""", """,
) )

View File

@@ -16,9 +16,11 @@ from pathlib import Path
# Import the snapshot downloader # Import the snapshot downloader
try: try:
from snapshot_downloader import SnapshotDownloader from src.snapshot_downloader import SnapshotDownloader
except ImportError: except ImportError:
print("Error: snapshot_downloader.py not found. Please ensure it's in the same directory.") print(
"Error: snapshot_downloader.py not found. Please ensure it's in the same directory."
)
exit(1) exit(1)
@@ -35,38 +37,47 @@ class ConfigSnapshotDownloader:
# Create the underlying snapshot downloader # Create the underlying snapshot downloader
self.downloader = SnapshotDownloader( self.downloader = SnapshotDownloader(
api_url=self.config.get('api_url', 'https://api.parentzone.me'), api_url=self.config.get("api_url", "https://api.parentzone.me"),
output_dir=self.config.get('output_dir', 'snapshots'), output_dir=self.config.get("output_dir", "snapshots"),
api_key=self.config.get('api_key'), api_key=self.config.get("api_key"),
email=self.config.get('email'), email=self.config.get("email"),
password=self.config.get('password') password=self.config.get("password"),
) )
def load_config(self, config_file: str) -> dict: def load_config(self, config_file: str) -> dict:
"""Load configuration from JSON file.""" """Load configuration from JSON file."""
try: try:
with open(config_file, 'r') as f: with open(config_file, "r") as f:
config = json.load(f) config = json.load(f)
# Validate required authentication # Validate required authentication
has_api_key = 'api_key' in config and config['api_key'] has_api_key = "api_key" in config and config["api_key"]
has_credentials = 'email' in config and 'password' in config and config['email'] and config['password'] has_credentials = (
"email" in config
and "password" in config
and config["email"]
and config["password"]
)
if not has_api_key and not has_credentials: if not has_api_key and not has_credentials:
raise ValueError("Either 'api_key' or both 'email' and 'password' must be provided in config") raise ValueError(
"Either 'api_key' or both 'email' and 'password' must be provided in config"
)
# Set defaults for optional fields # Set defaults for optional fields
config.setdefault('api_url', 'https://api.parentzone.me') config.setdefault("api_url", "https://api.parentzone.me")
config.setdefault('output_dir', 'snapshots') config.setdefault("output_dir", "snapshots")
config.setdefault('type_ids', [15]) config.setdefault("type_ids", [15])
config.setdefault('max_pages', None) config.setdefault("max_pages", None)
# Set default date range (last year) if not specified # Set default date range (last year) if not specified
if 'date_from' not in config or not config['date_from']: if "date_from" not in config or not config["date_from"]:
config['date_from'] = (datetime.now() - timedelta(days=365)).strftime("%Y-%m-%d") config["date_from"] = (datetime.now() - timedelta(days=365)).strftime(
"%Y-%m-%d"
)
if 'date_to' not in config or not config['date_to']: if "date_to" not in config or not config["date_to"]:
config['date_to'] = datetime.now().strftime("%Y-%m-%d") config["date_to"] = datetime.now().strftime("%Y-%m-%d")
return config return config
@@ -77,17 +88,14 @@ class ConfigSnapshotDownloader:
def setup_logging(self): def setup_logging(self):
"""Setup logging configuration.""" """Setup logging configuration."""
output_dir = Path(self.config['output_dir']) output_dir = Path(self.config["output_dir"])
output_dir.mkdir(exist_ok=True) output_dir.mkdir(exist_ok=True)
log_file = output_dir / 'snapshots.log' log_file = output_dir / "snapshots.log"
logging.basicConfig( logging.basicConfig(
level=logging.INFO, level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s', format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[ handlers=[logging.FileHandler(log_file), logging.StreamHandler()],
logging.FileHandler(log_file),
logging.StreamHandler()
]
) )
self.logger = logging.getLogger(__name__) self.logger = logging.getLogger(__name__)
@@ -99,19 +107,21 @@ class ConfigSnapshotDownloader:
Path to the generated HTML file Path to the generated HTML file
""" """
self.logger.info("Starting snapshot download with configuration") self.logger.info("Starting snapshot download with configuration")
self.logger.info(f"Date range: {self.config['date_from']} to {self.config['date_to']}") self.logger.info(
f"Date range: {self.config['date_from']} to {self.config['date_to']}"
)
self.logger.info(f"Type IDs: {self.config['type_ids']}") self.logger.info(f"Type IDs: {self.config['type_ids']}")
self.logger.info(f"Output directory: {self.config['output_dir']}") self.logger.info(f"Output directory: {self.config['output_dir']}")
if self.config.get('max_pages'): if self.config.get("max_pages"):
self.logger.info(f"Max pages limit: {self.config['max_pages']}") self.logger.info(f"Max pages limit: {self.config['max_pages']}")
try: try:
html_file = await self.downloader.download_snapshots( html_file = await self.downloader.download_snapshots(
type_ids=self.config['type_ids'], type_ids=self.config["type_ids"],
date_from=self.config['date_from'], date_from=self.config["date_from"],
date_to=self.config['date_to'], date_to=self.config["date_to"],
max_pages=self.config.get('max_pages') max_pages=self.config.get("max_pages"),
) )
return html_file return html_file
@@ -131,10 +141,10 @@ class ConfigSnapshotDownloader:
print(f"Date To: {self.config['date_to']}") print(f"Date To: {self.config['date_to']}")
print(f"Type IDs: {self.config['type_ids']}") print(f"Type IDs: {self.config['type_ids']}")
auth_method = "API Key" if self.config.get('api_key') else "Email/Password" auth_method = "API Key" if self.config.get("api_key") else "Email/Password"
print(f"Authentication: {auth_method}") print(f"Authentication: {auth_method}")
if self.config.get('max_pages'): if self.config.get("max_pages"):
print(f"Max Pages: {self.config['max_pages']}") print(f"Max Pages: {self.config['max_pages']}")
print("=" * 60) print("=" * 60)
@@ -151,11 +161,11 @@ def create_example_config():
"max_pages": null, "max_pages": null,
"api_key": "your-api-key-here", "api_key": "your-api-key-here",
"email": "your-email@example.com", "email": "your-email@example.com",
"password": "your-password-here" "password": "your-password-here",
} }
config_file = Path("snapshot_config_example.json") config_file = Path("snapshot_config_example.json")
with open(config_file, 'w') as f: with open(config_file, "w") as f:
json.dump(example_config, f, indent=2) json.dump(example_config, f, indent=2)
print(f"✅ Example configuration created: {config_file}") print(f"✅ Example configuration created: {config_file}")
@@ -196,30 +206,27 @@ Notes:
- 'date_from' and 'date_to' default to last year if not specified - 'date_from' and 'date_to' default to last year if not specified
- 'type_ids' defaults to [15] (snapshot type) - 'type_ids' defaults to [15] (snapshot type)
- 'max_pages' limits pages fetched (useful for testing) - 'max_pages' limits pages fetched (useful for testing)
""" """,
)
parser.add_argument("--config", help="Path to the JSON configuration file")
parser.add_argument(
"--create-example",
action="store_true",
help="Create an example configuration file and exit",
) )
parser.add_argument( parser.add_argument(
'--config', "--show-config",
help='Path to the JSON configuration file' action="store_true",
help="Show configuration summary before downloading",
) )
parser.add_argument( parser.add_argument(
'--create-example', "--debug",
action='store_true', action="store_true",
help='Create an example configuration file and exit' help="Enable debug mode with detailed server response logging",
)
parser.add_argument(
'--show-config',
action='store_true',
help='Show configuration summary before downloading'
)
parser.add_argument(
'--debug',
action='store_true',
help='Enable debug mode with detailed server response logging'
) )
args = parser.parse_args() args = parser.parse_args()

View File

@@ -26,20 +26,30 @@ import hashlib
# Import the auth manager and asset tracker # Import the auth manager and asset tracker
try: try:
from auth_manager import AuthManager from src.auth_manager import AuthManager
except ImportError: except ImportError:
AuthManager = None AuthManager = None
try: try:
from asset_tracker import AssetTracker from src.asset_tracker import AssetTracker
except ImportError: except ImportError:
AssetTracker = None AssetTracker = None
class ImageDownloader: class ImageDownloader:
def __init__(self, api_url: str, list_endpoint: str, download_endpoint: str, def __init__(
output_dir: str, max_concurrent: int = 5, timeout: int = 30, api_key: str = None, self,
email: str = None, password: str = None, track_assets: bool = True): api_url: str,
list_endpoint: str,
download_endpoint: str,
output_dir: str,
max_concurrent: int = 5,
timeout: int = 30,
api_key: str = None,
email: str = None,
password: str = None,
track_assets: bool = True,
):
""" """
Initialize the image downloader. Initialize the image downloader.
@@ -55,9 +65,9 @@ class ImageDownloader:
password: Password for login authentication password: Password for login authentication
track_assets: Whether to enable asset tracking to avoid re-downloads track_assets: Whether to enable asset tracking to avoid re-downloads
""" """
self.api_url = api_url.rstrip('/') self.api_url = api_url.rstrip("/")
self.list_endpoint = list_endpoint.lstrip('/') self.list_endpoint = list_endpoint.lstrip("/")
self.download_endpoint = download_endpoint.lstrip('/') self.download_endpoint = download_endpoint.lstrip("/")
self.output_dir = Path(output_dir) self.output_dir = Path(output_dir)
self.max_concurrent = max_concurrent self.max_concurrent = max_concurrent
self.timeout = timeout self.timeout = timeout
@@ -72,11 +82,11 @@ class ImageDownloader:
# Setup logging # Setup logging
logging.basicConfig( logging.basicConfig(
level=logging.INFO, level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s', format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[ handlers=[
logging.FileHandler(self.output_dir / 'download.log'), logging.FileHandler(self.output_dir / "download.log"),
logging.StreamHandler() logging.StreamHandler(),
] ],
) )
self.logger = logging.getLogger(__name__) self.logger = logging.getLogger(__name__)
@@ -86,17 +96,14 @@ class ImageDownloader:
self.asset_tracker = AssetTracker(storage_dir=str(self.output_dir)) self.asset_tracker = AssetTracker(storage_dir=str(self.output_dir))
self.logger.info("Asset tracking enabled") self.logger.info("Asset tracking enabled")
elif track_assets: elif track_assets:
self.logger.warning("Asset tracking requested but AssetTracker not available") self.logger.warning(
"Asset tracking requested but AssetTracker not available"
)
else: else:
self.logger.info("Asset tracking disabled") self.logger.info("Asset tracking disabled")
# Track download statistics # Track download statistics
self.stats = { self.stats = {"total": 0, "successful": 0, "failed": 0, "skipped": 0}
'total': 0,
'successful': 0,
'failed': 0,
'skipped': 0
}
async def authenticate(self): async def authenticate(self):
"""Perform login authentication if credentials are provided.""" """Perform login authentication if credentials are provided."""
@@ -111,10 +118,16 @@ class ImageDownloader:
self.logger.error("Login authentication failed") self.logger.error("Login authentication failed")
raise Exception("Login authentication failed") raise Exception("Login authentication failed")
elif self.email or self.password: elif self.email or self.password:
self.logger.warning("Both email and password must be provided for login authentication") self.logger.warning(
raise Exception("Both email and password must be provided for login authentication") "Both email and password must be provided for login authentication"
)
raise Exception(
"Both email and password must be provided for login authentication"
)
async def get_asset_list(self, session: aiohttp.ClientSession) -> List[Dict[str, Any]]: async def get_asset_list(
self, session: aiohttp.ClientSession
) -> List[Dict[str, Any]]:
""" """
Fetch the list of assets from the API. Fetch the list of assets from the API.
@@ -132,13 +145,15 @@ class ImageDownloader:
# Use API key if provided # Use API key if provided
if self.api_key: if self.api_key:
headers['x-api-key'] = self.api_key headers["x-api-key"] = self.api_key
# Use login authentication if provided # Use login authentication if provided
elif self.auth_manager and self.auth_manager.is_authenticated(): elif self.auth_manager and self.auth_manager.is_authenticated():
headers.update(self.auth_manager.get_auth_headers()) headers.update(self.auth_manager.get_auth_headers())
async with session.get(url, headers=headers, timeout=self.timeout) as response: async with session.get(
url, headers=headers, timeout=self.timeout
) as response:
response.raise_for_status() response.raise_for_status()
data = await response.json() data = await response.json()
@@ -147,12 +162,12 @@ class ImageDownloader:
assets = data assets = data
elif isinstance(data, dict): elif isinstance(data, dict):
# Common patterns for API responses # Common patterns for API responses
if 'data' in data: if "data" in data:
assets = data['data'] assets = data["data"]
elif 'results' in data: elif "results" in data:
assets = data['results'] assets = data["results"]
elif 'items' in data: elif "items" in data:
assets = data['items'] assets = data["items"]
else: else:
assets = [data] # Single asset assets = [data] # Single asset
else: else:
@@ -179,7 +194,7 @@ class ImageDownloader:
asset_id = None asset_id = None
# Common field names for asset identifiers # Common field names for asset identifiers
id_fields = ['id', 'asset_id', 'image_id', 'file_id', 'uuid', 'key'] id_fields = ["id", "asset_id", "image_id", "file_id", "uuid", "key"]
for field in id_fields: for field in id_fields:
if field in asset: if field in asset:
asset_id = asset[field] asset_id = asset[field]
@@ -192,12 +207,11 @@ class ImageDownloader:
# Build download URL with required parameters # Build download URL with required parameters
from urllib.parse import urlencode from urllib.parse import urlencode
params = { params = {"key": self.api_key, "u": asset.get("updated", "")}
'key': self.api_key,
'u': asset.get('updated', '')
}
download_url = urljoin(self.api_url, f"/v1/media/{asset_id}/full?{urlencode(params)}") download_url = urljoin(
self.api_url, f"/v1/media/{asset_id}/full?{urlencode(params)}"
)
return download_url return download_url
def get_filename(self, asset: Dict[str, Any], url: str) -> str: def get_filename(self, asset: Dict[str, Any], url: str) -> str:
@@ -212,27 +226,27 @@ class ImageDownloader:
Filename for the asset Filename for the asset
""" """
# Try to get filename from asset metadata # Try to get filename from asset metadata
if 'fileName' in asset: if "fileName" in asset:
filename = asset['fileName'] filename = asset["fileName"]
elif 'filename' in asset: elif "filename" in asset:
filename = asset['filename'] filename = asset["filename"]
elif 'name' in asset: elif "name" in asset:
filename = asset['name'] filename = asset["name"]
elif 'title' in asset: elif "title" in asset:
filename = asset['title'] filename = asset["title"]
else: else:
# Extract filename from URL # Extract filename from URL
parsed_url = urlparse(url) parsed_url = urlparse(url)
filename = os.path.basename(parsed_url.path) filename = os.path.basename(parsed_url.path)
# If no extension, try to get it from content-type or add default # If no extension, try to get it from content-type or add default
if '.' not in filename: if "." not in filename:
if 'mimeType' in asset: if "mimeType" in asset:
ext = self._get_extension_from_mime(asset['mimeType']) ext = self._get_extension_from_mime(asset["mimeType"])
elif 'content_type' in asset: elif "content_type" in asset:
ext = self._get_extension_from_mime(asset['content_type']) ext = self._get_extension_from_mime(asset["content_type"])
else: else:
ext = '.jpg' # Default extension ext = ".jpg" # Default extension
filename += ext filename += ext
# Sanitize filename # Sanitize filename
@@ -251,35 +265,39 @@ class ImageDownloader:
def _get_extension_from_mime(self, mime_type: str) -> str: def _get_extension_from_mime(self, mime_type: str) -> str:
"""Get file extension from MIME type.""" """Get file extension from MIME type."""
mime_to_ext = { mime_to_ext = {
'image/jpeg': '.jpg', "image/jpeg": ".jpg",
'image/jpg': '.jpg', "image/jpg": ".jpg",
'image/png': '.png', "image/png": ".png",
'image/gif': '.gif', "image/gif": ".gif",
'image/webp': '.webp', "image/webp": ".webp",
'image/bmp': '.bmp', "image/bmp": ".bmp",
'image/tiff': '.tiff', "image/tiff": ".tiff",
'image/svg+xml': '.svg' "image/svg+xml": ".svg",
} }
return mime_to_ext.get(mime_type.lower(), '.jpg') return mime_to_ext.get(mime_type.lower(), ".jpg")
def _sanitize_filename(self, filename: str) -> str: def _sanitize_filename(self, filename: str) -> str:
"""Sanitize filename by removing invalid characters.""" """Sanitize filename by removing invalid characters."""
# Remove or replace invalid characters # Remove or replace invalid characters
invalid_chars = '<>:"/\\|?*' invalid_chars = '<>:"/\\|?*'
for char in invalid_chars: for char in invalid_chars:
filename = filename.replace(char, '_') filename = filename.replace(char, "_")
# Remove leading/trailing spaces and dots # Remove leading/trailing spaces and dots
filename = filename.strip('. ') filename = filename.strip(". ")
# Ensure filename is not empty # Ensure filename is not empty
if not filename: if not filename:
filename = 'image' filename = "image"
return filename return filename
async def download_asset(self, session: aiohttp.ClientSession, asset: Dict[str, Any], async def download_asset(
semaphore: asyncio.Semaphore) -> bool: self,
session: aiohttp.ClientSession,
asset: Dict[str, Any],
semaphore: asyncio.Semaphore,
) -> bool:
""" """
Download a single asset. Download a single asset.
@@ -300,7 +318,7 @@ class ImageDownloader:
# Check if file already exists and we're not tracking assets # Check if file already exists and we're not tracking assets
if filepath.exists() and not self.asset_tracker: if filepath.exists() and not self.asset_tracker:
self.logger.info(f"Skipping {filename} (already exists)") self.logger.info(f"Skipping {filename} (already exists)")
self.stats['skipped'] += 1 self.stats["skipped"] += 1
return True return True
self.logger.info(f"Downloading {filename} from {download_url}") self.logger.info(f"Downloading {filename} from {download_url}")
@@ -309,34 +327,46 @@ class ImageDownloader:
response.raise_for_status() response.raise_for_status()
# Get content type to verify it's an image # Get content type to verify it's an image
content_type = response.headers.get('content-type', '') content_type = response.headers.get("content-type", "")
if not content_type.startswith('image/'): if not content_type.startswith("image/"):
self.logger.warning(f"Content type is not an image: {content_type}") self.logger.warning(
f"Content type is not an image: {content_type}"
)
# Download the file # Download the file
async with aiofiles.open(filepath, 'wb') as f: async with aiofiles.open(filepath, "wb") as f:
async for chunk in response.content.iter_chunked(8192): async for chunk in response.content.iter_chunked(8192):
await f.write(chunk) await f.write(chunk)
# Set file modification time to match the updated timestamp # Set file modification time to match the updated timestamp
if 'updated' in asset: if "updated" in asset:
try: try:
from datetime import datetime from datetime import datetime
import os import os
# Parse the ISO timestamp # Parse the ISO timestamp
updated_time = datetime.fromisoformat(asset['updated'].replace('Z', '+00:00')) updated_time = datetime.fromisoformat(
asset["updated"].replace("Z", "+00:00")
)
# Set file modification time # Set file modification time
os.utime(filepath, (updated_time.timestamp(), updated_time.timestamp())) os.utime(
self.logger.info(f"Set file modification time to {asset['updated']}") filepath,
(updated_time.timestamp(), updated_time.timestamp()),
)
self.logger.info(
f"Set file modification time to {asset['updated']}"
)
except Exception as e: except Exception as e:
self.logger.warning(f"Failed to set file modification time: {e}") self.logger.warning(
f"Failed to set file modification time: {e}"
)
# Mark asset as downloaded in tracker # Mark asset as downloaded in tracker
if self.asset_tracker: if self.asset_tracker:
self.asset_tracker.mark_asset_downloaded(asset, filepath, True) self.asset_tracker.mark_asset_downloaded(asset, filepath, True)
self.logger.info(f"Successfully downloaded {filename}") self.logger.info(f"Successfully downloaded {filename}")
self.stats['successful'] += 1 self.stats["successful"] += 1
return True return True
except Exception as e: except Exception as e:
@@ -347,8 +377,10 @@ class ImageDownloader:
filepath = self.output_dir / filename filepath = self.output_dir / filename
self.asset_tracker.mark_asset_downloaded(asset, filepath, False) self.asset_tracker.mark_asset_downloaded(asset, filepath, False)
self.logger.error(f"Failed to download asset {asset.get('id', 'unknown')}: {e}") self.logger.error(
self.stats['failed'] += 1 f"Failed to download asset {asset.get('id', 'unknown')}: {e}"
)
self.stats["failed"] += 1
return False return False
async def download_all_assets(self, force_redownload: bool = False): async def download_all_assets(self, force_redownload: bool = False):
@@ -364,7 +396,9 @@ class ImageDownloader:
connector = aiohttp.TCPConnector(limit=100, limit_per_host=30) connector = aiohttp.TCPConnector(limit=100, limit_per_host=30)
timeout = aiohttp.ClientTimeout(total=self.timeout) timeout = aiohttp.ClientTimeout(total=self.timeout)
async with aiohttp.ClientSession(connector=connector, timeout=timeout) as session: async with aiohttp.ClientSession(
connector=connector, timeout=timeout
) as session:
try: try:
# Perform authentication if needed # Perform authentication if needed
await self.authenticate() await self.authenticate()
@@ -380,24 +414,27 @@ class ImageDownloader:
# Filter for new/modified assets if tracking is enabled # Filter for new/modified assets if tracking is enabled
if self.asset_tracker and not force_redownload: if self.asset_tracker and not force_redownload:
assets = self.asset_tracker.get_new_assets(all_assets) assets = self.asset_tracker.get_new_assets(all_assets)
self.logger.info(f"Found {len(assets)} new/modified assets to download") self.logger.info(
f"Found {len(assets)} new/modified assets to download"
)
if len(assets) == 0: if len(assets) == 0:
self.logger.info("All assets are up to date!") self.logger.info("All assets are up to date!")
return return
else: else:
assets = all_assets assets = all_assets
if force_redownload: if force_redownload:
self.logger.info("Force redownload enabled - downloading all assets") self.logger.info(
"Force redownload enabled - downloading all assets"
)
self.stats['total'] = len(assets) self.stats["total"] = len(assets)
# Create semaphore to limit concurrent downloads # Create semaphore to limit concurrent downloads
semaphore = asyncio.Semaphore(self.max_concurrent) semaphore = asyncio.Semaphore(self.max_concurrent)
# Create tasks for all downloads # Create tasks for all downloads
tasks = [ tasks = [
self.download_asset(session, asset, semaphore) self.download_asset(session, asset, semaphore) for asset in assets
for asset in assets
] ]
# Download all assets with progress bar # Download all assets with progress bar
@@ -405,11 +442,13 @@ class ImageDownloader:
for coro in asyncio.as_completed(tasks): for coro in asyncio.as_completed(tasks):
result = await coro result = await coro
pbar.update(1) pbar.update(1)
pbar.set_postfix({ pbar.set_postfix(
'Success': self.stats['successful'], {
'Failed': self.stats['failed'], "Success": self.stats["successful"],
'Skipped': self.stats['skipped'] "Failed": self.stats["failed"],
}) "Skipped": self.stats["skipped"],
}
)
except Exception as e: except Exception as e:
self.logger.error(f"Error during download process: {e}") self.logger.error(f"Error during download process: {e}")
@@ -441,84 +480,75 @@ Examples:
--output-dir "./images" \\ --output-dir "./images" \\
--max-concurrent 10 \\ --max-concurrent 10 \\
--timeout 60 --timeout 60
""" """,
) )
parser.add_argument( parser.add_argument(
'--api-url', "--api-url",
required=True, required=True,
help='Base URL of the API (e.g., https://api.example.com)' help="Base URL of the API (e.g., https://api.example.com)",
) )
parser.add_argument( parser.add_argument(
'--list-endpoint', "--list-endpoint",
required=True, required=True,
help='Endpoint to get the list of assets (e.g., /assets or /images)' help="Endpoint to get the list of assets (e.g., /assets or /images)",
) )
parser.add_argument( parser.add_argument(
'--download-endpoint', "--download-endpoint",
required=True, required=True,
help='Endpoint to download individual assets (e.g., /download or /assets)' help="Endpoint to download individual assets (e.g., /download or /assets)",
) )
parser.add_argument( parser.add_argument(
'--output-dir', "--output-dir", required=True, help="Directory to save downloaded images"
required=True,
help='Directory to save downloaded images'
) )
parser.add_argument( parser.add_argument(
'--max-concurrent', "--max-concurrent",
type=int, type=int,
default=5, default=5,
help='Maximum number of concurrent downloads (default: 5)' help="Maximum number of concurrent downloads (default: 5)",
) )
parser.add_argument( parser.add_argument(
'--timeout', "--timeout",
type=int, type=int,
default=30, default=30,
help='Request timeout in seconds (default: 30)' help="Request timeout in seconds (default: 30)",
) )
parser.add_argument( parser.add_argument(
'--api-key', "--api-key", help="API key for authentication (x-api-key header)"
help='API key for authentication (x-api-key header)' )
parser.add_argument("--email", help="Email for login authentication")
parser.add_argument("--password", help="Password for login authentication")
parser.add_argument(
"--no-tracking",
action="store_true",
help="Disable asset tracking (will re-download all assets)",
) )
parser.add_argument( parser.add_argument(
'--email', "--force-redownload",
help='Email for login authentication' action="store_true",
help="Force re-download of all assets, even if already tracked",
) )
parser.add_argument( parser.add_argument(
'--password', "--show-stats",
help='Password for login authentication' action="store_true",
help="Show asset tracking statistics and exit",
) )
parser.add_argument( parser.add_argument(
'--no-tracking', "--cleanup",
action='store_true', action="store_true",
help='Disable asset tracking (will re-download all assets)' help="Clean up metadata for missing files and exit",
)
parser.add_argument(
'--force-redownload',
action='store_true',
help='Force re-download of all assets, even if already tracked'
)
parser.add_argument(
'--show-stats',
action='store_true',
help='Show asset tracking statistics and exit'
)
parser.add_argument(
'--cleanup',
action='store_true',
help='Clean up metadata for missing files and exit'
) )
args = parser.parse_args() args = parser.parse_args()
@@ -546,11 +576,13 @@ Examples:
api_key=args.api_key, api_key=args.api_key,
email=args.email, email=args.email,
password=args.password, password=args.password,
track_assets=not args.no_tracking track_assets=not args.no_tracking,
) )
try: try:
asyncio.run(downloader.download_all_assets(force_redownload=args.force_redownload)) asyncio.run(
downloader.download_all_assets(force_redownload=args.force_redownload)
)
except KeyboardInterrupt: except KeyboardInterrupt:
print("\nDownload interrupted by user") print("\nDownload interrupted by user")
except Exception as e: except Exception as e:

View File

@@ -21,15 +21,21 @@ import aiofiles
# Import the auth manager # Import the auth manager
try: try:
from auth_manager import AuthManager from src.auth_manager import AuthManager
except ImportError: except ImportError:
AuthManager = None AuthManager = None
class SnapshotDownloader: class SnapshotDownloader:
def __init__(self, api_url: str = "https://api.parentzone.me", def __init__(
output_dir: str = "snapshots", api_key: str = None, self,
email: str = None, password: str = None, debug_mode: bool = False): api_url: str = "https://api.parentzone.me",
output_dir: str = "snapshots",
api_key: str = None,
email: str = None,
password: str = None,
debug_mode: bool = False,
):
""" """
Initialize the snapshot downloader. Initialize the snapshot downloader.
@@ -41,7 +47,7 @@ class SnapshotDownloader:
password: Password for login authentication password: Password for login authentication
debug_mode: Enable detailed server response logging debug_mode: Enable detailed server response logging
""" """
self.api_url = api_url.rstrip('/') self.api_url = api_url.rstrip("/")
self.snapshots_endpoint = "/v1/posts" self.snapshots_endpoint = "/v1/posts"
self.output_dir = Path(output_dir) self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True) self.output_dir.mkdir(parents=True, exist_ok=True)
@@ -62,39 +68,36 @@ class SnapshotDownloader:
# Standard headers based on the curl command # Standard headers based on the curl command
self.headers = { self.headers = {
'accept': 'application/json, text/plain, */*', "accept": "application/json, text/plain, */*",
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8,ro;q=0.7', "accept-language": "en-GB,en-US;q=0.9,en;q=0.8,ro;q=0.7",
'origin': 'https://parentzone.me', "origin": "https://parentzone.me",
'priority': 'u=1, i', "priority": "u=1, i",
'sec-ch-ua': '"Not;A=Brand";v="99", "Google Chrome";v="139", "Chromium";v="139"', "sec-ch-ua": '"Not;A=Brand";v="99", "Google Chrome";v="139", "Chromium";v="139"',
'sec-ch-ua-mobile': '?0', "sec-ch-ua-mobile": "?0",
'sec-ch-ua-platform': '"macOS"', "sec-ch-ua-platform": '"macOS"',
'sec-fetch-dest': 'empty', "sec-fetch-dest": "empty",
'sec-fetch-mode': 'cors', "sec-fetch-mode": "cors",
'sec-fetch-site': 'same-site', "sec-fetch-site": "same-site",
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36', "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36",
'x-client-version': '3.54.0' "x-client-version": "3.54.0",
} }
# Statistics # Statistics
self.stats = { self.stats = {
'total_snapshots': 0, "total_snapshots": 0,
'pages_fetched': 0, "pages_fetched": 0,
'failed_requests': 0, "failed_requests": 0,
'generated_files': 0 "generated_files": 0,
} }
def setup_logging(self): def setup_logging(self):
"""Setup logging configuration.""" """Setup logging configuration."""
log_file = self.output_dir / 'snapshots.log' log_file = self.output_dir / "snapshots.log"
logging.basicConfig( logging.basicConfig(
level=logging.INFO, level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s', format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[ handlers=[logging.FileHandler(log_file), logging.StreamHandler()],
logging.FileHandler(log_file),
logging.StreamHandler()
]
) )
self.logger = logging.getLogger(__name__) self.logger = logging.getLogger(__name__)
@@ -109,8 +112,8 @@ class SnapshotDownloader:
self.logger.info("Login authentication successful") self.logger.info("Login authentication successful")
# Use the API key from auth manager # Use the API key from auth manager
auth_headers = self.auth_manager.get_auth_headers() auth_headers = self.auth_manager.get_auth_headers()
if 'x-api-key' in auth_headers: if "x-api-key" in auth_headers:
self.api_key = auth_headers['x-api-key'] self.api_key = auth_headers["x-api-key"]
else: else:
self.logger.error("Login authentication failed") self.logger.error("Login authentication failed")
raise Exception("Login authentication failed") raise Exception("Login authentication failed")
@@ -120,18 +123,21 @@ class SnapshotDownloader:
headers = self.headers.copy() headers = self.headers.copy()
if self.api_key: if self.api_key:
headers['x-api-key'] = self.api_key headers["x-api-key"] = self.api_key
elif self.auth_manager and self.auth_manager.is_authenticated(): elif self.auth_manager and self.auth_manager.is_authenticated():
headers.update(self.auth_manager.get_auth_headers()) headers.update(self.auth_manager.get_auth_headers())
return headers return headers
async def fetch_snapshots_page(self, session: aiohttp.ClientSession, async def fetch_snapshots_page(
self,
session: aiohttp.ClientSession,
type_ids: List[int] = [15], type_ids: List[int] = [15],
date_from: str = "2021-10-18", date_from: str = "2021-10-18",
date_to: str = None, date_to: str = None,
cursor: str = None, cursor: str = None,
per_page: int = 100) -> Dict[str, Any]: per_page: int = 100,
) -> Dict[str, Any]:
""" """
Fetch a single page of snapshots from the API using cursor-based pagination. Fetch a single page of snapshots from the API using cursor-based pagination.
@@ -151,17 +157,17 @@ class SnapshotDownloader:
# Build query parameters # Build query parameters
params = { params = {
'dateFrom': date_from, "dateFrom": date_from,
'dateTo': date_to, "dateTo": date_to,
} }
# Add cursor for pagination (skip for first request) # Add cursor for pagination (skip for first request)
if cursor: if cursor:
params['cursor'] = cursor params["cursor"] = cursor
# Add type IDs - API expects typeIDs[]=15 format # Add type IDs - API expects typeIDs[]=15 format
for type_id in type_ids: for type_id in type_ids:
params[f'typeIDs[]'] = type_id params[f"typeIDs[]"] = type_id
# Build URL with parameters # Build URL with parameters
query_string = urlencode(params, doseq=True) query_string = urlencode(params, doseq=True)
@@ -184,21 +190,25 @@ class SnapshotDownloader:
print(f"Status Code: {response.status}") print(f"Status Code: {response.status}")
print(f"Headers: {dict(response.headers)}") print(f"Headers: {dict(response.headers)}")
print(f"Response Type: {type(data)}") print(f"Response Type: {type(data)}")
print(f"Response Keys: {list(data.keys()) if isinstance(data, dict) else 'Not a dict'}") print(
f"Response Keys: {list(data.keys()) if isinstance(data, dict) else 'Not a dict'}"
)
print(f"Posts count: {len(data.get('posts', []))}") print(f"Posts count: {len(data.get('posts', []))}")
print(f"Cursor: {data.get('cursor', 'None')}") print(f"Cursor: {data.get('cursor', 'None')}")
if len(data.get('posts', [])) <= 3: # Only print full data if few posts if (
len(data.get("posts", [])) <= 3
): # Only print full data if few posts
print(f"Full Response Data:") print(f"Full Response Data:")
print(json.dumps(data, indent=2, default=str)) print(json.dumps(data, indent=2, default=str))
print("=" * 50) print("=" * 50)
# The API returns snapshots in 'posts' field # The API returns snapshots in 'posts' field
snapshots = data.get('posts', []) snapshots = data.get("posts", [])
cursor_value = data.get('cursor') cursor_value = data.get("cursor")
page_info = f"cursor: {cursor[:20]}..." if cursor else "first page" page_info = f"cursor: {cursor[:20]}..." if cursor else "first page"
self.logger.info(f"Retrieved {len(snapshots)} snapshots ({page_info})") self.logger.info(f"Retrieved {len(snapshots)} snapshots ({page_info})")
self.stats['pages_fetched'] += 1 self.stats["pages_fetched"] += 1
# Return the actual API response format # Return the actual API response format
return data return data
@@ -206,14 +216,17 @@ class SnapshotDownloader:
except Exception as e: except Exception as e:
page_info = f"cursor: {cursor[:20]}..." if cursor else "first page" page_info = f"cursor: {cursor[:20]}..." if cursor else "first page"
self.logger.error(f"Failed to fetch snapshots ({page_info}): {e}") self.logger.error(f"Failed to fetch snapshots ({page_info}): {e}")
self.stats['failed_requests'] += 1 self.stats["failed_requests"] += 1
raise raise
async def fetch_all_snapshots(self, session: aiohttp.ClientSession, async def fetch_all_snapshots(
self,
session: aiohttp.ClientSession,
type_ids: List[int] = [15], type_ids: List[int] = [15],
date_from: str = "2021-10-18", date_from: str = "2021-10-18",
date_to: str = None, date_to: str = None,
max_pages: int = None) -> List[Dict[str, Any]]: max_pages: int = None,
) -> List[Dict[str, Any]]:
""" """
Fetch all snapshots across all pages using cursor-based pagination. Fetch all snapshots across all pages using cursor-based pagination.
@@ -231,7 +244,9 @@ class SnapshotDownloader:
cursor = None cursor = None
page_count = 0 page_count = 0
self.logger.info(f"Starting snapshot fetch from {date_from} to {date_to or 'now'}") self.logger.info(
f"Starting snapshot fetch from {date_from} to {date_to or 'now'}"
)
while True: while True:
page_count += 1 page_count += 1
@@ -246,17 +261,19 @@ class SnapshotDownloader:
) )
# Extract snapshots from response # Extract snapshots from response
snapshots = response.get('posts', []) snapshots = response.get("posts", [])
new_cursor = response.get('cursor') new_cursor = response.get("cursor")
if not snapshots: if not snapshots:
self.logger.info("No more snapshots found (empty posts array)") self.logger.info("No more snapshots found (empty posts array)")
break break
all_snapshots.extend(snapshots) all_snapshots.extend(snapshots)
self.stats['total_snapshots'] += len(snapshots) self.stats["total_snapshots"] += len(snapshots)
self.logger.info(f"Page {page_count}: {len(snapshots)} snapshots (total: {len(all_snapshots)})") self.logger.info(
f"Page {page_count}: {len(snapshots)} snapshots (total: {len(all_snapshots)})"
)
# If no cursor returned, we've reached the end # If no cursor returned, we've reached the end
if not new_cursor: if not new_cursor:
@@ -273,7 +290,9 @@ class SnapshotDownloader:
self.logger.info(f"Total snapshots fetched: {len(all_snapshots)}") self.logger.info(f"Total snapshots fetched: {len(all_snapshots)}")
return all_snapshots return all_snapshots
async def format_snapshot_html(self, snapshot: Dict[str, Any], session: aiohttp.ClientSession) -> str: async def format_snapshot_html(
self, snapshot: Dict[str, Any], session: aiohttp.ClientSession
) -> str:
""" """
Format a single snapshot as HTML. Format a single snapshot as HTML.
@@ -284,34 +303,46 @@ class SnapshotDownloader:
HTML string for the snapshot HTML string for the snapshot
""" """
# Extract key information from ParentZone snapshot format # Extract key information from ParentZone snapshot format
snapshot_id = snapshot.get('id', 'unknown') snapshot_id = snapshot.get("id", "unknown")
content = snapshot.get('notes', '') # Don't escape HTML in notes field content = snapshot.get("notes", "") # Don't escape HTML in notes field
start_time = snapshot.get('startTime', '') start_time = snapshot.get("startTime", "")
snapshot_type = snapshot.get('type', 'Snapshot') snapshot_type = snapshot.get("type", "Snapshot")
# Format dates # Format dates
start_date = self.format_date(start_time) if start_time else 'Unknown' start_date = self.format_date(start_time) if start_time else "Unknown"
# Extract additional information # Extract additional information
author = snapshot.get('author', {}) author = snapshot.get("author", {})
author_forename = author.get('forename', '') if author else '' author_forename = author.get("forename", "") if author else ""
author_surname = author.get('surname', '') if author else '' author_surname = author.get("surname", "") if author else ""
author_name = html.escape(f"{author_forename} {author_surname}".strip()) if author else 'Unknown' author_name = (
html.escape(f"{author_forename} {author_surname}".strip())
if author
else "Unknown"
)
# Extract child information (if any) # Extract child information (if any)
child = snapshot.get('child', {}) child = snapshot.get("child", {})
child_forename = child.get('forename', '') if child else '' child_forename = child.get("forename", "") if child else ""
child_name = html.escape(f"{child.get('forename', '')} {child.get('surname', '')}".strip()) if child else '' child_name = (
html.escape(
f"{child.get('forename', '')} {child.get('surname', '')}".strip()
)
if child
else ""
)
# Create title in format: "Child Forename by Author Forename Surname" # Create title in format: "Child Forename by Author Forename Surname"
if child_forename and author_forename: if child_forename and author_forename:
title = html.escape(f"{child_forename} by {author_forename} {author_surname}".strip()) title = html.escape(
f"{child_forename} by {author_forename} {author_surname}".strip()
)
else: else:
title = html.escape(f"Snapshot {snapshot_id}") title = html.escape(f"Snapshot {snapshot_id}")
# Extract location/activity information # Extract location/activity information
activity = snapshot.get('activity', {}) activity = snapshot.get("activity", {})
activity_name = html.escape(activity.get('name', '')) if activity else '' activity_name = html.escape(activity.get("name", "")) if activity else ""
# Build HTML # Build HTML
html_content = f""" html_content = f"""
@@ -327,12 +358,12 @@ class SnapshotDownloader:
</div> </div>
<div class="snapshot-content"> <div class="snapshot-content">
{f'<div class="snapshot-author">👤 Author: {author_name}</div>' if author_name != 'Unknown' else ''} {f'<div class="snapshot-author">👤 Author: {author_name}</div>' if author_name != "Unknown" else ""}
{f'<div class="snapshot-child">👶 Child: {child_name}</div>' if child_name else ''} {f'<div class="snapshot-child">👶 Child: {child_name}</div>' if child_name else ""}
{f'<div class="snapshot-activity">🎯 Activity: {activity_name}</div>' if activity_name else ''} {f'<div class="snapshot-activity">🎯 Activity: {activity_name}</div>' if activity_name else ""}
<div class="snapshot-description"> <div class="snapshot-description">
<div class="notes-content">{content if content else '<em>No description provided</em>'}</div> <div class="notes-content">{content if content else "<em>No description provided</em>"}</div>
</div> </div>
{await self.format_snapshot_media(snapshot, session)} {await self.format_snapshot_media(snapshot, session)}
@@ -343,112 +374,130 @@ class SnapshotDownloader:
return html_content.strip() return html_content.strip()
async def format_snapshot_media(self, snapshot: Dict[str, Any], session: aiohttp.ClientSession) -> str: async def format_snapshot_media(
self, snapshot: Dict[str, Any], session: aiohttp.ClientSession
) -> str:
"""Format media attachments for a snapshot.""" """Format media attachments for a snapshot."""
media_html = "" media_html = ""
# Check for media (images and other files) # Check for media (images and other files)
media = snapshot.get('media', []) media = snapshot.get("media", [])
images = [m for m in media if m.get('type') == 'image'] images = [m for m in media if m.get("type") == "image"]
if images: if images:
media_html += '<div class="snapshot-images">\n' media_html += '<div class="snapshot-images">\n'
media_html += '<h4>📸 Images:</h4>\n' media_html += "<h4>📸 Images:</h4>\n"
media_html += '<div class="image-grid">\n' media_html += '<div class="image-grid">\n'
for image in images: for image in images:
# Download the image file # Download the image file
local_path = await self.download_media_file(session, image) local_path = await self.download_media_file(session, image)
image_name = html.escape(image.get('fileName', 'Image')) image_name = html.escape(image.get("fileName", "Image"))
if local_path: if local_path:
media_html += f'<div class="image-item">\n' media_html += f'<div class="image-item">\n'
media_html += f' <img src="{local_path}" alt="{image_name}" loading="lazy">\n' media_html += f' <img src="{local_path}" alt="{image_name}" loading="lazy">\n'
media_html += f' <p class="image-caption">{image_name}</p>\n' media_html += f' <p class="image-caption">{image_name}</p>\n'
media_html += f' <p class="image-meta">Updated: {self.format_date(image.get("updated", ""))}</p>\n' media_html += f' <p class="image-meta">Updated: {self.format_date(image.get("updated", ""))}</p>\n'
media_html += f'</div>\n' media_html += f"</div>\n"
else: else:
# Fallback to API URL if download failed # Fallback to API URL if download failed
image_url = f"{self.api_url}/v1/media/{image.get('id')}/full" if image.get('id') else '' image_url = (
f"{self.api_url}/v1/media/{image.get('id')}/full"
if image.get("id")
else ""
)
if image_url: if image_url:
media_html += f'<div class="image-item">\n' media_html += f'<div class="image-item">\n'
media_html += f' <img src="{image_url}" alt="{image_name}" loading="lazy">\n' media_html += f' <img src="{image_url}" alt="{image_name}" loading="lazy">\n'
media_html += f' <p class="image-caption">{image_name} (online)</p>\n' media_html += (
f' <p class="image-caption">{image_name} (online)</p>\n'
)
media_html += f' <p class="image-meta">Updated: {self.format_date(image.get("updated", ""))}</p>\n' media_html += f' <p class="image-meta">Updated: {self.format_date(image.get("updated", ""))}</p>\n'
media_html += f'</div>\n' media_html += f"</div>\n"
media_html += '</div>\n</div>\n' media_html += "</div>\n</div>\n"
# Check for non-image media as attachments # Check for non-image media as attachments
attachments = [m for m in media if m.get('type') != 'image'] attachments = [m for m in media if m.get("type") != "image"]
if attachments: if attachments:
media_html += '<div class="snapshot-attachments">\n' media_html += '<div class="snapshot-attachments">\n'
media_html += '<h4>📎 Attachments:</h4>\n' media_html += "<h4>📎 Attachments:</h4>\n"
media_html += '<ul class="attachment-list">\n' media_html += '<ul class="attachment-list">\n'
for attachment in attachments: for attachment in attachments:
# Download the attachment file # Download the attachment file
local_path = await self.download_media_file(session, attachment) local_path = await self.download_media_file(session, attachment)
attachment_name = html.escape(attachment.get('fileName', 'Attachment')) attachment_name = html.escape(attachment.get("fileName", "Attachment"))
attachment_type = attachment.get('mimeType', 'unknown') attachment_type = attachment.get("mimeType", "unknown")
if local_path: if local_path:
media_html += f' <li><a href="{local_path}" target="_blank">{attachment_name} ({attachment_type})</a></li>\n' media_html += f' <li><a href="{local_path}" target="_blank">{attachment_name} ({attachment_type})</a></li>\n'
else: else:
# Fallback to API URL if download failed # Fallback to API URL if download failed
attachment_url = f"{self.api_url}/v1/media/{attachment.get('id')}/full" if attachment.get('id') else '' attachment_url = (
f"{self.api_url}/v1/media/{attachment.get('id')}/full"
if attachment.get("id")
else ""
)
if attachment_url: if attachment_url:
media_html += f' <li><a href="{attachment_url}" target="_blank">{attachment_name} ({attachment_type}) - online</a></li>\n' media_html += f' <li><a href="{attachment_url}" target="_blank">{attachment_name} ({attachment_type}) - online</a></li>\n'
else: else:
media_html += f' <li>{attachment_name} ({attachment_type})</li>\n' media_html += (
f" <li>{attachment_name} ({attachment_type})</li>\n"
)
media_html += '</ul>\n</div>\n' media_html += "</ul>\n</div>\n"
return media_html return media_html
def format_snapshot_metadata(self, snapshot: Dict[str, Any]) -> str: def format_snapshot_metadata(self, snapshot: Dict[str, Any]) -> str:
"""Format additional metadata for a snapshot.""" """Format additional metadata for a snapshot."""
metadata_html = '<div class="snapshot-metadata">\n' metadata_html = '<div class="snapshot-metadata">\n'
metadata_html += '<h4> Additional Information:</h4>\n' metadata_html += "<h4> Additional Information:</h4>\n"
metadata_html += '<div class="metadata-grid">\n' metadata_html += '<div class="metadata-grid">\n'
# Add any additional fields that might be interesting # Add any additional fields that might be interesting
metadata_fields = [ metadata_fields = [
('code', 'Code'), ("code", "Code"),
('frameworkIndicatorCount', 'Framework Indicators'), ("frameworkIndicatorCount", "Framework Indicators"),
('signed', 'Signed Status'), ("signed", "Signed Status"),
('type', 'Type') ("type", "Type"),
] ]
for field, label in metadata_fields: for field, label in metadata_fields:
value = snapshot.get(field) value = snapshot.get(field)
if value: if value:
if isinstance(value, list): if isinstance(value, list):
value = ', '.join(str(v) for v in value) value = ", ".join(str(v) for v in value)
metadata_html += f'<div class="metadata-item">\n' metadata_html += f'<div class="metadata-item">\n'
metadata_html += f' <strong>{label}:</strong> {html.escape(str(value))}\n' metadata_html += (
metadata_html += f'</div>\n' f" <strong>{label}:</strong> {html.escape(str(value))}\n"
)
metadata_html += f"</div>\n"
# Raw JSON data (collapsed by default) # Raw JSON data (collapsed by default)
metadata_html += '<details class="raw-data">\n' metadata_html += '<details class="raw-data">\n'
metadata_html += '<summary>🔍 Raw JSON Data</summary>\n' metadata_html += "<summary>🔍 Raw JSON Data</summary>\n"
metadata_html += '<pre class="json-data">\n' metadata_html += '<pre class="json-data">\n'
metadata_html += html.escape(json.dumps(snapshot, indent=2, default=str)) metadata_html += html.escape(json.dumps(snapshot, indent=2, default=str))
metadata_html += '\n</pre>\n' metadata_html += "\n</pre>\n"
metadata_html += '</details>\n' metadata_html += "</details>\n"
metadata_html += '</div>\n</div>\n' metadata_html += "</div>\n</div>\n"
return metadata_html return metadata_html
def format_date(self, date_string: str) -> str: def format_date(self, date_string: str) -> str:
"""Format a date string for display.""" """Format a date string for display."""
try: try:
# Try to parse ISO format date # Try to parse ISO format date
dt = datetime.fromisoformat(date_string.replace('Z', '+00:00')) dt = datetime.fromisoformat(date_string.replace("Z", "+00:00"))
return dt.strftime("%Y-%m-%d %H:%M:%S") return dt.strftime("%Y-%m-%d %H:%M:%S")
except: except:
return date_string return date_string
async def download_media_file(self, session: aiohttp.ClientSession, media: Dict[str, Any]) -> Optional[str]: async def download_media_file(
self, session: aiohttp.ClientSession, media: Dict[str, Any]
) -> Optional[str]:
""" """
Download a media file to the assets folder. Download a media file to the assets folder.
@@ -459,11 +508,11 @@ class SnapshotDownloader:
Returns: Returns:
Relative path to downloaded file, or None if download failed Relative path to downloaded file, or None if download failed
""" """
media_id = media.get('id') media_id = media.get("id")
if not media_id: if not media_id:
return None return None
filename = media.get('fileName', f'media_{media_id}') filename = media.get("fileName", f"media_{media_id}")
# Sanitize filename # Sanitize filename
filename = self._sanitize_filename(filename) filename = self._sanitize_filename(filename)
@@ -480,11 +529,13 @@ class SnapshotDownloader:
self.logger.info(f"Downloading media file: {filename}") self.logger.info(f"Downloading media file: {filename}")
headers = self.get_auth_headers() headers = self.get_auth_headers()
async with session.get(download_url, headers=headers, timeout=30) as response: async with session.get(
download_url, headers=headers, timeout=30
) as response:
response.raise_for_status() response.raise_for_status()
# Download the file # Download the file
async with aiofiles.open(filepath, 'wb') as f: async with aiofiles.open(filepath, "wb") as f:
async for chunk in response.content.iter_chunked(8192): async for chunk in response.content.iter_chunked(8192):
await f.write(chunk) await f.write(chunk)
@@ -502,19 +553,20 @@ class SnapshotDownloader:
# Remove or replace invalid characters # Remove or replace invalid characters
invalid_chars = '<>:"/\\|?*' invalid_chars = '<>:"/\\|?*'
for char in invalid_chars: for char in invalid_chars:
filename = filename.replace(char, '_') filename = filename.replace(char, "_")
# Remove leading/trailing spaces and dots # Remove leading/trailing spaces and dots
filename = filename.strip('. ') filename = filename.strip(". ")
# Ensure filename is not empty # Ensure filename is not empty
if not filename: if not filename:
filename = 'media_file' filename = "media_file"
return filename return filename
async def generate_html_file(self, snapshots: List[Dict[str, Any]], async def generate_html_file(
date_from: str, date_to: str) -> Path: self, snapshots: List[Dict[str, Any]], date_from: str, date_to: str
) -> Path:
""" """
Generate an HTML file containing all snapshots. Generate an HTML file containing all snapshots.
@@ -528,9 +580,7 @@ class SnapshotDownloader:
""" """
# Sort snapshots by start time (newest first) # Sort snapshots by start time (newest first)
sorted_snapshots = sorted( sorted_snapshots = sorted(
snapshots, snapshots, key=lambda x: x.get("startTime", ""), reverse=True
key=lambda x: x.get('startTime', ''),
reverse=True
) )
# Generate filename # Generate filename
@@ -538,19 +588,22 @@ class SnapshotDownloader:
filepath = self.output_dir / filename filepath = self.output_dir / filename
# Generate HTML content # Generate HTML content
html_content = await self.generate_html_template(sorted_snapshots, date_from, date_to) html_content = await self.generate_html_template(
sorted_snapshots, date_from, date_to
)
# Write to file # Write to file
with open(filepath, 'w', encoding='utf-8') as f: with open(filepath, "w", encoding="utf-8") as f:
f.write(html_content) f.write(html_content)
self.logger.info(f"Generated HTML file: {filepath}") self.logger.info(f"Generated HTML file: {filepath}")
self.stats['generated_files'] += 1 self.stats["generated_files"] += 1
return filepath return filepath
async def generate_html_template(self, snapshots: List[Dict[str, Any]], async def generate_html_template(
date_from: str, date_to: str) -> str: self, snapshots: List[Dict[str, Any]], date_from: str, date_to: str
) -> str:
"""Generate the complete HTML template.""" """Generate the complete HTML template."""
# Generate individual snapshot HTML # Generate individual snapshot HTML
@@ -560,7 +613,9 @@ class SnapshotDownloader:
connector = aiohttp.TCPConnector(limit=100, limit_per_host=30) connector = aiohttp.TCPConnector(limit=100, limit_per_host=30)
timeout = aiohttp.ClientTimeout(total=30) timeout = aiohttp.ClientTimeout(total=30)
async with aiohttp.ClientSession(connector=connector, timeout=timeout) as session: async with aiohttp.ClientSession(
connector=connector, timeout=timeout
) as session:
# Authenticate session for media downloads # Authenticate session for media downloads
await self.authenticate() await self.authenticate()
@@ -604,7 +659,7 @@ class SnapshotDownloader:
<footer class="page-footer"> <footer class="page-footer">
<p>Generated by ParentZone Snapshot Downloader</p> <p>Generated by ParentZone Snapshot Downloader</p>
<p>Total snapshots: {len(snapshots)} | Pages fetched: {self.stats['pages_fetched']}</p> <p>Total snapshots: {len(snapshots)} | Pages fetched: {self.stats["pages_fetched"]}</p>
</footer> </footer>
</div> </div>
@@ -991,9 +1046,13 @@ class SnapshotDownloader:
}); });
""" """
async def download_snapshots(self, type_ids: List[int] = [15], async def download_snapshots(
date_from: str = None, date_to: str = None, self,
max_pages: int = None) -> Path: type_ids: List[int] = [15],
date_from: str = None,
date_to: str = None,
max_pages: int = None,
) -> Path:
""" """
Download all snapshots and generate HTML file. Download all snapshots and generate HTML file.
@@ -1013,13 +1072,17 @@ class SnapshotDownloader:
if date_to is None: if date_to is None:
date_to = datetime.now().strftime("%Y-%m-%d") date_to = datetime.now().strftime("%Y-%m-%d")
self.logger.info(f"Starting snapshot download for period {date_from} to {date_to}") self.logger.info(
f"Starting snapshot download for period {date_from} to {date_to}"
)
# Create aiohttp session # Create aiohttp session
connector = aiohttp.TCPConnector(limit=100, limit_per_host=30) connector = aiohttp.TCPConnector(limit=100, limit_per_host=30)
timeout = aiohttp.ClientTimeout(total=30) timeout = aiohttp.ClientTimeout(total=30)
async with aiohttp.ClientSession(connector=connector, timeout=timeout) as session: async with aiohttp.ClientSession(
connector=connector, timeout=timeout
) as session:
try: try:
# Authenticate if needed # Authenticate if needed
await self.authenticate() await self.authenticate()
@@ -1077,64 +1140,53 @@ Examples:
# Specify output directory # Specify output directory
python3 snapshot_downloader.py --api-key KEY --output-dir ./my_snapshots python3 snapshot_downloader.py --api-key KEY --output-dir ./my_snapshots
""" """,
)
parser.add_argument("--api-key", help="API key for authentication")
parser.add_argument("--email", help="Email for login authentication")
parser.add_argument("--password", help="Password for login authentication")
parser.add_argument(
"--date-from", help="Start date in YYYY-MM-DD format (default: 1 year ago)"
) )
parser.add_argument( parser.add_argument(
'--api-key', "--date-to", help="End date in YYYY-MM-DD format (default: today)"
help='API key for authentication'
) )
parser.add_argument( parser.add_argument(
'--email', "--type-ids",
help='Email for login authentication' nargs="+",
)
parser.add_argument(
'--password',
help='Password for login authentication'
)
parser.add_argument(
'--date-from',
help='Start date in YYYY-MM-DD format (default: 1 year ago)'
)
parser.add_argument(
'--date-to',
help='End date in YYYY-MM-DD format (default: today)'
)
parser.add_argument(
'--type-ids',
nargs='+',
type=int, type=int,
default=[15], default=[15],
help='Type IDs to filter by (default: [15])' help="Type IDs to filter by (default: [15])",
) )
parser.add_argument( parser.add_argument(
'--output-dir', "--output-dir",
default='snapshots', default="snapshots",
help='Directory to save snapshot files (default: snapshots)' help="Directory to save snapshot files (default: snapshots)",
) )
parser.add_argument( parser.add_argument(
'--max-pages', "--max-pages",
type=int, type=int,
help='Maximum number of cursor pages to fetch (for testing)' help="Maximum number of cursor pages to fetch (for testing)",
) )
parser.add_argument( parser.add_argument(
'--api-url', "--api-url",
default='https://api.parentzone.me', default="https://api.parentzone.me",
help='ParentZone API URL (default: https://api.parentzone.me)' help="ParentZone API URL (default: https://api.parentzone.me)",
) )
parser.add_argument( parser.add_argument(
'--debug', "--debug",
action='store_true', action="store_true",
help='Enable debug mode with detailed server response logging' help="Enable debug mode with detailed server response logging",
) )
args = parser.parse_args() args = parser.parse_args()
@@ -1160,19 +1212,21 @@ Examples:
api_key=args.api_key, api_key=args.api_key,
email=args.email, email=args.email,
password=args.password, password=args.password,
debug_mode=args.debug debug_mode=args.debug,
) )
if args.debug: if args.debug:
print("🔍 DEBUG MODE ENABLED - Detailed server responses will be printed") print("🔍 DEBUG MODE ENABLED - Detailed server responses will be printed")
# Download snapshots # Download snapshots
html_file = asyncio.run(downloader.download_snapshots( html_file = asyncio.run(
downloader.download_snapshots(
type_ids=args.type_ids, type_ids=args.type_ids,
date_from=args.date_from, date_from=args.date_from,
date_to=args.date_to, date_to=args.date_to,
max_pages=args.max_pages max_pages=args.max_pages,
)) )
)
if html_file: if html_file:
print(f"\n✅ Success! Snapshots downloaded and saved to: {html_file}") print(f"\n✅ Success! Snapshots downloaded and saved to: {html_file}")