11 KiB
Asset Tracking System
This document describes the asset tracking system implemented for the ParentZone Downloader, which intelligently identifies and downloads only new or modified assets, avoiding unnecessary re-downloads.
Overview
The asset tracking system consists of two main components:
- AssetTracker (
asset_tracker.py) - Manages local metadata and identifies new/modified assets - ImageDownloader Integration - Enhanced downloader with asset tracking capabilities
Features
🎯 Smart Asset Detection
- New Assets: Automatically detects assets that haven't been downloaded before
- Modified Assets: Identifies assets that have changed since last download (based on timestamp, size, etc.)
- Unchanged Assets: Efficiently skips assets that are already up-to-date locally
📊 Comprehensive Tracking
- Metadata Storage: Stores asset metadata in JSON format for persistence
- File Integrity: Tracks file sizes, modification times, and content hashes
- Download History: Maintains records of successful and failed downloads
🧹 Maintenance Features
- Cleanup: Removes metadata for files that no longer exist on disk
- Statistics: Provides detailed statistics about tracked assets
- Validation: Ensures consistency between metadata and actual files
Quick Start
Basic Usage with Asset Tracking
# Download only new/modified assets (default behavior)
python3 image_downloader.py \
--api-url "https://api.parentzone.me" \
--list-endpoint "/v1/media/list" \
--download-endpoint "/v1/media" \
--output-dir "./downloaded_images" \
--email "your-email@example.com" \
--password "your-password"
Advanced Options
# Disable asset tracking (download all assets)
python3 image_downloader.py [options] --no-tracking
# Force re-download of all assets
python3 image_downloader.py [options] --force-redownload
# Show asset tracking statistics
python3 image_downloader.py [options] --show-stats
# Clean up metadata for missing files
python3 image_downloader.py [options] --cleanup
Asset Tracker API
Basic Usage
from asset_tracker import AssetTracker
# Initialize tracker
tracker = AssetTracker(storage_dir="downloaded_images")
# Get new assets that need downloading
api_assets = [...] # Assets from API response
new_assets = tracker.get_new_assets(api_assets)
# Mark an asset as downloaded
tracker.mark_asset_downloaded(asset, filepath, success=True)
# Get statistics
stats = tracker.get_stats()
Key Methods
get_new_assets(api_assets: List[Dict]) -> List[Dict]
Identifies new or modified assets that need to be downloaded.
Parameters:
api_assets: List of asset dictionaries from API response
Returns:
- List of assets that need to be downloaded
Example:
# API returns 100 assets, but only 5 are new/modified
api_assets = await fetch_assets_from_api()
new_assets = tracker.get_new_assets(api_assets)
print(f"Need to download {len(new_assets)} out of {len(api_assets)} assets")
mark_asset_downloaded(asset: Dict, filepath: Path, success: bool)
Records that an asset has been downloaded (or attempted).
Parameters:
asset: Asset dictionary from APIfilepath: Local path where asset was savedsuccess: Whether download was successful
cleanup_missing_files()
Removes metadata entries for files that no longer exist on disk.
get_stats() -> Dict
Returns comprehensive statistics about tracked assets.
Returns:
{
'total_tracked_assets': 150,
'successful_downloads': 145,
'failed_downloads': 5,
'existing_files': 140,
'missing_files': 10,
'total_size_bytes': 524288000,
'total_size_mb': 500.0
}
Metadata Storage
File Structure
Asset metadata is stored in {output_dir}/asset_metadata.json:
{
"asset_001": {
"asset_id": "asset_001",
"filename": "family_photo.jpg",
"filepath": "/path/to/downloaded_images/family_photo.jpg",
"download_date": "2024-01-15T10:30:00",
"success": true,
"content_hash": "d41d8cd98f00b204e9800998ecf8427e",
"file_size": 1024000,
"file_modified": "2024-01-15T10:30:00",
"api_data": {
"id": "asset_001",
"name": "family_photo.jpg",
"updated": "2024-01-01T10:00:00Z",
"size": 1024000,
"mimeType": "image/jpeg"
}
}
}
Asset Identification
Assets are identified using the following priority:
idfieldassetIdfielduuidfield- MD5 hash of asset data (fallback)
Change Detection
Assets are considered modified if their content hash changes. The hash is based on:
updatedtimestampmodifiedtimestamplastModifiedtimestampsizefieldchecksumfieldetagfield
Integration with ImageDownloader
Automatic Integration
When asset tracking is enabled (default), the ImageDownloader automatically:
- Initializes Tracker: Creates an
AssetTrackerinstance - Filters Assets: Only downloads new/modified assets
- Records Downloads: Marks successful/failed downloads in metadata
- Provides Feedback: Shows statistics about skipped vs downloaded assets
Example Integration
from image_downloader import ImageDownloader
# Asset tracking enabled by default
downloader = ImageDownloader(
api_url="https://api.parentzone.me",
list_endpoint="/v1/media/list",
download_endpoint="/v1/media",
output_dir="./images",
email="user@example.com",
password="password",
track_assets=True # Default: True
)
# First run: Downloads all assets
await downloader.download_all_assets()
# Second run: Skips unchanged assets, downloads only new/modified ones
await downloader.download_all_assets()
Testing
Unit Tests
# Run comprehensive asset tracking tests
python3 test_asset_tracking.py
# Output shows:
# ✅ Basic tracking test passed!
# ✅ Modified asset detection test passed!
# ✅ Cleanup functionality test passed!
# ✅ Integration test completed!
Live Demo
# Demonstrate asset tracking with real API
python3 demo_asset_tracking.py
# Shows:
# - Authentication process
# - Current asset status
# - First download run (downloads new assets)
# - Second run (skips all assets)
# - Final statistics
Performance Benefits
Network Efficiency
- Reduced API Calls: Only downloads assets that have changed
- Bandwidth Savings: Skips unchanged assets entirely
- Faster Sync: Subsequent runs complete much faster
Storage Efficiency
- No Duplicates: Prevents downloading the same asset multiple times
- Smart Cleanup: Removes metadata for deleted files
- Size Tracking: Monitors total storage usage
Example Performance Impact
First Run: 150 assets → Downloaded 150 (100%)
Second Run: 150 assets → Downloaded 0 (0%) - All up to date!
Third Run: 155 assets → Downloaded 5 (3.2%) - Only new ones
Troubleshooting
Common Issues
"No existing metadata file found"
This is normal for first-time usage. The system will create the metadata file automatically.
"File missing, removing from metadata"
The cleanup process found files that were deleted outside the application. This is normal maintenance.
Asset tracking not working
Ensure AssetTracker is properly imported and asset tracking is enabled:
# Check if tracking is enabled
if downloader.asset_tracker:
print("Asset tracking is enabled")
else:
print("Asset tracking is disabled")
Manual Maintenance
Reset All Tracking
# Remove metadata file to start fresh
rm downloaded_images/asset_metadata.json
Clean Up Missing Files
python3 image_downloader.py --cleanup --output-dir "./downloaded_images"
View Statistics
python3 image_downloader.py --show-stats --output-dir "./downloaded_images"
Configuration
Environment Variables
# Disable asset tracking globally
export DISABLE_ASSET_TRACKING=1
# Set custom metadata filename
export ASSET_METADATA_FILE="my_assets.json"
Programmatic Configuration
# Custom metadata file location
tracker = AssetTracker(
storage_dir="./images",
metadata_file="custom_metadata.json"
)
# Disable tracking for specific downloader
downloader = ImageDownloader(
# ... other params ...
track_assets=False
)
Future Enhancements
Planned Features
- Parallel Metadata Updates: Concurrent metadata operations
- Cloud Sync: Sync metadata across multiple devices
- Asset Versioning: Track multiple versions of the same asset
- Batch Operations: Bulk metadata operations for large datasets
- Web Interface: Browser-based asset management
Extensibility
The asset tracking system is designed to be extensible:
# Custom asset identification
class CustomAssetTracker(AssetTracker):
def _get_asset_key(self, asset):
# Custom logic for asset identification
return f"{asset.get('category')}_{asset.get('id')}"
def _get_asset_hash(self, asset):
# Custom logic for change detection
return super()._get_asset_hash(asset)
API Reference
AssetTracker Class
| Method | Description | Parameters | Returns |
|---|---|---|---|
__init__ |
Initialize tracker | storage_dir, metadata_file |
None |
get_new_assets |
Find new/modified assets | api_assets: List[Dict] |
List[Dict] |
mark_asset_downloaded |
Record download | asset, filepath, success |
None |
is_asset_downloaded |
Check if downloaded | asset: Dict |
bool |
is_asset_modified |
Check if modified | asset: Dict |
bool |
cleanup_missing_files |
Remove stale metadata | None | None |
get_stats |
Get statistics | None | Dict |
print_stats |
Print formatted stats | None | None |
ImageDownloader Integration
| Parameter | Type | Default | Description |
|---|---|---|---|
track_assets |
bool |
True |
Enable asset tracking |
| Method | Description | Parameters |
|---|---|---|
download_all_assets |
Download assets | force_redownload: bool = False |
Command Line Options
| Option | Description |
|---|---|
--no-tracking |
Disable asset tracking |
--force-redownload |
Download all assets regardless of tracking |
--show-stats |
Display asset statistics |
--cleanup |
Clean up missing file metadata |
Contributing
To contribute to the asset tracking system:
- Test Changes: Run
python3 test_asset_tracking.py - Update Documentation: Modify this README as needed
- Follow Patterns: Use existing code patterns and error handling
- Add Tests: Include tests for new functionality
License
This asset tracking system is part of the ParentZone Downloader project.