# Asset Tracking System This document describes the asset tracking system implemented for the ParentZone Downloader, which intelligently identifies and downloads only new or modified assets, avoiding unnecessary re-downloads. ## Overview The asset tracking system consists of two main components: 1. **AssetTracker** (`asset_tracker.py`) - Manages local metadata and identifies new/modified assets 2. **ImageDownloader Integration** - Enhanced downloader with asset tracking capabilities ## Features ### ๐ŸŽฏ Smart Asset Detection - **New Assets**: Automatically detects assets that haven't been downloaded before - **Modified Assets**: Identifies assets that have changed since last download (based on timestamp, size, etc.) - **Unchanged Assets**: Efficiently skips assets that are already up-to-date locally ### ๐Ÿ“Š Comprehensive Tracking - **Metadata Storage**: Stores asset metadata in JSON format for persistence - **File Integrity**: Tracks file sizes, modification times, and content hashes - **Download History**: Maintains records of successful and failed downloads ### ๐Ÿงน Maintenance Features - **Cleanup**: Removes metadata for files that no longer exist on disk - **Statistics**: Provides detailed statistics about tracked assets - **Validation**: Ensures consistency between metadata and actual files ## Quick Start ### Basic Usage with Asset Tracking ```bash # Download only new/modified assets (default behavior) python3 image_downloader.py \ --api-url "https://api.parentzone.me" \ --list-endpoint "/v1/media/list" \ --download-endpoint "/v1/media" \ --output-dir "./downloaded_images" \ --email "your-email@example.com" \ --password "your-password" ``` ### Advanced Options ```bash # Disable asset tracking (download all assets) python3 image_downloader.py [options] --no-tracking # Force re-download of all assets python3 image_downloader.py [options] --force-redownload # Show asset tracking statistics python3 image_downloader.py [options] --show-stats # Clean up metadata for missing files python3 image_downloader.py [options] --cleanup ``` ## Asset Tracker API ### Basic Usage ```python from asset_tracker import AssetTracker # Initialize tracker tracker = AssetTracker(storage_dir="downloaded_images") # Get new assets that need downloading api_assets = [...] # Assets from API response new_assets = tracker.get_new_assets(api_assets) # Mark an asset as downloaded tracker.mark_asset_downloaded(asset, filepath, success=True) # Get statistics stats = tracker.get_stats() ``` ### Key Methods #### `get_new_assets(api_assets: List[Dict]) -> List[Dict]` Identifies new or modified assets that need to be downloaded. **Parameters:** - `api_assets`: List of asset dictionaries from API response **Returns:** - List of assets that need to be downloaded **Example:** ```python # API returns 100 assets, but only 5 are new/modified api_assets = await fetch_assets_from_api() new_assets = tracker.get_new_assets(api_assets) print(f"Need to download {len(new_assets)} out of {len(api_assets)} assets") ``` #### `mark_asset_downloaded(asset: Dict, filepath: Path, success: bool)` Records that an asset has been downloaded (or attempted). **Parameters:** - `asset`: Asset dictionary from API - `filepath`: Local path where asset was saved - `success`: Whether download was successful #### `cleanup_missing_files()` Removes metadata entries for files that no longer exist on disk. #### `get_stats() -> Dict` Returns comprehensive statistics about tracked assets. **Returns:** ```python { 'total_tracked_assets': 150, 'successful_downloads': 145, 'failed_downloads': 5, 'existing_files': 140, 'missing_files': 10, 'total_size_bytes': 524288000, 'total_size_mb': 500.0 } ``` ## Metadata Storage ### File Structure Asset metadata is stored in `{output_dir}/asset_metadata.json`: ```json { "asset_001": { "asset_id": "asset_001", "filename": "family_photo.jpg", "filepath": "/path/to/downloaded_images/family_photo.jpg", "download_date": "2024-01-15T10:30:00", "success": true, "content_hash": "d41d8cd98f00b204e9800998ecf8427e", "file_size": 1024000, "file_modified": "2024-01-15T10:30:00", "api_data": { "id": "asset_001", "name": "family_photo.jpg", "updated": "2024-01-01T10:00:00Z", "size": 1024000, "mimeType": "image/jpeg" } } } ``` ### Asset Identification Assets are identified using the following priority: 1. `id` field 2. `assetId` field 3. `uuid` field 4. MD5 hash of asset data (fallback) ### Change Detection Assets are considered modified if their content hash changes. The hash is based on: - `updated` timestamp - `modified` timestamp - `lastModified` timestamp - `size` field - `checksum` field - `etag` field ## Integration with ImageDownloader ### Automatic Integration When asset tracking is enabled (default), the `ImageDownloader` automatically: 1. **Initializes Tracker**: Creates an `AssetTracker` instance 2. **Filters Assets**: Only downloads new/modified assets 3. **Records Downloads**: Marks successful/failed downloads in metadata 4. **Provides Feedback**: Shows statistics about skipped vs downloaded assets ### Example Integration ```python from image_downloader import ImageDownloader # Asset tracking enabled by default downloader = ImageDownloader( api_url="https://api.parentzone.me", list_endpoint="/v1/media/list", download_endpoint="/v1/media", output_dir="./images", email="user@example.com", password="password", track_assets=True # Default: True ) # First run: Downloads all assets await downloader.download_all_assets() # Second run: Skips unchanged assets, downloads only new/modified ones await downloader.download_all_assets() ``` ## Testing ### Unit Tests ```bash # Run comprehensive asset tracking tests python3 test_asset_tracking.py # Output shows: # โœ… Basic tracking test passed! # โœ… Modified asset detection test passed! # โœ… Cleanup functionality test passed! # โœ… Integration test completed! ``` ### Live Demo ```bash # Demonstrate asset tracking with real API python3 demo_asset_tracking.py # Shows: # - Authentication process # - Current asset status # - First download run (downloads new assets) # - Second run (skips all assets) # - Final statistics ``` ## Performance Benefits ### Network Efficiency - **Reduced API Calls**: Only downloads assets that have changed - **Bandwidth Savings**: Skips unchanged assets entirely - **Faster Sync**: Subsequent runs complete much faster ### Storage Efficiency - **No Duplicates**: Prevents downloading the same asset multiple times - **Smart Cleanup**: Removes metadata for deleted files - **Size Tracking**: Monitors total storage usage ### Example Performance Impact ``` First Run: 150 assets โ†’ Downloaded 150 (100%) Second Run: 150 assets โ†’ Downloaded 0 (0%) - All up to date! Third Run: 155 assets โ†’ Downloaded 5 (3.2%) - Only new ones ``` ## Troubleshooting ### Common Issues #### "No existing metadata file found" This is normal for first-time usage. The system will create the metadata file automatically. #### "File missing, removing from metadata" The cleanup process found files that were deleted outside the application. This is normal maintenance. #### Asset tracking not working Ensure `AssetTracker` is properly imported and asset tracking is enabled: ```python # Check if tracking is enabled if downloader.asset_tracker: print("Asset tracking is enabled") else: print("Asset tracking is disabled") ``` ### Manual Maintenance #### Reset All Tracking ```bash # Remove metadata file to start fresh rm downloaded_images/asset_metadata.json ``` #### Clean Up Missing Files ```bash python3 image_downloader.py --cleanup --output-dir "./downloaded_images" ``` #### View Statistics ```bash python3 image_downloader.py --show-stats --output-dir "./downloaded_images" ``` ## Configuration ### Environment Variables ```bash # Disable asset tracking globally export DISABLE_ASSET_TRACKING=1 # Set custom metadata filename export ASSET_METADATA_FILE="my_assets.json" ``` ### Programmatic Configuration ```python # Custom metadata file location tracker = AssetTracker( storage_dir="./images", metadata_file="custom_metadata.json" ) # Disable tracking for specific downloader downloader = ImageDownloader( # ... other params ... track_assets=False ) ``` ## Future Enhancements ### Planned Features - **Parallel Metadata Updates**: Concurrent metadata operations - **Cloud Sync**: Sync metadata across multiple devices - **Asset Versioning**: Track multiple versions of the same asset - **Batch Operations**: Bulk metadata operations for large datasets - **Web Interface**: Browser-based asset management ### Extensibility The asset tracking system is designed to be extensible: ```python # Custom asset identification class CustomAssetTracker(AssetTracker): def _get_asset_key(self, asset): # Custom logic for asset identification return f"{asset.get('category')}_{asset.get('id')}" def _get_asset_hash(self, asset): # Custom logic for change detection return super()._get_asset_hash(asset) ``` ## API Reference ### AssetTracker Class | Method | Description | Parameters | Returns | |--------|-------------|------------|---------| | `__init__` | Initialize tracker | `storage_dir`, `metadata_file` | None | | `get_new_assets` | Find new/modified assets | `api_assets: List[Dict]` | `List[Dict]` | | `mark_asset_downloaded` | Record download | `asset`, `filepath`, `success` | None | | `is_asset_downloaded` | Check if downloaded | `asset: Dict` | `bool` | | `is_asset_modified` | Check if modified | `asset: Dict` | `bool` | | `cleanup_missing_files` | Remove stale metadata | None | None | | `get_stats` | Get statistics | None | `Dict` | | `print_stats` | Print formatted stats | None | None | ### ImageDownloader Integration | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `track_assets` | `bool` | `True` | Enable asset tracking | | Method | Description | Parameters | |--------|-------------|------------| | `download_all_assets` | Download assets | `force_redownload: bool = False` | ### Command Line Options | Option | Description | |--------|-------------| | `--no-tracking` | Disable asset tracking | | `--force-redownload` | Download all assets regardless of tracking | | `--show-stats` | Display asset statistics | | `--cleanup` | Clean up missing file metadata | ## Contributing To contribute to the asset tracking system: 1. **Test Changes**: Run `python3 test_asset_tracking.py` 2. **Update Documentation**: Modify this README as needed 3. **Follow Patterns**: Use existing code patterns and error handling 4. **Add Tests**: Include tests for new functionality ## License This asset tracking system is part of the ParentZone Downloader project.