Files
parentzone_downloader/docs/WEBSERVER_README.md
Tudor Sitaru d8637ac2ea
All checks were successful
Build Docker Image / build (push) Successful in 1m3s
repo restructure
2025-10-14 21:58:54 +01:00

9.4 KiB

ParentZone Snapshots Web Server

A built-in web server that serves your downloaded snapshot HTML files and their assets through a clean, responsive web interface.

Features

  • 📂 Directory Listing: Browse all your snapshot files with file sizes and modification dates
  • 🖼️ Asset Serving: Properly serves images, CSS, and other assets referenced in HTML files
  • 📱 Responsive Design: Works great on desktop, tablet, and mobile devices
  • 🔒 Security: Path traversal protection and secure file serving
  • 📊 Request Logging: Detailed logging of all web requests
  • Caching: Optimized caching headers for better performance

Quick Start

The web server starts automatically when you run the Docker container:

# Build and start with docker-compose
docker-compose up -d

# Or build and run manually
docker build -t parentzone-downloader .
docker run -d -p 8080:8080 -v ./snapshots:/app/snapshots parentzone-downloader

The web interface will be available at: http://localhost:8080

Running Standalone

You can also run the web server independently:

# Start web server with default settings
python webserver.py

# Custom port and directory
python webserver.py --port 3000 --snapshots-dir ./my-snapshots

# Bind to all interfaces
python webserver.py --host 0.0.0.0 --port 8080

Configuration Options

Command Line Arguments

Argument Default Description
--snapshots-dir ./snapshots Directory containing snapshot files
--port 8080 Port to run the server on
--host 0.0.0.0 Host interface to bind to

Examples

# Serve from custom directory on port 3000
python webserver.py --snapshots-dir /path/to/snapshots --port 3000

# Local access only
python webserver.py --host 127.0.0.1

# Production setup
python webserver.py --host 0.0.0.0 --port 80 --snapshots-dir /var/snapshots

Web Interface

Main Directory Page

  • Clean Layout: Modern, responsive design with file cards
  • File Information: Shows file names, sizes, and last modified dates
  • Sorting: Files are sorted by modification date (newest first)
  • Direct Links: Click any file name to view the snapshot

File Serving

  • HTML Files: Served with proper content types and encoding
  • Assets: Images, CSS, JS, and other assets are served correctly
  • Caching: Efficient browser caching for better performance
  • Security: Path traversal protection prevents unauthorized access

URL Structure

URL Pattern Description Example
/ Main directory listing http://localhost:8080/
/{filename}.html Serve HTML snapshot file http://localhost:8080/snapshots_2024-01-01.html
/assets/{path} Serve asset files http://localhost:8080/assets/images/photo.jpg
/{filename}.{ext} Serve other files http://localhost:8080/snapshots.log

Docker Integration

Environment Variables

The web server respects these environment variables when running in Docker:

  • SNAPSHOTS_DIR: Directory to serve files from (default: /app/snapshots)
  • WEB_PORT: Port for the web server (default: 8080)
  • WEB_HOST: Host interface to bind to (default: 0.0.0.0)

Volume Mounts

Make sure your snapshots directory is properly mounted:

# docker-compose.yml
volumes:
  - ./snapshots:/app/snapshots  # Your local snapshots folder
  - ./logs:/app/logs            # Log files

Port Mapping

The default port 8080 is exposed and mapped in the Docker setup:

# docker-compose.yml
ports:
  - "8080:8080"  # Host:Container

To use a different port:

ports:
  - "3000:8080"  # Access via http://localhost:3000

File Types Supported

HTML Files

  • Snapshot files: Main HTML files with embedded images and styles
  • Content-Type: text/html; charset=utf-8
  • Features: Full HTML rendering with linked assets

Asset Files

  • Images: JPG, PNG, GIF, WebP, SVG, ICO
  • Stylesheets: CSS files
  • Scripts: JavaScript files
  • Data: JSON files
  • Documents: PDF files
  • Logs: TXT and LOG files

Content Type Detection

The server automatically detects content types based on file extensions:

content_types = {
    ".html": "text/html; charset=utf-8",
    ".css": "text/css; charset=utf-8",
    ".js": "application/javascript; charset=utf-8",
    ".jpg": "image/jpeg",
    ".png": "image/png",
    ".pdf": "application/pdf",
    # ... and more
}

Security Features

Path Traversal Protection

The server prevents access to files outside the snapshots directory:

  • /snapshots_2024-01-01.html - Allowed
  • /assets/images/photo.jpg - Allowed
  • /../../../etc/passwd - Blocked
  • /../../config.json - Blocked

Safe File Serving

  • Only serves files from designated directories
  • Validates all file paths before serving
  • Returns proper HTTP error codes for invalid requests
  • Logs suspicious access attempts

Performance Optimization

Caching Headers

The server sets appropriate caching headers:

  • HTML files: Cache-Control: public, max-age=3600 (1 hour)
  • Asset files: Cache-Control: public, max-age=86400 (24 hours)
  • Last-Modified: Proper modification time headers

Connection Handling

  • Built on aiohttp for high-performance async handling
  • Efficient file serving with proper buffer sizes
  • Graceful error handling and recovery

Logging

Request Logging

All requests are logged with details:

2024-01-15 10:30:45 - webserver - INFO - 192.168.1.100 - GET /snapshots_2024-01-01.html - 200 - 0.045s
2024-01-15 10:30:46 - webserver - INFO - 192.168.1.100 - GET /assets/images/photo.jpg - 200 - 0.012s

Error Logging

Errors and security events are logged:

2024-01-15 10:31:00 - webserver - WARNING - Attempted path traversal: ../../../etc/passwd
2024-01-15 10:31:05 - webserver - ERROR - Error serving file unknown.html: File not found

Log Location

  • Docker: Logs to /app/logs/startup.log and container stdout
  • Standalone: Logs to console and any configured log files

Troubleshooting

Common Issues

Port Already in Use

# Error: Address already in use
# Solution: Use a different port
python webserver.py --port 8081

Permission Denied

# Error: Permission denied (port 80)
# Solution: Use sudo or higher port number
sudo python webserver.py --port 80
# Or
python webserver.py --port 8080

No Files Visible

  • Check that snapshots directory exists and contains HTML files
  • Verify directory permissions are readable
  • Check docker volume mounts are correct

Assets Not Loading

  • Ensure assets directory exists within snapshots folder
  • Check that asset files are properly referenced in HTML
  • Verify file permissions on asset files

AttributeError: 'Application' object has no attribute 'remote'

This error occurs with older versions of aiohttp. The web server has been updated to use the correct request attributes:

  • Uses request.transport.get_extra_info("peername") for client IP
  • Handles cases where transport is not available
  • Falls back to "unknown" for client identification

Debug Mode

For more verbose logging, modify the logging level:

# In webserver.py
logging.basicConfig(level=logging.DEBUG)

Health Check

Test if the server is running:

# Check if server responds
curl http://localhost:8080/

# Check specific file
curl -I http://localhost:8080/snapshots_2024-01-01.html

Development

Adding New Features

The web server is designed to be easily extensible:

# Add new route
async def custom_handler(request):
    return web.Response(text="Custom response")

# Register route
app.router.add_get("/custom", custom_handler)

Custom Styling

You can customize the directory listing appearance by modifying the CSS in _generate_index_html().

API Endpoints

Consider adding REST API endpoints for programmatic access:

# Example: JSON API for file listing
async def api_files(request):
    files = get_file_list()  # Your logic here
    return web.json_response(files)

app.router.add_get("/api/files", api_files)

Production Deployment

Reverse Proxy Setup

For production, consider using nginx as a reverse proxy:

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://localhost:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

SSL/HTTPS

Add SSL termination at the reverse proxy level:

server {
    listen 443 ssl;
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    location / {
        proxy_pass http://localhost:8080;
    }
}

Process Management

Use systemd or supervisor to manage the web server process:

# /etc/systemd/system/parentzone-webserver.service
[Unit]
Description=ParentZone Web Server
After=network.target

[Service]
Type=simple
User=parentzone
WorkingDirectory=/opt/parentzone
ExecStart=/usr/bin/python3 webserver.py
Restart=always

[Install]
WantedBy=multi-user.target

Contributing

The web server is part of the ParentZone Downloader project. To contribute:

  1. Fork the repository
  2. Make your changes to webserver.py
  3. Test thoroughly
  4. Submit a pull request

License

This web server is part of the ParentZone Downloader project and follows the same license terms.