Fetching Historical Tracker Reports

This guide explains how to use the fetch_all_tracker_reports.py script to retrieve historical location data for all trackers.

When to Use This Script

You should use this script when:

You notice missing tracker data in the system
After a system outage or downtime period
When onboarding new trackers and you want to import their historical data
For data recovery or backup purposes

Prerequisites

Before running the script, ensure you have:

Access to the Apple FindMy API via an anisette server
Database connection credentials configured in environment variables or .env files
The findmy Python package and its dependencies installed

For more detailed information, see the Fetch All Tracker Reports Details guide.

Environment Configuration

The script can use environment variables from three possible sources (in order of precedence):

.env file in the project root
fetcher/.env file
fetcher/.env.example file

The following environment variables are used:

Variable	Description	Default
ANISETTE_SERVER	URL of the Anisette server	(required)
DB_HOST	Database host	localhost
DB_PORT	Database port	5432
DB_NAME	Database name	postgres
DB_USER	Database user	postgres
DB_PASSWORD	Database password	postgres
REDIS_HOST	Redis host	localhost
REDIS_PORT	Redis port	6379
REDIS_PASSWORD	Redis password	(empty)
REDIS_DB	Redis database number	0
MAX_REPORT_AGE_DAYS	Default number of days to fetch	7

Basic Usage

The script is included in the tracker-report-fetcher Docker container. You can run it using docker compose exec:

docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py

This will:

Use environment variables from the Docker container (set from your .env file)
Fetch reports for all trackers for the past 7 days
Process trackers in batches of 10
Use a 5-second delay between batches

Advanced Options

The script supports several command-line options for customization:

Days to Fetch

Control how far back in time to fetch reports:

docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --days 14

This fetches reports for the past 14 days instead of the default 7 days.

Batch Size

Control how many trackers to process in each batch:

docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --batch-size 20

Larger batch sizes process more trackers at once but may increase memory usage and API load.

Batch Delay

Control the delay between processing batches:

docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --batch-delay 10

Longer delays reduce the risk of API rate limiting but increase total processing time.

Combining Options

You can combine multiple options:

docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --days 10 --batch-size 15 --batch-delay 8

Monitoring Progress

The script provides detailed logging to help you monitor progress:

It logs when it starts processing each batch
It logs when it starts processing each tracker
It reports how many reports were found and stored for each tracker
It provides a summary of total reports stored at the end

Example log output:

INFO - Processing batch 1/5
INFO - Processing tracker Tracker1 (ID: 123)
INFO - Found 15 reports for tracker Tracker1
INFO - Stored 10 new reports for tracker Tracker1
INFO - Processing tracker Tracker2 (ID: 124)
INFO - Found 8 reports for tracker Tracker2
INFO - Stored 8 new reports for tracker Tracker2
INFO - Batch complete. Stored 18 reports in this batch.
INFO - Waiting 5 seconds before next batch...

After Running the Script

After the script completes:

The continuous aggregates and materialized view are automatically refreshed:
- The location_history_hourly continuous aggregate is refreshed for the last 48 hours
- The location_history_daily continuous aggregate is refreshed for the last 7 days
- The location_history materialized view is refreshed to include all updated data
New location data will be available in the GraphQL API
The tracker's last_report_received timestamp will be updated

The script uses a comprehensive refresh approach that ensures all data layers are updated:

def refresh_all_views(conn):
    """
    Refresh all continuous aggregates and the materialized view.
    This ensures that all location data is properly aggregated and available for querying.
    """
    try:
        logger.info("Refreshing continuous aggregates and materialized views...")
        with conn.cursor() as cur:
            # Set a longer statement timeout for large tables
            cur.execute("SET statement_timeout = '20min';")

            # Get the current time in database format
            cur.execute("SELECT NOW();")
            current_time = cur.fetchone()[0]

            # Calculate time bounds for refresh
            # Refresh data from the last 48 hours to now for hourly aggregate
            start_time_hourly = current_time - datetime.timedelta(hours=48)

            # Refresh hourly aggregate with time bounds
            logger.info("Refreshing hourly aggregate...")
            cur.execute(
                "CALL refresh_continuous_aggregate('location_history_hourly', %s, %s);",
                (start_time_hourly, current_time)
            )

            # Refresh data from the last 7 days to now for daily aggregate
            start_time_daily = current_time - datetime.timedelta(days=7)

            # Refresh daily aggregate with time bounds
            logger.info("Refreshing daily aggregate...")
            cur.execute(
                "CALL refresh_continuous_aggregate('location_history_daily', %s, %s);",
                (start_time_daily, current_time)
            )

            # Refresh the materialized view
            logger.info("Refreshing location_history materialized view...")
            cur.execute("REFRESH MATERIALIZED VIEW CONCURRENTLY location_history;")

        logger.info("All views refreshed successfully")
        return True
    except Exception as e:
        logger.error(f"Error refreshing views: {str(e)}")
        return False

This approach ensures that:

The continuous aggregates are refreshed with time bounds for efficiency
The materialized view is refreshed to include the updated aggregates
All data is available for querying through the GraphQL API

Troubleshooting

Authentication Issues

If you encounter authentication issues:

Check that the anisette server URL is correct
Verify that the account credentials are valid
Try removing the account.json file to force re-authentication

Missing Data

If you're still missing data after running the script:

Try increasing the --days parameter to fetch older data
Check that the trackers have valid private_key and hashed_advertisement_key values
Verify that the trackers are actually reporting data to Apple's FindMy network

Continuous Aggregate Refresh Issues

If the continuous aggregates aren't being refreshed properly:

Check the logs for any errors during the refresh process

Manually refresh the continuous aggregates:

-- In PostgreSQL
CALL refresh_continuous_aggregate('location_history_hourly', NOW() - INTERVAL '48 hours', NOW());
CALL refresh_continuous_aggregate('location_history_daily', NOW() - INTERVAL '7 days', NOW());
REFRESH MATERIALIZED VIEW CONCURRENTLY location_history;

Check if the TimescaleDB extension is properly installed and configured
Verify that the continuous aggregates exist and are properly defined

Database Connection Issues

If you encounter database connection issues:

Check the DATABASE_URL environment variable
Verify that the database is running and accessible
Check for any firewall or network issues

API Rate Limiting

If you encounter API rate limiting:

Increase the --batch-delay parameter
Decrease the --batch-size parameter
Try running the script during off-peak hours

When to Use This Script​

Prerequisites​

Environment Configuration​

Basic Usage​

Advanced Options​

Days to Fetch​

Batch Size​

Batch Delay​

Combining Options​

Monitoring Progress​

After Running the Script​

Troubleshooting​

Authentication Issues​

Missing Data​

Continuous Aggregate Refresh Issues​

Database Connection Issues​

API Rate Limiting​