Skip to main content

Fetching Historical Tracker Reports

This guide explains how to use the fetch_all_tracker_reports.py script to retrieve historical location data for all trackers.

When to Use This Script

You should use this script when:

  1. You notice missing tracker data in the system
  2. After a system outage or downtime period
  3. When onboarding new trackers and you want to import their historical data
  4. For data recovery or backup purposes

Prerequisites

Before running the script, ensure you have:

  1. Access to the Apple FindMy API via an anisette server
  2. Database connection credentials configured in environment variables or .env files
  3. The findmy Python package and its dependencies installed

For more detailed information, see the Fetch All Tracker Reports Details guide.

Environment Configuration

The script can use environment variables from three possible sources (in order of precedence):

  1. .env file in the project root
  2. fetcher/.env file
  3. fetcher/.env.example file

The following environment variables are used:

VariableDescriptionDefault
ANISETTE_SERVERURL of the Anisette server(required)
DB_HOSTDatabase hostlocalhost
DB_PORTDatabase port5432
DB_NAMEDatabase namepostgres
DB_USERDatabase userpostgres
DB_PASSWORDDatabase passwordpostgres
REDIS_HOSTRedis hostlocalhost
REDIS_PORTRedis port6379
REDIS_PASSWORDRedis password(empty)
REDIS_DBRedis database number0
MAX_REPORT_AGE_DAYSDefault number of days to fetch7

Basic Usage

The script is included in the tracker-report-fetcher Docker container. You can run it using docker compose exec:

docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py

This will:

  1. Use environment variables from the Docker container (set from your .env file)
  2. Fetch reports for all trackers for the past 7 days
  3. Process trackers in batches of 10
  4. Use a 5-second delay between batches

Advanced Options

The script supports several command-line options for customization:

Days to Fetch

Control how far back in time to fetch reports:

docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --days 14

This fetches reports for the past 14 days instead of the default 7 days.

Batch Size

Control how many trackers to process in each batch:

docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --batch-size 20

Larger batch sizes process more trackers at once but may increase memory usage and API load.

Batch Delay

Control the delay between processing batches:

docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --batch-delay 10

Longer delays reduce the risk of API rate limiting but increase total processing time.

Combining Options

You can combine multiple options:

docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --days 10 --batch-size 15 --batch-delay 8

Monitoring Progress

The script provides detailed logging to help you monitor progress:

  1. It logs when it starts processing each batch
  2. It logs when it starts processing each tracker
  3. It reports how many reports were found and stored for each tracker
  4. It provides a summary of total reports stored at the end

Example log output:

INFO - Processing batch 1/5
INFO - Processing tracker Tracker1 (ID: 123)
INFO - Found 15 reports for tracker Tracker1
INFO - Stored 10 new reports for tracker Tracker1
INFO - Processing tracker Tracker2 (ID: 124)
INFO - Found 8 reports for tracker Tracker2
INFO - Stored 8 new reports for tracker Tracker2
INFO - Batch complete. Stored 18 reports in this batch.
INFO - Waiting 5 seconds before next batch...

After Running the Script

After the script completes:

  1. The continuous aggregates and materialized view are automatically refreshed:
    • The location_history_hourly continuous aggregate is refreshed for the last 48 hours
    • The location_history_daily continuous aggregate is refreshed for the last 7 days
    • The location_history materialized view is refreshed to include all updated data
  2. New location data will be available in the GraphQL API
  3. The tracker's last_report_received timestamp will be updated

The script uses a comprehensive refresh approach that ensures all data layers are updated:

def refresh_all_views(conn):
"""
Refresh all continuous aggregates and the materialized view.
This ensures that all location data is properly aggregated and available for querying.
"""
try:
logger.info("Refreshing continuous aggregates and materialized views...")
with conn.cursor() as cur:
# Set a longer statement timeout for large tables
cur.execute("SET statement_timeout = '20min';")

# Get the current time in database format
cur.execute("SELECT NOW();")
current_time = cur.fetchone()[0]

# Calculate time bounds for refresh
# Refresh data from the last 48 hours to now for hourly aggregate
start_time_hourly = current_time - datetime.timedelta(hours=48)

# Refresh hourly aggregate with time bounds
logger.info("Refreshing hourly aggregate...")
cur.execute(
"CALL refresh_continuous_aggregate('location_history_hourly', %s, %s);",
(start_time_hourly, current_time)
)

# Refresh data from the last 7 days to now for daily aggregate
start_time_daily = current_time - datetime.timedelta(days=7)

# Refresh daily aggregate with time bounds
logger.info("Refreshing daily aggregate...")
cur.execute(
"CALL refresh_continuous_aggregate('location_history_daily', %s, %s);",
(start_time_daily, current_time)
)

# Refresh the materialized view
logger.info("Refreshing location_history materialized view...")
cur.execute("REFRESH MATERIALIZED VIEW CONCURRENTLY location_history;")

logger.info("All views refreshed successfully")
return True
except Exception as e:
logger.error(f"Error refreshing views: {str(e)}")
return False

This approach ensures that:

  1. The continuous aggregates are refreshed with time bounds for efficiency
  2. The materialized view is refreshed to include the updated aggregates
  3. All data is available for querying through the GraphQL API

Troubleshooting

Authentication Issues

If you encounter authentication issues:

  1. Check that the anisette server URL is correct
  2. Verify that the account credentials are valid
  3. Try removing the account.json file to force re-authentication

Missing Data

If you're still missing data after running the script:

  1. Try increasing the --days parameter to fetch older data
  2. Check that the trackers have valid private_key and hashed_advertisement_key values
  3. Verify that the trackers are actually reporting data to Apple's FindMy network

Continuous Aggregate Refresh Issues

If the continuous aggregates aren't being refreshed properly:

  1. Check the logs for any errors during the refresh process
  2. Manually refresh the continuous aggregates:
    -- In PostgreSQL
    CALL refresh_continuous_aggregate('location_history_hourly', NOW() - INTERVAL '48 hours', NOW());
    CALL refresh_continuous_aggregate('location_history_daily', NOW() - INTERVAL '7 days', NOW());
    REFRESH MATERIALIZED VIEW CONCURRENTLY location_history;
  3. Check if the TimescaleDB extension is properly installed and configured
  4. Verify that the continuous aggregates exist and are properly defined

Database Connection Issues

If you encounter database connection issues:

  1. Check the DATABASE_URL environment variable
  2. Verify that the database is running and accessible
  3. Check for any firewall or network issues

API Rate Limiting

If you encounter API rate limiting:

  1. Increase the --batch-delay parameter
  2. Decrease the --batch-size parameter
  3. Try running the script during off-peak hours