Fetching Historical Tracker Reports
This guide explains how to use the fetch_all_tracker_reports.py script to retrieve historical location data for all trackers.
When to Use This Script
You should use this script when:
- You notice missing tracker data in the system
- After a system outage or downtime period
- When onboarding new trackers and you want to import their historical data
- For data recovery or backup purposes
Prerequisites
Before running the script, ensure you have:
- Access to the Apple FindMy API via an anisette server
- Database connection credentials configured in environment variables or .env files
- The
findmyPython package and its dependencies installed
For more detailed information, see the Fetch All Tracker Reports Details guide.
Environment Configuration
The script can use environment variables from three possible sources (in order of precedence):
.envfile in the project rootfetcher/.envfilefetcher/.env.examplefile
The following environment variables are used:
| Variable | Description | Default |
|---|---|---|
| ANISETTE_SERVER | URL of the Anisette server | (required) |
| DB_HOST | Database host | localhost |
| DB_PORT | Database port | 5432 |
| DB_NAME | Database name | postgres |
| DB_USER | Database user | postgres |
| DB_PASSWORD | Database password | postgres |
| REDIS_HOST | Redis host | localhost |
| REDIS_PORT | Redis port | 6379 |
| REDIS_PASSWORD | Redis password | (empty) |
| REDIS_DB | Redis database number | 0 |
| MAX_REPORT_AGE_DAYS | Default number of days to fetch | 7 |
Basic Usage
The script is included in the tracker-report-fetcher Docker container. You can run it using docker compose exec:
docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py
This will:
- Use environment variables from the Docker container (set from your
.envfile) - Fetch reports for all trackers for the past 7 days
- Process trackers in batches of 10
- Use a 5-second delay between batches
Advanced Options
The script supports several command-line options for customization:
Days to Fetch
Control how far back in time to fetch reports:
docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --days 14
This fetches reports for the past 14 days instead of the default 7 days.
Batch Size
Control how many trackers to process in each batch:
docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --batch-size 20
Larger batch sizes process more trackers at once but may increase memory usage and API load.
Batch Delay
Control the delay between processing batches:
docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --batch-delay 10
Longer delays reduce the risk of API rate limiting but increase total processing time.
Combining Options
You can combine multiple options:
docker compose -f fetcher/compose.yml exec tracker-report-fetcher python fetch_all_tracker_reports.py --days 10 --batch-size 15 --batch-delay 8
Monitoring Progress
The script provides detailed logging to help you monitor progress:
- It logs when it starts processing each batch
- It logs when it starts processing each tracker
- It reports how many reports were found and stored for each tracker
- It provides a summary of total reports stored at the end
Example log output:
INFO - Processing batch 1/5
INFO - Processing tracker Tracker1 (ID: 123)
INFO - Found 15 reports for tracker Tracker1
INFO - Stored 10 new reports for tracker Tracker1
INFO - Processing tracker Tracker2 (ID: 124)
INFO - Found 8 reports for tracker Tracker2
INFO - Stored 8 new reports for tracker Tracker2
INFO - Batch complete. Stored 18 reports in this batch.
INFO - Waiting 5 seconds before next batch...
After Running the Script
After the script completes:
- The continuous aggregates and materialized view are automatically refreshed:
- The
location_history_hourlycontinuous aggregate is refreshed for the last 48 hours - The
location_history_dailycontinuous aggregate is refreshed for the last 7 days - The
location_historymaterialized view is refreshed to include all updated data
- The
- New location data will be available in the GraphQL API
- The tracker's last_report_received timestamp will be updated
The script uses a comprehensive refresh approach that ensures all data layers are updated:
def refresh_all_views(conn):
"""
Refresh all continuous aggregates and the materialized view.
This ensures that all location data is properly aggregated and available for querying.
"""
try:
logger.info("Refreshing continuous aggregates and materialized views...")
with conn.cursor() as cur:
# Set a longer statement timeout for large tables
cur.execute("SET statement_timeout = '20min';")
# Get the current time in database format
cur.execute("SELECT NOW();")
current_time = cur.fetchone()[0]
# Calculate time bounds for refresh
# Refresh data from the last 48 hours to now for hourly aggregate
start_time_hourly = current_time - datetime.timedelta(hours=48)
# Refresh hourly aggregate with time bounds
logger.info("Refreshing hourly aggregate...")
cur.execute(
"CALL refresh_continuous_aggregate('location_history_hourly', %s, %s);",
(start_time_hourly, current_time)
)
# Refresh data from the last 7 days to now for daily aggregate
start_time_daily = current_time - datetime.timedelta(days=7)
# Refresh daily aggregate with time bounds
logger.info("Refreshing daily aggregate...")
cur.execute(
"CALL refresh_continuous_aggregate('location_history_daily', %s, %s);",
(start_time_daily, current_time)
)
# Refresh the materialized view
logger.info("Refreshing location_history materialized view...")
cur.execute("REFRESH MATERIALIZED VIEW CONCURRENTLY location_history;")
logger.info("All views refreshed successfully")
return True
except Exception as e:
logger.error(f"Error refreshing views: {str(e)}")
return False
This approach ensures that:
- The continuous aggregates are refreshed with time bounds for efficiency
- The materialized view is refreshed to include the updated aggregates
- All data is available for querying through the GraphQL API
Troubleshooting
Authentication Issues
If you encounter authentication issues:
- Check that the anisette server URL is correct
- Verify that the account credentials are valid
- Try removing the account.json file to force re-authentication
Missing Data
If you're still missing data after running the script:
- Try increasing the --days parameter to fetch older data
- Check that the trackers have valid private_key and hashed_advertisement_key values
- Verify that the trackers are actually reporting data to Apple's FindMy network
Continuous Aggregate Refresh Issues
If the continuous aggregates aren't being refreshed properly:
- Check the logs for any errors during the refresh process
- Manually refresh the continuous aggregates:
-- In PostgreSQL
CALL refresh_continuous_aggregate('location_history_hourly', NOW() - INTERVAL '48 hours', NOW());
CALL refresh_continuous_aggregate('location_history_daily', NOW() - INTERVAL '7 days', NOW());
REFRESH MATERIALIZED VIEW CONCURRENTLY location_history; - Check if the TimescaleDB extension is properly installed and configured
- Verify that the continuous aggregates exist and are properly defined
Database Connection Issues
If you encounter database connection issues:
- Check the DATABASE_URL environment variable
- Verify that the database is running and accessible
- Check for any firewall or network issues
API Rate Limiting
If you encounter API rate limiting:
- Increase the --batch-delay parameter
- Decrease the --batch-size parameter
- Try running the script during off-peak hours