Tracker Report Fetching
This document explains how tracker reports are fetched from the FindMy API and stored in the database.
Overview
The system includes dedicated scripts for fetching location reports from the FindMy API. These scripts:
- Authenticate with the FindMy API
- Fetch reports for trackers
- Store the reports in the database
- Update materialized views for efficient querying
Report Data Structure
Each location report contains the following key information:
| Field | Type | Description |
|---|---|---|
hashed_adv_key | String | Hashed advertisement key that identifies the tracker |
timestamp | DateTime | When the location was recorded |
location | Coordinates | Geographic coordinates (latitude, longitude) |
confidence | Integer | Confidence level (1-3, where 3 is highest) |
horizontal_accuracy | Float | Accuracy radius in meters |
nearest_city | String | Nearest city (populated by geocoding process) |
Confidence and Accuracy
As of April 2025, the system uses two fields to represent location data quality:
-
Confidence: A scale from 1-3 indicating the overall reliability of the location data
- 1: Low confidence
- 2: Medium confidence
- 3: High confidence
-
Horizontal Accuracy: The accuracy radius in meters, with lower values indicating more precise location data
By default, the system filters location history records to only include medium to high confidence reports (confidence >= 2).
Fetcher Scripts
tracker_report_fetcher.py
The main script for fetching reports for all trackers. It:
- Runs continuously as a service
- Uses a Redis-based queue to manage tracker processing
- Fetches reports for each tracker at regular intervals
- Handles authentication and token refresh
- Stores reports in the database
fetch_specific_tracker_reports.py
A utility script for fetching reports for a specific tracker. It:
- Takes a tracker ID as input
- Fetches reports for that tracker only
- Can be run manually or scheduled
- Useful for debugging or filling gaps in data
test_horizontal_accuracy.py
A utility script for testing the horizontal_accuracy field. It:
- Takes a tracker ID as input
- Fetches reports for that tracker
- Prints detailed information about each report, including the horizontal_accuracy field
- Useful for verifying that the horizontal_accuracy field is being correctly retrieved from the API
Implementation Details
Database Storage
Reports are stored in the location_reports table with the following SQL:
INSERT INTO location_reports
(hashed_adv_key, timestamp, location, confidence, horizontal_accuracy, nearest_city)
VALUES (%s, %s, ST_GeomFromText(%s, 4326), %s, %s, %s)
Queue Management
The tracker report fetcher uses a Redis-based queue system to manage which trackers to process and when. This ensures efficient processing across multiple instances and provides resilience against failures.
Queue Initialization
When the fetcher starts:
- It checks if the queue already has entries
- If the queue is empty, it queries the database for all trackers
- It adds all trackers to the queue with staggered processing times
- Each tracker is assigned metadata including success/failure counts and backoff factors
Queue Verification
To ensure the queue remains populated:
- The system periodically checks if the queue is empty
- If empty, it automatically reinitializes the queue with all trackers
- After database reconnection, it verifies queue status and reinitializes if needed
Error Recovery
The system includes robust error recovery mechanisms:
- Database connection issues are handled with exponential backoff reconnection
- After successful reconnection, the queue is verified and reinitialized if empty
- Failed tracker processing is tracked and retried with increasing backoff intervals
- Stalled work (trackers being processed for too long) is detected and reclaimed
Materialized Views
The system uses TimescaleDB continuous aggregates to create materialized views of location history:
location_history_hourly: Aggregates location reports by hourlocation_history_daily: Aggregates hourly data by daylocation_history: A materialized view combining recent hourly data with older daily data
These views include the confidence and horizontal_accuracy fields, allowing for efficient filtering and querying.
Configuration
The fetcher scripts can be configured through environment variables or command-line arguments:
ANISETTE_SERVER: URL of the Anisette server for authenticationDB_HOST,DB_PORT,DB_NAME,DB_USER,DB_PASSWORD: Database connection detailsFETCH_INTERVAL: Time between fetch operations (in seconds)MAX_KEYS_PER_BATCH: Maximum number of trackers to process in a batchMAX_REPORT_AGE_DAYS: Maximum age of reports to fetch (in days)REQUEST_INTERVAL_HOURS: Hours between requests for each tracker
Error Handling
The scripts include robust error handling:
- Authentication failures are retried with exponential backoff
- Database connection issues are handled with reconnection logic and queue reinitialization
- Queue emptiness is detected and automatically corrected through periodic verification
- API rate limiting is respected with appropriate delays
- Failed tracker processing is retried with exponential backoff (up to a configurable maximum)
- Stalled work detection reclaims trackers from failed or crashed worker processes
- Worker heartbeats ensure reliable detection of failed processes
Deployment
The main fetcher script can be deployed as:
- A systemd service (recommended for production)
- A Docker container (using the provided Dockerfile)
- A standalone Python script (for development or testing)
See the Multi-Container Setup guide for details on containerized deployment.