Tracker Report Fetching

This document explains how tracker reports are fetched from the FindMy API and stored in the database.

Overview

The system includes dedicated scripts for fetching location reports from the FindMy API. These scripts:

Authenticate with the FindMy API
Fetch reports for trackers
Store the reports in the database
Update materialized views for efficient querying

Report Data Structure

Each location report contains the following key information:

Field	Type	Description
`hashed_adv_key`	`String`	Hashed advertisement key that identifies the tracker
`timestamp`	`DateTime`	When the location was recorded
`location`	`Coordinates`	Geographic coordinates (latitude, longitude)
`confidence`	`Integer`	Confidence level (1-3, where 3 is highest)
`horizontal_accuracy`	`Float`	Accuracy radius in meters
`nearest_city`	`String`	Nearest city (populated by geocoding process)

Confidence and Accuracy

As of April 2025, the system uses two fields to represent location data quality:

Confidence: A scale from 1-3 indicating the overall reliability of the location data
- 1: Low confidence
- 2: Medium confidence
- 3: High confidence
Horizontal Accuracy: The accuracy radius in meters, with lower values indicating more precise location data

By default, the system filters location history records to only include medium to high confidence reports (confidence >= 2).

Fetcher Scripts

tracker_report_fetcher.py

The main script for fetching reports for all trackers. It:

Runs continuously as a service
Uses a Redis-based queue to manage tracker processing
Fetches reports for each tracker at regular intervals
Handles authentication and token refresh
Stores reports in the database

fetch_specific_tracker_reports.py

A utility script for fetching reports for a specific tracker. It:

Takes a tracker ID as input
Fetches reports for that tracker only
Can be run manually or scheduled
Useful for debugging or filling gaps in data

test_horizontal_accuracy.py

A utility script for testing the horizontal_accuracy field. It:

Takes a tracker ID as input
Fetches reports for that tracker
Prints detailed information about each report, including the horizontal_accuracy field
Useful for verifying that the horizontal_accuracy field is being correctly retrieved from the API

Implementation Details

Database Storage

Reports are stored in the location_reports table with the following SQL:

INSERT INTO location_reports
(hashed_adv_key, timestamp, location, confidence, horizontal_accuracy, nearest_city)
VALUES (%s, %s, ST_GeomFromText(%s, 4326), %s, %s, %s)

Queue Management

The tracker report fetcher uses a Redis-based queue system to manage which trackers to process and when. This ensures efficient processing across multiple instances and provides resilience against failures.

Queue Initialization

When the fetcher starts:

It checks if the queue already has entries
If the queue is empty, it queries the database for all trackers
It adds all trackers to the queue with staggered processing times
Each tracker is assigned metadata including success/failure counts and backoff factors

Queue Verification

To ensure the queue remains populated:

The system periodically checks if the queue is empty
If empty, it automatically reinitializes the queue with all trackers
After database reconnection, it verifies queue status and reinitializes if needed

Error Recovery

The system includes robust error recovery mechanisms:

Database connection issues are handled with exponential backoff reconnection
After successful reconnection, the queue is verified and reinitialized if empty
Failed tracker processing is tracked and retried with increasing backoff intervals
Stalled work (trackers being processed for too long) is detected and reclaimed

Materialized Views

The system uses TimescaleDB continuous aggregates to create materialized views of location history:

location_history_hourly: Aggregates location reports by hour
location_history_daily: Aggregates hourly data by day
location_history: A materialized view combining recent hourly data with older daily data

These views include the confidence and horizontal_accuracy fields, allowing for efficient filtering and querying.

Configuration

The fetcher scripts can be configured through environment variables or command-line arguments:

ANISETTE_SERVER: URL of the Anisette server for authentication
DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD: Database connection details
FETCH_INTERVAL: Time between fetch operations (in seconds)
MAX_KEYS_PER_BATCH: Maximum number of trackers to process in a batch
MAX_REPORT_AGE_DAYS: Maximum age of reports to fetch (in days)
REQUEST_INTERVAL_HOURS: Hours between requests for each tracker

Error Handling

The scripts include robust error handling:

Authentication failures are retried with exponential backoff
Database connection issues are handled with reconnection logic and queue reinitialization
Queue emptiness is detected and automatically corrected through periodic verification
API rate limiting is respected with appropriate delays
Failed tracker processing is retried with exponential backoff (up to a configurable maximum)
Stalled work detection reclaims trackers from failed or crashed worker processes
Worker heartbeats ensure reliable detection of failed processes

Deployment

The main fetcher script can be deployed as:

A systemd service (recommended for production)
A Docker container (using the provided Dockerfile)
A standalone Python script (for development or testing)

See the Multi-Container Setup guide for details on containerized deployment.

Overview​

Report Data Structure​

Confidence and Accuracy​

Fetcher Scripts​

tracker_report_fetcher.py​

fetch_specific_tracker_reports.py​

test_horizontal_accuracy.py​

Implementation Details​

Database Storage​

Queue Management​

Queue Initialization​

Queue Verification​

Error Recovery​

Materialized Views​

Configuration​

Error Handling​

Deployment​