Skip to main content

Performance Monitoring Architecture

This document provides a detailed overview of the performance monitoring system's architecture, implementation, and technical design.

Architecture Overview

The performance monitoring system is designed to track and analyze the performance of various components of the Tracker GraphQL API. It follows a modular architecture that integrates with the existing system components.

Core Components

1. Metrics Store

The metrics store is an in-memory data structure that collects and aggregates performance data:

# Global performance metrics storage
_metrics_store = {
"resolvers": defaultdict(list),
"queries": defaultdict(list),
"cache": {
"hits": 0,
"misses": 0,
"hit_times": [],
"miss_times": [],
},
}

This store maintains:

  • Resolver execution times
  • Database query execution times
  • Cache hit/miss statistics

2. Resolver Tracking

Resolver tracking is implemented using a decorator pattern:

def track_resolver(func):
"""
Decorator to track resolver performance.
"""
@wraps(func)
async def wrapper(*args, **kwargs):
if not _config["enabled"]:
return await func(*args, **kwargs)

# Clean old metrics periodically
_clean_old_metrics()

# Start timing
start_time = time.time()

# Call the original function
result = await func(*args, **kwargs)

# Calculate execution time
execution_time = time.time() - start_time

# Store metrics
metric = {
"execution_time": execution_time,
"timestamp": datetime.now(),
"args_count": len(args),
"kwargs_count": len(kwargs),
}
_metrics_store["resolvers"][func.__name__].append(metric)

# Log performance data
if _config["log_all_metrics"] or execution_time >= _config["slow_resolver_threshold"]:
log_level = logging.WARNING if execution_time >= _config["slow_resolver_threshold"] else logging.INFO
logger.log(
log_level,
f"PERFORMANCE: Resolver {func.__name__} took {execution_time:.4f}s"
)

# Attach performance data to result if it's a dict
if _config["include_in_response"] and isinstance(result, dict):
result["_performance"] = {
"executionTime": execution_time,
"resolverName": func.__name__,
}

return result

return wrapper

This decorator:

  1. Times the execution of resolver functions
  2. Stores metrics in the metrics store
  3. Logs performance data
  4. Optionally includes performance data in the response

3. Database Query Tracking

Database query tracking is implemented using a mixin class that extends SQLAlchemy's session:

class TimedSessionMixin:
"""
Mixin class to add query timing to a SQLAlchemy session.
"""

def execute(self, statement, *args, **kwargs):
"""
Execute a statement and track its execution time.
"""
if not _config["enabled"]:
return super().execute(statement, *args, **kwargs)

# Start timing
start_time = time.time()

# Call the original method
result = super().execute(statement, *args, **kwargs)

# Calculate execution time
execution_time = time.time() - start_time

# Extract query information and store metrics
# ...

return result

This mixin:

  1. Times the execution of database queries
  2. Extracts query type and table information
  3. Stores metrics in the metrics store
  4. Logs performance data

4. Cache Tracking

Cache tracking is implemented using simple functions that are called during cache operations:

def track_cache_hit(execution_time: float):
"""
Track a cache hit.
"""
if not _config["enabled"]:
return

_metrics_store["cache"]["hits"] += 1
_metrics_store["cache"]["hit_times"].append(execution_time)

if _config["log_all_metrics"]:
logger.info(f"CACHE HIT: took {execution_time:.4f}s")


def track_cache_miss(execution_time: float):
"""
Track a cache miss.
"""
if not _config["enabled"]:
return

_metrics_store["cache"]["misses"] += 1
_metrics_store["cache"]["miss_times"].append(execution_time)

if _config["log_all_metrics"]:
logger.info(f"CACHE MISS: took {execution_time:.4f}s")

These functions:

  1. Track cache hits and misses
  2. Record execution times
  3. Log performance data

5. Metrics Aggregation

Metrics are aggregated using statistical functions:

def get_resolver_metrics():
"""
Get aggregated resolver performance metrics.
"""
_clean_old_metrics()

result = {}
for resolver_name, metrics in _metrics_store["resolvers"].items():
if not metrics:
continue

execution_times = [m["execution_time"] for m in metrics]
result[resolver_name] = {
"count": len(metrics),
"avg_time": mean(execution_times) if execution_times else 0,
"median_time": median(execution_times) if execution_times else 0,
"min_time": min(execution_times) if execution_times else 0,
"max_time": max(execution_times) if execution_times else 0,
"total_time": sum(execution_times) if execution_times else 0,
"last_execution_time": metrics[-1]["execution_time"] if metrics else 0,
"last_execution": metrics[-1]["timestamp"].isoformat() if metrics else None,
}

return result

This function:

  1. Cleans old metrics based on retention policy
  2. Calculates statistical measures (mean, median, min, max, etc.)
  3. Returns aggregated metrics

6. GraphQL Integration

The performance monitoring system is integrated with GraphQL through a dedicated resolver:

@query.field("_performanceMetrics")
def resolve_performance_metrics(*_):
"""Resolver for the _performanceMetrics query field."""
logger.debug("Performance metrics resolver called")

# Get all metrics
metrics = get_all_metrics()

# Convert metrics to GraphQL format
# ...

return {
"resolvers": resolver_metrics,
"queries": query_metrics,
"cache": cache_metrics,
"config": config,
}

This resolver:

  1. Retrieves all metrics
  2. Converts them to the format expected by the GraphQL schema
  3. Returns the metrics as a GraphQL response

Database Integration

The performance monitoring system integrates with the database through a custom session class:

class PerformanceSession(ResilientSession, TimedSessionMixin):
"""
Session class that combines resilient connections with performance monitoring.
"""
pass

SessionLocal = sessionmaker(
autocommit=False, autoflush=False, bind=engine, class_=PerformanceSession
)

This integration:

  1. Combines resilient connections with performance monitoring
  2. Tracks all database operations automatically
  3. Provides detailed metrics on query performance

Configuration System

The performance monitoring system is configurable through a central configuration object:

# Configuration
_config = {
"enabled": True,
"include_in_response": True,
"log_all_metrics": True,
"retention_period": timedelta(hours=1),
"slow_query_threshold": 0.5, # seconds
"slow_resolver_threshold": 1.0, # seconds
}

def configure(**kwargs):
"""Configure performance monitoring settings."""
global _config
for key, value in kwargs.items():
if key in _config and value is not None:
_config[key] = value

This configuration system allows:

  1. Enabling/disabling performance monitoring
  2. Configuring logging behavior
  3. Setting thresholds for slow operations
  4. Configuring data retention

Data Retention

The performance monitoring system includes a data retention mechanism:

def _clean_old_metrics():
"""Remove metrics older than the retention period."""
if not _config["retention_period"]:
return

cutoff_time = datetime.now() - _config["retention_period"]

# Clean resolver metrics
for resolver_name in list(_metrics_store["resolvers"].keys()):
_metrics_store["resolvers"][resolver_name] = [
m for m in _metrics_store["resolvers"][resolver_name]
if m["timestamp"] >= cutoff_time
]

# Clean query metrics
# ...

# Clean cache metrics
# ...

This mechanism:

  1. Removes metrics older than the configured retention period
  2. Prevents memory leaks from accumulating metrics
  3. Ensures the metrics store remains efficient

Security Considerations

The performance monitoring system includes several security considerations:

  1. Access Control: The _performanceMetrics query should be restricted to administrative users
  2. Data Exposure: Performance data may expose information about the application's internal structure
  3. Production Configuration: In production, consider disabling includeInResponse to prevent leaking performance data to regular users

Performance Impact

The performance monitoring system is designed to have minimal impact on the application's performance:

  1. Conditional Execution: Monitoring can be disabled entirely
  2. Efficient Storage: In-memory storage with configurable retention
  3. Minimal Overhead: Simple timing operations with low overhead
  4. Configurable Logging: Logging can be limited to slow operations only

Integration Points

The performance monitoring system integrates with the application at several points:

  1. Resolver Execution: Through the @track_resolver decorator
  2. Database Operations: Through the TimedSessionMixin class
  3. Cache Operations: Through the track_cache_hit and track_cache_miss functions
  4. GraphQL Schema: Through the _performanceMetrics query
  5. Response Generation: Through the optional _performance field in responses

Future Enhancements

Potential future enhancements to the performance monitoring system include:

  1. Persistent Storage: Storing metrics in a database for long-term analysis
  2. Alerting System: Sending alerts when performance thresholds are exceeded
  3. Dashboard Integration: Creating a dedicated dashboard for performance monitoring
  4. Tracing Integration: Integrating with distributed tracing systems
  5. Automated Optimization: Suggesting optimizations based on performance data