Performance Monitoring Architecture
This document provides a detailed overview of the performance monitoring system's architecture, implementation, and technical design.
Architecture Overview
The performance monitoring system is designed to track and analyze the performance of various components of the Tracker GraphQL API. It follows a modular architecture that integrates with the existing system components.
Core Components
1. Metrics Store
The metrics store is an in-memory data structure that collects and aggregates performance data:
# Global performance metrics storage
_metrics_store = {
"resolvers": defaultdict(list),
"queries": defaultdict(list),
"cache": {
"hits": 0,
"misses": 0,
"hit_times": [],
"miss_times": [],
},
}
This store maintains:
- Resolver execution times
- Database query execution times
- Cache hit/miss statistics
2. Resolver Tracking
Resolver tracking is implemented using a decorator pattern:
def track_resolver(func):
"""
Decorator to track resolver performance.
"""
@wraps(func)
async def wrapper(*args, **kwargs):
if not _config["enabled"]:
return await func(*args, **kwargs)
# Clean old metrics periodically
_clean_old_metrics()
# Start timing
start_time = time.time()
# Call the original function
result = await func(*args, **kwargs)
# Calculate execution time
execution_time = time.time() - start_time
# Store metrics
metric = {
"execution_time": execution_time,
"timestamp": datetime.now(),
"args_count": len(args),
"kwargs_count": len(kwargs),
}
_metrics_store["resolvers"][func.__name__].append(metric)
# Log performance data
if _config["log_all_metrics"] or execution_time >= _config["slow_resolver_threshold"]:
log_level = logging.WARNING if execution_time >= _config["slow_resolver_threshold"] else logging.INFO
logger.log(
log_level,
f"PERFORMANCE: Resolver {func.__name__} took {execution_time:.4f}s"
)
# Attach performance data to result if it's a dict
if _config["include_in_response"] and isinstance(result, dict):
result["_performance"] = {
"executionTime": execution_time,
"resolverName": func.__name__,
}
return result
return wrapper
This decorator:
- Times the execution of resolver functions
- Stores metrics in the metrics store
- Logs performance data
- Optionally includes performance data in the response
3. Database Query Tracking
Database query tracking is implemented using a mixin class that extends SQLAlchemy's session:
class TimedSessionMixin:
"""
Mixin class to add query timing to a SQLAlchemy session.
"""
def execute(self, statement, *args, **kwargs):
"""
Execute a statement and track its execution time.
"""
if not _config["enabled"]:
return super().execute(statement, *args, **kwargs)
# Start timing
start_time = time.time()
# Call the original method
result = super().execute(statement, *args, **kwargs)
# Calculate execution time
execution_time = time.time() - start_time
# Extract query information and store metrics
# ...
return result
This mixin:
- Times the execution of database queries
- Extracts query type and table information
- Stores metrics in the metrics store
- Logs performance data
4. Cache Tracking
Cache tracking is implemented using simple functions that are called during cache operations:
def track_cache_hit(execution_time: float):
"""
Track a cache hit.
"""
if not _config["enabled"]:
return
_metrics_store["cache"]["hits"] += 1
_metrics_store["cache"]["hit_times"].append(execution_time)
if _config["log_all_metrics"]:
logger.info(f"CACHE HIT: took {execution_time:.4f}s")
def track_cache_miss(execution_time: float):
"""
Track a cache miss.
"""
if not _config["enabled"]:
return
_metrics_store["cache"]["misses"] += 1
_metrics_store["cache"]["miss_times"].append(execution_time)
if _config["log_all_metrics"]:
logger.info(f"CACHE MISS: took {execution_time:.4f}s")
These functions:
- Track cache hits and misses
- Record execution times
- Log performance data
5. Metrics Aggregation
Metrics are aggregated using statistical functions:
def get_resolver_metrics():
"""
Get aggregated resolver performance metrics.
"""
_clean_old_metrics()
result = {}
for resolver_name, metrics in _metrics_store["resolvers"].items():
if not metrics:
continue
execution_times = [m["execution_time"] for m in metrics]
result[resolver_name] = {
"count": len(metrics),
"avg_time": mean(execution_times) if execution_times else 0,
"median_time": median(execution_times) if execution_times else 0,
"min_time": min(execution_times) if execution_times else 0,
"max_time": max(execution_times) if execution_times else 0,
"total_time": sum(execution_times) if execution_times else 0,
"last_execution_time": metrics[-1]["execution_time"] if metrics else 0,
"last_execution": metrics[-1]["timestamp"].isoformat() if metrics else None,
}
return result
This function:
- Cleans old metrics based on retention policy
- Calculates statistical measures (mean, median, min, max, etc.)
- Returns aggregated metrics
6. GraphQL Integration
The performance monitoring system is integrated with GraphQL through a dedicated resolver:
@query.field("_performanceMetrics")
def resolve_performance_metrics(*_):
"""Resolver for the _performanceMetrics query field."""
logger.debug("Performance metrics resolver called")
# Get all metrics
metrics = get_all_metrics()
# Convert metrics to GraphQL format
# ...
return {
"resolvers": resolver_metrics,
"queries": query_metrics,
"cache": cache_metrics,
"config": config,
}
This resolver:
- Retrieves all metrics
- Converts them to the format expected by the GraphQL schema
- Returns the metrics as a GraphQL response
Database Integration
The performance monitoring system integrates with the database through a custom session class:
class PerformanceSession(ResilientSession, TimedSessionMixin):
"""
Session class that combines resilient connections with performance monitoring.
"""
pass
SessionLocal = sessionmaker(
autocommit=False, autoflush=False, bind=engine, class_=PerformanceSession
)
This integration:
- Combines resilient connections with performance monitoring
- Tracks all database operations automatically
- Provides detailed metrics on query performance
Configuration System
The performance monitoring system is configurable through a central configuration object:
# Configuration
_config = {
"enabled": True,
"include_in_response": True,
"log_all_metrics": True,
"retention_period": timedelta(hours=1),
"slow_query_threshold": 0.5, # seconds
"slow_resolver_threshold": 1.0, # seconds
}
def configure(**kwargs):
"""Configure performance monitoring settings."""
global _config
for key, value in kwargs.items():
if key in _config and value is not None:
_config[key] = value
This configuration system allows:
- Enabling/disabling performance monitoring
- Configuring logging behavior
- Setting thresholds for slow operations
- Configuring data retention
Data Retention
The performance monitoring system includes a data retention mechanism:
def _clean_old_metrics():
"""Remove metrics older than the retention period."""
if not _config["retention_period"]:
return
cutoff_time = datetime.now() - _config["retention_period"]
# Clean resolver metrics
for resolver_name in list(_metrics_store["resolvers"].keys()):
_metrics_store["resolvers"][resolver_name] = [
m for m in _metrics_store["resolvers"][resolver_name]
if m["timestamp"] >= cutoff_time
]
# Clean query metrics
# ...
# Clean cache metrics
# ...
This mechanism:
- Removes metrics older than the configured retention period
- Prevents memory leaks from accumulating metrics
- Ensures the metrics store remains efficient
Security Considerations
The performance monitoring system includes several security considerations:
- Access Control: The
_performanceMetricsquery should be restricted to administrative users - Data Exposure: Performance data may expose information about the application's internal structure
- Production Configuration: In production, consider disabling
includeInResponseto prevent leaking performance data to regular users
Performance Impact
The performance monitoring system is designed to have minimal impact on the application's performance:
- Conditional Execution: Monitoring can be disabled entirely
- Efficient Storage: In-memory storage with configurable retention
- Minimal Overhead: Simple timing operations with low overhead
- Configurable Logging: Logging can be limited to slow operations only
Integration Points
The performance monitoring system integrates with the application at several points:
- Resolver Execution: Through the
@track_resolverdecorator - Database Operations: Through the
TimedSessionMixinclass - Cache Operations: Through the
track_cache_hitandtrack_cache_missfunctions - GraphQL Schema: Through the
_performanceMetricsquery - Response Generation: Through the optional
_performancefield in responses
Future Enhancements
Potential future enhancements to the performance monitoring system include:
- Persistent Storage: Storing metrics in a database for long-term analysis
- Alerting System: Sending alerts when performance thresholds are exceeded
- Dashboard Integration: Creating a dedicated dashboard for performance monitoring
- Tracing Integration: Integrating with distributed tracing systems
- Automated Optimization: Suggesting optimizations based on performance data