Performance Monitoring Architecture

This document provides a detailed overview of the performance monitoring system's architecture, implementation, and technical design.

Architecture Overview

The performance monitoring system is designed to track and analyze the performance of various components of the Tracker GraphQL API. It follows a modular architecture that integrates with the existing system components.

Core Components

1. Metrics Store

The metrics store is an in-memory data structure that collects and aggregates performance data:

# Global performance metrics storage
_metrics_store = {
    "resolvers": defaultdict(list),
    "queries": defaultdict(list),
    "cache": {
        "hits": 0,
        "misses": 0,
        "hit_times": [],
        "miss_times": [],
    },
}

This store maintains:

Resolver execution times
Database query execution times
Cache hit/miss statistics

2. Resolver Tracking

Resolver tracking is implemented using a decorator pattern:

def track_resolver(func):
    """
    Decorator to track resolver performance.
    """
    @wraps(func)
    async def wrapper(*args, **kwargs):
        if not _config["enabled"]:
            return await func(*args, **kwargs)

        # Clean old metrics periodically
        _clean_old_metrics()

        # Start timing
        start_time = time.time()

        # Call the original function
        result = await func(*args, **kwargs)

        # Calculate execution time
        execution_time = time.time() - start_time

        # Store metrics
        metric = {
            "execution_time": execution_time,
            "timestamp": datetime.now(),
            "args_count": len(args),
            "kwargs_count": len(kwargs),
        }
        _metrics_store["resolvers"][func.__name__].append(metric)

        # Log performance data
        if _config["log_all_metrics"] or execution_time >= _config["slow_resolver_threshold"]:
            log_level = logging.WARNING if execution_time >= _config["slow_resolver_threshold"] else logging.INFO
            logger.log(
                log_level,
                f"PERFORMANCE: Resolver {func.__name__} took {execution_time:.4f}s"
            )

        # Attach performance data to result if it's a dict
        if _config["include_in_response"] and isinstance(result, dict):
            result["_performance"] = {
                "executionTime": execution_time,
                "resolverName": func.__name__,
            }

        return result

    return wrapper

This decorator:

Times the execution of resolver functions
Stores metrics in the metrics store
Logs performance data
Optionally includes performance data in the response

3. Database Query Tracking

Database query tracking is implemented using a mixin class that extends SQLAlchemy's session:

class TimedSessionMixin:
    """
    Mixin class to add query timing to a SQLAlchemy session.
    """

    def execute(self, statement, *args, **kwargs):
        """
        Execute a statement and track its execution time.
        """
        if not _config["enabled"]:
            return super().execute(statement, *args, **kwargs)

        # Start timing
        start_time = time.time()

        # Call the original method
        result = super().execute(statement, *args, **kwargs)

        # Calculate execution time
        execution_time = time.time() - start_time

        # Extract query information and store metrics
        # ...

        return result

This mixin:

Times the execution of database queries
Extracts query type and table information
Stores metrics in the metrics store
Logs performance data

4. Cache Tracking

Cache tracking is implemented using simple functions that are called during cache operations:

def track_cache_hit(execution_time: float):
    """
    Track a cache hit.
    """
    if not _config["enabled"]:
        return

    _metrics_store["cache"]["hits"] += 1
    _metrics_store["cache"]["hit_times"].append(execution_time)

    if _config["log_all_metrics"]:
        logger.info(f"CACHE HIT: took {execution_time:.4f}s")


def track_cache_miss(execution_time: float):
    """
    Track a cache miss.
    """
    if not _config["enabled"]:
        return

    _metrics_store["cache"]["misses"] += 1
    _metrics_store["cache"]["miss_times"].append(execution_time)

    if _config["log_all_metrics"]:
        logger.info(f"CACHE MISS: took {execution_time:.4f}s")

These functions:

Track cache hits and misses
Record execution times
Log performance data

5. Metrics Aggregation

Metrics are aggregated using statistical functions:

def get_resolver_metrics():
    """
    Get aggregated resolver performance metrics.
    """
    _clean_old_metrics()

    result = {}
    for resolver_name, metrics in _metrics_store["resolvers"].items():
        if not metrics:
            continue

        execution_times = [m["execution_time"] for m in metrics]
        result[resolver_name] = {
            "count": len(metrics),
            "avg_time": mean(execution_times) if execution_times else 0,
            "median_time": median(execution_times) if execution_times else 0,
            "min_time": min(execution_times) if execution_times else 0,
            "max_time": max(execution_times) if execution_times else 0,
            "total_time": sum(execution_times) if execution_times else 0,
            "last_execution_time": metrics[-1]["execution_time"] if metrics else 0,
            "last_execution": metrics[-1]["timestamp"].isoformat() if metrics else None,
        }

    return result

This function:

Cleans old metrics based on retention policy
Calculates statistical measures (mean, median, min, max, etc.)
Returns aggregated metrics

6. GraphQL Integration

The performance monitoring system is integrated with GraphQL through a dedicated resolver:

@query.field("_performanceMetrics")
def resolve_performance_metrics(*_):
    """Resolver for the _performanceMetrics query field."""
    logger.debug("Performance metrics resolver called")

    # Get all metrics
    metrics = get_all_metrics()

    # Convert metrics to GraphQL format
    # ...

    return {
        "resolvers": resolver_metrics,
        "queries": query_metrics,
        "cache": cache_metrics,
        "config": config,
    }

This resolver:

Retrieves all metrics
Converts them to the format expected by the GraphQL schema
Returns the metrics as a GraphQL response

Database Integration

The performance monitoring system integrates with the database through a custom session class:

class PerformanceSession(ResilientSession, TimedSessionMixin):
    """
    Session class that combines resilient connections with performance monitoring.
    """
    pass

SessionLocal = sessionmaker(
    autocommit=False, autoflush=False, bind=engine, class_=PerformanceSession
)

This integration:

Combines resilient connections with performance monitoring
Tracks all database operations automatically
Provides detailed metrics on query performance

Configuration System

The performance monitoring system is configurable through a central configuration object:

# Configuration
_config = {
    "enabled": True,
    "include_in_response": True,
    "log_all_metrics": True,
    "retention_period": timedelta(hours=1),
    "slow_query_threshold": 0.5,  # seconds
    "slow_resolver_threshold": 1.0,  # seconds
}

def configure(**kwargs):
    """Configure performance monitoring settings."""
    global _config
    for key, value in kwargs.items():
        if key in _config and value is not None:
            _config[key] = value

This configuration system allows:

Enabling/disabling performance monitoring
Configuring logging behavior
Setting thresholds for slow operations
Configuring data retention

Data Retention

The performance monitoring system includes a data retention mechanism:

def _clean_old_metrics():
    """Remove metrics older than the retention period."""
    if not _config["retention_period"]:
        return

    cutoff_time = datetime.now() - _config["retention_period"]

    # Clean resolver metrics
    for resolver_name in list(_metrics_store["resolvers"].keys()):
        _metrics_store["resolvers"][resolver_name] = [
            m for m in _metrics_store["resolvers"][resolver_name]
            if m["timestamp"] >= cutoff_time
        ]

    # Clean query metrics
    # ...

    # Clean cache metrics
    # ...

This mechanism:

Removes metrics older than the configured retention period
Prevents memory leaks from accumulating metrics
Ensures the metrics store remains efficient

Security Considerations

The performance monitoring system includes several security considerations:

Access Control: The _performanceMetrics query should be restricted to administrative users
Data Exposure: Performance data may expose information about the application's internal structure
Production Configuration: In production, consider disabling includeInResponse to prevent leaking performance data to regular users

Performance Impact

The performance monitoring system is designed to have minimal impact on the application's performance:

Conditional Execution: Monitoring can be disabled entirely
Efficient Storage: In-memory storage with configurable retention
Minimal Overhead: Simple timing operations with low overhead
Configurable Logging: Logging can be limited to slow operations only

Integration Points

The performance monitoring system integrates with the application at several points:

Resolver Execution: Through the @track_resolver decorator
Database Operations: Through the TimedSessionMixin class
Cache Operations: Through the track_cache_hit and track_cache_miss functions
GraphQL Schema: Through the _performanceMetrics query
Response Generation: Through the optional _performance field in responses

Future Enhancements

Potential future enhancements to the performance monitoring system include:

Persistent Storage: Storing metrics in a database for long-term analysis
Alerting System: Sending alerts when performance thresholds are exceeded
Dashboard Integration: Creating a dedicated dashboard for performance monitoring
Tracing Integration: Integrating with distributed tracing systems
Automated Optimization: Suggesting optimizations based on performance data

Architecture Overview​

Core Components​

1. Metrics Store​

2. Resolver Tracking​

3. Database Query Tracking​

4. Cache Tracking​

5. Metrics Aggregation​

6. GraphQL Integration​

Database Integration​

Configuration System​

Data Retention​

Security Considerations​

Performance Impact​

Integration Points​

Future Enhancements​