Skip to content

12. Performance Characteristics

Latency Breakdown (typical)

Total Request Time: ~100-200ms

┌─────────────────────────────────────────────────────┐
│ Total: ~150ms                                       │
├─────────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────┐             │
│ │ Java Gateway: ~20ms                 │             │
│ │ - Parse request: 2ms                │             │
│ │ - Validate data: 3ms                │             │
│ │ - HTTP call: 10ms                   │             │
│ │ - Response building: 5ms            │             │
│ └─────────────────────────────────────┘             │
│ ┌─────────────────────────────────────┐             │
│ │ Network: ~10-30ms                   │             │
│ │ - Request transmission: 5ms         │             │
│ │ - Response transmission: 5ms        │             │
│ │ - Latency: 0-20ms                   │             │
│ └─────────────────────────────────────┘             │
│ ┌─────────────────────────────────────┐             │
│ │ Python Service: ~80-120ms           │             │
│ │ - Request parsing: 5ms              │             │
│ │ - Model prediction: 70-100ms        │             │
│ │ - Response formatting: 5ms          │             │
│ └─────────────────────────────────────┘             │
└─────────────────────────────────────────────────────┘

Throughput (requests per second)

  • Single instance: 10-50 RPS (requests per second)
  • With horizontal scaling: Linear increase with load balancing
  • Bottleneck: Python model inference (70-100ms per prediction)

Resource Usage

Service CPU Memory Disk
Java Gateway 0.1-0.2 cores 512 MB 100 MB
Python Service 0.5-1.0 cores 1-2 GB 200 MB + model size
Model Storage N/A N/A 50-500 MB per model