Skip to content

13. Scalability Considerations

Horizontal Scaling

graph TB
    LB["Load Balancer<br/>nginx/haproxy"]

    subgraph Gateway["Java Gateway Fleet"]
        JG1["Instance 1"]
        JG2["Instance 2"]
        JG3["Instance N"]
    end

    subgraph Python["Python Service Fleet"]
        PS1["Instance 1"]
        PS2["Instance 2"]
        PS3["Instance N"]
    end

    Cache["Model Cache<br/>Shared Volume"]

    LB -->|Round Robin| JG1
    LB -->|Round Robin| JG2
    LB -->|Round Robin| JG3

    JG1 -->|Load Balanced| PS1
    JG2 -->|Load Balanced| PS2
    JG3 -->|Load Balanced| PS3

    PS1 -->|Read| Cache
    PS2 -->|Read| Cache
    PS3 -->|Read| Cache

    style LB fill:#ffeb3b
    style Cache fill:#e8f5e9

Vertical Scaling

  • Increase container memory (Python: 1GB → 4GB)
  • Increase CPU allocation (Java: 1 core → 4 cores)
  • Use GPU acceleration for Python service (CUDA-enabled Docker)