Project repository at: https://github.com/CynicDog/Envoy-xDS-server-in-python
Decoding the Envoy Control Plane: An xDS Overview
So there I was, trying to teach Envoy how to think.
The idea is solid: instead of hardcoding your proxy config in static YAML, you spin up a separate control plane that tells Envoy what to do — live, over gRPC.
The xDS API family is extensive, and it can take a bit of time to understand what each component does. Here’s a breakdown of the key members:
- LDS (Listener Discovery Service) – Configures which ports Envoy should listen on and how to process incoming traffic.
- CDS (Cluster Discovery Service) – Defines clusters, representing the upstream services Envoy can connect to.
- RDS (Route Discovery Service) – Specifies routing rules, such as directing requests for
/users
to the user service. - EDS (Endpoint Discovery Service) – Provides the actual IP addresses of healthy instances within a cluster.
And here’s the catch: these configs need to arrive in the right order. You can’t route to a cluster that doesn’t exist. You can’t define an endpoint before the cluster is known. Enter ADS — Aggregated Discovery Service.
Instead of juggling four separate gRPC streams (one per service), ADS gives you a single, unified stream where everything flows through one channel. CDS → EDS → RDS, in perfect order. Envoy connects once, and the control plane sequences the updates.
SotW ADS in Action
Envoy’s docs spell out four xDS transport variants along two axes—State-of-the-World vs. incremental, and per-type vs. aggregated streams:
- State of the World (Basic xDS): SotW, separate gRPC stream for each resource type
- Incremental xDS: incremental, separate gRPC stream for each resource type
- Aggregated Discovery Service (ADS): SotW, aggregate stream for all resource types
- Incremental ADS: incremental, aggregate stream for all resource types
Our Python control plane sits in the Aggregated Discovery Service (ADS) + State-of-the-World corner:
- Single gRPC stream for everything. Envoy calls only
StreamAggregatedResources
on ourAggregatedDiscoveryServiceServicer
, and carries LDS, CDS, RDS, EDS, etc., as logical sub-streams over that one channel. - Full snapshots on every update. Whenever Envoy’s
version_info
is empty, doesn’t match our server version, or it NACKs, we bundle up all of the requested resources of that type and send them together—no delta logic, no piecemeal updates. - Guaranteed ordering. By delivering Listeners → Clusters → Routes → Endpoints on the same wire, we sidestep “I got my routes before my cluster existed” errors.
If you later need “send-only-what-changed” behavior or lazy loading at scale, you can switch to Incremental ADS (DeltaAggregatedResources
). But for a lean, easy-to-debug demo, SotW ADS gives us exactly the right mix of simplicity and correctness.
First Contact: How Envoy Finds Its Control Plane
Before our Python server can work its magic, the Envoy proxy needs to be told one simple, critical thing: where to find the xDS server. This is the sole job of the static envoy.yaml
file. It’s the “bootstrap” configuration that gets the proxy’s lights on and points it in the right direction.
My envoy.yaml
shows this perfectly. It’s mostly static housekeeping, but the key sections bridge the static world with our dynamic one.
- The Control Plane Address: The
dynamic_resources
section is where the magic begins. It tells Envoy that all its configuration—Listeners, Clusters, Routes—will come from a dynamic source via the Aggregated Discovery Service (ads_config
). Critically, it points to a static cluster namedxds_cluster
for this purpose. - A Static Cluster for a Dynamic World: This
xds_cluster
is a bit of an anachronism—a static cluster defined in a file—but it’s essential. Its only purpose is to define a host (xds-server
) and a port (5678
) so Envoy knows how to establish the initial gRPC connection to our Python server.
Think of it this way: our envoy.yaml
isn’t the destination. It’s the ignition key and map to the real destination—our Python xDS server.
The Handshake: Keeping Envoy and the Control Plane in Sync
After Envoy connects to the xDS server, they exchange configuration updates through a handshake using ACKs. Envoy requests configs, the server sends them, and Envoy confirms receipt by sending an ACK with the config version. This tells the server Envoy successfully applied the update.
If something goes wrong, Envoy can send a NACK to ask for a retry. These ACKs ensure both sides stay synchronized, preventing Envoy from running outdated or broken configs.
Rebuilding Envoy’s Request Path in Python
Once connected, our Python server takes over. It doesn’t just respond to xDS requests—it dynamically reassembles Envoy’s full L4–L7 request path using nothing but Protobuf APIs.
At the mesh edge, generate_listener_config()
sets up a Listener on port 15001
, Envoy’s entry point for downstream traffic. The Listener kicks off a filter chain starting with an HttpConnectionManager
, which upgrades raw TCP into structured HTTP requests.
Each request then passes through a minimal set of HTTP filters. The only one we use is envoy.filters.http.router
, which routes requests based on the attached RouteConfiguration
.
In our case, all paths (/
) forward to a single cluster, httpbin_service
, built by generate_cluster_config()
. This cluster is resolved via LOGICAL_DNS
, using the hostname httpbin
. Since all services—Envoy, the xDS server, and the httpbin
backend—run inside the same Docker network, Docker’s internal DNS allows Envoy to resolve httpbin
directly to its container IP.
The full flow—Listener → Filter chain → Routing → Cluster resolution—is served dynamically over gRPC by the XdsServer
, which implements Envoy’s Aggregated Discovery Service (ADS). It supports stateful versioning and content hashing to avoid sending redundant updates.
This compact Python server mirrors a production-ready Envoy setup, but swaps static YAML for dynamic logic—giving you full visibility and control over how Envoy processes requests, step by step.
Conclusion
Getting an xDS server up and running is one thing — but truly understanding the path to that point is a whole different story. This project felt like a microcosm of modern microservice challenges: complex, sometimes maddening, but rewarding.
What really stuck with me is that this isn’t about static files anymore. Static YAML is simple to read, but brittle and slow to change. xDS flips the script: it’s a living, breathing system — a continuous control loop that adapts on the fly.
Our Python server is a mini version of that philosophy. It’s not a static snapshot; it’s an active, intelligent agent streaming updates, ready to evolve as the network changes.
This isn’t just about building a functional server—it’s about adopting a new mindset: treating dynamic networking as code. The future isn’t defined by flawless YAML files, but by systems that communicate, adapt, and reconfigure themselves in real time.