Tang Sicheng‘s blog: August 2018

Yes, this is just another blog about API rate limiting. There are already tons of articles about this topic all over the Internet [1] [2], so this blog is not about the benefit and algorithm of API rate limiting but some practical problems when we implement API rate limiting in our API Gateway platform.

Problems

Various rate limiting criteria.

Besides the normal "my endpoint can only be accessed X times in Y minutes/hours" situation, we need the rule of API rate limiting to be more flexible and configurable. For example, you can protect your API with a certain threshold, however, if there is an attacker who brutes force this API, this threshold will be easily reached and the “good” users will be blocked
Sometimes we need rate limit API based on client IP and sometimes we need based on user ID or user token.

Be agile when facing incidents.

Incidents and outrage are normal. Instead of trying to eliminate them we should be agile when they happen. We need to change the rate limiting rule in real time, restart the application to load some new value from an external place is not an option.

Non-invasive integration with the service.

We do have an in-house rate limiting SDK, generally, it works like this:
So for every API needs to be protected by rate limiting, you need to duplicate some code and release a new version. As a self-managed platform, we want to provide an easier way for engineers.

Solutions

After rounds of discussion, we came up a configuration like this

It is very straightforward except for the "keys" field. The field is used to generate a key for each request matched by the "pattern", in the example above, the "keys" only contains an "IP", then the for each unique IP, the thresholds are 50 counts in every minute and 500 counts in every hour.

Other details:

We use the RateLimitJ as our rate limiting SDK which provides both Redis and memory based repository.
We extend the Netflix Archaius, we store the configuration as a JSON string in Dynamodb and in the application we use the JSON string as a dynamic object. Every time the Dynamodb has a new value, the dynamic object will refresh automatically.
A new custom Zuul filter is added, when a request is limited, Zuul will return it immediately without routing.

Performance

We did some stress tests about API rate limiting, the environment set is:

Zuul: EC2 c5.2xlarge, with 10GB JVM, G1 garbage collector
Redis: cache.r4.large
Downstream service: a cluster with 10 instance

Jmeter set up: 50 threads, each thread makes 4000 requests.

Here are the results of 3 round tests:

Fig. 1. Without rate limiting

Fig. 2. With rate limiting of 500000 requests/hour

Fig. 3. With rate limiting of 100000 requests/hour

From the results, we can know that:

The Redis based rate limiting will introduce 3 to 5 milliseconds latency, which is also verified by our debug log.
After the API is rate limited, Zuul will return "TOO_MANY_REQUESTS" without routing request to downstream service. In production, this will reduce the average response time because the p50 response time is around 300 milliseconds.

Tang Sicheng‘s blog

Tuesday, August 21, 2018

API rate limiting use Zuul and Redis

Problems

Be agile when facing incidents.

Solutions

Performance

Here are the results of 3 round tests:

About Me

Links

Previous Posts

Archives