Request Hedging
Explains what request hedging is and how you can configure it.
Request Hedging
Overview
Hedging is one of two configurable retry policies supported by gRPC. With hedging, a gRPC client sends multiple copies of the same request to different backends and uses the first response it receives. Subsequently, the client cancels any outstanding requests and forwards the response to the application.
Hedging is a technique to reduce tail latency in large scale distributed
systems. While naive implementations could add significant load to the backend
servers, it is possible to get most of the latency reduction effects while
increasing load only modestly. For an in-depth discussion on tail latencies, see the seminal article, The Tail
At Scale, by Jeff Dean and Luiz André
Barroso. Hedging is configurable via gRPC Service Config, at a per-method granularity.
The configuration contains the following knobs: When the application makes an RPC call that contains a When a successful response is received (in response to any of the hedged
requests), all outstanding hedged requests are canceled and the response is
returned to the client application layer. If an error response with a non-fatal status code (controlled by the
If all instances of a hedged RPC fail, there are no additional retry attempts.
Essentially, hedging can be seen as retrying the original RPC before a failure
is even received. If server pushback that specifies not to retry is received in response to a
hedged request, no further hedged requests should be issued for the call. gRPC provides a way to throttle hedged RPCs to prevent server overload.
Throttling can be configured via the Service Config as well using the
For each server name, the gRPC client maintains a With hedging, the first request is always sent out, but subsequent hedged
requests are sent only if The only requests that are counted as failures for the throttling policy are the
ones that fail with a status code that qualifies as a non-fatal status code, or
that receive a pushback response indicating not to retry. This avoids conflating
server failure with responses to malformed requests (such as the
Servers may explicitly pushback by setting metadata in their response to the
client. If the pushback says not to retry, no further hedged requests will be
sent. If the pushback says to retry after a given delay, the next hedged request
(if any) will be issued after the given delay has elapsed. Server pushback is specified using the metadata key, Use cases
Configuring hedging in gRPC
"hedgingPolicy": {
"maxAttempts": INTEGER,
"hedgingDelay": JSON proto3 Duration type,
"nonFatalStatusCodes": JSON array of grpc status codes (int or string)
}
maxAttempts
: maximum number of in-flight requests while waiting for a
successful response. This is a mandatory field, and must be specified. If the
specified value is greater than 5
, gRPC uses a value of 5
.hedgingDelay
: amount of time that needs to elapse before the client sends out
the next request while waiting for a successful response. This field is
optional, and if left unspecified, results in maxAttempts
number of requests
all sent out at the same time.nonFatalStatusCodes
: an optional list of grpc status codes. If any of hedged
requests fails with a status code that is not present in this list, all
outstanding requests are canceled and the response is returned to the
application.Hedging policy
hedgingPolicy
configuration in the Service Config, the original RPC is sent immediately, as
with a standard non-hedged call. After hedgingDelay
has elapsed without a
successful response, the second RPC will be issued. If neither RPC has received
a response after hedgingDelay
has elapsed again, a third RPC is sent, and so
on, up to maxAttempts
. gRPC call deadlines apply to the entire chain of hedged
requests. Once the deadline has passed, the operation fails regardless of
in-flight RPCS, and regardless of the hedging configuration.nonFatalStatusCodes
field) is received from a hedged request, then the next
hedged request in line is sent immediately, shortcutting its hedging delay. If
any other status code is received, all outstanding RPCs are canceled and the
error is returned to the client application layer.Throttling Hedged RPCs
RetryThrottlingPolicy
message. The throttling configuration contains the
following:"retryThrottling": {
"maxTokens": 10,
"tokenRatio": 0.1
}
token_count
which is
initially set to max_tokens
. Every outgoing RPC (regardless of service or
method invoked) changes token_count
as follows:token_count
by 1
.token_count
by token_ratio
.token_count
is greater than the threshold (defined
as max_tokens / 2
). If token_count
is less than or equal to the threshold,
hedged requests do not block. Instead they are canceled, and if there are no
other already-sent hedged RPCs the failure is returned to the client
application.INVALID_ARGUMENT
status code).Server Pushback
grpc-retry-pushback-ms
.
The value is an ASCII encoded signed 32-bit integer with no unnecessary leading
zeros that represents how many milliseconds to wait before sending the next
hedged request. If the value for pushback is negative or unparseble, then it
will be seen as the server asking the client not to retry at all.Resources
Language Support
Language Example Java Java example C++ Not yet available Go Not yet supported