RFC: Fine-grained timeout configuration
Status: Implemented
For a summarized list of proposed changes, see the Changes Checklist section.
While it is currently possible for users to implement request timeouts by racing operation send futures against timeout futures, this RFC proposes a more ergonomic solution that would also enable users to set timeouts for things like TLS negotiation and "time to first byte".
Terminology
There's a lot of terminology to define, so I've broken it up into three sections.
General terms
- Smithy Client: A
aws_smithy_client::Client<C, M, R>
struct that is responsible for gluing together the connector, middleware, and retry policy. This is not generated and lives in theaws-smithy-client
crate. - Fluent Client: A code-generated
Client<C, M, R>
that has methods for each service operation on it. A fluent builder is generated alongside it to make construction easier. - AWS Client: A specialized Fluent Client that defaults to using a
DynConnector
,AwsMiddleware
, andStandard
retry policy. - Shared Config: An
aws_types::Config
struct that is responsible for storing shared configuration data that is used across all services. This is not generated and lives in theaws-types
crate. - Service-specific Config: A code-generated
Config
that has methods for setting service-specific configuration. EachConfig
is defined in theconfig
module of its parent service. For example, the S3-specific config struct isuse
able fromaws_sdk_s3::config::Config
and re-exported asaws_sdk_s3::Config
. In this case, "service" refers to an AWS offering like S3.
HTTP stack terms
- Service: A trait defined in the
tower-service
crate. The lowest level of abstraction we deal with when making HTTP requests. Services act directly on data to transform and modify that data. A Service is what eventually turns a request into a response. - Layer: Layers are a higher-order abstraction over services that is used to compose multiple services together, creating a new service from that combination. Nothing prevents us from manually wrapping services within services, but Layers allow us to do it in a flexible and generic manner. Layers don't directly act on data but instead can wrap an existing service with additional functionality, creating a new service. Layers can be thought of as middleware. NOTE: The use of Layers can produce compiler errors that are difficult to interpret and defining a layer requires a large amount of boilerplate code.
- Middleware: a term with several meanings,
- Generically speaking, middleware are similar to Services and Layers in that they modify requests and responses.
- In the SDK, "Middleware" refers to a layer that can be wrapped around a
DispatchService
. In practice, this means that the resultingService
(and the inner service) must meet the boundT: where T: Service<operation::Request, Response=operation::Response, Error=SendOperationError>
.- Note: This doesn't apply to the middlewares we use when generating presigned request because those don't wrap a
DispatchService
.
- Note: This doesn't apply to the middlewares we use when generating presigned request because those don't wrap a
- The most notable example of a Middleware is the AwsMiddleware. Other notable examples include MapRequest, AsyncMapRequest, and ParseResponse.
- DispatchService: The innermost part of a group of nested services. The Service that actually makes an HTTP call on behalf of a request. Responsible for parsing success and error responses.
- Connector: a term with several meanings,
- DynConnectors (a struct that implements DynConnect) are Services with their specific type erased so that we can do dynamic dispatch.
- A term from
hyper
for any object that implements the Connect trait. Really just an alias for tower_service::Service. Sometimes referred to as aConnection
.
- Stage: A form of middleware that's not related to
tower
. These currently function as a way of transforming requests and don't have the ability to transform responses. - Stack: higher order abstraction over Layers defined in the tower crate e.g. Layers wrap services in one another and Stacks wrap layers within one another.
Timeout terms
- Connect Timeout: A limit on the amount of time after making an initial connect attempt on a socket to complete the
connect-handshake.
- TODO: the runtime is based on Hyper which reuses connection and doesn't currently have a way of guaranteeing that a fresh connection will be use for a given request.
- TLS Negotiation Timeout: A limit on the amount of time a TLS handshake takes from when the CLIENT HELLO message is sent to the time the client and server have fully negotiated ciphers and exchanged keys.
- Time to First Byte Timeout: Sometimes referred to as a "read timeout." A limit on the amount of time an application takes to attempt to read the first byte over an established, open connection after write request.
- HTTP Request Timeout For A Single Attempt: A limit on the amount of time it takes for the first byte to be sent over an established, open connection and when the last byte is received from the service.
- HTTP Request Timeout For Multiple Attempts: This timeout acts like the previous timeout but constrains the total time
it takes to make a request plus any retries.
- NOTE: In a way, this is already possible in that users are free to race requests against timer futures with the futures::future::select macro or to use tokio::time::timeout. See relevant discussion in hyper#1097
Configuring timeouts
Just like with Retry Behavior Configuration, these settings can be configured in several places and have the same precedence rules (paraphrased here for clarity).
- Service-specific config builders
- Shared config builders
- Environment variables
- Profile config file (e.g.,
~/.aws/credentials
)
The above list is in order of decreasing precedence e.g. configuration set in an app will override values from environment variables.
Configuration options
The table below details the specific ways each timeout can be configured. In all cases, valid values are non-negative floats representing the number of seconds before a timeout is triggered.
Timeout | Environment Variable | AWS Config Variable | Builder Method |
---|---|---|---|
Connect | AWS_CONNECT_TIMEOUT | connect_timeout | connect_timeout |
TLS Negotiation | AWS_TLS_NEGOTIATION_TIMEOUT | tls_negotiation_timeout | tls_negotiation_timeout |
Time To First Byte | AWS_READ_TIMEOUT | read_timeout | read_timeout |
HTTP Request - single attempt | AWS_API_CALL_ATTEMPT_TIMEOUT | api_call_attempt_timeout | api_call_attempt_timeout |
HTTP Request - all attempts | AWS_API_CALL_TIMEOUT | api_call_timeout | api_call_timeout |
SDK-specific defaults set by AWS service teams
QUESTION: How does the SDK currently handle these defaults?
Prior Art
- hjr3/hyper-timeout is a
Connector
for hyper that enables setting connect, read, and write timeouts - sfackler/tokio-io-timeout provides timeouts for tokio IO operations. Used within
hyper-timeout
. - [tokio::time::sleep_until] creates a
Future
that completes after some time has elapsed. Used withintokio-io-timeout
.
Behind the scenes
Timeouts are achieved by racing a future against a tokio::time::Sleep
future. The question, then, is "how can I create a future that represents a condition I want to watch for?". For example, in the case of a ConnectTimeout
, how do we watch an ongoing request to see if it's completed the connect-handshake? Our current stack of Middleware acts on requests at different levels of granularity. The timeout Middlewares will be no different.
Middlewares for AWS Client requests
View AwsMiddleware in GitHub
#[derive(Debug, Default)]
#[non_exhaustive]
pub struct AwsMiddleware;
impl<S> tower::Layer<S> for AwsMiddleware {
type Service = <AwsMiddlewareStack as tower::Layer<S>>::Service;
fn layer(&self, inner: S) -> Self::Service {
let credential_provider = AsyncMapRequestLayer::for_mapper(CredentialsStage::new());
let signer = MapRequestLayer::for_mapper(SigV4SigningStage::new(SigV4Signer::new()));
let endpoint_resolver = MapRequestLayer::for_mapper(AwsAuthStage);
let user_agent = MapRequestLayer::for_mapper(UserAgentStage::new());
ServiceBuilder::new()
.layer(endpoint_resolver)
.layer(user_agent)
.layer(credential_provider)
.layer(signer)
.service(inner)
}
}
The above code is only included for context. This RFC doesn't define any timeouts specific to AWS so AwsMiddleware
won't require any changes.
Middlewares for Smithy Client requests
View aws_smithy_client::Client::call_raw in GitHub
impl<C, M, R> Client<C, M, R>
where
C: bounds::SmithyConnector,
M: bounds::SmithyMiddleware<C>,
R: retry::NewRequestPolicy,
{
// ...other methods omitted
pub async fn call_raw<O, T, E, Retry>(
&self,
input: Operation<O, Retry>,
) -> Result<SdkSuccess<T>, SdkError<E>>
where
R::Policy: bounds::SmithyRetryPolicy<O, T, E, Retry>,
bounds::Parsed<<M as bounds::SmithyMiddleware<C>>::Service, O, Retry>:
Service<Operation<O, Retry>, Response=SdkSuccess<T>, Error=SdkError<E>> + Clone,
{
let connector = self.connector.clone();
let mut svc = ServiceBuilder::new()
// Create a new request-scoped policy
.retry(self.retry_policy.new_request_policy())
.layer(ParseResponseLayer::<O, Retry>::new())
// These layers can be considered as occurring in order. That is, first invoke the
// customer-provided middleware, then dispatch dispatch over the wire.
.layer(&self.middleware)
.layer(DispatchLayer::new())
.service(connector);
svc.ready().await?.call(input).await
}
}
The Smithy Client creates a new Stack
of services to handle each request it sends. Specifically:
- A method
retry
is used set the retry handler. The configuration for this was set during creation of theClient
. ParseResponseLayer
inserts a service for transforming responses into operation-specific outputs or errors. TheO
generic parameter ofinput
is what decides exactly how the transformation is implemented.- A middleware stack that was included during
Client
creation is inserted into the stack. In the case of the AWS SDK, this would beAwsMiddleware
. DispatchLayer
inserts a service for transforming anhttp::Request
into anoperation::Request
. It's also responsible for re-attaching the property bag from the Operation that triggered the request.- The innermost
Service
is aDynConnector
wrapping ahyper
client (which one depends on the TLS implementation was enabled by cargo features.)
The HTTP Request Timeout For A Single Attempt and HTTP Request Timeout For Multiple Attempts can be implemented at this level. The same Layer
can be used to create both TimeoutService
s. The TimeoutLayer
would require two inputs:
sleep_fn
: A runtime-specific implementation ofsleep
. The SDK is currentlytokio
-based and would default totokio::time::sleep
(this default is set in theaws_smithy_async::rt::sleep
module.)- The duration of the timeout as a
std::time::Duration
The resulting code would look like this:
impl<C, M, R> Client<C, M, R>
where
C: bounds::SmithyConnector,
M: bounds::SmithyMiddleware<C>,
R: retry::NewRequestPolicy,
{
// ...other methods omitted
pub async fn call_raw<O, T, E, Retry>(
&self,
input: Operation<O, Retry>,
) -> Result<SdkSuccess<T>, SdkError<E>>
where
R::Policy: bounds::SmithyRetryPolicy<O, T, E, Retry>,
bounds::Parsed<<M as bounds::SmithyMiddleware<C>>::Service, O, Retry>:
Service<Operation<O, Retry>, Response=SdkSuccess<T>, Error=SdkError<E>> + Clone,
{
let connector = self.connector.clone();
let sleep_fn = aws_smithy_async::rt::sleep::default_async_sleep();
let mut svc = ServiceBuilder::new()
.layer(TimeoutLayer::new(
sleep_fn,
self.timeout_config.api_call_timeout(),
))
// Create a new request-scoped policy
.retry(self.retry_policy.new_request_policy())
.layer(TimeoutLayer::new(
sleep_fn,
self.timeout_config.api_call_attempt_timeout(),
))
.layer(ParseResponseLayer::<O, Retry>::new())
// These layers can be considered as occurring in order. That is, first invoke the
// customer-provided middleware, then dispatch dispatch over the wire.
.layer(&self.middleware)
.layer(DispatchLayer::new())
.service(connector);
svc.ready().await?.call(input).await
}
}
Note: Our HTTP client supports multiple TLS implementations. We'll likely have to implement this feature once per library.
Timeouts will be implemented in the following places:
- HTTP request timeout for multiple requests will be implemented as the outermost Layer in
Client::call_raw
. - HTTP request timeout for a single request will be implemented within
RetryHandler::retry
. - Time to first byte, TLS negotiation, and connect timeouts will be implemented within the central
hyper
connector.
Changes checklist
Changes are broken into to sections:
- HTTP requests (single or multiple) are implementable as layers within our current stack
- Other timeouts will require changes to our dependencies and may be slower to implement
Implementing HTTP request timeouts
-
Add
TimeoutConfig
tosmithy-types
-
Add
TimeoutConfigProvider
toaws-config
- Add provider that fetches config from environment variables
- Add provider that fetches config from profile
-
Add
timeout
method toaws_types::Config
for setting timeout configuration -
Add
timeout
method to generatedConfig
s too -
Create a generic
TimeoutService
and accompanyingLayer
-
TimeoutLayer
should accept asleep
function so that it doesn't have a hard dependency ontokio
-
-
insert a
TimeoutLayer
before theRetryPolicy
to handle timeouts for multiple-attempt requests -
insert a
TimeoutLayer
after theRetryPolicy
to handle timeouts for single-attempt requests -
Add tests for timeout behavior
- test multi-request timeout triggers after 3 slow retries
- test single-request timeout triggers correctly
- test single-request timeout doesn't trigger if request completes in time