Design Overview

The AWS Rust SDK aims to provide an official, high quality & complete interface to AWS services. We plan to eventually use the CRT to provide signing & credential management. The Rust SDK will provide first-class support for the CRT as well as Tokio & Hyper. The Rust SDK empowers advanced customers to bring their own HTTP/IO implementations.

Our design choices are guided by our Tenets.

Acknowledgments

The design builds on the learnings, ideas, hard work, and GitHub issues of the 142 Rusoto contributors & thousands of users who built this first and learned the hard way.

External API Overview

The Rust SDK is "modular" meaning that each AWS service is its own crate. Each crate provides two layers to access the service:

  1. The "fluent" API. For most use cases, a high level API that ties together connection management and serialization will be the quickest path to success.
#[tokio::main]
async fn main() {
    let client = dynamodb::Client::from_env();
    let tables = client
        .list_tables()
        .limit(10)
        .send()
        .await.expect("failed to load tables");
}
  1. The "low-level" API: It is also possible for customers to assemble the pieces themselves. This offers more control over operation construction & dispatch semantics:
#[tokio::main]
async fn main() {
    let conf = dynamodb::Config::builder().build();
    let conn = aws_hyper::Client::https();
    let operation = dynamodb::ListTables::builder()
        .limit(10)
        .build(&conf)
        .expect("invalid operation");
    let tables = conn.call(operation).await.expect("failed to list tables");
}

The Fluent API is implemented as a thin wrapper around the core API to improve ergonomics.

Internals

The Rust SDK is built on Tower Middleware, Tokio & Hyper. We're continuing to iterate on the internals to enable running the AWS SDK in other executors & HTTP stacks. As an example, you can see a demo of adding reqwest as a custom HTTP stack to gain access to its HTTP Proxy support!

For more details about the SDK internals see Operation Design

Code Generation

The Rust SDK is code generated from Smithy models, using Smithy codegeneration utilities. The Code generation is written in Kotlin. More details can be found in the Smithy section.

Rust SDK Design Tenets

Unless you know better ones! These are our tenets today, but we'd love your thoughts. Do you wish we had different priorities? Let us know by opening and issue or starting a discussion.

  1. Batteries included, but replaceable. The AWS SDK for Rust should provide a best-in-class experience for many use cases, but, customers will use the SDK in unique and unexpected ways. Meet customers where they are; strive to be compatible with their tools. Provide mechanisms to allow customers make different choices.
  2. Make common problems easy to solve. The AWS SDK for Rust should make common problems solvable. Guide customers to patterns that set them up for long-term success.
  3. Design for the Future. The AWS SDK for Rust should evolve with AWS without breaking existing customers. APIs will evolve in unpredictable directions, new protocols will gain adoption, and new services will be created that we never could have imagined. Don’t simplify or unify code today that prevents evolution tomorrow.

Details, Justifications, and Ramifications

Batteries included, but replaceable.

Some customers will use the Rust SDK as their first experience with async Rust, potentially any Rust. They may not be familiar with Tokio or the concept of an async executor. We are not afraid to have an opinion about the best solution for most customers.

Other customers will come to the SDK with specific requirements. Perhaps they're integrating the SDK into a much larger project that uses async_std. Maybe they need to set custom headers, modify the user agent, or audit every request. They should be able to use the Rust SDK without forking it to meet their needs.

Make common problems easy to solve

If solving a common problem isn’t obvious from the API, it should be obvious from the documentation. The SDK should guide users towards the best solutions for common tasks, first with well named methods, second with documentation, and third with real -world usage examples. Provide misuse resistant APIs. Async Rust has the potential to introduce subtle bugs; the Rust SDK should help customers avoid them.

Design for the Future

APIs evolve in unpredictable ways, and it's crucial that the SDK can evolve without breaking existing customers. This means designing the SDK so that fundamental changes to the internals can be made without altering the external interface we surface to customers:

  • Keeping the shared core as small & opaque as possible.
  • Don’t leak our internal dependencies to customers
  • With every design choice, consider, "Can I reverse this choice in the future?"

This may not result in DRY code, and that’s OK! Code that is auto generated has different goals and tradeoffs than code that has been written by hand.

Design FAQ

What is Smithy?

Smithy is the interface design language used by AWS services. smithy-rs allows users to generate a Rust client for any Smithy based service (pending protocol support), including those outside of AWS.

Why is there one crate per service?

  1. Compilation time: Although it's possible to use cargo features to conditionally compile individual services, we decided that this added significant complexity to the generated code. In Rust the "unit of compilation" is a Crate, so by using smaller crates we can get better compilation parallelism. Furthermore, ecosystem services like docs.rs have an upper limit on the maximum amount of time required to build an individual crate—if we packaged the entire SDK as a single crate, we would quickly exceed this limit.

  2. Versioning: It is expected that over time we may major-version-bump individual services. New updates will be pushed for some AWS service nearly every day. Maintaining separate crates allows us to only increment versions for the relevant pieces that change. See Independent Crate Versioning for more info.

Why don't the SDK service crates implement serde::Serialize or serde::Deserialize for any types?

  1. Compilation time: serde makes heavy use of several crates (proc-macro2, quote, and syn) that are very expensive to compile. Several service crates are already quite large and adding a serde dependency would increase compile times beyond what we consider acceptable. When we last checked, adding serde derives made compilation 23% slower.

  2. Misleading results: We can't use serde for serializing requests to AWS or deserializing responses from AWS because both sides of that process would require too much customization. Adding serialize/deserialize impls for operations has the potential to confuse users when they find it doesn't actually capture all the necessary information (like headers and trailers) sent in a request or received in a response.

In the future, we may add serde support behind a feature gate. However, we would only support this for operation Input and Output structs with the aim of making SDK-related tests easier to set up and run.

I want to add new request building behavior. Should I add that functionality to the make_operation codegen or write a request-altering middleware?

The main question to ask yourself in this case is "is this new behavior relevant to all services or is it only relevant to some services?"

  • If the behavior is relevant to all services: Behavior like this should be defined as a middleware. Behavior like this is often AWS-specific and may not be relevant to non-AWS smithy clients. Middlewares are defined outside of codegen. One example of behavior that should be defined as a middleware is request signing because all requests to AWS services must be signed.
  • If the behavior is only relevant to some services/depends on service model specifics: Behavior like this should be defined within make_operation. Avoid defining AWS-specific behavior within make_operation. One example of behavior that should be defined in make_operation is checksum validation because only some AWS services have APIs that support checksum validation.

"Wait a second" I hear you say, "checksum validation is part of the AWS smithy spec, not the core smithy spec. Why is that behavior defined in make_operation?" The answer is that that feature only applies to some operations and we don't want to codegen a middleware that only supports a subset of operations for a service.

Smithy

The Rust SDK uses Smithy models and code generation tooling to generate an SDK. Smithy is an open source IDL (interface design language) developed by Amazon. Although the Rust SDK uses Smithy models for AWS services, smithy-rs and Smithy models in general are not AWS specific.

Design documentation here covers both our implementation of Smithy Primitives (e.g. simple shape) as well as more complex Smithy traits like Endpoint.

Internals

Smithy introduces a few concepts that are defined here:

  1. Shape: The core Smithy primitive. A smithy model is composed of nested shapes defining an API.

  2. Symbol: A Representation of a type including namespaces and any dependencies required to use a type. A shape can be converted into a symbol by a SymbolVisitor. A SymbolVisitor maps shapes to types in your programming language (e.g. Rust). In the Rust SDK, see SymbolVisitor.kt. Symbol visitors are composable—many specific behaviors are mixed in via small & focused symbol providers, e.g. support for the streaming trait is mixed in separately.

  3. Writer: Writers are code generation primitives that collect code prior to being written to a file. Writers enable language specific helpers to be added to simplify codegen for a given language. For example, smithy-rs adds rustBlock to RustWriter to create a "Rust block" of code.

    writer.rustBlock("struct Model") {
        model.fields.forEach {
            write("${field.name}: #T", field.symbol)
        }
    }
    

    This would produce something like:

    #![allow(unused)]
    fn main() {
    struct Model {
       field1: u32,
       field2: String
    }
    }
  4. Generators: A Generator, e.g. StructureGenerator, UnionGenerator generates more complex Rust code from a Smithy model. Protocol generators pull these individual tools together to generate code for an entire service / protocol.

A developer's view of code generation can be found in this document.

Simple Shapes

Smithy Type (links to design discussions)Rust Type (links to Rust documentation)
blobVec<u8>
booleanbool
stringString
bytei8
shorti16
integeri32
longi64
floatf32
doublef64
bigIntegerBigInteger (Not implemented yet)
bigDecimalBigDecimal (Not implemented yet)
timestampDateTime
documentDocument

Big Numbers

Rust currently has no standard library or universally accepted large-number crate. Until one is stabilized, a string representation is a reasonable compromise:

#![allow(unused)]
fn main() {
pub struct BigInteger(String);
pub struct BigDecimal(String);
}

This will enable us to add helpers over time as requested. Users will also be able to define their own conversions into their preferred large-number libraries.

As of 5/23/2021 BigInteger / BigDecimal are not included in AWS models. Implementation is tracked here.

Timestamps

chrono is the current de facto library for datetime in Rust, but it is pre-1.0. DateTimes are represented by an SDK defined structure modeled on std::time::Duration from the Rust standard library.

#![allow(unused)]

fn main() {
/// DateTime in time.
///
/// DateTime in time represented as seconds and sub-second nanos since
/// the Unix epoch (January 1, 1970 at midnight UTC/GMT).
///
/// This type can be converted to/from the standard library's [`SystemTime`]:
/// ```rust
/// # fn doc_fn() -> Result<(), aws_smithy_types::date_time::ConversionError> {
/// # use aws_smithy_types::date_time::DateTime;
/// # use std::time::SystemTime;
/// use std::convert::TryFrom;
///
/// let the_millennium_as_system_time = SystemTime::try_from(DateTime::from_secs(946_713_600))?;
/// let now_as_date_time = DateTime::from(SystemTime::now());
/// # Ok(())
/// # }
/// ```
///
/// The [`aws-smithy-types-convert`](https://crates.io/crates/aws-smithy-types-convert) crate
/// can be used for conversions to/from other libraries, such as
/// [`time`](https://crates.io/crates/time) or [`chrono`](https://crates.io/crates/chrono).
#[derive(PartialEq, Eq, Hash, Clone, Copy)]
pub struct DateTime {
    pub(crate) seconds: i64,
    /// Subsecond nanos always advances the wallclock time, even for times where seconds is negative
    ///
    /// Bigger subsecond nanos => later time
    pub(crate) subsecond_nanos: u32,
}

}

Functions in the aws-smithy-types-convert crate provide conversions to other crates, such as time or chrono.

Strings

Rust has two different String representations:

  • String, an owned, heap allocated string.
  • &str, a reference to a string, owned elsewhere.

In ideal world, input shapes, where there is no reason for the strings to be owned would use &'a str. Outputs would likely use String. However, Smithy does not provide a distinction between input and output shapes.

A third compromise could be storing Arc<String>, an atomic reference counted pointer to a String. This may be ideal for certain advanced users, but is likely to confuse most users and produces worse ergonomics. This is an open design area where we will seek user feedback. Rusoto uses String and there has been one feature request to date to change that.

Current models represent strings as String.

Document Types

Smithy defines the concept of "Document Types":

[Documents represent] protocol-agnostic open content that is accessed like JSON data. Open content is useful for modeling unstructured data that has no schema, data that can't be modeled using rigid types, or data that has a schema that evolves outside of the purview of a model. The serialization format of a document is an implementation detail of a protocol and MUST NOT have any effect on the types exposed by tooling to represent a document value.

Individual protocols define their own document serialization behavior, with some protocols such as AWS and EC2 Query not supporting document types.

Recursive Shapes

Note: Throughout this document, the word "box" always refers to a Rust Box<T>, a heap allocated pointer to T, and not the Smithy concept of boxed vs. unboxed.

Recursive shapes pose a problem for Rust, because the following Rust code will not compile:

#![allow(unused)]
fn main() {
struct TopStructure {
    intermediate: IntermediateStructure
}

struct IntermediateStructure {
    top: Option<TopStructure>
}
}
  |
3 | struct TopStructure {
  | ^^^^^^^^^^^^^^^^^^^ recursive type has infinite size
4 |     intermediate: IntermediateStructure
  |     ----------------------------------- recursive without indirection
  |
  = help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `main::TopStructure` representable

This occurs because Rust types must be a size known at compile time. The way around this, as the message suggests, is to Box the offending type. smithy-rs implements this design in RecursiveShapeBoxer.kt

To support this, as the message suggests, we must "Box" the offending type. There is a touch of trickiness—only one element in the cycle needs to be boxed, but we need to select it deterministically such that we always pick the same element between multiple codegen runs. To do this the Rust SDK will:

  1. Topologically sort the graph of shapes.
  2. Identify cycles that do not pass through an existing Box, List, Set, or Map
  3. For each cycle, select the earliest shape alphabetically & mark it as Box in the Smithy model by attaching the custom RustBoxTrait to the member.
  4. Go back to step 1.

This would produce valid Rust:

#![allow(unused)]
fn main() {
struct TopStructure {
    intermediate: IntermediateStructure
}

struct IntermediateStructure {
    top: Box<Option<TopStructure>>
}
}

Backwards Compatibility Note!

Box is not generally compatible with T in Rust. There are several unlikely but valid model changes that will cause the SDK to produce code that may break customers. If these are problematic, all are avoidable with customizations.

  1. A recursive link is added to an existing structure. This causes a member that was not boxed before to become Box.

    Workaround: Mark the new member as Box in a customization.

  2. A field is removed from a structure that removes the recursive dependency. The SDK would generate T instead of Box.

    Workaround: Mark the member that used to be boxed as Box in a customization. The Box will be unnecessary, but we will keep it for backwards compatibility.

Aggregate Shapes

Smithy TypeRust Type
ListVec<Member>
SetVec<Member>
MapHashMap<String, Value>
Structurestruct
Unionenum

Most generated types are controlled by SymbolVisitor.

List

List objects in Smithy are transformed into vectors in Rust. Based on the output of the NullableIndex, the generated list may be Vec<T> or Vec<Option<T>>.

Set

Because floats are not Hashable in Rust, for simplicity smithy-rs translates all sets to into Vec<T> instead of HashSet<T>. In the future, a breaking change may be made to introduce a library-provided wrapper type for Sets.

Map

Because key MUST be a string in Smithy maps, we avoid the hashibility issue encountered with Set. There are optimizations that could be considered (e.g. since these maps will probably never be modified), however, pending customer feedback, Smithy Maps become HashMap<String, V> in Rust.

Structure

See StructureGenerator.kt for more details

Smithy structure becomes a struct in Rust. Backwards compatibility & usability concerns lead to a few design choices:

  1. As specified by NullableIndex, fields are Option<T> when Smithy models them as nullable.
  2. All structs are marked #[non_exhaustive]
  3. All structs derive Debug & PartialEq. Structs do not derive Eq because a float member may be added in the future.
  4. Struct fields are public. Public struct fields allow for split borrows. When working with output objects this significantly improves ergonomics, especially with optional fields.
    let out = dynamo::ListTablesOutput::new();
    out.some_field.unwrap(); // <- partial move, impossible with an accessor
  5. Builders are generated for structs that provide ergonomic and backwards compatible constructors. A builder for a struct is always available via the convenience method SomeStruct::builder()
  6. Structures manually implement debug: In order to support the sensitive trait, a Debug implementation for structures is manually generated.

Example Structure Output

Smithy Input:

@documentation("<p>Contains I/O usage metrics...")
structure IOUsage {
    @documentation("... elided")
    ReadIOs: ReadIOs,
    @documentation("... elided")
    WriteIOs: WriteIOs
}

long ReadIOs

long WriteIOs

Rust Output:

/// <p>Contains I/O usage metrics for a command that was invoked.</p>
#[non_exhaustive]
#[derive(std::clone::Clone, std::cmp::PartialEq)]
pub struct IoUsage {
    /// <p>The number of read I/O requests that the command made.</p>
    pub read_i_os: i64,
    /// <p>The number of write I/O requests that the command made.</p>
    pub write_i_os: i64,
}
impl std::fmt::Debug for IoUsage {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        let mut formatter = f.debug_struct("IoUsage");
        formatter.field("read_i_os", &self.read_i_os);
        formatter.field("write_i_os", &self.write_i_os);
        formatter.finish()
    }
}
/// See [`IoUsage`](crate::model::IoUsage)
pub mod io_usage {
    /// A builder for [`IoUsage`](crate::model::IoUsage)
    #[non_exhaustive]
    #[derive(Debug, Clone, Default)]
    pub struct Builder {
        read_i_os: std::option::Option<i64>,
        write_i_os: std::option::Option<i64>,
    }
    impl Builder {
        /// <p>The number of read I/O requests that the command made.</p>
        pub fn read_i_os(mut self, inp: i64) -> Self {
            self.read_i_os = Some(inp);
            self
        }
         /// <p>The number of read I/O requests that the command made.</p>
        pub fn set_read_i_os(mut self, inp: Option<i64>) -> Self {
            self.read_i_os = inp;
            self
        }
        /// <p>The number of write I/O requests that the command made.</p>
        pub fn write_i_os(mut self, inp: i64) -> Self {
            self.write_i_os = Some(inp);
            self
        }
        /// <p>The number of write I/O requests that the command made.</p>
        pub fn set_write_i_os(mut self, inp: Option<i64>) -> Self {
            self.write_i_os = inp;
            self
        }
        /// Consumes the builder and constructs a [`IoUsage`](crate::model::IoUsage)
        pub fn build(self) -> crate::model::IoUsage {
            crate::model::IoUsage {
                read_i_os: self.read_i_os.unwrap_or_default(),
                write_i_os: self.write_i_os.unwrap_or_default(),
            }
        }
    }
}
impl IoUsage {
    /// Creates a new builder-style object to manufacture [`IoUsage`](crate::model::IoUsage)
    pub fn builder() -> crate::model::io_usage::Builder {
        crate::model::io_usage::Builder::default()
    }
}

Union

Smithy Union is modeled as enum in Rust.

  1. Generated enums must be marked #[non_exhaustive].
  2. Generated enums must provide an Unknown variant. If parsing receives an unknown input that doesn't match any of the given union variants, Unknown should be constructed. Tracking Issue.
  3. Union members (enum variants) are not nullable, because Smithy union members cannot contain null values.
  4. When union members contain references to other shapes, we generate a wrapping variant (see below).
  5. Union members do not require #[non_exhaustive], because changing the shape targeted by a union member is not backwards compatible.
  6. is_variant and as_variant helper functions are generated to improve ergonomics.

Generated Union Example

The union generated for a simplified dynamodb::AttributeValue Smithy:

namespace test

union AttributeValue {
    @documentation("A string value")
    string: String,
    bool: Boolean,
    bools: BoolList,
    map: ValueMap
}

map ValueMap {
    key: String,
    value: AttributeValue
}

list BoolList {
    member: Boolean
}

Rust:

#[non_exhaustive]
#[derive(std::clone::Clone, std::cmp::PartialEq, std::fmt::Debug)]
pub enum AttributeValue {
    /// a string value
    String(std::string::String),
    Bool(bool),
    Bools(std::vec::Vec<bool>),
    Map(std::collections::HashMap<std::string::String, crate::model::AttributeValue>),
}

impl AttributeValue {
    pub fn as_bool(&self) -> Result<&bool, &crate::model::AttributeValue> {
        if let AttributeValue::Bool(val) = &self { Ok(&val) } else { Err(self) }
    }
    pub fn is_bool(&self) -> bool {
        self.as_bool().is_some()
    }
    pub fn as_bools(&self) -> Result<&std::vec::Vec<bool>, &crate::model::AttributeValue> {
        if let AttributeValue::Bools(val) = &self { Ok(&val) } else { Err(self) }
    }
    pub fn is_bools(&self) -> bool {
        self.as_bools().is_some()
    }
    pub fn as_map(&self) -> Result<&std::collections::HashMap<std::string::String, crate::model::AttributeValue>, &crate::model::AttributeValue> {
        if let AttributeValue::Map(val) = &self { Ok(&val) } else { Err(self) }
    }
    pub fn is_map(&self) -> bool {
        self.as_map().is_some()
    }
    pub fn as_string(&self) -> Result<&std::string::String, &crate::model::AttributeValue> {
        if let AttributeValue::String(val) = &self { Ok(&val) } else { Err(self) }
    }
    pub fn is_string(&self) -> bool {
        self.as_string().is_some()
    }
}

Backwards Compatibility

AWS SDKs require that clients can evolve in a backwards compatible way as new fields and operations are added. The types generated by smithy-rs are specifically designed to meet these requirements. Specifically, the following transformations must not break compilation when upgrading to a new version:

However, the following changes are not backwards compatible:

  • Error removed from operation.

In general, the best tool in Rust to solve these issues in the #[non_exhaustive] attribute which will be explored in detail below.

New Operation Added

Before

$version: "1"
namespace s3

service S3 {
    operations: [GetObject]
}

After

$version: "1"
namespace s3

service S3 {
    operations: [GetObject, PutObject]
}

Adding support for a new operation is backwards compatible because SDKs to not expose any sort of "service trait" that provides an interface over an entire service. This prevents clients from inheriting or implementing an interface that would be broken by the addition of a new operation.

New member added to structure

Summary

  • Structures are marked #[non_exhaustive]
  • Structures must be instantiated using builders
  • Structures must not derive Default in the event that required fields are added in the future.

In general, adding a new public member to a structure in Rust is not backwards compatible. However, by applying the #[non_exhaustive] to the structures generated by the Rust SDK, the Rust compiler will prevent users from using our structs in ways that prevent new fields from being added in the future. Note: in this context, the optionality of the fields is irrelevant.

Specifically, #[non_exhaustive] prohibits the following patterns:

  1. Direct structure instantiation:

    fn foo() {
    let ip_addr = IpAddress { addr: "192.168.1.1" };
    }

    If a new member is_local: boolean was added to the IpAddress structure, this code would not compile. To enable users to still construct our structures while maintaining backwards compatibility, all structures expose a builder, accessible at SomeStruct::Builder:

    fn foo() {
    let ip_addr = IpAddress::builder().addr("192.168.1.1").build();
    }
  2. Structure destructuring:

    fn foo() {
    let IpAddress { addr } = some_ip_addr();
    }

    This will also fail to compile if a new member is added, however, by adding #[non_exhaustive], the .. multifield wildcard MUST be added to support new fields being added in the future:

    fn foo() {
    let IpAddress { addr, .. } = some_ip_addr();
    }

Validation & Required Members

Adding a required member to a structure is not considered backwards compatible. When a required member is added to a structure:

  1. The builder will change to become fallible, meaning that instead of returning T it will return Result<T, BuildError>.
  2. Previous builder invocations that did not set the new field will still stop compiling if this was the first required field.
  3. Previous builder invocations will now return a BuildError because the required field is unset.

New union variant added

Similar to structures, #[non_exhaustive] also applies to unions. In order to allow new union variants to be added in the future, all unions (enum in Rust) generated by the Rust SDK must be marked with #[non_exhaustive]. Note: because new fields cannot be added to union variants, the union variants themselves do not need to be #[non_exhaustive]. To support new variants from services, each union contains an Unknown variant. By marking Unknown as non_exhaustive, we prevent customers from instantiating it directly.

#[non_exhaustive]
#[derive(std::clone::Clone, std::cmp::PartialEq, std::fmt::Debug)]
pub enum AttributeValue {
    B(aws_smithy_types::Blob),
    Bool(bool),
    Bs(std::vec::Vec<aws_smithy_types::Blob>),
    L(std::vec::Vec<crate::model::AttributeValue>),
    M(std::collections::HashMap<std::string::String, crate::model::AttributeValue>),
    N(std::string::String),
    Ns(std::vec::Vec<std::string::String>),
    Null(bool),
    S(std::string::String),
    Ss(std::vec::Vec<std::string::String>),

    // By marking `Unknown` as non_exhaustive, we prevent client code from instantiating it directly.
    #[non_exhaustive]
    Unknown,
}

Smithy Client

smithy-rs provides the ability to generate a client whose operations defined by a smithy model. The documents referenced here explain aspects of the client in greater detail.

What is the orchestrator?

At a very high level, an orchestrator is a process for transforming requests into responses. Please enjoy this fancy chart:

flowchart TB
	A(Orchestrate)-->|Input|B(Request serialization)
	B-->|Transmit Request|C(Connection)
	C-->|Transmit Response|D(Response deserialization)
	D-->|Success|E("Ok(Output)")
	D-->|Unretryable Failure|F("Err(SdkError)")
	D-->|Retryable Failure|C

This process is also referred to as the "request/response lifecycle." In this example, the types of "transmit request" and "transmit response" are protocol-dependent. Typical operations use HTTP, but we plan to support other protocols like MQTT in the future.

In addition to the above steps, the orchestrator must also handle:

  • Endpoint resolution: figuring out which URL to send a request to.
  • Authentication, identity resolution, and request signing: Figuring out who is sending the request, their credentials, and how we should insert the credentials into a request.
  • Interceptors: Running lifecycle hooks at each point in the request/response lifecycle.
  • Runtime Plugins: Resolving configuration from config builders.
  • Retries: Categorizing responses from services and deciding whether to retry and how long to wait before doing so.
  • Trace Probes: A sink for events that occur during the request/response lifecycle.

How is an orchestrator configured?

While the structure of an orchestrator is fixed, the actions it takes during its lifecycle are highly configurable. Users have two ways to configure this process:

  • Runtime Plugins:
    • When can these be set? Any time before calling orchestrate.
    • When are they called by the orchestrator? In two batches, at the very beginning of orchestrate.
    • What can they do?
      • They can set configuration to be used by the orchestrator or in interceptors.
      • They can set interceptors.
    • Are they user-definable? No. At present, only smithy-rs maintainers may define these.
  • Interceptors:
    • When can these be set? Any time before calling orchestrate.
    • When are they called by the orchestrator? At each step in the request-response lifecycle.
    • What can they do?
      • They can set configuration to be used by the orchestrator or in interceptors.
      • They can log information.
      • Depending on when they're run, they can modify the input, transmit request, transmit response, and the output/error.
    • Are they user-definable? Yes.

Configuration for a request is constructed by runtime plugins just after calling orchestrate. Configuration is stored in a ConfigBag: a hash map that's keyed on type's TypeId (an opaque object, managed by the Rust compiler, which references some type.)

What does the orchestrator do?

The orchestrator's work is divided into four phases:

NOTE: If an interceptor fails, then the other interceptors for that lifecycle event are still run. All resulting errors are collected and emitted together.

  1. Building the ConfigBag and mounting interceptors.
    • This phase is fallible.
    • An interceptor context is created. This will hold request and response objects, making them available to interceptors.
    • All runtime plugins set at the client-level are run. These plugins can set config and mount interceptors. Any "read before execution" interceptors that have been set get run.
    • All runtime plugins set at the operation-level are run. These plugins can also set config and mount interceptors. Any new "read before execution" interceptors that have been set get run.
  2. Request Construction
    • This phase is fallible.
    • The "read before serialization" and "modify before serialization" interceptors are called.
    • The input is serialized into a transmit request.
    • The "read after serialization" and "modify before retry loop" interceptors are called.
    • Before making an attempt, the retry handler is called to check if an attempt should be made. The retry handler makes this decision for an initial attempt as well as for the retry attempts. If an initial attempt should be made, then the orchestrator enters the Dispatch phase. Otherwise, a throttling error is returned.
  3. Request Dispatch
    • This phase is fallible. This phase's tasks are performed in a loop. Retryable request failures will be retried, and unretryable failures will end the loop.
    • The "read before attempt" interceptors are run.
    • An endpoint is resolved according to an endpoint resolver. The resolved endpoint is then applied to the transmit request.
    • The "read before signing" and "modify before signing" interceptors are run.
    • An identity and a signer are resolved according to an authentication resolver. The signer then signs the transmit request with the identity.
    • The "read after signing", "read before transmit", and "modify before transmit" interceptors are run.
    • The transmit request is passed into the connection, and a transmit response is received.
    • The "read after transmit", "read before deserialization", and "modify before deserialization" interceptors are run.
    • The transmit response is deserialized.
    • The "read after attempt" and "modify before attempt completion" interceptors are run.
    • The retry strategy is called to check if a retry is necessary. If a retry is required, the Dispatch phase restarts. Otherwise, the orchestrator enters the Response Handling phase.
  4. Response Handling
    • This phase is fallible.
    • The "read after deserialization" and "modify before completion" interceptors are run.
    • Events are dispatched to any trace probes that the user has set.
    • The "read after execution" interceptors are run.

At the end of all this, the response is returned. If an error occurred at any point, then the response will contain one or more errors, depending on what failed. Otherwise, the output will be returned.

How is the orchestrator implemented in Rust?

Avoiding generics at all costs

In designing the orchestrator, we sought to solve the problems we had with the original smithy client. The client made heavy use of generics, allowing for increased performance, but at the cost of increased maintenance burden and increased compile times. The Rust compiler, usually very helpful, isn't well-equipped to explain trait errors when bounds are this complex, and so the resulting client was difficult to extend. Trait aliases would have helped, but they're not (at the time of writing) available.

The type signatures for the old client and its call method:

impl<C, M, R> Client<C, M, R>
where
    C: bounds::SmithyConnector,
    M: bounds::SmithyMiddleware<C>,
    R: retry::NewRequestPolicy,
{
    pub async fn call<O, T, E, Retry>(&self, op: Operation<O, Retry>) -> Result<T, SdkError<E>>
        where
            O: Send + Sync,
            E: std::error::Error + Send + Sync + 'static,
            Retry: Send + Sync,
            R::Policy: bounds::SmithyRetryPolicy<O, T, E, Retry>,
            Retry: ClassifyRetry<SdkSuccess<T>, SdkError<E>>,
            bounds::Parsed<<M as bounds::SmithyMiddleware<C>>::Service, O, Retry>:
            Service<Operation<O, Retry>, Response=SdkSuccess<T>, Error=SdkError<E>> + Clone,
    {
        self.call_raw(op).await.map(|res| res.parsed)
    }

    pub async fn call_raw<O, T, E, Retry>(
        &self,
        op: Operation<O, Retry>,
    ) -> Result<SdkSuccess<T>, SdkError<E>>
        where
            O: Send + Sync,
            E: std::error::Error + Send + Sync + 'static,
            Retry: Send + Sync,
            R::Policy: bounds::SmithyRetryPolicy<O, T, E, Retry>,
            Retry: ClassifyRetry<SdkSuccess<T>, SdkError<E>>,
        // This bound is not _technically_ inferred by all the previous bounds, but in practice it
        // is because _we_ know that there is only implementation of Service for Parsed
        // (ParsedResponseService), and it will apply as long as the bounds on C, M, and R hold,
        // and will produce (as expected) Response = SdkSuccess<T>, Error = SdkError<E>. But Rust
        // doesn't know that -- there _could_ theoretically be other implementations of Service for
        // Parsed that don't return those same types. So, we must give the bound.
            bounds::Parsed<<M as bounds::SmithyMiddleware<C>>::Service, O, Retry>:
            Service<Operation<O, Retry>, Response=SdkSuccess<T>, Error=SdkError<E>> + Clone,
    {
        // The request/response lifecycle
    }
}

The type signature for the new orchestrate method:

pub async fn orchestrate(
    input: Input,
    runtime_plugins: &RuntimePlugins,
    // Currently, SdkError is HTTP-only. We currently use it for backwards-compatibility purposes.
    // The `HttpResponse` generic will likely be removed in the future.
) -> Result<Output, SdkError<Error, HttpResponse>> {
	// The request/response lifecycle
}

Wait a second, I hear you ask. "I see an Input and Output there, but you're not declaring any generic type arguments. What gives?"

I'm glad you asked. Generally, when you need traits, but you aren't willing to use generic type arguments, then you must Box. Polymorphism is achieved through dynamic dispatch instead of static dispatch, and this comes with a small runtime cost.

So, what are Input and Output? They're our own special flavor of a boxed trait object.

pub type Input = TypeErasedBox;
pub type Output = TypeErasedBox;
pub type Error = TypeErasedBox;

/// A new-type around `Box<dyn Any + Send + Sync>`
#[derive(Debug)]
pub struct TypeErasedBox {
    inner: Box<dyn Any + Send + Sync>,
}

The orchestrator itself doesn't know about any concrete types. Instead, it passes boxed data between the various components of the request/response lifecycle. Individual components access data in two ways:

  1. From the ConfigBag:
  • (with an accessor) let retry_strategy = cfg.retry_strategy();
  • (with the get method) let retry_strategy = cfg.get::<Box<dyn RetryStrategy>>()
  1. From the InterceptorContext:
  • (owned) let put_object_input: PutObjectInput = ctx.take_input().unwrap().downcast().unwrap()?;
  • (by reference) let put_object_input = ctx.input().unwrap().downcast_ref::<PutObjectInput>().unwrap();

Users can only call ConfigBag::get or downcast a TypeErasedBox to types they have access to, which allows maintainers to ensure encapsulation. For example: a plugin writer may declare a private type, place it in the config bag, and then later retrieve it. Because the type is private, only code in the same crate/module can ever insert or retrieve it. Therefore, there's less worry that someone will depend on a hidden, internal detail and no worry they'll accidentally overwrite a type in the bag.

NOTE: When inserting values into a config bag, using one of the set_<component> methods is always preferred, as this prevents mistakes related to inserting similar, but incorrect types.

The actual code

The current implementation of orchestrate is defined here, in the aws-smithy-runtime crate. Related code can be found in the aws-smithy-runtime-api crate.

Frequently asked questions

Why can't users create and use their own runtime plugins?

We chose to hide the runtime plugin API from users because we are concerned that exposing it will cause more problems than it solves. Instead, we encourage users to use interceptors. This is because, when setting a runtime plugin, any existing runtime plugin with the same type will be replaced. For example, there can only be one retry strategy or response deserializer. Errors resulting from unintentionally overriding a plugin would be difficult for users to diagnose, and would consume valuable development time.

Why does the orchestrator exist?

The orchestrator exists because there is an AWS-internal initiative to bring the architecture of all AWS SDKs closer to one another.

Why does this document exist when there's already an orchestrator RFC?

Because RFCs become outdated as designs evolve. It is our intention to keep this document up to date with our current implementation.

Identity and Auth in Clients

The Smithy specification establishes several auth related modeling traits that can be applied to operation and service shapes. To briefly summarize:

  • The auth schemes that are supported by a service are declared on the service shape
  • Operation shapes MAY specify the subset of service-defined auth schemes they support. If none are specified, then all service-defined auth schemes are supported.

A smithy code generator MUST support at least one auth scheme for every modeled operation, but it need not support ALL modeled auth schemes.

This design document establishes how smithy-rs implements this specification.

Terminology

  • Auth: Either a shorthand that represents both of the authentication and authorization terms below, or an ambiguous representation of one of them. In this doc, this term will always refer to both.
  • Authentication: The process of proving an entity is who they claim they are, sometimes referred to as AuthN.
  • Authorization: The process of granting an authenticated entity the permission to do something, sometimes referred to as AuthZ.
  • Identity: The information required for authentication.
  • Signing: The process of attaching metadata to a request that allows a server to authenticate that request.

Overview of Smithy Client Auth

There are two stages to identity and auth:

  1. Configuration
  2. Execution

The configuration stage

First, let's establish the aspects of auth that can be configured from the model at codegen time.

  • Data
    • AuthSchemeOptionResolverParams: parameters required to resolve auth scheme options. These parameters are allowed to come from both the client config and the operation input structs.
    • AuthSchemes: a list of auth schemes that can be used to sign HTTP requests. This information comes directly from the service model.
    • AuthSchemeProperties: configuration from the auth scheme for the signer.
    • IdentityResolvers: list of available identity resolvers.
  • Implementations
    • IdentityResolver: resolves an identity for use in authentication. There can be multiple identity resolvers that need to be selected from.
    • Signer: a signing implementation that signs a HTTP request.
    • ResolveAuthSchemeOptions: resolves a list of auth scheme options for a given operation and its inputs.

As it is undocumented (at time of writing), this document assumes that the code generator creates one service-level runtime plugin, and an operation-level runtime plugin per operation, hence referred to as the service runtime plugin and operation runtime plugin.

The code generator emits code to add identity resolvers and HTTP auth schemes to the config bag in the service runtime plugin. It then emits code to register an interceptor in the operation runtime plugin that reads the operation input to generate the auth scheme option resolver params (which also get added to the config bag).

The execution stage

At a high-level, the process of resolving an identity and signing a request looks as follows:

  1. Retrieve the AuthSchemeOptionResolverParams from the config bag. The AuthSchemeOptionResolverParams allow client config and operation inputs to play a role in which auth scheme option is selected.
  2. Retrieve the ResolveAuthSchemeOptions impl from the config bag, and use it to resolve the auth scheme options available with the AuthSchemeOptionResolverParams. The returned auth scheme options are in priority order.
  3. Retrieve the IdentityResolvers list from the config bag.
  4. For each auth scheme option:
    1. Attempt to find an HTTP auth scheme for that auth scheme option in the config bag (from the AuthSchemes list).
    2. If an auth scheme is found:
      1. Use the auth scheme to extract the correct identity resolver from the IdentityResolvers list.
      2. Retrieve the Signer implementation from the auth scheme.
      3. Use the IdentityResolver to resolve the identity needed for signing.
      4. Sign the request with the identity, and break out of the loop from step #4.

In general, it is assumed that if an HTTP auth scheme exists for an auth scheme option, then an identity resolver also exists for that auth scheme option. Otherwise, the auth option was configured incorrectly during codegen.

How this looks in Rust

The client will use trait objects and dynamic dispatch for the IdentityResolver, Signer, and AuthSchemeOptionResolver implementations. Generics could potentially be used, but the number of generic arguments and trait bounds in the orchestrator would balloon to unmaintainable levels if each configurable implementation in it was made generic.

These traits look like this:

#[derive(Clone, Debug)]
pub struct AuthSchemeId {
    scheme_id: &'static str,
}

pub trait ResolveAuthSchemeOptions: Send + Sync + Debug {
    fn resolve_auth_scheme_options<'a>(
        &'a self,
        params: &AuthSchemeOptionResolverParams,
    ) -> Result<Cow<'a, [AuthSchemeId]>, BoxError>;
}

pub trait IdentityResolver: Send + Sync + Debug {
    fn resolve_identity(&self, config: &ConfigBag) -> BoxFallibleFut<Identity>;
}

pub trait Signer: Send + Sync + Debug {
    /// Return a signed version of the given request using the given identity.
    ///
    /// If the provided identity is incompatible with this signer, an error must be returned.
    fn sign_http_request(
        &self,
        request: &mut HttpRequest,
        identity: &Identity,
        auth_scheme_endpoint_config: AuthSchemeEndpointConfig<'_>,
        runtime_components: &RuntimeComponents,
        config_bag: &ConfigBag,
    ) -> Result<(), BoxError>;
}

IdentityResolver and Signer implementations are both given an Identity, but will need to understand what the concrete data type underlying that identity is. The Identity struct uses a Arc<dyn Any> to represent the actual identity data so that generics are not needed in the traits:

#[derive(Clone, Debug)]
pub struct Identity {
    data: Arc<dyn Any + Send + Sync>,
    expiration: Option<SystemTime>,
}

Identities can often be cached and reused across several requests, which is why the Identity uses Arc rather than Box. This also reduces the allocations required. The signer implementations will use downcasting to access the identity data types they understand. For example, with AWS SigV4, it might look like the following:

fn sign_http_request(
    &self,
    request: &mut HttpRequest,
    identity: &Identity,
    auth_scheme_endpoint_config: AuthSchemeEndpointConfig<'_>,
    runtime_components: &RuntimeComponents,
    config_bag: &ConfigBag,
) -> Result<(), BoxError> {
    let aws_credentials = identity.data::<Credentials>()
        .ok_or_else(|| "The SigV4 signer requires AWS credentials")?;
    let access_key = &aws_credentials.secret_access_key;
    // -- snip --
}

Also note that identity data structs are expected to censor their own sensitive fields, as Identity implements the automatically derived Debug trait.

Challenges with this Identity design

A keen observer would note that there is an expiration field on Identity, and may ask, "what about non-expiring identities?" This is the result of a limitation on Box<dyn Any>, where it can only be downcasted to concrete types. There is no way to downcast to a dyn Trait since the information required to verify that that type is that trait is lost at compile time (a std::any::TypeId only encodes information about the concrete type).

In an ideal world, it would be possible to extract the expiration like this:

pub trait ExpiringIdentity {
    fn expiration(&self) -> SystemTime;
}

let identity: Identity = some_identity();
if let Some(expiration) = identity.data::<&dyn ExpiringIdentity>().map(ExpiringIdentity::expiration) {
    // make a decision based on that expiration
}

Theoretically, you should be able to save off additional type information alongside the Box<dyn Any> and use unsafe code to transmute to known traits, but it is difficult to implement in practice, and adds unsafe code in a security critical piece of code that could otherwise be avoided.

The expiration field is a special case that is allowed onto the Identity struct directly since identity cache implementations will always need to be aware of this piece of information, and having it as an Option still allows for non-expiring identities.

Ultimately, this design constrains Signer implementations to concrete types. There is no world where an Signer can operate across multiple unknown identity data types via trait, and that should be OK since the signer implementation can always be wrapped with an implementation that is aware of the concrete type provided by the identity resolver, and can do any necessary conversions.

Detailed Error Explanations

This page collects detailed explanations for some errors. If you encounter an error and are interested in learning more about what it means and why it occurs, check here.

If you can't find the explanation on this page, please file an issue asking for it to be added.

"Connection encountered an issue and should not be re-used. Marking it for closure"

The SDK clients each maintain their own connection pool (except when they share an HttpClient). By the convention of some services, when a request fails due to a transient error, that connection should not be re-used for a retry. Instead, it should be dropped and a new connection created instead. This prevents clients from repeatedly sending requests over a failed connection.

This feature is referred to as "connection poisoning" internally.

Transient Errors

When requests to a service time out, or when a service responds with a 500, 502, 503, or 504 error, it's considered a 'transient error'. Transient errors are often resolved by making another request.

When retrying transient errors, the SDKs may avoid re-using connections to overloaded or otherwise unavailable service endpoints, choosing instead to establish a new connection. This behavior is referred to internally as "connection poisoning" and is configurable.

To configure this behavior, set the reconnect_mode in an SDK client config's RetryConfig.

Smithy Server

Smithy Rust provides the ability to generate a server whose operations are provided by the customer.

Middleware

The following document provides a brief survey of the various positions middleware can be inserted in Smithy Rust.

We use the Pokémon service as a reference model throughout.

/// A Pokémon species forms the basis for at least one Pokémon.
@title("Pokémon Species")
resource PokemonSpecies {
    identifiers: {
        name: String
    },
    read: GetPokemonSpecies,
}

/// A users current Pokémon storage.
resource Storage {
    identifiers: {
        user: String
    },
    read: GetStorage,
}

/// The Pokémon Service allows you to retrieve information about Pokémon species.
@title("Pokémon Service")
@restJson1
service PokemonService {
    version: "2021-12-01",
    resources: [PokemonSpecies, Storage],
    operations: [
        GetServerStatistics,
        DoNothing,
        CapturePokemon,
        CheckHealth
    ],
}

Introduction to Tower

Smithy Rust is built on top of tower.

Tower is a library of modular and reusable components for building robust networking clients and servers.

The tower library is centered around two main interfaces, the Service trait and the Layer trait.

The Service trait can be thought of as an asynchronous function from a request to a response, async fn(Request) -> Result<Response, Error>, coupled with a mechanism to handle back pressure, while the Layer trait can be thought of as a way of decorating a Service, transforming either the request or response.

Middleware in tower typically conforms to the following pattern, a Service implementation of the form

#![allow(unused)]
fn main() {
pub struct NewService<S> {
    inner: S,
    /* auxillary data */
}
}

and a complementary

#![allow(unused)]
fn main() {
extern crate tower;
pub struct NewService<S> { inner: S }
use tower::{Layer, Service};

pub struct NewLayer {
    /* auxiliary data */
}

impl<S> Layer<S> for NewLayer {
    type Service = NewService<S>;

    fn layer(&self, inner: S) -> Self::Service {
        NewService {
            inner,
            /* auxiliary fields */
        }
    }
}
}

The NewService modifies the behavior of the inner Service S while the NewLayer takes auxiliary data and constructs NewService<S> from S.

Customers are then able to stack middleware by composing Layers using combinators such as ServiceBuilder::layer and Stack.

Applying Middleware

One of the primary goals is to provide configurability and extensibility through the application of middleware. The customer is able to apply Layers in a variety of key places during the request/response lifecycle. The following schematic labels each configurable middleware position from A to D:

stateDiagram-v2
    state in <<fork>>
    state "GetPokemonSpecies" as C1
    state "GetStorage" as C2
    state "DoNothing" as C3
    state "..." as C4
    direction LR
    [*] --> in : HTTP Request
    UpgradeLayer --> [*]: HTTP Response
    state A {
        state PokemonService {
            state RoutingService {
                in --> UpgradeLayer: HTTP Request
                in --> C2: HTTP Request
                in --> C3: HTTP Request
                in --> C4: HTTP Request
                state B {
                    state C1 {
                        state C {
                            state UpgradeLayer {
                                direction LR
                                [*] --> Handler: Model Input
                                Handler --> [*] : Model Output
                                state D {
                                    Handler
                                }
                            }
                        }
                    }
                    C2
                    C3
                    C4
                }
            }
        }
    }
    C2 --> [*]: HTTP Response
    C3 --> [*]: HTTP Response
    C4 --> [*]: HTTP Response

where UpgradeLayer is the Layer converting Smithy model structures to HTTP structures and the RoutingService is responsible for routing requests to the appropriate operation.

A. Outer Middleware

The output of the Smithy service builder provides the user with a Service<http::Request, Response = http::Response> implementation. A Layer can be applied around the entire Service.

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
extern crate pokemon_service_server_sdk;
extern crate tower;
use std::time::Duration;
struct TimeoutLayer;
impl TimeoutLayer { fn new(t: Duration) -> Self { Self }}
impl<S> Layer<S> for TimeoutLayer { type Service = S; fn layer(&self, svc: S) -> Self::Service { svc } }
use pokemon_service_server_sdk::{input::*, output::*, error::*};
let handler = |req: GetPokemonSpeciesInput| async { Result::<GetPokemonSpeciesOutput, GetPokemonSpeciesError>::Ok(todo!()) };
use aws_smithy_http_server::protocol::rest_json_1::{RestJson1, router::RestRouter};
use aws_smithy_http_server::routing::{Route, RoutingService};
use pokemon_service_server_sdk::{PokemonServiceConfig, PokemonService};
use tower::Layer;

let config = PokemonServiceConfig::builder().build();

// This is a HTTP `Service`.
let app = PokemonService::builder(config)
    .get_pokemon_species(handler)
    /* ... */
    .build()
    .unwrap();
let app: PokemonService<RoutingService<RestRouter<Route>, RestJson1>>  = app;

// Construct `TimeoutLayer`.
let timeout_layer = TimeoutLayer::new(Duration::from_secs(3));

// Apply a 3 second timeout to all responses.
let app = timeout_layer.layer(app);
}

B. Route Middleware

A single layer can be applied to all routes inside the Router. This exists as a method on the PokemonServiceConfig builder object, which is passed into the service builder.

#![allow(unused)]
fn main() {
extern crate tower;
extern crate pokemon_service_server_sdk;
extern crate aws_smithy_http_server;
use tower::{util::service_fn, Layer};
use std::time::Duration;
use aws_smithy_http_server::protocol::rest_json_1::{RestJson1, router::RestRouter};
use aws_smithy_http_server::routing::{Route, RoutingService};
use pokemon_service_server_sdk::{input::*, output::*, error::*};
let handler = |req: GetPokemonSpeciesInput| async { Result::<GetPokemonSpeciesOutput, GetPokemonSpeciesError>::Ok(todo!()) };
struct MetricsLayer;
impl MetricsLayer { pub fn new() -> Self { Self } }
impl<S> Layer<S> for MetricsLayer { type Service = S; fn layer(&self, svc: S) -> Self::Service { svc } }
use pokemon_service_server_sdk::{PokemonService, PokemonServiceConfig};

// Construct `MetricsLayer`.
let metrics_layer = MetricsLayer::new();

let config = PokemonServiceConfig::builder().layer(metrics_layer).build();

let app = PokemonService::builder(config)
    .get_pokemon_species(handler)
    /* ... */
    .build()
    .unwrap();
let app: PokemonService<RoutingService<RestRouter<Route>, RestJson1>>  = app;
}

Note that requests pass through this middleware immediately after routing succeeds and therefore will not be encountered if routing fails. This means that the TraceLayer in the example above does not provide logs unless routing has completed. This contrasts to middleware A, which all requests/responses pass through when entering/leaving the service.

C. Operation Specific HTTP Middleware

A "HTTP layer" can be applied to specific operations.

#![allow(unused)]
fn main() {
extern crate tower;
extern crate pokemon_service_server_sdk;
extern crate aws_smithy_http_server;
use tower::{util::service_fn, Layer};
use std::time::Duration;
use pokemon_service_server_sdk::{operation_shape::GetPokemonSpecies, input::*, output::*, error::*};
use aws_smithy_http_server::protocol::rest_json_1::{RestJson1, router::RestRouter};
use aws_smithy_http_server::routing::{Route, RoutingService};
use aws_smithy_http_server::{operation::OperationShapeExt, plugin::*, operation::*};
let handler = |req: GetPokemonSpeciesInput| async { Result::<GetPokemonSpeciesOutput, GetPokemonSpeciesError>::Ok(todo!()) };
struct LoggingLayer;
impl LoggingLayer { pub fn new() -> Self { Self } }
impl<S> Layer<S> for LoggingLayer { type Service = S; fn layer(&self, svc: S) -> Self::Service { svc } }
use pokemon_service_server_sdk::{PokemonService, PokemonServiceConfig, scope};

scope! {
    /// Only log on `GetPokemonSpecies` and `GetStorage`
    struct LoggingScope {
        includes: [GetPokemonSpecies, GetStorage]
    }
}

// Construct `LoggingLayer`.
let logging_plugin = LayerPlugin(LoggingLayer::new());
let logging_plugin = Scoped::new::<LoggingScope>(logging_plugin);
let http_plugins = HttpPlugins::new().push(logging_plugin);

let config = PokemonServiceConfig::builder().http_plugin(http_plugins).build();

let app = PokemonService::builder(config)
    .get_pokemon_species(handler)
    /* ... */
    .build()
    .unwrap();
let app: PokemonService<RoutingService<RestRouter<Route>, RestJson1>>  = app;
}

This middleware transforms the operations HTTP requests and responses.

D. Operation Specific Model Middleware

A "model layer" can be applied to specific operations.

#![allow(unused)]
fn main() {
extern crate tower;
extern crate pokemon_service_server_sdk;
extern crate aws_smithy_http_server;
use tower::{util::service_fn, Layer};
use pokemon_service_server_sdk::{operation_shape::GetPokemonSpecies, input::*, output::*, error::*};
let handler = |req: GetPokemonSpeciesInput| async { Result::<GetPokemonSpeciesOutput, GetPokemonSpeciesError>::Ok(todo!()) };
use aws_smithy_http_server::{operation::*, plugin::*};
use aws_smithy_http_server::protocol::rest_json_1::{RestJson1, router::RestRouter};
use aws_smithy_http_server::routing::{Route, RoutingService};
struct BufferLayer;
impl BufferLayer { pub fn new(size: usize) -> Self { Self } }
impl<S> Layer<S> for BufferLayer { type Service = S; fn layer(&self, svc: S) -> Self::Service { svc } }
use pokemon_service_server_sdk::{PokemonService, PokemonServiceConfig, scope};

scope! {
    /// Only buffer on `GetPokemonSpecies` and `GetStorage`
    struct BufferScope {
        includes: [GetPokemonSpecies, GetStorage]
    }
}

// Construct `BufferLayer`.
let buffer_plugin = LayerPlugin(BufferLayer::new(3));
let buffer_plugin = Scoped::new::<BufferScope>(buffer_plugin);
let config = PokemonServiceConfig::builder().model_plugin(buffer_plugin).build();

let app = PokemonService::builder(config)
    .get_pokemon_species(handler)
    /* ... */
    .build()
    .unwrap();
let app: PokemonService<RoutingService<RestRouter<Route>, RestJson1>>  = app;
}

In contrast to position C, this middleware transforms the operations modelled inputs to modelled outputs.

Plugin System

Suppose we want to apply a different Layer to every operation. In this case, position B (PokemonService::layer) will not suffice because it applies a single Layer to all routes and while position C (Operation::layer) would work, it'd require the customer constructs the Layer by hand for every operation.

Consider the following middleware:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
extern crate tower;
use aws_smithy_http_server::shape_id::ShapeId;
use std::task::{Context, Poll};
use tower::Service;

/// A [`Service`] that adds a print log.
pub struct PrintService<S> {
    inner: S,
    operation_id: ShapeId,
    service_id: ShapeId
}

impl<R, S> Service<R> for PrintService<S>
where
    S: Service<R>,
{
    type Response = S::Response;
    type Error = S::Error;
    type Future = S::Future;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.inner.poll_ready(cx)
    }

    fn call(&mut self, req: R) -> Self::Future {
        println!("Hi {} in {}", self.operation_id.name(), self.service_id.name());
        self.inner.call(req)
    }
}
}

The plugin system provides a way to construct then apply Layers in position C and D, using the protocol and operation shape as parameters.

An example of a PrintPlugin which prints the operation name:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
use aws_smithy_http_server::shape_id::ShapeId;
pub struct PrintService<S> { inner: S, operation_id: ShapeId, service_id: ShapeId }
use aws_smithy_http_server::{plugin::Plugin, operation::OperationShape, service::ServiceShape};

/// A [`Plugin`] for a service builder to add a [`PrintService`] over operations.
#[derive(Debug)]
pub struct PrintPlugin;

impl<Ser, Op, T> Plugin<Ser, Op, T> for PrintPlugin
where
    Ser: ServiceShape,
    Op: OperationShape,
{
    type Output = PrintService<T>;

    fn apply(&self, inner: T) -> Self::Output {
        PrintService {
            inner,
            operation_id: Op::ID,
            service_id: Ser::ID,
        }
    }
}
}

You can provide a custom method to add your plugin to a collection of HttpPlugins or ModelPlugins via an extension trait. For example, for HttpPlugins:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
pub struct PrintPlugin;
impl aws_smithy_http_server::plugin::HttpMarker for PrintPlugin { }
use aws_smithy_http_server::plugin::{HttpPlugins, PluginStack};

/// This provides a [`print`](PrintExt::print) method on [`HttpPlugins`].
pub trait PrintExt<ExistingPlugins> {
    /// Causes all operations to print the operation name when called.
    ///
    /// This works by applying the [`PrintPlugin`].
    fn print(self) -> HttpPlugins<PluginStack<PrintPlugin, ExistingPlugins>>;
}

impl<ExistingPlugins> PrintExt<ExistingPlugins> for HttpPlugins<ExistingPlugins> {
    fn print(self) -> HttpPlugins<PluginStack<PrintPlugin, ExistingPlugins>> {
        self.push(PrintPlugin)
    }
}
}

This allows for:

#![allow(unused)]
fn main() {
extern crate pokemon_service_server_sdk;
extern crate aws_smithy_http_server;
use aws_smithy_http_server::plugin::{PluginStack, Plugin};
struct PrintPlugin;
impl<Ser, Op, T> Plugin<Ser, Op, T> for PrintPlugin { type Output = T; fn apply(&self, svc: T) -> Self::Output { svc }}
impl aws_smithy_http_server::plugin::HttpMarker for PrintPlugin { }
trait PrintExt<EP> { fn print(self) -> HttpPlugins<PluginStack<PrintPlugin, EP>>; }
impl<EP> PrintExt<EP> for HttpPlugins<EP> { fn print(self) -> HttpPlugins<PluginStack<PrintPlugin, EP>> { self.push(PrintPlugin) }}
use pokemon_service_server_sdk::{operation_shape::GetPokemonSpecies, input::*, output::*, error::*};
let handler = |req: GetPokemonSpeciesInput| async { Result::<GetPokemonSpeciesOutput, GetPokemonSpeciesError>::Ok(todo!()) };
use aws_smithy_http_server::protocol::rest_json_1::{RestJson1, router::RestRouter};
use aws_smithy_http_server::routing::{Route, RoutingService};
use aws_smithy_http_server::plugin::{IdentityPlugin, HttpPlugins};
use pokemon_service_server_sdk::{PokemonService, PokemonServiceConfig};

let http_plugins = HttpPlugins::new()
    // [..other plugins..]
    // The custom method!
    .print();
let config = PokemonServiceConfig::builder().http_plugin(http_plugins).build();
let app /* : PokemonService<Route<B>> */ = PokemonService::builder(config)
    .get_pokemon_species(handler)
    /* ... */
    .build()
    .unwrap();
let app: PokemonService<RoutingService<RestRouter<Route>, RestJson1>>  = app;
}

The custom print method hides the details of the Plugin trait from the average consumer. They interact with the utility methods on HttpPlugins and enjoy the self-contained documentation.

Instrumentation

A Smithy Rust server uses the tracing crate to provide instrumentation. The customer is responsible for setting up a Subscriber in order to ingest and process events - Smithy Rust makes no prescription on the choice of Subscriber. Common choices might include:

Events are emitted and spans are opened by the aws-smithy-http-server, aws-smithy-http-server-python, and generated crate. The default target is always used

The tracing macros default to using the module path where the span or event originated as the target, but it may be overridden.

and therefore spans and events be filtered using the EnvFilter and/or Targets filters with crate and module paths.

For example,

RUST_LOG=aws_smithy_http_server=warn,aws_smithy_http_server_python=error

and

#![allow(unused)]
fn main() {
extern crate tracing_subscriber;
extern crate tracing;
use tracing_subscriber::filter;
use tracing::Level;
let filter = filter::Targets::new().with_target("aws_smithy_http_server", Level::DEBUG);
}

In general, Smithy Rust is conservative when using high-priority log levels:

  • ERROR
    • Fatal errors, resulting in the termination of the service.
    • Requires immediate remediation.
  • WARN
    • Non-fatal errors, resulting in incomplete operation.
    • Indicates service misconfiguration, transient errors, or future changes in behavior.
    • Requires inspection and remediation.
  • INFO
    • Informative events, which occur inside normal operating limits.
    • Used for large state transitions, e.g. startup/shutdown.
  • DEBUG
    • Informative and sparse events, which occur inside normal operating limits.
    • Used to debug coarse-grained progress of service.
  • TRACE
    • Informative and frequent events, which occur inside normal operating limits.
    • Used to debug fine-grained progress of service.

Spans over the Request/Response lifecycle

Smithy Rust is built on top of tower, which means that middleware can be used to encompass different periods of the lifecycle of the request and response and identify them with a span.

An open-source example of such a middleware is TraceLayer provided by the tower-http crate.

Smithy provides an out-the-box middleware which:

  • Opens a DEBUG level span, prior to request handling, including the operation name and request URI and headers.
  • Emits a DEBUG level event, after to request handling, including the response headers and status code.

This is enabled via the instrument method provided by the aws_smithy_http_server::instrumentation::InstrumentExt trait.

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
extern crate pokemon_service_server_sdk;
use pokemon_service_server_sdk::{operation_shape::GetPokemonSpecies, input::*, output::*, error::*};
let handler = |req: GetPokemonSpeciesInput| async { Result::<GetPokemonSpeciesOutput, GetPokemonSpeciesError>::Ok(todo!()) };
use aws_smithy_http_server::{
  instrumentation::InstrumentExt,
  plugin::{IdentityPlugin, HttpPlugins}
};
use aws_smithy_http_server::protocol::rest_json_1::{RestJson1, router::RestRouter};
use aws_smithy_http_server::routing::{Route, RoutingService};
use pokemon_service_server_sdk::{PokemonServiceConfig, PokemonService};

let http_plugins = HttpPlugins::new().instrument();
let config = PokemonServiceConfig::builder().http_plugin(http_plugins).build();
let app = PokemonService::builder(config)
  .get_pokemon_species(handler)
  /* ... */
  .build()
  .unwrap();
let app: PokemonService<RoutingService<RestRouter<Route>, RestJson1>>  = app;
}

Example

The Pokémon service example, located at /examples/pokemon-service, sets up a tracing Subscriber as follows:

#![allow(unused)]
fn main() {
extern crate tracing_subscriber;
use tracing_subscriber::{prelude::*, EnvFilter};

/// Setup `tracing::subscriber` to read the log level from RUST_LOG environment variable.
pub fn setup_tracing() {
    let format = tracing_subscriber::fmt::layer().pretty();
    let filter = EnvFilter::try_from_default_env()
        .or_else(|_| EnvFilter::try_new("info"))
        .unwrap();
    tracing_subscriber::registry().with(format).with(filter).init();
}
}

Running the Pokémon service example using

RUST_LOG=aws_smithy_http_server=debug,pokemon_service=debug cargo r

and then using cargo t to run integration tests against the server, yields the following logs:

  2022-09-27T09:13:35.372517Z DEBUG aws_smithy_http_server::instrumentation::service: response, headers: {"content-type": "application/json", "content-length": "17"}, status_code: 200 OK
    at /smithy-rs/rust-runtime/aws-smithy-http-server/src/logging/service.rs:47
    in aws_smithy_http_server::instrumentation::service::request with operation: get_server_statistics, method: GET, uri: /stats, headers: {"host": "localhost:13734"}

  2022-09-27T09:13:35.374104Z DEBUG pokemon_service: attempting to authenticate storage user
    at pokemon-service/src/lib.rs:184
    in aws_smithy_http_server::instrumentation::service::request with operation: get_storage, method: GET, uri: /pokedex/{redacted}, headers: {"passcode": "{redacted}", "host": "localhost:13734"}

  2022-09-27T09:13:35.374152Z DEBUG pokemon_service: authentication failed
    at pokemon-service/src/lib.rs:188
    in aws_smithy_http_server::instrumentation::service::request with operation: get_storage, method: GET, uri: /pokedex/{redacted}, headers: {"passcode": "{redacted}", "host": "localhost:13734"}

  2022-09-27T09:13:35.374230Z DEBUG aws_smithy_http_server::instrumentation::service: response, headers: {"content-type": "application/json", "x-amzn-errortype": "NotAuthorized", "content-length": "2"}, status_code: 401 Unauthorized
    at /smithy-rs/rust-runtime/aws-smithy-http-server/src/logging/service.rs:47
    in aws_smithy_http_server::instrumentation::service::request with operation: get_storage, method: GET, uri: /pokedex/{redacted}, headers: {"passcode": "{redacted}", "host": "localhost:13734"}

Interactions with Sensitivity

Instrumentation interacts with Smithy's sensitive trait.

Sensitive data MUST NOT be exposed in things like exception messages or log output. Application of this trait SHOULD NOT affect wire logging (i.e., logging of all data transmitted to and from servers or clients).

For this reason, Smithy runtime will never use tracing to emit events or open spans that include any sensitive data. This means that the customer can ingest all logs from aws-smithy-http-server and aws-smithy-http-server-* without fear of violating the sensitive trait.

The Smithy runtime will not, and cannot, prevent the customer violating the sensitive trait within the operation handlers and custom middleware. It is the responsibility of the customer to not violate the sensitive contract of their own model, care must be taken.

Smithy shapes can be sensitive while being coupled to the HTTP request/responses via the HTTP binding traits. This poses a risk when ingesting events which naively capture request/response information. The instrumentation middleware provided by Smithy Rust respects the sensitive trait and will replace sensitive data in its span and event with {redacted}. This feature can be seen in the Example above. For debugging purposes these redactions can be prevented using the aws-smithy-http-server feature flag, unredacted-logging.

Some examples of inadvertently leaking sensitive information:

  • Ingesting tracing events and spans from third-party crates which do not respect sensitivity.
    • An concrete example of this would be enabling events from hyper or tokio.
  • Applying middleware which ingests events including HTTP payloads or any other part of the HTTP request/response which can be bound.

Accessing Un-modelled Data

For every Smithy Operation an input, output, and optional error are specified. This in turn constrains the function signature of the handler provided to the service builder - the input to the handler must be the input specified by the operation etc.

But what if we, the customer, want to access data in the handler which is not modelled by our Smithy model? Smithy Rust provides an escape hatch in the form of the FromParts trait. In axum these are referred to as "extractors".

/// Provides a protocol aware extraction from a [`Request`]. This borrows the
/// [`Parts`], in contrast to [`FromRequest`].
pub trait FromParts<Protocol>: Sized {
    /// The type of the failures yielded extraction attempts.
    type Rejection: IntoResponse<Protocol>;

    /// Extracts `self` from a [`Parts`] synchronously.
    fn from_parts(parts: &mut Parts) -> Result<Self, Self::Rejection>;
}

Here Parts is the struct containing all items in a http::Request except for the HTTP body.

A prolific example of a FromParts implementation is Extension<T>:

/// Generic extension type stored in and extracted from [request extensions].
///
/// This is commonly used to share state across handlers.
///
/// If the extension is missing it will reject the request with a `500 Internal
/// Server Error` response.
///
/// [request extensions]: https://docs.rs/http/latest/http/struct.Extensions.html
#[derive(Debug, Clone)]
pub struct Extension<T>(pub T);

/// The extension has not been added to the [`Request`](http::Request) or has been previously removed.
#[derive(Debug, Error)]
#[error("the `Extension` is not present in the `http::Request`")]
pub struct MissingExtension;

impl<Protocol> IntoResponse<Protocol> for MissingExtension {
    fn into_response(self) -> http::Response<BoxBody> {
        let mut response = http::Response::new(empty());
        *response.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
        response
    }
}

impl<Protocol, T> FromParts<Protocol> for Extension<T>
where
    T: Send + Sync + 'static,
{
    type Rejection = MissingExtension;

    fn from_parts(parts: &mut http::request::Parts) -> Result<Self, Self::Rejection> {
        parts.extensions.remove::<T>().map(Extension).ok_or(MissingExtension)
    }
}

This allows the service builder to accept the following handler

async fn handler(input: ModelInput, extension: Extension<SomeStruct>) -> ModelOutput {
    /* ... */
}

where ModelInput and ModelOutput are specified by the Smithy Operation and SomeStruct is a struct which has been inserted, by middleware, into the http::Request::extensions.

Up to 32 structures implementing FromParts can be provided to the handler with the constraint that they must be provided after the ModelInput:

async fn handler(input: ModelInput, ext1: Extension<SomeStruct1>, ext2: Extension<SomeStruct2>, other: Other /* : FromParts */, /* ... */) -> ModelOutput {
    /* ... */
}

Note that the parts.extensions.remove::<T>() in Extensions::from_parts will cause multiple Extension<SomeStruct> arguments in the handler to fail. The first extraction failure to occur is serialized via the IntoResponse trait (notice type Error: IntoResponse<Protocol>) and returned.

The FromParts trait is public so customers have the ability specify their own implementations:

struct CustomerDefined {
    /* ... */
}

impl<P> FromParts<P> for CustomerDefined {
    type Error = /* ... */;

    fn from_parts(parts: &mut Parts) -> Result<Self, Self::Error> {
        // Construct `CustomerDefined` using the request headers.
        let header_value = parts.headers.get("header-name").ok_or(/* ... */)?;
        Ok(CustomerDefined { /* ... */ })
    }
}

async fn handler(input: ModelInput, arg: CustomerDefined) -> ModelOutput {
    /* ... */
}

The Anatomy of a Service

What is Smithy? At a high-level, it's a grammar for specifying services while leaving the business logic undefined. A Smithy Service specifies a collection of function signatures in the form of Operations, their purpose is to encapsulate business logic. A Smithy implementation should, for each Smithy Service, provide a builder, which accepts functions conforming to said signatures, and returns a service subject to the semantics specified by the model.

This survey is disinterested in the actual Kotlin implementation of the code generator, and instead focuses on the structure of the generated Rust code and how it relates to the Smithy model. The intended audience is new contributors and users interested in internal details.

During the survey we will use the pokemon.smithy model as a reference:

/// A Pokémon species forms the basis for at least one Pokémon.
@title("Pokémon Species")
resource PokemonSpecies {
    identifiers: {
        name: String
    },
    read: GetPokemonSpecies,
}

/// A users current Pokémon storage.
resource Storage {
    identifiers: {
        user: String
    },
    read: GetStorage,
}

/// The Pokémon Service allows you to retrieve information about Pokémon species.
@title("Pokémon Service")
@restJson1
service PokemonService {
    version: "2021-12-01",
    resources: [PokemonSpecies, Storage],
    operations: [
        GetServerStatistics,
        DoNothing,
        CapturePokemon,
        CheckHealth
    ],
}

Smithy Rust will use this model to produce the following API:

#![allow(unused)]
fn main() {
extern crate pokemon_service_server_sdk;
extern crate aws_smithy_http_server;
use aws_smithy_http_server::protocol::rest_json_1::{RestJson1, router::RestRouter};
use aws_smithy_http_server::routing::{Route, RoutingService};
use pokemon_service_server_sdk::{input::*, output::*, error::*, operation_shape::*, PokemonServiceConfig, PokemonService};
// A handler for the `GetPokemonSpecies` operation (the `PokemonSpecies` resource).
async fn get_pokemon_species(input: GetPokemonSpeciesInput) -> Result<GetPokemonSpeciesOutput, GetPokemonSpeciesError> {
    todo!()
}

let config = PokemonServiceConfig::builder().build();

// Use the service builder to create `PokemonService`.
let pokemon_service = PokemonService::builder(config)
    // Pass the handler directly to the service builder...
    .get_pokemon_species(get_pokemon_species)
    /* other operation setters */
    .build()
    .expect("failed to create an instance of the Pokémon service");
let pokemon_service: PokemonService<RoutingService<RestRouter<Route>, RestJson1>>  = pokemon_service;
}

Operations

A Smithy Operation specifies the input, output, and possible errors of an API operation. One might characterize a Smithy Operation as syntax for specifying a function type.

We represent this in Rust using the OperationShape trait:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
use aws_smithy_http_server::shape_id::ShapeId;
pub trait OperationShape {
    /// The name of the operation.
    const ID: ShapeId;

    /// The operation input.
    type Input;
    /// The operation output.
    type Output;
    /// The operation error. [`Infallible`](std::convert::Infallible) in the case where no error
    /// exists.
    type Error;
}
use aws_smithy_http_server::operation::OperationShape as OpS;
impl<T: OpS> OperationShape for T {
  const ID: ShapeId = <T as OpS>::ID;
  type Input = <T as OpS>::Input;
  type Output = <T as OpS>::Output;
  type Error = <T as OpS>::Error;
}
}

For each Smithy Operation shape,

/// Retrieve information about a Pokémon species.
@readonly
@http(uri: "/pokemon-species/{name}", method: "GET")
operation GetPokemonSpecies {
    input: GetPokemonSpeciesInput,
    output: GetPokemonSpeciesOutput,
    errors: [ResourceNotFoundException],
}

the following implementation is generated

#![allow(unused)]
fn main() {
extern crate pokemon_service_server_sdk;
extern crate aws_smithy_http_server;
use aws_smithy_http_server::{operation::OperationShape, shape_id::ShapeId};
use pokemon_service_server_sdk::{input::*, output::*, error::*};
/// Retrieve information about a Pokémon species.
pub struct GetPokemonSpecies;

impl OperationShape for GetPokemonSpecies {
    const ID: ShapeId = ShapeId::new("com.aws.example#GetPokemonSpecies", "com.aws.example", "GetPokemonSpecies");

    type Input = GetPokemonSpeciesInput;
    type Output = GetPokemonSpeciesOutput;
    type Error = GetPokemonSpeciesError;
}
}

where GetPokemonSpeciesInput, GetPokemonSpeciesOutput are both generated from the Smithy structures and GetPokemonSpeciesError is an enum generated from the errors: [ResourceNotFoundException].

Note that the GetPokemonSpecies marker structure is a zero-sized type (ZST), and therefore does not exist at runtime - it is a way to attach operation-specific data on an entity within the type system.

The following nomenclature will aid us in our survey. We describe a tower::Service as a "model service" if its request and response are Smithy structures, as defined by the OperationShape trait - the GetPokemonSpeciesInput, GetPokemonSpeciesOutput, and GetPokemonSpeciesError described above. Similarly, we describe a tower::Service as a "HTTP service" if its request and response are http structures - http::Request and http::Response.

The constructors exist on the marker ZSTs as an extension trait to OperationShape, namely OperationShapeExt:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
use aws_smithy_http_server::operation::*;
/// An extension trait over [`OperationShape`].
pub trait OperationShapeExt: OperationShape {
    /// Creates a new [`Service`] for well-formed [`Handler`]s.
    fn from_handler<H, Exts>(handler: H) -> IntoService<Self, H>
    where
        H: Handler<Self, Exts>,
        Self: Sized;

    /// Creates a new [`Service`] for well-formed [`Service`](tower::Service)s.
    fn from_service<S, Exts>(svc: S) -> Normalize<Self, S>
    where
        S: OperationService<Self, Exts>,
        Self: Sized;
}
use aws_smithy_http_server::operation::OperationShapeExt as OpS;
impl<T: OpS> OperationShapeExt for T {
  fn from_handler<H, Exts>(handler: H) -> IntoService<Self, H> where H: Handler<Self, Exts>, Self: Sized { <T as OpS>::from_handler(handler) }
  fn from_service<S, Exts>(svc: S) -> Normalize<Self, S> where S: OperationService<Self, Exts>, Self: Sized { <T as OpS>::from_service(svc) }
}
}

Observe that there are two constructors provided: from_handler which takes a H: Handler and from_service which takes a S: OperationService. In both cases Self is passed as a parameter to the traits - this constrains handler: H and svc: S to the signature given by the implementation of OperationShape on Self.

The Handler and OperationService both serve a similar purpose - they provide a common interface for converting to a model service S.

  • The Handler<GetPokemonSpecies> trait covers all async functions taking GetPokemonSpeciesInput and asynchronously returning a Result<GetPokemonSpeciesOutput, GetPokemonSpeciesError>.
  • The OperationService<GetPokemonSpecies> trait covers all tower::Services with request GetPokemonSpeciesInput, response GetPokemonSpeciesOutput and error GetPokemonSpeciesOutput.

The from_handler constructor is used in the following way:

#![allow(unused)]
fn main() {
extern crate pokemon_service_server_sdk;
extern crate aws_smithy_http_server;
use pokemon_service_server_sdk::{
    input::GetPokemonSpeciesInput,
    output::GetPokemonSpeciesOutput,
    error::GetPokemonSpeciesError,
    operation_shape::GetPokemonSpecies
};
use aws_smithy_http_server::operation::OperationShapeExt;

async fn get_pokemon_service(input: GetPokemonSpeciesInput) -> Result<GetPokemonSpeciesOutput, GetPokemonSpeciesError> {
    todo!()
}

let operation = GetPokemonSpecies::from_handler(get_pokemon_service);
}

Alternatively, from_service constructor:

#![allow(unused)]
fn main() {
extern crate pokemon_service_server_sdk;
extern crate aws_smithy_http_server;
extern crate tower;
use pokemon_service_server_sdk::{
    input::GetPokemonSpeciesInput,
    output::GetPokemonSpeciesOutput,
    error::GetPokemonSpeciesError,
    operation_shape::GetPokemonSpecies
};
use aws_smithy_http_server::operation::OperationShapeExt;
use std::task::{Context, Poll};
use tower::Service;

struct Svc {
    /* ... */
}

impl Service<GetPokemonSpeciesInput> for Svc {
    type Response = GetPokemonSpeciesOutput;
    type Error = GetPokemonSpeciesError;
    type Future = /* Future<Output = Result<Self::Response, Self::Error>> */
    std::future::Ready<Result<Self::Response, Self::Error>>;

    fn poll_ready(&mut self, ctx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        todo!()
    }

    fn call(&mut self, input: GetPokemonSpeciesInput) -> Self::Future {
        todo!()
    }
}

let svc: Svc = Svc { /* ... */ };
let operation = GetPokemonSpecies::from_service(svc);
}

To summarize a model service constructed can be constructed from a Handler or a OperationService subject to the constraints of an OperationShape. More detailed information on these conversions is provided in the Handler and OperationService section Rust docs.

Serialization and Deserialization

A Smithy protocol specifies the serialization/deserialization scheme - how a HTTP request is transformed into a modelled input and a modelled output to a HTTP response. The is formalized using the FromRequest and IntoResponse traits:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
extern crate http;
use aws_smithy_http_server::body::BoxBody;
use std::future::Future;
/// Provides a protocol aware extraction from a [`Request`]. This consumes the
/// [`Request`], in contrast to [`FromParts`].
pub trait FromRequest<Protocol, B>: Sized {
    type Rejection: IntoResponse<Protocol>;
    type Future: Future<Output = Result<Self, Self::Rejection>>;

    /// Extracts `self` from a [`Request`] asynchronously.
    fn from_request(request: http::Request<B>) -> Self::Future;
}

/// A protocol aware function taking `self` to [`http::Response`].
pub trait IntoResponse<Protocol> {
    /// Performs a conversion into a [`http::Response`].
    fn into_response(self) -> http::Response<BoxBody>;
}
use aws_smithy_http_server::request::FromRequest as FR;
impl<P, B, T: FR<P, B>> FromRequest<P, B> for T {
  type Rejection = <T as FR<P, B>>::Rejection;
  type Future = <T as FR<P, B>>::Future;
  fn from_request(request: http::Request<B>) -> Self::Future {
      <T as FR<P, B>>::from_request(request)
  }
}
use aws_smithy_http_server::response::IntoResponse as IR;
impl<P, T: IR<P>> IntoResponse<P> for T {
  fn into_response(self) -> http::Response<BoxBody> { <T as IR<P>>::into_response(self) }
}
}

Note that both traits are parameterized by Protocol. These protocols exist as ZST marker structs:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
use aws_smithy_http_server::protocol::{
  aws_json_10::AwsJson1_0 as _,
  aws_json_11::AwsJson1_1 as _,
  rest_json_1::RestJson1 as _,
  rest_xml::RestXml as _,
};
/// [AWS REST JSON 1.0 Protocol](https://awslabs.github.io/smithy/2.0/aws/protocols/aws-restjson1-protocol.html).
pub struct RestJson1;

/// [AWS REST XML Protocol](https://awslabs.github.io/smithy/2.0/aws/protocols/aws-restxml-protocol.html).
pub struct RestXml;

/// [AWS JSON 1.0 Protocol](https://awslabs.github.io/smithy/2.0/aws/protocols/aws-json-1_0-protocol.html).
pub struct AwsJson1_0;

/// [AWS JSON 1.1 Protocol](https://awslabs.github.io/smithy/2.0/aws/protocols/aws-json-1_1-protocol.html).
pub struct AwsJson1_1;
}

Upgrading a Model Service

We can "upgrade" a model service to a HTTP service using FromRequest and IntoResponse described in the prior section:

stateDiagram-v2
    direction LR
    HttpService: HTTP Service
    [*] --> from_request: HTTP Request
    state HttpService {
        direction LR
        ModelService: Model Service
        from_request --> ModelService: Model Input
        ModelService --> into_response: Model Output
    }
    into_response --> [*]: HTTP Response

This is formalized by the Upgrade<Protocol, Op, S> HTTP service. The tower::Service implementation is approximately:

impl<P, Op, S> Service<http::Request> for Upgrade<P, Op, S>
where
    Input: FromRequest<P, B>,
    S: Service<Input>,
    S::Response: IntoResponse<P>,
    S::Error: IntoResponse<P>,
{
    async fn call(&mut self, request: http::Request) -> http::Response {
        let model_request = match <Op::Input as OperationShape>::from_request(request).await {
            Ok(ok) => ok,
            Err(err) => return err.into_response()
        };
        let model_response = self.model_service.call(model_request).await;
        model_response.into_response()
    }
}

When we GetPokemonSpecies::from_handler or GetPokemonSpecies::from_service, the model service produced, S, will meet the constraints above.

There is an associated Plugin, UpgradePlugin which constructs Upgrade from a service.

The upgrade procedure is finalized by the application of the Layer L, referenced in Operation<S, L>. In this way the entire upgrade procedure takes an Operation<S, L> and returns a HTTP service.

stateDiagram-v2
    direction LR
    [*] --> UpgradePlugin: HTTP Request
    state HttpPlugin {
        state UpgradePlugin {
            direction LR
            [*] --> S: Model Input
            S --> [*] : Model Output
            state ModelPlugin {
                S
            }
        }
    }
    UpgradePlugin --> [*]: HTTP Response

Note that the S is specified by logic written, in Rust, by the customer, whereas UpgradePlugin is specified entirely by Smithy model via the protocol, HTTP bindings, etc.

Routers

Different protocols supported by Smithy enjoy different routing mechanisms, for example, AWS JSON 1.0 uses the X-Amz-Target header to select an operation, whereas AWS REST XML uses the HTTP label trait.

Despite their differences, all routing mechanisms satisfy a common interface. This is formalized using the Router trait:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
extern crate http;
/// An interface for retrieving an inner [`Service`] given a [`http::Request`].
pub trait Router<B> {
    type Service;
    type Error;

    /// Matches a [`http::Request`] to a target [`Service`].
    fn match_route(&self, request: &http::Request<B>) -> Result<Self::Service, Self::Error>;
}
}

which provides the ability to determine an inner HTTP service from a collection using a &http::Request.

Types which implement the Router trait are converted to a HTTP service via the RoutingService struct:

/// A [`Service`] using a [`Router`] `R` to redirect messages to specific routes.
///
/// The `Protocol` parameter is used to determine the serialization of errors.
pub struct RoutingService<R, Protocol> {
    router: R,
    _protocol: PhantomData<Protocol>,
}

impl<R, P> Service<http::Request> for RoutingService<R, P>
where
    R: Router<B>,
    R::Service: Service<http::Request, Response = http::Response>,
    R::Error: IntoResponse<P> + Error,
{
    type Response = http::Response;
    type Error = /* implementation detail */;

    async fn call(&mut self, req: http::Request<B>) -> Result<Self::Response, Self::Error> {
        match self.router.match_route(&req) {
            // Successfully routed, use the routes `Service::call`.
            Ok(ok) => ok.oneshot(req).await,
            // Failed to route, use the `R::Error`s `IntoResponse<P>`.
            Err(error) => {
                debug!(%error, "failed to route");
                Err(Box::new(error.into_response()))
            }
        }
    }
}

The RouterService is the final piece necessary to form a functioning composition - it is used to aggregate together the HTTP services, created via the upgrade procedure, into a single HTTP service which can be presented to the customer.

stateDiagram
state in <<fork>>
    direction LR
    [*] --> in
    state RouterService {
        direction LR
        in -->  ServiceA
        in --> ServiceB
        in --> ServiceC
    }
    ServiceA --> [*]
    ServiceB --> [*]
    ServiceC --> [*]

Plugins

A Plugin is a [tower::Layer] with two extra type parameters, Service and Operation, corresponding to Smithy Service and Smithy Operation. This allows the middleware to be parameterized them and change behavior depending on the context in which it's applied.

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
pub trait Plugin<Service, Operation, T> {
    type Output;

    fn apply(&self, input: T) -> Self::Output;
}
use aws_smithy_http_server::plugin::Plugin as Pl;
impl<Ser, Op, T, U: Pl<Ser, Op, T>> Plugin<Ser, Op, T> for U {
  type Output = <U as Pl<Ser, Op, T>>::Output;
  fn apply(&self, input: T) -> Self::Output { <U as Pl<Ser, Op, T>>::apply(self, input) }
}
}

An example Plugin implementation can be found in /examples/pokemon-service/src/plugin.rs.

Plugins can be applied in two places:

  • HTTP plugins, which are applied pre-deserialization/post-serialization, acting on HTTP requests/responses.
  • Model plugins, which are applied post-deserialization/pre-serialization, acting on model inputs/outputs/errors.
stateDiagram-v2
    direction LR
    [*] --> S: HTTP Request
    state HttpPlugin {
        state UpgradePlugin {
            state ModelPlugin {
                S
            }
        }
    }
    S --> [*]: HTTP Response

The service builder API requires plugins to be specified upfront - they must be registered in the config object, which is passed as an argument to builder. Plugins cannot be modified afterwards.

You might find yourself wanting to apply multiple plugins to your service. This can be accommodated via [HttpPlugins] and [ModelPlugins].

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
use aws_smithy_http_server::plugin::HttpPlugins;
use aws_smithy_http_server::plugin::IdentityPlugin as LoggingPlugin;
use aws_smithy_http_server::plugin::IdentityPlugin as MetricsPlugin;

let http_plugins = HttpPlugins::new().push(LoggingPlugin).push(MetricsPlugin);
}

The plugins' runtime logic is executed in registration order. In the example above, LoggingPlugin would run first, while MetricsPlugin is executed last.

If you are vending a plugin, you can leverage HttpPlugins or ModelPlugins as an extension point: you can add custom methods to it using an extension trait. For example:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
use aws_smithy_http_server::plugin::{HttpPlugins, PluginStack};
use aws_smithy_http_server::plugin::IdentityPlugin as LoggingPlugin;
use aws_smithy_http_server::plugin::IdentityPlugin as AuthPlugin;

pub trait AuthPluginExt<CurrentPlugins> {
    fn with_auth(self) -> HttpPlugins<PluginStack<AuthPlugin, CurrentPlugins>>;
}

impl<CurrentPlugins> AuthPluginExt<CurrentPlugins> for HttpPlugins<CurrentPlugins> {
    fn with_auth(self) -> HttpPlugins<PluginStack<AuthPlugin, CurrentPlugins>> {
        self.push(AuthPlugin)
    }
}

let http_plugins = HttpPlugins::new()
    .push(LoggingPlugin)
    // Our custom method!
    .with_auth();
}

Builders

The service builder is the primary public API, generated for every Smithy Service. At a high-level, the service builder takes as input a function for each Smithy Operation and returns a single HTTP service. The signature of each function, also known as handlers, must match the constraints of the corresponding Smithy model.

You can create an instance of a service builder by calling builder on the corresponding service struct.

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
use aws_smithy_http_server::routing::Route;
/// The service builder for [`PokemonService`].
///
/// Constructed via [`PokemonService::builder`].
pub struct PokemonServiceBuilder<Body, HttpPl, ModelPl> {
    capture_pokemon_operation: Option<Route<Body>>,
    empty_operation: Option<Route<Body>>,
    get_pokemon_species: Option<Route<Body>>,
    get_server_statistics: Option<Route<Body>>,
    get_storage: Option<Route<Body>>,
    health_check_operation: Option<Route<Body>>,
    http_plugin: HttpPl,
    model_plugin: ModelPl,
}
}

The builder has two setter methods for each Smithy Operation in the Smithy Service:

    pub fn get_pokemon_species<HandlerType, HandlerExtractors, UpgradeExtractors>(self, handler: HandlerType) -> Self
    where
        HandlerType:Handler<GetPokemonSpecies, HandlerExtractors>,

        ModelPl: Plugin<
            PokemonService,
            GetPokemonSpecies,
            IntoService<GetPokemonSpecies, HandlerType>
        >,
        UpgradePlugin::<UpgradeExtractors>: Plugin<
            PokemonService,
            GetPokemonSpecies,
            ModelPlugin::Output
        >,
        HttpPl: Plugin<
            PokemonService,
            GetPokemonSpecies,
            UpgradePlugin::<UpgradeExtractors>::Output
        >,
    {
        let svc = GetPokemonSpecies::from_handler(handler);
        let svc = self.model_plugin.apply(svc);
        let svc = UpgradePlugin::<UpgradeExtractors>::new()
            .apply(svc);
        let svc = self.http_plugin.apply(svc);
        self.get_pokemon_species_custom(svc)
    }

    pub fn get_pokemon_species_service<S, ServiceExtractors, UpgradeExtractors>(self, service: S) -> Self
    where
        S: OperationService<GetPokemonSpecies, ServiceExtractors>,

        ModelPl: Plugin<
            PokemonService,
            GetPokemonSpecies,
            Normalize<GetPokemonSpecies, S>
        >,
        UpgradePlugin::<UpgradeExtractors>: Plugin<
            PokemonService,
            GetPokemonSpecies,
            ModelPlugin::Output
        >,
        HttpPl: Plugin<
            PokemonService,
            GetPokemonSpecies,
            UpgradePlugin::<UpgradeExtractors>::Output
        >,
    {
        let svc = GetPokemonSpecies::from_service(service);
        let svc = self.model_plugin.apply(svc);
        let svc = UpgradePlugin::<UpgradeExtractors>::new().apply(svc);
        let svc = self.http_plugin.apply(svc);
        self.get_pokemon_species_custom(svc)
    }

    pub fn get_pokemon_species_custom<S>(mut self, svc: S) -> Self
    where
        S: Service<Request<Body>, Response = Response<BoxBody>, Error = Infallible>,
    {
        self.get_pokemon_species = Some(Route::new(svc));
        self
    }

Handlers and operations are upgraded to a Route as soon as they are registered against the service builder. You can think of Route as a boxing layer in disguise.

You can transform a builder instance into a complete service (PokemonService) using one of the following methods:

  • build. The transformation fails if one or more operations do not have a registered handler;
  • build_unchecked. The transformation never fails, but we return 500s for all operations that do not have a registered handler.

Both builder methods take care of:

  1. Pair each handler with the routing information for the corresponding operation;
  2. Collect all (routing_info, handler) pairs into a Router;
  3. Transform the Router implementation into a HTTP service via RouterService;
  4. Wrap the RouterService in a newtype given by the service name, PokemonService.

The final outcome, an instance of PokemonService, looks roughly like this:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
use aws_smithy_http_server::{routing::RoutingService, protocol::rest_json_1::{router::RestRouter, RestJson1}};
/// The Pokémon Service allows you to retrieve information about Pokémon species.
#[derive(Clone)]
pub struct PokemonService<S> {
    router: RoutingService<RestRouter<S>, RestJson1>,
}
}

The following schematic summarizes the composition:

stateDiagram-v2
    state in <<fork>>
    state "GetPokemonSpecies" as C1
    state "GetStorage" as C2
    state "DoNothing" as C3
    state "..." as C4
    direction LR
    [*] --> in : HTTP Request
    UpgradePlugin --> [*]: HTTP Response
    state PokemonService {
        state RoutingService {
            in --> UpgradePlugin: HTTP Request
            in --> C2: HTTP Request
            in --> C3: HTTP Request
            in --> C4: HTTP Request
            state C1 {
                state HttpPlugin {
                    state UpgradePlugin {
                        direction LR
                        [*] --> S: Model Input
                        S --> [*] : Model Output
                        state ModelPlugin {
                            S
                        }
                    }
                }
            }
            C2
            C3
            C4
        }

    }
    C2 --> [*]: HTTP Response
    C3 --> [*]: HTTP Response
    C4 --> [*]: HTTP Response

Accessing Unmodelled Data

An additional omitted detail is that we provide an "escape hatch" allowing Handlers and OperationServices to accept data that isn't modelled. In addition to accepting Op::Input they can accept additional arguments which implement the FromParts trait:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
extern crate http;
use http::request::Parts;
use aws_smithy_http_server::response::IntoResponse;
/// Provides a protocol aware extraction from a [`Request`]. This borrows the
/// [`Parts`], in contrast to [`FromRequest`].
pub trait FromParts<Protocol>: Sized {
    /// The type of the failures yielded extraction attempts.
    type Rejection: IntoResponse<Protocol>;

    /// Extracts `self` from a [`Parts`] synchronously.
    fn from_parts(parts: &mut Parts) -> Result<Self, Self::Rejection>;
}
use aws_smithy_http_server::request::FromParts as FP;
impl<P, T: FP<P>> FromParts<P> for T {
  type Rejection = <T as FP<P>>::Rejection;
  fn from_parts(parts: &mut Parts) -> Result<Self, Self::Rejection> { <T as FP<P>>::from_parts(parts) }
}
}

This differs from FromRequest trait, introduced in Serialization and Deserialization, as it's synchronous and has non-consuming access to Parts, rather than the entire Request.

pub struct Parts {
    pub method: Method,
    pub uri: Uri,
    pub version: Version,
    pub headers: HeaderMap<HeaderValue>,
    pub extensions: Extensions,
    /* private fields */
}

This is commonly used to access types stored within Extensions which have been inserted by a middleware. An Extension struct implements FromParts to support this use case:

#![allow(unused)]
fn main() {
extern crate aws_smithy_http_server;
extern crate http;
extern crate thiserror;
use aws_smithy_http_server::{body::BoxBody, request::FromParts, response::IntoResponse};
use http::status::StatusCode;
use thiserror::Error;
fn empty() -> BoxBody { todo!() }
/// Generic extension type stored in and extracted from [request extensions].
///
/// This is commonly used to share state across handlers.
///
/// If the extension is missing it will reject the request with a `500 Internal
/// Server Error` response.
///
/// [request extensions]: https://docs.rs/http/latest/http/struct.Extensions.html
#[derive(Debug, Clone)]
pub struct Extension<T>(pub T);

impl<Protocol, T> FromParts<Protocol> for Extension<T>
where
    T: Clone + Send + Sync + 'static,
{
    type Rejection = MissingExtension;

    fn from_parts(parts: &mut http::request::Parts) -> Result<Self, Self::Rejection> {
        parts.extensions.remove::<T>().map(Extension).ok_or(MissingExtension)
    }
}

/// The extension has not been added to the [`Request`](http::Request) or has been previously removed.
#[derive(Debug, Error)]
#[error("the `Extension` is not present in the `http::Request`")]
pub struct MissingExtension;

impl<Protocol> IntoResponse<Protocol> for MissingExtension {
    fn into_response(self) -> http::Response<BoxBody> {
        let mut response = http::Response::new(empty());
        *response.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
        response
    }
}
}

Generating Common Service Code

This document introduces the project and how code is being generated. It is written for developers who want to start contributing to smithy-rs.

Folder structure

The project is divided in:

  • /codegen-core: contains common code to be used for both client and server code generation
  • /codegen-client: client code generation. Depends on codegen-core
  • /codegen-server: server code generation. Depends on codegen-core
  • /aws: the AWS Rust SDK, it deals with AWS services specifically. The folder structure reflects the project's, with the rust-runtime and the codegen
  • /rust-runtime: the generated client and server crates may depend on crates in this folder. Crates here are not code generated. The only crate that is not published is inlineable, which contains common functions used by other crates, copied into the source crate

Crates in /rust-runtime (informally referred to as "runtime crates") are added to a crate's dependency only when used. For example, if a model uses event streams, the generated crates will depend on aws-smithy-eventstream.

Generating code

smithy-rs's entry points are Smithy code-generation plugins, and is not a command. One entry point is in RustCodegenPlugin::execute and inherits from SmithyBuildPlugin in smithy-build. Code generation is in Kotlin and shared common, non-Rust specific code with the smithy Java repository. They plug into the Smithy gradle plugin, which is a gradle plugin.

The comment at the beginning of execute describes what a Decorator is and uses the following terms:

  • Context: contains the model being generated, projection and settings for the build
  • Decorator: (also referred to as customizations) customizes how code is being generated. AWS services are required to sign with the SigV4 protocol, and a decorator adds Rust code to sign requests and responses. Decorators are applied in reverse order of being added and have a priority order.
  • Writer: creates files and adds content; it supports templating, using # for substitutions
  • Location: the file where a symbol will be written to

The only task of a RustCodegenPlugin is to construct a CodegenVisitor and call its execute() method.

CodegenVisitor::execute() is given a Context and decorators, and calls a CodegenVisitor.

CodegenVisitor, RustCodegenPlugin, and wherever there are different implementations between client and server, such as in generating error types, have corresponding server versions.

Objects used throughout code generation are:

  • Symbol: a node in a graph, an abstraction that represents the qualified name of a type; symbols reference and depend on other symbols, and have some common properties among languages (such as a namespace or a definition file). For Rust, we add properties to include more metadata about a symbol, such as its type
  • RustType: Option<T>, HashMap, ... along with their namespaces of origin such as std::collections
  • RuntimeType: the information to locate a type, plus the crates it depends on
  • ShapeId: an immutable object that identifies a Shape

Useful conversions are:

SymbolProvider.toSymbol(shape)

where SymbolProvider constructs symbols for shapes. Some symbols require to create other symbols and types; event streams and other streaming shapes are an example. Symbol providers are all applied in order; if a shape uses a reserved keyword in Rust, its name is converted to a new name by a symbol provider, and all other providers will work with this new symbol.

Model.expectShape(shapeId)

Each model has a shapeId to shape map; this method returns the shape associated with this shapeId.

Some objects implement a transform method that only change the input model, so that code generation will work on that new model. This is used to, for example, add a trait to a shape.

CodegenVisitor is a ShapeVisitor. For all services in the input model, shapes are converted into Rust; here is how a service is constructed, here a structure and so on.

Code generation flows from writer to files and entities are (mostly) generated only on a need-by-need basis. The complete result is a Rust crate, in which all dependencies are written into their modules and lib.rs is generated (here). execute() ends by running cargo fmt, to avoid having to format correctly Rust in Writers and to be sure the generated code follows the styling rules.

RFCs

What is an RFC?: An RFC is a document that proposes a change to smithy-rs or the AWS Rust SDK. Request for Comments means a request for discussion and oversight about the future of the project from maintainers, contributors and users.

When should I write an RFC?: The AWS Rust SDK team proactively decides to write RFCs for major features or complex changes that we feel require extra scrutiny. However, the process can be used to request feedback on any change. Even changes that seem obvious and simple at first glance can be improved once a group of interested and experienced people have a chance to weigh in.

Who can submit an RFC?: An RFC can be submitted by anyone. In most cases, RFCs are authored by SDK maintainers, but everyone is welcome to submit RFCs.

Where do I start?: If you're ready to write and submit an RFC, please start a GitHub discussion with a summary of what you're trying to accomplish first. That way, the AWS Rust SDK team can ensure they have the bandwidth to review and shepherd the RFC through the whole process before you've expended effort in writing it. Once you've gotten the go-ahead, start with the RFC template.

Previously Submitted RFCs

4a8757c23 (add RFC to fix identity cache partitioning and default cache behaviors)

AWS Configuration RFC

Status: Implemented. For an ordered list of proposed changes see: Proposed changes.

An AWS SDK loads configuration from multiple locations. Some of these locations can be loaded synchronously. Some are async. Others may actually use AWS services such as STS or SSO.

This document proposes an overhaul to the configuration design to facilitate three things:

  1. Future-proof: It should be easy to add additional sources of region and credentials, sync and async, from many sources, including code-generated AWS services.
  2. Ergonomic: There should be one obvious way to create an AWS service client. Customers should be able to easily customize the client to make common changes. It should encourage sharing of things that are expensive to create.
  3. Shareable: A config object should be usable to configure multiple AWS services.

Usage Guide

The following is an imagined usage guide if this RFC where implemented.

Getting Started

Using the SDK requires two crates:

  1. aws-sdk-<someservice>: The service you want to use (e.g. dynamodb, s3, sesv2)
  2. aws-config: AWS metaconfiguration. This crate contains all the of logic to load configuration for the SDK (regions, credentials, retry configuration, etc.)

Add the following to your Cargo.toml:

[dependencies]
aws-sdk-dynamo = "0.1"
aws-config = "0.5"

tokio = { version = "1", features = ["full"] }

Let's write a small example project to list tables:

use aws_sdk_dynamodb as dynamodb;

#[tokio::main]
async fn main() -> Result<(), dynamodb::Error> {
    let config = aws_config::load_from_env().await;
    let dynamodb = dynamodb::Client::new(&config);
    let resp = dynamodb.list_tables().send().await;
    println!("my tables: {}", resp.tables.unwrap_or_default());
    Ok(())
}

Tip: Every AWS service exports a top level Error type (e.g. aws_sdk_dynamodb::Error). Individual operations return specific error types that contain only the error variants returned by the operation. Because all the individual errors implement Into<dynamodb::Error>, you can use dynamodb::Error as the return type along with ?.

Next, we'll explore some other ways to configure the SDK. Perhaps you want to override the region loaded from the environment with your region. In this case, we'll want more control over how we load config, using aws_config::from_env() directly:

use aws_sdk_dynamodb as dynamodb;

#[tokio::main]
async fn main() -> Result<(), dynamodb::Error> {
    let region_provider = RegionProviderChain::default_provider().or_else("us-west-2");
    let config = aws_config::from_env().region(region_provider).load().await;
    let dynamodb = dynamodb::Client::new(&config);
    let resp = dynamodb.list_tables().send().await;
    println!("my tables: {}", resp.tables.unwrap_or_default());
    Ok(())
}

Sharing configuration between multiple services

The Config produced by aws-config can be used with any AWS service. If we wanted to read our Dynamodb DB tables aloud with Polly, we could create a Polly client as well. First, we'll need to add Polly to our Cargo.toml:

[dependencies]
aws-sdk-dynamo = "0.1"
aws-sdk-polly = "0.1"
aws-config = "0.5"

tokio = { version = "1", features = ["full"] }

Then, we can use the shared configuration to build both service clients. The region override will apply to both clients:

use aws_sdk_dynamodb as dynamodb;
use aws_sdk_polly as polly;

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> { // error type changed to `Box<dyn Error>` because we now have dynamo and polly errors
    let config = aws_config::env_loader().with_region(Region::new("us-west-2")).load().await;

    let dynamodb = dynamodb::Client::new(&config);
    let polly = polly::Client::new(&config);

    let resp = dynamodb.list_tables().send().await;
    let tables = resp.tables.unwrap_or_default();
    let table_sentence = format!("my dynamo DB tables are: {}", tables.join(", "));
    let audio = polly.synthesize_speech()
        .output_format(OutputFormat::Mp3)
        .text(table_sentence)
        .voice_id(VoiceId::Joanna)
        .send()
        .await?;

    // Get MP3 data from the response and save it
    let mut blob = resp
        .audio_stream
        .collect()
        .await
        .expect("failed to read data");

    let mut file = tokio::fs::File::create("tables.mp3")
        .await
        .expect("failed to create file");

    file.write_all_buf(&mut blob)
        .await
        .expect("failed to write to file");
    Ok(())
}

Specifying a custom credential provider

If you have your own source of credentials, you may opt-out of the standard credential provider chain.

To do this, implement the ProvideCredentials trait.

NOTE: aws_types::Credentials already implements ProvideCredentials. If you want to use the SDK with static credentials, you're already done!

use aws_types::credentials::{ProvideCredentials, provide_credentials::future, Result};

struct MyCustomProvider;

impl MyCustomProvider {
    pub async fn load_credentials(&self) -> Result {
        todo!() // A regular async function
    }
}

impl ProvideCredentials for MyCustomProvider {
    fn provide_credentials<'a>(&'a self) -> future::ProvideCredentials<'a>
        where
            Self: 'a,
    {
        future::ProvideCredentials::new(self.load_credentials())
    }
}

Hint: If your credential provider is not asynchronous, you can use ProvideCredentials::ready instead to save an allocation.

After writing your custom provider, you'll use it in when constructing the configuration:

#[tokio::main]
async fn main() {
    let config = aws_config::from_env().credentials_provider(MyCustomProvider).load().await;
    let dynamodb = dynamodb::new(&config);
}

Proposed Design

Achieving this design consists of three major changes:

  1. Add a Config struct to aws-types. This contains a config, but with no logic to construct it. This represents what configuration SDKS need, but not how to load the information from the environment.
  2. Create the aws-config crate. aws-config contains the logic to load configuration from the environment. No generated service clients will depend on aws-config. This is critical to avoid circular dependencies and to allow aws-config to depend on other AWS services. aws-config contains individual providers as well as a pre-assembled default provider chain for region and credentials. It will also contain crate features to automatically bring in HTTPS and async-sleep implementations.
  3. Remove all "business logic" from aws-types. aws-types should be an interface-only crate that is extremely stable. The ProvideCredentials trait should move into aws-types. The region provider trait which only exists to support region-chaining will move out of aws-types into aws-config.

Services will continue to generate their own Config structs. These will continue to be customizable as they are today, however, they won't have any default resolvers built in. Each AWS config will implement From<&aws_types::SharedConfig> . A convenience method to new() a fluent client directly from a shared config will also be generated.

Shared Config Implementation

This RFC proposes adding region and credentials providers support to the shared config. A future RFC will propose integration with HTTP settings, HTTPs connectors, and async sleep.

struct Config {
    // private fields
    ...
}

impl Config {
    pub fn region(&self) -> Option<&Region> {
        self.region.as_ref()
    }

    pub fn credentials_provider(&self) -> Option<SharedCredentialsProvider> {
        self.credentials_provider.clone()
    }

    pub fn builder() -> Builder {
        Builder::default()
    }
}

The Builder for Config allows customers to provide individual overrides and handles the insertion of the default chain for regions and credentials.

Sleep + Connectors

Sleep and Connector are both runtime dependent features. aws-config will define rt-tokio and rustls and native-tls optional features. This centralizes the Tokio/Hyper dependency eventually removing the need for each service to maintain their own Tokio/Hyper features.

Although not proposed in this RFC, shared config will eventually gain support for creating an HTTPs client from HTTP settings.

The .build() method on ::Config

Currently, the .build() method on service config will fill in defaults. As part of this change, .build() called on the service config with missing properties will fill in "empty" defaults. If no credentials provider is given, a NoCredentials provider will be set, and Region will remain as None.

Stability and Versioning

The introduction of Config to aws-types is not without risks. If a customer depends on a version aws-config that uses Config that is incompatible, they will get confusing compiler errors.

An example of a problematic set of dependent versions:

┌─────────────────┐                 ┌───────────────┐
│ aws-types = 0.1 │                 │aws-types= 0.2 │
└─────────────────┘                 └───────────────┘
           ▲                                 ▲
           │                                 │
           │                                 │
           │                                 │
 ┌─────────┴─────────────┐          ┌────────┴───────┐
 │aws-sdk-dynamodb = 0.5 │          │aws-config = 0.6│
 └───────────┬───────────┘          └───────┬────────┘
             │                              │
             │                              │
             │                              │
             │                              │
             │                              │
             ├─────────────────────┬────────┘
             │ my-lambda-function  │
             └─────────────────────┘

To mitigate this risk, we will need to make aws-types essentially permanently stable. Changes to aws-types need to be made with extreme care. This will ensure that two versions of aws-types never end up in a customer's dependency tree.

We will dramatically reduce the surface area of aws-types to contain only interfaces.

Several breaking changes will be made as part of this, notably, the profile file parsing will be moved out of aws-types.

Finally, to mitigate this risk even further, services will pub use items from aws-types directly which means that even if a dependency mismatch exists, it is still possible for customers to work around it.

Changes Checklist

  • ProvideRegion becomes async using a newtype'd future.
  • AsyncProvideCredentials is removed. ProvideCredentials becomes async using a newtype'd future.
  • ProvideCredentials moved into aws-types. Credentials moved into aws-types
  • Create aws-config.
  • Profile-file parsing moved into aws-config, region chain & region environment loaders moved to aws-config.
  • os_shim_internal moved to ??? aws-smithy-types?
  • Add Config to aws-types. Ensure that it's set up to add new members while remaining backwards compatible.
  • Code generate From<&SharedConfig> for <everyservice>::Config
  • Code generate <everservice>::Client::new(&shared_config)
  • Remove <everyservice>::from_env

Open Issues

  • Connector construction needs to be a function of HTTP settings
  • An AsyncSleep should be added to aws-types::Config

RFC: Supporting multiple HTTP versions for SDKs that use Event Stream

Status: Accepted

For a summarized list of proposed changes, see the Changes Checklist section.

Most AWS SDK operations use HTTP/1.1, but bi-directional streaming operations that use the Event Stream message framing format need to use HTTP/2 (h2).

Smithy models can also customize which HTTP versions are used in each individual protocol trait. For example, @restJson1 has attributes http and eventStreamHttp to list out the versions that should be used in a priority order.

There are two problems in play that this doc attempts to solve:

  1. Connector Creation: Customers need to be able to create connectors with the HTTP settings they desire, and these custom connectors must align with what the Smithy model requires.
  2. Connector Selection: The generated code must be able to select the connector that best matches the requirements from the Smithy model.

Terminology

Today, there are three layers of Client that are easy to confuse, so to make the following easier to follow, the following terms will be used:

  • Connector: An implementor of Tower's Service trait that converts a request into a response. This is typically a thin wrapper around a Hyper client.
  • Smithy Client: A aws_smithy_client::Client<C, M, R> struct that is responsible for gluing together the connector, middleware, and retry policy. This isn't intended to be used directly.
  • Fluent Client: A code generated Client<C, M, R> that has methods for each service operation on it. A fluent builder is generated alongside it to make construction easier.
  • AWS Client: A specialized Fluent Client that uses a DynConnector, DefaultMiddleware, and Standard retry policy.

All of these are just called Client in code today. This is something that could be clarified in a separate refactor.

How Clients Work Today

Fluent clients currently keep a handle to a single Smithy client, which is a wrapper around the underlying connector. When constructing operation builders, this handle is Arc cloned and given to the new builder instances so that their send() calls can initiate a request.

The generated fluent client code ends up looking like this:

struct Handle<C, M, R> {
    client: aws_smithy_client::Client<C, M, R>,
    conf: crate::Config,
}

pub struct Client<C, M, R = Standard> {
    handle: Arc<Handle<C, M, R>>,
}

Functions are generated per operation on the fluent client to gain access to the individual operation builders. For example:

pub fn assume_role(&self) -> fluent_builders::AssumeRole<C, M, R> {
    fluent_builders::AssumeRole::new(self.handle.clone())
}

The fluent operation builders ultimately implement send(), which chooses the one and only Smithy client out of the handle to make the request with:

pub struct AssumeRole<C, M, R> {
    handle: std::sync::Arc<super::Handle<C, M, R>>,
    inner: crate::input::assume_role_input::Builder,
}

impl<C, M, R> AssumeRole<C, M, R> where ...{
    pub async fn send(self) -> Result<AssumeRoleOutput, SdkError<AssumeRoleError>> where ... {
        // Setup code omitted ...

        // Make the actual request
        self.handle.client.call(op).await
    }
}

Smithy clients are constructed from a connector, as shown:

let connector = Builder::new()
    .https()
    .middleware(...)
    .build();
let client = Client::with_config(connector, Config::builder().build());

The https() method on the Builder constructs the actual Hyper client, and is driven off Cargo features to select the correct TLS implementation. For example:

#![allow(unused)]
fn main() {
#[cfg(feature = "rustls")]
pub fn https() -> Https {
    let https = hyper_rustls::HttpsConnector::with_native_roots();
    let client = hyper::Client::builder().build::<_, SdkBody>(https);
    // HyperAdapter is a Tower `Service` request -> response connector that just calls the Hyper client
    crate::hyper_impls::HyperAdapter::from(client)
}
}

Solving the Connector Creation Problem

Customers need to be able to provide HTTP settings, such as timeouts, for all connectors that the clients use. These should come out of the SharedConfig when it is used. Connector creation also needs to be customizable so that alternate HTTP implementations can be used, or so that a fake implementation can be used for tests.

To accomplish this, SharedConfig will have a make_connector member. A customer would configure it as such:

let config = some_shared_config_loader()
    .with_http_settings(my_http_settings)
    .with_make_connector(|reqs: &MakeConnectorRequirements| {
        Some(MyCustomConnector::new(reqs))
    })
    .await;

The passed in MakeConnectorRequirements will hold the customer-provided HttpSettings as well as any Smithy-modeled requirements, which will just be HttpVersion for now. The MakeConnectorRequirements struct will be marked non_exhaustive so that new requirements can be added to it as the SDK evolves.

A default make_connector implementation would be provided that creates a Hyper connector based on the Cargo feature flags. This might look something like this:

#![allow(unused)]
fn main() {
#[cfg(feature = "rustls")]
pub fn default_connector(reqs: &HttpRequirements) -> HyperAdapter {
    let https = hyper_rustls::HttpsConnector::with_native_roots();
    let mut builder = hyper::Client::builder();
    builder = configure_settings(builder, &reqs.http_settings);
    if let Http2 = &reqs.http_version {
        builder = builder.http2_only(true);
    }
    HyperAdapter::from(builder.build::<_, SdkBody>(https))
}
}

For any given service, make_connector could be called multiple times to create connectors for all required HTTP versions and settings.

Note: the make_connector returns an Option since an HTTP version may not be required, but rather, preferred according to a Smithy model. For operations that list out ["h2", "HTTP/1.1"] as the desired versions, a customer could choose to provide only an HTTP 1 connector, and the operation should still succeed.

Solving the Connector Selection Problem

Each service operation needs to be able to select a connector that meets its requirements best from the customer provided connectors. Initially, the only selection criteria will be the HTTP version, but later when per-operation HTTP settings are implemented, the connector will also need to be keyed off of those settings. Since connector creation is not a cheap process, connectors will need to be cached after they are created.

This caching is currently handled by the Handle in the fluent client, which holds on to the Smithy client. This cache needs to be adjusted to:

  • Support multiple connectors, keyed off of the customer provided HttpSettings, and also off of the Smithy modeled requirements.
  • Be lazy initialized. Services that have a mix of Event Stream and non-streaming operations shouldn't create an HTTP/2 client if the customer doesn't intend to use the Event Stream operations that require it.

To accomplish this, the Handle will hold a cache that is optimized for many reads and few writes:

#[derive(Debug, Hash, Eq, PartialEq)]
struct ConnectorKey {
    http_settings: HttpSettings,
    http_version: HttpVersion,
}

struct Handle<C, M, R> {
    clients: RwLock<HashMap<HttpRequirements<'static>, aws_smithy_client::Client<C, M, R>>>,
    conf: crate::Config,
}

pub struct Client<C, M, R = Standard> {
    handle: Arc<Handle<C, M, R>>,
}

With how the generics are organized, the connector type will have to be the same between HTTP implementations, but this should be fine since it is generally a thin wrapper around a separate HTTP implementor. For cases where it is not, the custom connector type can host its own dyn Trait solution.

The HttpRequirements struct will hold HttpSettings as copy-on-write so that it can be used for cache lookup without having to clone HttpSettings:

struct HttpRequirements<'a> {
    http_settings: Cow<'a, HttpSettings>,
    http_version: HttpVersion,
}

impl<'a> HttpRequirements<'a> {
    // Needed for converting a borrowed HttpRequirements into an owned cache key for cache population
    pub fn into_owned(self) -> HttpRequirements<'static> {
        Self {
            http_settings: Cow::Owned(self.http_settings.into_owned()),
            http_version: self.http_version,
        }
    }
}

With the cache established, each operation needs to be aware of its requirements. The code generator will be updated to store a prioritized list of HttpVersion in the property bag in an input's make_operation() method. This prioritized list will come from the Smithy protocol trait's http or eventStreamHttp attribute, depending on the operation. The fluent client will then pull this list out of the property bag so that it can determine which connector to use. This indirection is necessary so that an operation still holds all information needed to make a service call from the Smithy client directly.

Note: This may be extended in the future to be more than just HttpVersion, for example, when per-operation HTTP setting overrides are implemented. This doc is not attempting to solve that problem.

In the fluent client, this will look as follows:

impl<C, M, R> AssumeRole<C, M, R> where ... {
    pub async fn send(self) -> Result<AssumeRoleOutput, SdkError<AssumeRoleError>> where ... {
        let input = self.create_input()?;
        let op = input.make_operation(&self.handle.conf)?;

        // Grab the `make_connector` implementation
        let make_connector = self.config.make_connector();

        // Acquire the prioritized HttpVersion list
        let http_versions = op.properties().get::<HttpVersionList>();

        // Make the actual request (using default HttpSettings until modifying those is implemented)
        let client = self.handle
            .get_or_create_client(make_connector, &default_http_settings(), &http_versions)
            .await?;
        client.call(op).await
    }
}

If an operation requires a specific protocol version, and if the make_connection implementation can't provide that it, then the get_or_create_client() function will return SdkError::ConstructionFailure indicating the error.

Changes Checklist

  • Create HttpVersion in aws-smithy-http with Http1_1 and Http2
  • Refactor existing https() connector creation functions to take HttpVersion
  • Add make_connector to SharedConfig, and wire up the https() functions as a default
  • Create HttpRequirements in aws-smithy-http
  • Implement the connector cache on Handle
  • Implement function to calculate a minimum required set of HTTP versions from a Smithy model in the code generator
  • Update the make_operation code gen to put an HttpVersionList into the operation property bag
  • Update the fluent client send() function code gen grab the HTTP version list and acquire the correct connector with it
  • Add required defaulting for models that don't set the optional http and eventStreamHttp protocol trait attributes

RFC: API for Presigned URLs

Status: Implemented

For a summarized list of proposed changes, see the Changes Checklist section.

Several AWS services allow for presigned requests in URL form, which is described well by S3's documentation on authenticating requests using query parameters.

This doc establishes the customer-facing API for creating these presigned URLs and how they will be implemented in a generic fashion in the SDK codegen.

Terminology

To differentiate between the clients that are present in the generated SDK today, the following terms will be used throughout this doc:

  • Smithy Client: A aws_smithy_client::Client<C, M, R> struct that is responsible for gluing together the connector, middleware, and retry policy. This is not generated and lives in the aws-smithy-client crate.
  • Fluent Client: A code-generated Client<C, M, R> that has methods for each service operation on it. A fluent builder is generated alongside it to make construction easier.

Presigned URL config

Today, presigned URLs take an expiration time that's not part of the service API. The SDK will make this configurable as a separate struct so that there's no chance of name collisions, and so that additional fields can be added in the future. Fields added later will require defaulting for backwards compatibility.

Customers should also be able to set a start time on the presigned URL's expiration so that they can generate URLs that become active in the future. An optional start_time option will be available and default to SystemTime::now().

Construction PresigningConfig can be done with a builder, but a PresigningConfig::expires_in convenience function will be provided to bypass the builder for the most frequent use-case.

#[non_exhaustive]
#[derive(Debug, Clone)]
pub struct PresigningConfig {
    start_time: SystemTime,
    expires_in: Duration,
}

#[non_exhaustive]
#[derive(Debug)]
pub struct Builder {
    start_time: Option<SystemTime>,
    expires_in: Option<Duration>,
}

impl Builder {
    pub fn start_time(self, start_time: SystemTime) -> Self { ... }
    pub fn set_start_time(&mut self, start_time: Option<SystemTime>) { ... }

    pub fn expires_in(self, expires_in: Duration) -> Self { ... }
    pub fn set_expires_in(&mut self, expires_in: Option<Duration>) { ... }

    // Validates `expires_in` is no greater than one week
    pub fn build(self) -> Result<PresigningConfig, Error> { ... }
}

impl PresigningConfig {
    pub fn expires_in(expires_in: Duration) -> PresigningConfig {
        Self::builder().expires(expires).build().unwrap()
    }

    pub fn builder() -> Builder { ... }
}

Construction of PresigningConfig will validate that expires_in is no greater than one week, as this is the longest supported expiration time for SigV4. This validation will result in a panic.

It's not inconceivable that PresigningConfig will need additional service-specific parameters as customizations, so it will be code generated with each service rather than living a shared location.

Fluent Presigned URL API

The generated fluent builders for operations that support presigning will have a presigned() method in addition to send() that will return a presigned URL rather than sending the request. For S3's GetObject, the usage of this will look as follows:

let config = aws_config::load_config_from_environment().await;
let client = s3::Client::new(&config);
let presigning_config = PresigningConfig::expires_in(Duration::from_secs(86400));
let presigned: PresignedRequest = client.get_object()
    .bucket("example-bucket")
    .key("example-object")
    .presigned(presigning_config)
    .await?;

This API requires a client, and for use-cases where no actual service calls need to be made, customers should be able to create presigned URLs without the overhead of an HTTP client. Once the HTTP Versions RFC is implemented, the underlying HTTP client won't be created until the first service call, so there will be no HTTP client overhead to this approach.

In a step away from the general pattern of keeping fluent client capabilities in line with Smithy client capabilities, creating presigned URLs directly from the Smithy client will not be supported. This is for two reasons:

  • The Smithy client is not code generated, so adding a method to do presigning would apply to all operations, but not all operations can be presigned.
  • Presigned URLs are not currently a Smithy concept (although this may change soon).

The result of calling presigned() is a PresignedRequest, which is a wrapper with delegating functions around http::Request<()> so that the request method and additional signing headers are also made available. This is necessary since there are some presignable POST operations that require the signature to be in the headers rather than the query.

Note: Presigning needs to be async because the underlying credentials provider used to sign the request may need to make service calls to acquire the credentials.

Input Presigned URL API

Even though generating a presigned URL through the fluent client doesn't necessitate an HTTP client, it will be clearer that this is the case by allowing the creation of presigned URLs directly from an input. This would look as follows:

let config = aws_config::load_config_from_environment().await;
let presigning_config = PresigningConfig::expires_in(Duration::from_secs(86400));
let presigned: PresignedRequest = GetObjectInput::builder()
    .bucket("example-bucket")
    .key("example-bucket")
    .presigned(&config, presigning_config)
    .await?;

Creating the URL through the input will exercise the same code path as creating it through the client, but it will be more apparent that the overhead of a client isn't present.

Behind the scenes

From an SDK's perspective, the following are required to make a presigned URL:

  • Valid request input
  • Endpoint
  • Credentials to sign with
  • Signing implementation

The AWS middleware provides everything except the request, and the request is provided as part of the fluent builder API. The generated code needs to be able to run the middleware to fully populate a request property bag, but not actually dispatch it. The expires_in value from the presigning config needs to be piped all the way through to the signer. Additionally, the SigV4 signing needs to adjusted to do query param signing, which is slightly different than its header signing.

Today, request dispatch looks as follows:

  1. The customer creates a new fluent builder by calling client.operation_name(), fills in inputs, and then calls send().
  2. send():
    1. Builds the final input struct, and then calls its make_operation() method with the stored config to create a Smithy Operation.
    2. Calls the underlying Smithy client with the operation.
  3. The Smithy client constructs a Tower Service with AWS middleware and a dispatcher at the bottom, and then executes it.
  4. The middleware acquire and add required signing parameters (region, credentials, endpoint, etc) to the request property bag.
  5. The SigV4 signing middleware signs the request by adding HTTP headers to it.
  6. The dispatcher makes the actual HTTP request and returns the response all the way back up the Tower.

Presigning will take advantage of a lot of these same steps, but will cut out the Operation and replace the dispatcher with a presigned URL generator:

  1. The customer creates a new fluent builder by calling client.operation_name(), fills in inputs, and then calls presigned().
  2. presigned():
    1. Builds the final input struct, calls the make_operation() method with the stored config, and then extracts the request from the operation (discarding the rest).
    2. Mutates the OperationSigningConfig in the property bag to:
      • Change the signature_type to HttpRequestQueryParams so that the signer runs the correct signing logic.
      • Set expires_in to the value given by the customer in the presigning config.
    3. Constructs a Tower Service with AwsMiddleware layered in, and a PresignedUrlGeneratorLayer at the bottom.
    4. Calls the Tower Service and returns its result
  3. The AwsMiddleware will sign the request.
  4. The PresignedUrlGeneratorLayer directly returns the request since all of the work is done by the middleware.

It should be noted that the presigned() function above is on the generated input struct, so implementing this for the input API is identical to implementing it for the fluent client.

All the code for the new make_request() is already in the existing make_operation() and will just need to be split out.

Modeling Presigning

AWS models don't currently have any information about which operations can be presigned. To work around this, the Rust SDK will create a synthetic trait to model presigning with, and apply this trait to known presigned operations via customization. The code generator will look for this synthetic trait when creating the fluent builders and inputs to know if a presigned() method should be added.

Avoiding name collision

If a presignable operation input has a member named presigned, then there will be a name collision with the function to generate a presigned URL. To mitigate this, RustReservedWords will be updated to rename the presigned member to presigned_value similar to how send is renamed.

Changes Checklist

  • Update aws-sigv4 to support query param signing
  • Create PresignedOperationSyntheticTrait
  • Customize models for known presigned operations
  • Create PresigningConfig and its builder
  • Implement PresignedUrlGeneratorLayer
  • Create new AWS codegen decorator to:
    • Add new presigned() method to input code generator
    • Add new presigned() method to fluent client generator
  • Update RustReservedWords to reserve presigned()
  • Add integration test to S3
  • Add integration test to Polly
  • Add examples for using presigning for:
    • S3 GetObject and PutObject
    • Polly SynthesizeSpeech

RFC: Retry Behavior

Status: Implemented

For a summarized list of proposed changes, see the Changes Checklist section.

It is not currently possible for users of the SDK to configure a client's maximum number of retry attempts. This RFC establishes a method for users to set the number of retries to attempt when calling a service and would allow users to disable retries entirely. This RFC would introduce breaking changes to the retry module of the aws-smithy-client crate.

Terminology

  • Smithy Client: A aws_smithy_client::Client<C, M, R> struct that is responsible for gluing together the connector, middleware, and retry policy. This is not generated and lives in the aws-smithy-client crate.
  • Fluent Client: A code-generated Client<C, M, R> that has methods for each service operation on it. A fluent builder is generated alongside it to make construction easier.
  • AWS Client: A specialized Fluent Client that defaults to using a DynConnector, AwsMiddleware, and Standard retry policy.
  • Shared Config: An aws_types::Config struct that is responsible for storing shared configuration data that is used across all services. This is not generated and lives in the aws-types crate.
  • Service-specific Config: A code-generated Config that has methods for setting service-specific configuration. Each Config is defined in the config module of its parent service. For example, the S3-specific config struct is useable from aws_sdk_s3::config::Config and re-exported as aws_sdk_s3::Config.
  • Standard retry behavior: The standard set of retry rules across AWS SDKs. This mode includes a standard set of errors that are retried, and support for retry quotas. The default maximum number of attempts with this mode is three, unless max_attempts is explicitly configured.
  • Adaptive retry behavior: Adaptive retry mode dynamically limits the rate of AWS requests to maximize success rate. This may be at the expense of request latency. Adaptive retry mode is not recommended when predictable latency is important.
    • Note: supporting the "adaptive" retry behavior is considered outside the scope of this RFC

Configuring the maximum number of retries

This RFC will demonstrate (with examples) the following ways that Users can set the maximum number of retry attempts:

  • By calling the Config::retry_config(..) or Config::disable_retries() methods when building a service-specific config
  • By calling the Config::retry_config(..) or Config::disable_retries() methods when building a shared config
  • By setting the AWS_MAX_ATTEMPTS environment variable

The above list is in order of decreasing precedence e.g. setting maximum retry attempts with the max_attempts builder method will override a value set by AWS_MAX_ATTEMPTS.

The default number of retries is 3 as specified in the AWS SDKs and Tools Reference Guide.

Setting an environment variable

Here's an example app that logs your AWS user's identity

use aws_sdk_sts as sts;

#[tokio::main]
async fn main() -> Result<(), sts::Error> {
    let config = aws_config::load_from_env().await;

    let sts = sts::Client::new(&config);
    let resp = sts.get_caller_identity().send().await?;
    println!("your user id: {}", resp.user_id.unwrap_or_default());
    Ok(())
}

Then, in your terminal:

# Set the env var before running the example program
export AWS_MAX_ATTEMPTS=5
# Run the example program
cargo run

Calling a method on an AWS shared config

Here's an example app that creates a shared config with custom retry behavior and then logs your AWS user's identity

use aws_sdk_sts as sts;
use aws_types::retry_config::StandardRetryConfig;

#[tokio::main]
async fn main() -> Result<(), sts::Error> {
    let retry_config = StandardRetryConfig::builder().max_attempts(5).build();
    let config = aws_config::from_env().retry_config(retry_config).load().await;

    let sts = sts::Client::new(&config);
    let resp = sts.get_caller_identity().send().await?;
    println!("your user id: {}", resp.user_id.unwrap_or_default());
    Ok(())
}

Calling a method on service-specific config

Here's an example app that creates a service-specific config with custom retry behavior and then logs your AWS user's identity

use aws_sdk_sts as sts;
use aws_types::retry_config::StandardRetryConfig;

#[tokio::main]
async fn main() -> Result<(), sts::Error> {
    let config = aws_config::load_from_env().await;
    let retry_config = StandardRetryConfig::builder().max_attempts(5).build();
    let sts_config = sts::config::Config::from(&config).retry_config(retry_config).build();

    let sts = sts::Client::new(&sts_config);
    let resp = sts.get_caller_identity().send().await?;
    println!("your user id: {}", resp.user_id.unwrap_or_default());
    Ok(())
}

Disabling retries

Here's an example app that creates a shared config that disables retries and then logs your AWS user's identity

use aws_sdk_sts as sts;
use aws_types::config::Config;

#[tokio::main]
async fn main() -> Result<(), sts::Error> {
    let config = aws_config::from_env().disable_retries().load().await;
    let sts_config = sts::config::Config::from(&config).build();

    let sts = sts::Client::new(&sts_config);
    let resp = sts.get_caller_identity().send().await?;
    println!("your user id: {}", resp.user_id.unwrap_or_default());
    Ok(())
}

Retries can also be disabled by explicitly passing the RetryConfig::NoRetries enum variant to the retry_config builder method:

use aws_sdk_sts as sts;
use aws_types::retry_config::RetryConfig;

#[tokio::main]
async fn main() -> Result<(), sts::Error> {
    let config = aws_config::load_from_env().await;
    let sts_config = sts::config::Config::from(&config).retry_config(RetryConfig::NoRetries).build();

    let sts = sts::Client::new(&sts_config);
    let resp = sts.get_caller_identity().send().await?;
    println!("your user id: {}", resp.user_id.unwrap_or_default());
    Ok(())
}

Behind the scenes

Currently, when users want to send a request, the following occurs:

  1. The user creates either a shared config or a service-specific config
  2. The user creates a fluent client for the service they want to interact with and passes the config they created. Internally, this creates an AWS client with a default retry policy
  3. The user calls an operation builder method on the client which constructs a request
  4. The user sends the request by awaiting the send() method
  5. The smithy client creates a new Service and attaches a copy of its retry policy
  6. The Service is called, sending out the request and retrying it according to the retry policy

After this change, the process will work like this:

  1. The user creates either a shared config or a service-specific config
    • If AWS_MAX_ATTEMPTS is set to zero, this is invalid and we will log it with tracing::warn. However, this will not error until a request is made
    • If AWS_MAX_ATTEMPTS is 1, retries will be disabled
    • If AWS_MAX_ATTEMPTS is greater than 1, retries will be attempted at most as many times as is specified
    • If the user creates the config with the .disable_retries builder method, retries will be disabled
    • If the user creates the config with the retry_config builder method, retry behavior will be set according to the RetryConfig they passed
  2. The user creates a fluent client for the service they want to interact with and passes the config they created
    • Provider precedence will determine what retry behavior is actually set, working like how Region is set
  3. The user calls an operation builder method on the client which constructs a request
  4. The user sends the request by awaiting the send() method
  5. The smithy client creates a new Service and attaches a copy of its retry policy
  6. The Service is called, sending out the request and retrying it according to the retry policy

These changes will be made in such a way that they enable us to add the "adaptive" retry behavior at a later date without introducing a breaking change.

Changes checklist

  • Create new Kotlin decorator RetryConfigDecorator
    • Based on RegionDecorator.kt
    • This decorator will live in the codegen project because it has relevance outside the SDK
  • Breaking changes:
    • Rename aws_smithy_client::retry::Config to StandardRetryConfig
    • Rename aws_smithy_client::retry::Config::with_max_retries method to with_max_attempts in order to follow AWS convention
    • Passing 0 to with_max_attempts will panic with a helpful, descriptive error message
  • Create non-exhaustive aws_types::retry_config::RetryConfig enum wrapping structs that represent specific retry behaviors
    • A NoRetry variant that disables retries. Doesn't wrap a struct since it doesn't need to contain any data
    • A Standard variant that enables the standard retry behavior. Wraps a StandardRetryConfig struct.
  • Create aws_config::meta::retry_config::RetryConfigProviderChain
  • Create aws_config::meta::retry_config::ProvideRetryConfig
  • Create EnvironmentVariableMaxAttemptsProvider struct
    • Setting AWS_MAX_ATTEMPTS=0 and trying to load from env will panic with a helpful, descriptive error message
  • Add retry_config method to aws_config::ConfigLoader
  • Update AwsFluentClientDecorator to correctly configure the max retry attempts of its inner aws_hyper::Client based on the passed-in Config
  • Add tests
    • Test that setting retry_config to 1 disables retries
    • Test that setting retry_config to n limits retries to n where n is a non-zero integer
    • Test that correct precedence is respected when overriding retry behavior in a service-specific config
    • Test that correct precedence is respected when overriding retry behavior in a shared config
    • Test that creating a config from env if AWS_MAX_ATTEMPTS=0 will panic with a helpful, descriptive error message
    • Test that setting invalid max_attempts=0 with a StandardRetryConfig will panic with a helpful, descriptive error message

RFC: Smithy Rust Service Framework

Status: RFC

The Rust Smithy Framework is a full-fledged service framework whose main responsibility is to handle request lifecycles from beginning to end. It takes care of input de-serialization, operation execution, output serialization, error handling, and provides facilities to fulfill the requirements below.

Requirements

Smithy model-driven code generation

Server side code is generated from Smithy models and implements operations, input and output structures, and errors defined in the service model.

Performance

This new framework is built with performance in mind. It refrains from allocating memory when not needed and tries to use a majority of borrowed types, handling their memory lifetimes so that a request body can be stored in memory only once and not cloned if possible.

The code is implemented on solid and widely used foundations. It uses Hyper to handle the HTTP requests, the Tokio ecosystem for asynchronous (non-blocking) operations and Tower to implement middleware such as timeouts, rate limiting, retries, and more. CPU intensive operations are scheduled on a separated thread-pool to avoid blocking the event loop.

It uses Tokio axum, an HTTP framework built on top of the technologies mentioned above which handles routing, request extraction, response building, and workers lifecycle. Axum is a relatively thin layer on top of Hyper and adds very little overhead, so its performance is comparable to Hyper.

The framework should allow customers to use the built-in HTTP server or select other transport implementations that can be more performant or better suited than HTTP for their use case.

Extensibility

We want to deliver an extensible framework that can plugin components possibly during code generation and at runtime for specific scenarios that cannot be covered during generation. These components are developed using a standard interface provided by the framework itself.

Observability

Being able to report and trace the status of the service is vital for the success of any product. The framework is integrated with tracing and allows non-blocking I/O through the asynchronous tracing appender.

Metrics and logging are built with extensibility in mind, allowing customers to plug their own handlers following a well defined interface provided by the framework.

Client generation

Client generation is deferred to the various Smithy implementations.

Benchmarking

Benchmarking the framework is key and customers can't use anything that compromises the fundamental business objectives of latency and performance.

Model validation

The generated service code is responsible for validating the model constraints of input structures.

RFC: Service-specific middleware

Status: Implemented

For a summarized list of proposed changes, see the Changes Checklist section.

Currently, all services use a centralized AwsMiddleware that is defined in the (poorly named) aws-hyper crate. This poses a number of long term risks and limitations:

  1. When creating a Smithy Client directly for a given service, customers are forced to implicitly assume that the service uses stock AwsMiddleware. This prevents us from ever changing the middleware stack for a service in the future.
  2. It is impossible / impractical in the current situation to alter the middleware stack for a given service. For services like S3, we will almost certainly want to customize endpoint middleware in a way that is currently impossible.

In light of these limitations, this RFC proposes moving middleware into each generated service. aws-inlineable will be used to host and test the middleware stack. Each service will then define a public middleware module containing their middleware stack.

Terminology

  • Middleware: A tower layer that augments operation::Request -> operation::Response for things like signing and endpoint resolution.
  • Aws Middleware: A specific middleware stack that meets the requirements for AWS services.
  • Smithy Client: A aws_smithy_client::Client<C, M, R> struct that is responsible for gluing together the connector, middleware, and retry policy. This is not generated and lives in the aws-smithy-client crate.
  • Fluent Client: A code-generated Client<C, M, R> that has methods for each service operation on it. A fluent builder is generated alongside it to make construction easier.
  • AWS Client: A specialized Fluent Client that defaults to using a DynConnector, AwsMiddleware, and Standard retry policy.
  • Shared Config: An aws_types::Config struct that is responsible for storing shared configuration data that is used across all services. This is not generated and lives in the aws-types crate.
  • Service-specific Config: A code-generated Config that has methods for setting service-specific configuration. Each Config is defined in the config module of its parent service. For example, the S3-specific config struct is useable from aws_sdk_s3::config::Config and re-exported as aws_sdk_s3::Config.

Detailed Design

Currently, AwsMiddleware is defined in aws-hyper. As part of this change, an aws-inlineable dependency will be added containing code that is largely identical. This will be exposed in a public middleware module in all generated services. At some future point, we could even expose a baseline set of default middleware for whitelabel Smithy services to make them easier to use out-of-the-box.

The ClientGenerics parameter of the AwsFluentClientGenerator will be updated to become a RuntimeType, enabling loading the type directly. This has the advantage of making it fairly easy to do per-service middleware stacks since we can easily configure AwsFluentClientGenerator to insert different types based on the service id.

Changes Checklist

  • Move aws-hyper into aws-inlineable. Update comments as needed including with a usage example about how customers can augment it.
  • Refactor ClientGenerics to contain a RuntimeType instead of a string and configure. Update AwsFluentClientDecorator.
  • Update all code and examples that use aws-hyper to use service-specific middleware.
  • Push an updated README to aws-hyper deprecating the package, explaining what happened. Do not yank previous versions since those will be relied on by older SDK versions.

RFC: Split Release Process

Status: Implemented in smithy-rs#986 and aws-sdk-rust#351

At the time of writing, the aws-sdk-rust repository is used exclusively for the entire release process of both the Rust runtime crates from smithy-rs as well as the AWS runtime crates and the AWS SDK. This worked well when smithy-rs was only used for the AWS SDK, but now that it's also used for server codegen, there are issues around publishing the server-specific runtime crates since they don't belong to the SDK.

This RFC proposes a new split-release process so that the entire smithy-rs runtime can be published separately before the AWS SDK is published.

Terminology

  • Smithy Runtime Crate: A crate that gets published to crates.io and supports the code generated by smithy-rs. These crates don't provide any SDK-only functionality. These crates can support client and/or server code, and clients or servers may use only a subset of them.
  • AWS Runtime Crate: A crate of SDK-specific code that supports the code generated by the aws/codegen module in smithy-rs. These also get published to crates.io.
  • Publish-ready Bundle: A build artifact that is ready to publish to crates.io without additional steps (such as running the publisher tool's fix-manifests subcommand). Publishing one group of crates before another is not considered an additional step for this definition.
  • Releaser: A developer, automated process, or combination of the two that performs the actual release.

Requirements

At a high level, the requirements are: publish from both smithy-rs and aws-sdk-rust while preserving our current level of confidence in the quality of the release. This can be enumerated as:

  1. All Smithy runtime crates must be published together from smithy-rs
  2. AWS runtime crates and the SDK must be published together from aws-sdk-rust
  3. CI on smithy-rs must give confidence that the Smithy runtime crates, AWS runtime crates, and SDK are all at the right quality bar for publish.
  4. CI on the aws-sdk-rust repository must give confidence that the AWS SDK and its runtime crates are at the right quality bar for publish. To do this successfully, it must run against the exact versions of the Smithy runtime crates the code was generated against both before AND after they have been published to crates.io.

Background: How Publishing Worked Before

The publish process to crates.io relied on copying all the Smithy runtime crates into the final aws-sdk-rust repository. Overall, the process looked as follows:

  1. smithy-rs generates a complete aws-sdk-rust source bundle at CI time
  2. The releaser copies the generated bundle over to aws-sdk-rust
  3. The releaser runs the publisher fix-manifests subcommand to correct the Cargo.toml files generated by smithy-rs
  4. The aws-sdk-rust CI performs one last pass on the code to verify it's sound
  5. The releaser runs the publisher publish subcommand to push all the crates up to crates.io

Proposed Solution

CI in smithy-rs will be revised to generate two separate build artifacts where it generates just an SDK artifact previously. Now, it will have two build targets that get executed from CI to generate these artifacts:

  • rust-runtime:assemble - Generates a publish-ready bundle of Smithy runtime crates.
  • aws:sdk:assemble - Generates a publish-ready bundle of AWS runtime crates, SDK crates, and just the Smithy runtime crates that are used by the SDK.

The aws-sdk-rust repository will have a new next branch that has its own set of CI workflows and branch protection rules. The releaser will take the aws:sdk:assemble artifact and apply it directly to this next branch as would have previously been done against the main branch. The main branch will continue to have the same CI as next.

When it's time to cut a release, the releaser will do the following:

  1. Tag smithy-rs with the desired version number
  2. Wait for CI to build artifacts for the tagged release
  3. Pull-request the SDK artifacts over to aws-sdk-rust/next (this will be automated in the future)
  4. Pull-request merge aws-sdk-rust/next into aws-sdk-rust/main
  5. Wait for successful CI in main
  6. Tag release for main
  7. Publish SDK with publisher tool

The server team can then download the rust-runtime:assemble build artifact for the tagged release in smithy-rs, and publish the aws-smithy-http-server crate from there.

Avoiding mistakes by disallowing creation of publish-ready bundles outside of CI

It should be difficult to accidentally publish a locally built set of crates. To add friction to this, the smithy-rs build process will look for the existence of the GITHUB_ACTIONS=true environment variable. If this environment variable is not set, then it will pass a flag to the Rust codegen plugin that tells it to emit a publish = false under [package] in the generated Cargo.toml.

This could be easily circumvented, but the goal is to reduce the chances of accidentally publishing crates rather than making it impossible.

Alternatives Considered

Publish Smithy runtime crates from smithy-rs build artifacts

This approach is similar to the proposed solution, except that the SDK would not publish the Smithy runtime crates. The aws-sdk-rust/main branch would have a small tweak to its CI so that the SDK is tested against the Smithy runtime crates that are published to crates.io This CI process would look as follows:

  1. Shallow clone aws-sdk-rust with the revision being tested
  2. Run a script to remove the path argument for the Smithy runtime crate dependencies for every crate in aws-sdk-rust. For example,
aws-smithy-types = { version = "0.33.0", path = "../aws-smithy-types" }

Would become:

aws-smithy-types = { version = "0.33.0" }
  1. Run the tests as usual

When it's time to cut a release, the releaser will do the following:

  1. Tag smithy-rs with the desired version number
  2. Wait for CI to build artifacts for the tagged release
  3. Pull-request the SDK artifacts over to aws-sdk-rust/next
  4. Wait for successful CI in aws-sdk-rust/next
  5. Download the Smithy runtime crates build artifact and publish it to crates.io
  6. Pull-request merge aws-sdk-rust/next into aws-sdk-rust/main
  7. Wait for successful CI in main (this time actually running against the crates.io Smithy runtime crates)
  8. Tag release for main
  9. Publish SDK with publisher tool

Keep Smithy runtime crates in smithy-rs

This approach is similar to the previous alternative, except that the aws-sdk-rust repository won't have a snapshot of the Smithy runtime crates, and an additional step needs to be performed during CI for the next branch so that it looks as follows:

  1. Make a shallow clone of aws-sdk-rust/next
  2. Retrieve the smithy-rs commit hash that was used to generate the SDK from a file that was generated alongside the rest of the build artifacts from smithy-rs and copied into aws-sdk-rust.
  3. Make a shallow clone of smithy-rs at the correct commit hash
  4. Use a script to add a [patch] section to all the AWS SDK crates to point to the Smithy runtime crates from the local clone of smithy-rs. For example:
# The dependencies section is left alone, but is here for context
[dependencies]
# Some version of aws-smithy-types that isn't on crates.io yet, referred to as `<unreleased>` below
aws-smithy-types = "<unreleased>"

# This patch section gets added by the script
[patch.crates-io]
aws-smithy-types = { version = "<unreleased>", path = "path/to/local/smithy-rs/rust-runtime/aws-smithy-types"}
  1. Run CI as normal.

Note: smithy-rs would need to do the same patching in CI as aws-sdk-rust/next since the generated SDK would not have path dependencies for the Smithy runtime crates (since they are a publish-ready bundle intended for landing in aws-sdk-rust). The script that does this patching could live in smithy-rs and be reused by aws-sdk-rust.

The disadvantage of this approach is that a customer having an issue with the current release wouldn't be able to get a fix sooner by patching their own project's crate manifest to use the aws-sdk-rust/next branch before a release is cut since their project wouldn't be able to find the unreleased Smithy runtime crates.

Changes Checklist

  • In smithy-rs:
    • Move publisher tool from aws-sdk-rust into smithy-rs
    • Modify aws:sdk:assemble target to run the publisher fix-manifests subcommand
    • Add rust-runtime:assemble target that generates publish-ready Smithy runtime crates
    • Add CI step to create Smithy runtime bundle artifact
    • Add GITHUB_ACTIONS=true env var check for setting the publish flag in generated AND runtime manifests
    • Revise publisher tool to publish from an arbitrary directory
  • In aws-sdk-rust:
    • Implement CI for the aws-sdk-rust/next branch
    • Remove the publisher tool
  • Update release process documentation

Summary

Status: Implemented

Smithy models paginated responses . Customers of Smithy generated code & the Rust SDK will have an improved user experience if code is generated to support this. Fundamentally, paginators are a way to automatically make a series of requests with the SDK, where subsequent requests automatically forward output from the previous responses. There is nothing a paginator does that a user could not do manually, they merely simplify the common task of interacting with paginated APIs. **Specifically, a paginator will resend the orginal request but with inputToken updated to the value of the previous outputToken.

In this RFC, we propose modeling paginated data as a Stream of output shapes.

  • When an output is paginated, a paginate() method will be added to the high level builder
  • An <OperationName>Paginator struct will be generated into the paginator module.
  • If items is modeled, paginate().items() will be added to produce the paginated items. <OperationName>PaginatorItems will be generated into the paginator module.

The Stream trait enables customers to use a number of abstractions including simple looping, and collect()ing all data in a single call. A paginator will resend the original input, but with the field marked inputToken to the value of outputToken in the previous output.

Usage example:

let paginator = client
    .list_tables()
    .paginate()
    .items()
    .page_size(10)
    .send()
    .await;
let tables: Result<Vec<_ >, _ > = paginator.collect().await;

Paginators are lazy and only retrieve pages when polled by a client.

Details

Paginators will be generated into the paginator module of service crates. Currently, paginators are not feature gated, but this could be considered in the future. A paginator struct captures 2 pieces of data:

// dynamodb/src/paginator.rs
struct ListTablesPaginator<C, M, R> {
    // holds the low-level client and configuration
    handle: Arc<Handle<C, M, R>>,

    // input builder to construct the actual input on demand
    input: ListTablesInputBuilder
}

In addition to the basic usage example above, when pageSize is modeled, customers can specify the page size during pagination:

let mut tables = vec![];
let mut pages = client
    .list_tables()
    .paginate()
    .page_size(20)
    .send();
while let Some(next_page) = pages.try_next().await? {
    // pages of 20 items requested from DynamoDb
    tables.extend(next_page.table_names.unwrap_or_default().into_iter());
}

Paginators define a public method send(). This method returns impl Stream<Item=Result<OperationOutput, OperationError>. This uses FnStream defined in the aws-smithy-async crate which enables demand driven execution of a closure. A rendezvous channel is used which will block on send until demand exists.

When modeled by Smithy, page_size which automatically sets the appropriate page_size parameter and items() which returns an automatically flattened paginator are also generated. Note: page_size directly sets the modeled parameter on the internal builder. This means that a value set for page size will override any previously set value for that field.

// Generated paginator for ListTables
impl<C, M, R> ListTablesPaginator<C, M, R>
{
  /// Set the page size
  pub fn page_size(mut self, limit: i32) -> Self {
    self.builder.limit = Some(limit);
    self
  }

  /// Create a flattened paginator
  ///
  /// This paginator automatically flattens results using `table_names`. Queries to the underlying service
  /// are dispatched lazily.
  pub fn items(self) -> crate::paginator::ListTablesPaginatorItems<C, M, R> {
    crate::paginator::ListTablesPaginatorItems(self)
  }

  /// Create the pagination stream
  ///
  /// _Note:_ No requests will be dispatched until the stream is used (eg. with [`.next().await`](tokio_stream::StreamExt::next)).
  pub async fn send(
    self,
  ) -> impl tokio_stream::Stream<
    Item = std::result::Result<
      crate::output::ListTablesOutput,
      aws_smithy_http::result::SdkError<crate::error::ListTablesError>,
    >,
  > + Unpin
  {
    // Move individual fields out of self for the borrow checker
    let builder = self.builder;
    let handle = self.handle;
    fn_stream::FnStream::new(move |tx| {
      Box::pin(async move {
        // Build the input for the first time. If required fields are missing, this is where we'll produce an early error.
        let mut input = match builder.build().map_err(|err| {
          SdkError::ConstructionFailure(err.into())
        }) {
          Ok(input) => input,
          Err(e) => {
            let _ = tx.send(Err(e)).await;
            return;
          }
        };
        loop {
          let op = match input.make_operation(&handle.conf).await.map_err(|err| {
            SdkError::ConstructionFailure(err.into())
          }) {
            Ok(op) => op,
            Err(e) => {
              let _ = tx.send(Err(e)).await;
              return;
            }
          };
          let resp = handle.client.call(op).await;
          // If the input member is None or it was an error
          let done = match resp {
            Ok(ref resp) => {
              input.exclusive_start_table_name = crate::lens::reflens_structure_crate_output_list_tables_output_last_evaluated_table_name(resp).cloned();
              input.exclusive_start_table_name.is_none()
            }
            Err(_) => true,
          };
          if let Err(_) = tx.send(resp).await {
            // receiving end was dropped
            return;
          }
          if done {
            return;
          }
        }
      })
    })
  }
}

On Box::pin: The stream returned by AsyncStream does not implement Unpin. Unfortunately, this makes iteration require an invocation of pin_mut! and generates several hundred lines of compiler errors. Box::pin seems a worthwhile trade off to improve the user experience.

On the + Unpin bound: Because auto-traits leak across impl Trait boundaries, + Unpin prevents accidental regressions in the generated code which would break users.

On the crate::reflens::...: We use LensGenerator.kt to generate potentially complex accessors to deeply nested fields.

Updates to ergonomic clients

The builders generated by ergonomic clients will gain the following method, if they represent an operation that implements the Paginated trait:

/// Create a paginator for this request
///
/// Paginators are used by calling [`send().await`](crate::paginator::ListTablesPaginator::send) which returns a [`Stream`](tokio_stream::Stream).
pub fn paginate(self) -> crate::paginator::ListTablesPaginator<C, M, R> {
  crate::paginator::ListTablesPaginator::new(self.handle, self.inner)
}

Discussion Areas

On send().await

Calling send().await is not necessary from an API perspective—we could have the paginators impl-stream directly. However, it enables using impl Trait syntax and also makes the API consistent with other SDK APIs.

On tokio_stream::Stream

Currently, the core trait we use is tokio_stream::Stream. This is a re-export from futures-util. There are a few other choices:

  1. Re-export Stream from tokio_stream.
  2. Use futures_util directly

On Generics

Currently, the paginators forward the generics from the client (C, M, R) along with their fairly annoying bounds. However, if we wanted to we could simplify this and erase all the generics when the paginator was created. Since everything is code generated, there isn't actually much duplicated code in the generator, just in the generated code.

Changes Checklist

  • Create and test FnStream abstraction
  • Generate page-level paginators
  • Generate .items() paginators
  • Generate doc hints pointing people to paginators
  • Integration test using mocked HTTP traffic against a generated paginator for a real service
  • Integration test using real traffic

RFC: Examples Consolidation

Status: Implemented

Currently, the AWS Rust SDK's examples are duplicated across awslabs/aws-sdk-rust, smithy-lang/smithy-rs, and awsdocs/aws-doc-sdk-examples. The smithy-rs repository was formerly the source of truth for examples, with the examples being copied over to aws-sdk-rust as part of the release process, and examples were manually copied over to aws-doc-sdk-examples so that they could be included in the developer guide.

Now that the SDK is more stable with less frequent breaking changes, the aws-doc-sdk-examples repository can become the source of truth so long as the examples are tested against smithy-rs and continue to be copied into aws-sdk-rust.

Requirements

  1. Examples are authored and maintained in aws-doc-sdk-examples
  2. Examples are no longer present in smithy-rs
  3. CI in smithy-rs checks out examples from aws-doc-sdk-examples and builds them against the generated SDK. Success for this CI job is optional for merging since there can be a time lag between identifying that examples are broken and fixing them.
  4. Examples must be copied into aws-sdk-rust so that the examples for a specific version of the SDK can be easily referenced.
  5. Examples must be verified in aws-sdk-rust prior to merging into the main branch.

Example CI in smithy-rs

A CI job will be added to smithy-rs that:

  1. Depends on the CI job that generates the full AWS SDK
  2. Checks out the aws-doc-sdk-examples repository
  3. Modifies example Cargo.toml files to point to the newly generated AWS SDK crates
  4. Runs cargo check on each example

This job will not be required to pass for branch protection, but will let us know that examples need to be updated before the next release.

Auto-sync to aws-sdk-rust from smithy-rs changes

The auto-sync job that copies generated code from smithy-rs into the aws-sdk-rust/next branch will be updated to check out the aws-doc-sdk-examples repository and copy the examples into aws-sdk-rust. The example Cargo.toml files will also be updated to point to the local crate paths as part of this process.

The aws-sdk-rust CI already requires examples to compile, so merging next into main, the step required to perform a release, will be blocked until the examples are fixed.

In the event the examples don't work on the next branch, developers and example writers will need to be able to point the examples in aws-doc-sdk-examples to the generated SDK in next so that they can verify their fixes. This can be done by hand, or a tool can be written to automate it if a significant number of examples need to be fixed.

Process Risks

There are a couple of risks with this approach:

  1. Risk: Examples are broken and an urgent fix needs to be released.

    Possible mitigations:

    1. Revert the change that broke the examples and then add the urgent fix
    2. Create a patch branch in aws-sdk-rust, apply the fix to that based off an older version of smithy-rs with the fix applied, and merge that into main.
  2. Risk: A larger project requires changes to examples prior to GA, but multiple releases need to occur before the project completion.

    Possible mitigations:

    1. If the required changes compile against the older SDK, then just make the changes to the examples.
    2. Feature gate any incremental new functionality in smithy-rs, and work on example changes on a branch in aws-doc-sdk-examples. When wrapping up the project, remove the feature gating and merge the examples into the main branch.

Alternatives

aws-sdk-rust as the source of truth

Alternatively, the examples could reside in aws-sdk-rust, be referenced from smithy-rs CI, and get copied into aws-doc-sdk-examples for inclusion in the user guide.

Pros:

  • Prior to GA, fixing examples after making breaking changes to the SDK would be easier. Otherwise, Cargo.toml files have to be temporarily modified to point to the aws-sdk-rust/next branch in order to make fixes.
  • If a customer discovers examples via the aws-sdk-rust repository rather than via the SDK user guide, then it would be more obvious how to make changes to examples. At time of writing, the examples in the user guide link to the aws-doc-sdk-examples repository, so if the examples are discovered that way, then updating them should already be clear.

Cons:

  • Tooling would need to be built to sync examples from aws-sdk-rust into aws-doc-sdk-examples so that they could be incorporated into the user guide.
  • Creates a circular dependency between the aws-sdk-rust and smithy-rs repositories. CI in smithy-rs needs to exercise examples, which would be in aws-sdk-rust, and aws-sdk-rust has its code generated by smithy-rs. This is workable, but may lead to problems later on.

The tooling to auto-sync from aws-sdk-rust into aws-doc-sdk-examples will likely cost more than tooling to temporarily update Cargo.toml files to make example fixes (if that tooling is even necessary).

Changes Checklist

  • Add example CI job to smithy-rs
  • Diff examples in smithy-rs and aws-doc-sdk-examples and move desired differences into aws-doc-sdk-examples
  • Apply example fix PRs from aws-sdk-rust into aws-doc-sdk-examples
  • Update smithy-rs CI to copy examples from aws-doc-sdk-examples rather than from smithy-rs
  • Delete examples from smithy-rs

RFC: Waiters

Status: Accepted

Waiters are a convenient polling mechanism to wait for a resource to become available or to be deleted. For example, a waiter could be used to wait for a S3 bucket to be created after a call to the CreateBucket API, and this would only require a small amount of code rather than building out an entire polling mechanism manually.

At the highest level, a waiter is a simple polling loop (pseudo-Rust):

// Track state that contains the number of attempts made and the previous delay
let mut state = initial_state();

loop {
    // Poll the service
    let result = poll_service().await;

    // Classify the action that needs to be taken based on the Smithy model
    match classify(result) {
        // If max attempts hasn't been exceeded, then retry after a delay. Otherwise, error.
        Retry => if state.should_retry() {
            let delay = state.next_retry();
            sleep(delay).await;
        } else {
            return error_max_attempts();
        }
        // Otherwise, if the termination condition was met, return the output
        Terminate(result) => return result,
    }
}

In the AWS SDK for Rust, waiters can be added without making any backwards breaking changes to the current API. This doc outlines the approach to add them in this fashion, but does NOT examine code generating response classification from JMESPath expressions, which can be left to the implementer without concern for the overall API.

Terminology

Today, there are three layers of Client that are easy to confuse, so to make the following easier to follow, the following terms will be used:

  • Connector: An implementor of Tower's Service trait that converts a request into a response. This is typically a thin wrapper around a Hyper client.
  • Smithy Client: A aws_smithy_client::Client<C, M, R> struct that is responsible for gluing together the connector, middleware, and retry policy. This isn't intended to be used directly.
  • Fluent Client: A code generated Client<C, M, R> that has methods for each service operation on it. A fluent builder is generated alongside it to make construction easier.
  • AWS Client: A specialized Fluent Client that uses a DynConnector, DefaultMiddleware, and Standard retry policy.

All of these are just called Client in code today. This is something that could be clarified in a separate refactor.

Requirements

Waiters must adhere to the Smithy waiter specification. To summarize:

  1. Waiters are specified by the Smithy @waitable trait
  2. Retry during polling must be exponential backoff with jitter, with the min/max delay times and max attempts configured by the @waitable trait
  3. The SDK's built-in retry needs to be replaced by the waiter's retry since the Smithy model can specify retry conditions that are contrary to the defaults. For example, an error that would otherwise be retried by default might be the termination condition for the waiter.
  4. Classification of the response must be code generated based on the JMESPath expression in the model.

Waiter API

To invoke a waiter, customers will only need to invoke a single function on the AWS Client. For example, if waiting for a S3 bucket to exist, it would look like the following:

// Request bucket creation
client.create_bucket()
    .bucket_name("my-bucket")
    .send()
    .await()?;

// Wait for it to be created
client.wait_until_bucket_exists()
    .bucket_name("my-bucket")
    .send()
    .await?;

The call to wait_until_bucket_exists() will return a waiter-specific fluent builder with a send() function that will start the polling and return a future.

To avoid name conflicts with other API methods, the waiter functions can be added to the client via trait:

pub trait WaitUntilBucketExists {
    fn wait_until_bucket_exists(&self) -> crate::waiter::bucket_exists::Builder;
}

This trait would be implemented for the service's fluent client (which will necessitate making the fluent client's handle field pub(crate)).

Waiter Implementation

A waiter trait implementation will merely return a fluent builder:

impl WaitUntilBucketExists for Client {
    fn wait_until_bucket_exists(&self) -> crate::waiter::bucket_exists::Builder {
        crate::waiter::bucket_exists::Builder::new()
    }
}

This builder will have a short send() function to kick off the actual waiter implementation:

impl Builder {
    // ... existing fluent builder codegen can be reused to create all the setters and constructor

    pub async fn send(self) -> Result<HeadBucketOutput, SdkError<HeadBucketError>> {
        // Builds an input from this builder
        let input = self.inner.build().map_err(|err| aws_smithy_http::result::SdkError::ConstructionFailure(err.into()))?;
        // Passes in the client's handle, which contains a Smithy client and client config
        crate::waiter::bucket_exists::wait(self.handle, input).await
    }
}

This wait function needs to, in a loop similar to the pseudo-code in the beginning, convert the given input into an operation, replace the default response classifier on it with a no-retry classifier, and then determine what to do next based on that classification:

pub async fn wait(
    handle: Arc<Handle<DynConnector, DynMiddleware<DynConnector>, retry::Standard>>,
    input: HeadBucketInput,
) -> Result<HeadBucketOutput, SdkError<HeadBucketError>> {
    loop {
        let operation = input
            .make_operation(&handle.conf)
            .await
            .map_err(|err| {
                aws_smithy_http::result::SdkError::ConstructionFailure(err.into())
            })?;
        // Assume `ClassifyRetry` trait is implemented for `NeverRetry` to always return `RetryKind::Unnecessary`
        let operation = operation.with_retry_classifier(NeverRetry::new());

        let result = handle.client.call(operation).await;
        match classify_result(&input, result) {
            AcceptorState::Retry => {
                // The sleep implementation is available here from `handle.conf.sleep_impl`
                unimplemented!("Check if another attempt should be made and calculate delay time if so")
            }
            AcceptorState::Terminate(output) => return output,
        }
    }
}

fn classify_result(
    input: &HeadBucketInput,
    result: Result<HeadBucketOutput, SdkError<HeadBucketError>>,
) -> AcceptorState<HeadBucketOutput, SdkError<HeadBucketError>> {
    unimplemented!(
        "The Smithy model would dictate conditions to check here to produce an `AcceptorState`"
    )
}

The retry delay time should be calculated by the same exponential backoff with jitter code that the default RetryHandler uses in aws-smithy-client. This function will need to be split up and made available to the waiter implementations so that just the delay can be calculated.

Changes Checklist

  • Codegen fluent builders for waiter input and their send() functions
  • Codegen waiter invocation traits
  • Commonize exponential backoff with jitter delay calculation
  • Codegen wait() functions with delay and max attempts configuration from Smithy model
  • Codegen classify_result() functions based on JMESPath expressions in Smithy model

RFC: Publishing the Alpha SDK to Crates.io

Status: Implemented

The AWS SDK for Rust and its supporting Smithy crates need to be published to crates.io so that customers can include them in their projects and also publish crates of their own that depend on them.

This doc proposes a short-term solution for publishing to crates.io. This approach is intended to be executed manually by a developer using scripts and an SOP no more than once per week, and should require less than a dev week to implement.

Terminology

  • AWS SDK Crate: A crate that provides a client for calling a given AWS service, such as aws-sdk-s3 for calling S3.
  • AWS Runtime Crate: Any runtime crate that the AWS SDK generated code relies on, such as aws-types.
  • Smithy Runtime Crate: Any runtime crate that the smithy-rs generated code relies on, such as smithy-types.

Requirements

Versioning

Cargo uses semver for versioning, with a major.minor.patch-pre format:

  • major: Incompatible API changes
  • minor: Added functionality in backwards compatible manner
  • patch: Backwards compatible bug fixes
  • pre: Pre-release version tag (omitted for normal releases)

For now, AWS SDK crates (including aws-config) will maintain a consistent major and minor version number across all services. The latest version of aws-sdk-s3 will always have the same major.minor version as the latest aws-sdk-dynamodb, for example. The patch version is allowed to be different between service crates, but it is unlikely that we will make use of patch versions throughout alpha and dev preview. Smithy runtime crates will have different version numbers from the AWS SDK crates, but will also maintain a consistent major.minor.

The pre version tag will be alpha during the Rust SDK alpha, and will be removed once the SDK is in dev preview.

During alpha, the major version will always be 0, and the minor will be bumped for all published crates for every release. A later RFC may change the process during dev preview.

Yanking

Mistakes will inevitably be made, and a mechanism is needed to yank packages while keeping the latest version of the SDK successfully consumable from crates.io. To keep this simple, the entire published batch of crates will be yanked if any crate in that batch needs to be yanked. For example, if 260 crates were published in a batch, and it turns out there's a problem that requires yanking one of them, then all 260 will be yanked. Attempting to do partial yanking will require a lot of effort and be difficult to get right. Yanking should be a last resort.

Concrete Scenarios

The following changes will be bundled together as a minor version bump during weekly releases:

  • AWS model updates
  • New features
  • Bug fixes in runtime crates or codegen

In exceptional circumstances, a patch version will be issued if the fix doesn't require API breaking changes:

  • CVE discovered in a runtime crate
  • Buggy update to a runtime crate

In the event of a CVE being discovered in an external dependency, if the external dependency is internal to a crate, then a patch revision can be issued for that crate to correct it. Otherwise if the CVE is in a dependency that is part of the public API, a minor revision will be issued with an expedited release.

For a CVE in generated code, a minor revision will be issued with an expedited release.

Proposal

The short-term approach builds off our pre-crates.io weekly release process. That process was the following:

  1. Run script to update AWS models
  2. Manually update AWS SDK version in aws/sdk/gradle.properties in smithy-rs
  3. Tag smithy-rs
  4. Wait for GitHub actions to generate AWS SDK using newly released smithy-rs
  5. Check out aws-sdk-rust, delete existing SDK code, unzip generated SDK in place, and update readme
  6. Tag aws-sdk-rust

To keep things simple:

  • The Smithy runtime crates will have the same smithy-rs version
  • All AWS crates will have the same AWS SDK version
  • patch revisions are exceptional and will be one-off manually published by a developer

All runtime crate version numbers in smithy-rs will be locked at 0.0.0-smithy-rs-head. This is a fake version number that gets replaced when generating the SDK.

The SDK generator script in smithy-rs will be updated to:

  • Replace Smithy runtime crate versions with the smithy-rs version from aws/sdk/gradle.properties
  • Replace AWS runtime crate versions with AWS SDK version from aws/sdk/gradle.properties
  • Add correct version numbers to all path dependencies in all the final crates that end up in the build artifacts

This will result in all the crates having the correct version and manifests when imported into aws-sdk-rust. From there, a script needs to be written to determine crate dependency order, and publish crates (preferably with throttling and retry) in the correct order. This script needs to be able to recover from an interruption part way through publishing all the crates, and it also needs to output a list of all crate versions published together. This crate list will be commented on the release issue so that yanking the batch can be done if necessary.

The new release process would be:

  1. Run script to update AWS models
  2. Manually update both the AWS SDK version and the smithy-rs version in aws/sdk/gradle.properties in smithy-rs
  3. Tag smithy-rs
  4. Wait for automation to sync changes to aws-sdk-rust/next
  5. Cut a PR to merge aws-sdk-rust/next into aws-sdk-rust/main
  6. Tag aws-sdk-rust
  7. Run publish script

Short-term Changes Checklist

  • Prepare runtime crate manifests for publication to crates.io (https://github.com/smithy-lang/smithy-rs/pull/755)
  • Update SDK generator to set correct crate versions (https://github.com/smithy-lang/smithy-rs/pull/755)
  • Write bulk publish script
  • Write bulk yank script
  • Write automation to sync smithy-rs to aws-sdk-rust

RFC: Independent Crate Versioning

Status: RFC

During its alpha and dev preview releases, the AWS SDK for Rust adopted a short-term solution for versioning and publishing to crates.io. This doc proposes a long-term versioning strategy that will carry the SDK from dev preview into general availability.

This strategy will be implemented in two phases:

  1. Dev Preview: The SDK will break with its current version strategy of maintaining consistent major.minor version numbers.
  2. Stability and 1.x: This phase begins when the SDK becomes generally available. The major version will be bumped to 1, and backwards breaking changes will no longer be allowed without a major version bump to all crates in the SDK.

Terminology

  • AWS SDK Crate: A crate that provides a client for calling a given AWS service, such as aws-sdk-s3 for calling S3.
  • AWS Runtime Crate: Any runtime crate that the AWS SDK generated code relies on, such as aws-types.
  • Smithy Runtime Crate: Any runtime crate that the smithy-rs generated code relies on, such as smithy-types.

Requirements

Versioning

Cargo uses semver for versioning, with a major.minor.patch-pre format:

  • major: Incompatible API changes
  • minor: Added functionality in backwards compatible manner
  • patch: Backwards compatible bug fixes
  • pre: Pre-release version tag (omitted for normal releases)

In the new versioning strategy, the minor version number will no longer be coordinated across all SDK and Smithy runtime crates.

During phases 1 and 2, the major version will always be 0, and the following scheme will be used:

  • minor:
    • New features
    • Breaking changes
    • Dependency updates for dependencies that are part of the public API
    • Model updates with API changes
    • For code-generated crates: when a newer version of smithy-rs is used to generate the crate
  • patch:
    • Bug fixes that do not break backwards compatibility
    • Model updates that only have documentation changes

During phase 3:

  • major: Breaking changes
  • minor:
    • Changes that aren't breaking
    • Dependency updates for dependencies that are part of the public API
    • Model updates with API changes
    • For code-generated crates: when a newer version of smithy-rs is used to generate the crate
  • patch:
    • Bug fixes that do not break backwards compatibility
    • Model updates that only have documentation changes

During phase 3, bumps to the major version must be coordinated across all SDK and runtime crates.

Release Identification

Since there will no longer be one SDK "version", release tags will be dates in YYYY-MM-DD format rather than version numbers. Additionally, the SDK's user agent string will need to include a separate service version number (this requirement has already been implemented).

Yanking

It must be possible to yank an entire release with a single action. The publisher tool must be updated to understand which crate versions were released with a given release tag, and be able to yank all the crates published from that tag.

Phase 1: Dev Preview

Phase 1 will address the following challenges introduced by uncoordinating the major.minor versions:

  • Tracking of versions associated with a release tag
  • Creation of version bump process for code generated crates
  • Enforcement of version bump process in runtime crates
  • Yanking of versions associated with a release tag

Version Tracking

A new manifest file will be introduced in the root of aws-sdk-rust named versions.toml that describes all versioning information for any given commit in the repository. In the main branch, the versions.toml in tagged commits will become the source of truth for which crate versions belong to that release, as well as additional metadata that's required for maintaining version process in the future.

The special 0.0.0-smithy-rs-head version that is used prior to Phase 1 for maintaining the runtime crate versions will no longer be used (as detailed in Versioning for Runtime Crates).

This format will look as follows:

smithy_rs_version = "<release-tag|commit-hash>"

[aws-smithy-types]
version = "0.50.1"

[aws-config]
version = "0.40.0"

[aws-sdk-s3]
version = "0.89.0"
model_hash = "<hash>"

# ...

The auto-sync tool is responsible for maintaining this file. When it generates a new SDK, it will take the version numbers from runtime crates directly, and it will use the rules from the next section to determine the version numbers for the generated crates.

Versioning for Code Generated (SDK Service) Crates

Code generated crates will have their minor version bumped when the version of smithy-rs used to generate them changes, or when model updates with API changes are made. Three pieces of information are required to handle this process: the previously released version number, the smithy-rs version used to generate the code, and the level of model updates being applied. For this last one, if there are multiple model updates that affect only documentation, but then one model update that affects an API, then as a whole they will be considered as affecting an API and require a minor version bump.

The previously released version number will be retrieved from crates.io using its API. The smithy-rs version used during code generation will become a build artifact that is saved to versions.toml in aws-sdk-rust. During phase 1, the tooling required to know if a model is a documentation-only change will not be available, so all model changes will result in a minor version bump during this phase.

Overall, determining a generated crate's version number looks as follows:

flowchart TD
    start[Generate crate version] --> smithyrschanged{A. smithy-rs changed?}
    smithyrschanged -- Yes --> minor1[Minor version bump]
    smithyrschanged -- No --> modelchanged{B. model changed?}
    modelchanged -- Yes --> minor2[Minor version bump]
    modelchanged -- No --> keep[Keep current version]
  • A: smithy-rs changed?: Compare the smithy_rs_version in the previous versions.toml with the next versions.toml file, and if the values are different, consider smithy-rs to have changed.
  • B: model changed?: Similarly, compare the model_hash for the crate in versions.toml.

Versioning for Runtime Crates

The old scheme of all runtime crates in smithy-rs having a fake 0.0.0-smithy-rs-head version number with a build step to replace those with a consistent major.minor will be removed. These runtime crates will begin having their actual next version number in the Cargo.toml file in smithy-rs.

This introduces a new problem where a developer can forget to bump a runtime crate version, so a method of process enforcement needs to be introduced. This will be done through CI when merging into smithy-rs/main and repeated when merging into aws-sdk-rust/main.

The following checks need to be run for runtime crates:

flowchart TD
    A[Check runtime crate] --> B{A. Crate has changed?}
    B -- Yes --> C{B. Minor bumped?}
    B -- No --> H{C. Version changed?}
    C -- Yes --> K[Pass]
    C -- No --> E{D. Patch bumped?}
    E -- Yes --> F{E. Semverver passes?}
    E -- No --> L[Fail]
    F -- Yes --> D[Pass]
    F -- No --> G[Fail]
    H -- Yes --> I[Fail]
    H -- No --> J[Pass]
  • A: Crate has changed? The crate's source files and manifest will be hashed for the previous version and the next version. If these hashes match, then the crate is considered unchanged.
  • B: Minor bumped? The previous version is compared against the next version to see if the minor version number was bumped.
  • C: Version changed? The previous version is compared against the next version to see if it changed.
  • D: Patch bumped? The previous version is compared against the next version to see if the patch version number was bumped.
  • E: Semverver passes? Runs rust-semverver against the old and new versions of the crate.
    • If semverver fails to run (for example, if it needs to be updated to the latest nightly to succeed), then fail CI saying that either semverver needs maintenance, or that a minor version bump is required.
    • If semverver results in errors, fail CI indicating a minor version bump is required.
    • If semverver passes, then pass CI.

When running semverver, the path dependencies of the crate under examination should be updated to be crates.io references if there were no changes in those crates since the last public to crates.io. Otherwise, the types referenced from those crates in the public API will always result in breaking changes since, as far as the Rust compiler is concerned, they are different types originating from separate path-dependency crates.

For CI, the aws-sdk-rust/main branch's versions.toml file is the source of truth for the previous release's crate versions and source code.

Yanking

The publisher tool will be updated to read the versions.toml to yank all versions published in a release. This process will look as follows:

  1. Take a path to a local clone of the aws-sdk-rust repository
  2. Confirm the working tree is currently unmodified and on a release tag.
  3. Read versions.toml and print out summary of crates to yank
  4. Confirm with user before proceeding
  5. Yank crates

Changes Checklist

  • Update rust-semverver to a newer nightly that can compile aws-smithy-client
  • Establish initial versions.toml in aws-sdk-rust/main
  • Set version numbers in runtime crates in smithy-rs
  • Update the auto-sync tool to generate versions.toml
  • Create CI tool to check runtime crate version
    • Integrate with smithy-rs/main CI
    • Integrate with aws-sdk-rust/main CI
  • Update CI to verify no older runtime crates are used. For example, if aws-smithy-client is bumped to 0.50.0, then verify no crates (generated or runtime) depend on 0.49.0 or lower.

Estimate: 2-4 dev weeks

Phase 2: Stability and 1.x

When stabilizing to 1.x, the version process will stay the same, but the minor version bumps caused by version bumping runtime crates, updating models, or changing the code generator will be candidate for automatic upgrade per semver. At that point, no further API breaking changes can be made without a major version bump.

RFC: Callback APIs for ByteStream and SdkBody

Status: RFC

Adding a callback API to ByteStream and SdkBody will enable developers using the SDK to implement things like checksum validations and 'read progress' callbacks.

The Implementation

Note that comments starting with '//' are not necessarily going to be included in the actual implementation and are intended as clarifying comments for the purposes of this RFC.

// in aws_smithy_http::callbacks...

/// A callback that, when inserted into a request body, will be called for corresponding lifecycle events.
trait BodyCallback: Send {
   /// This lifecycle function is called for each chunk **successfully** read. If an error occurs while reading a chunk,
   /// this method will not be called. This method takes `&mut self` so that implementors may modify an implementing
   /// struct/enum's internal state. Implementors may return an error.
   fn update(&mut self, #[allow(unused_variables)] bytes: &[u8]) -> Result<(), BoxError> { Ok(()) }

   /// This callback is called once all chunks have been read. If the callback encountered one or more errors
   /// while running `update`s, this is how those errors are raised. Implementors may return a [`HeaderMap`][HeaderMap]
   /// that will be appended to the HTTP body as a trailer. This is only useful to do for streaming requests.
   fn trailers(&self) -> Result<Option<HeaderMap<HeaderValue>>, BoxError> { Ok(None) }

   /// Create a new `BodyCallback` from an existing one. This is called when a `BodyCallback` needs to be
   /// re-initialized with default state. For example: when a request has a body that needs to be
   /// rebuilt, all callbacks for that body need to be run again but with a fresh internal state.
   fn make_new(&self) -> Box<dyn BodyCallback>;
}

impl BodyCallback for Box<dyn BodyCallback> {
   fn update(&mut self, bytes: &[u8]) -> Result<(), BoxError> { BodyCallback::update(self, bytes) }
   fn trailers(&self) -> Result<Option<HeaderMap<HeaderValue>>, BoxError> { BodyCallback::trailers(self) }
   fn make_new(&self) -> Box<dyn SendCallback> { BodyCallback::make_new(self) }
}

The changes we need to make to ByteStream:

(The current version of ByteStream and Inner can be seen here.)

// in `aws_smithy_http::byte_stream`...

// We add a new method to `ByteStream` for inserting callbacks
impl ByteStream {
    // ...other impls omitted

    // A "builder-style" method for setting callbacks
    pub fn with_body_callback(&mut self, body_callback: Box<dyn BodyCallback>) -> &mut Self {
        self.inner.with_body_callback(body_callback);
        self
    }
}

impl Inner<SdkBody> {
    // `Inner` wraps an `SdkBody` which has a "builder-style" function for adding callbacks.
    pub fn with_body_callback(&mut self, body_callback: Box<dyn BodyCallback>) -> &mut Self {
        self.body.with_body_callback(body_callback);
        self
    }
}

The changes we need to make to SdkBody:

(The current version of SdkBody can be seen here.)

// In aws_smithy_http::body...

#[pin_project]
pub struct SdkBody {
    #[pin]
    inner: Inner,
    rebuild: Option<Arc<dyn (Fn() -> Inner) + Send + Sync>>,
    // We add a `Vec` to store the callbacks
    #[pin]
    callbacks: Vec<Box<dyn BodyCallback>>,
}

impl SdkBody {
    // We update the various fns that create `SdkBody`s to create an empty `Vec` to store callbacks.
    // Those updates are very simple so I've omitted them from this code example.

    fn poll_inner(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Result<Bytes, Error>>> {
        let mut this = self.project();
        // This block is old. I've included for context.
        let polling_result = match this.inner.project() {
            InnerProj::Once(ref mut opt) => {
                let data = opt.take();
                match data {
                    Some(bytes) if bytes.is_empty() => Poll::Ready(None),
                    Some(bytes) => Poll::Ready(Some(Ok(bytes))),
                    None => Poll::Ready(None),
                }
            }
            InnerProj::Streaming(body) => body.poll_data(cx).map_err(|e| e.into()),
            InnerProj::Dyn(box_body) => box_body.poll_data(cx),
            InnerProj::Taken => {
                Poll::Ready(Some(Err("A `Taken` body should never be polled".into())))
            }
        };

        // This block is new.
        match &polling_result {
            // When we get some bytes back from polling, pass those bytes to each callback in turn
            Poll::Ready(Some(Ok(bytes))) => {
               for callback in this.callbacks.iter_mut() {
                  // Callbacks can run into errors when reading bytes. They'll be surfaced here
                  callback.update(bytes)?;
               }
            }
            // When we're done polling for bytes, run each callback's `trailers()` method. If any calls to
            // `trailers()` return an error, propagate that error up. Otherwise, continue.
            Poll::Ready(None) => {
                for callback_result in this.callbacks.iter().map(BodyCallback::trailers) {
                    if let Err(e) = callback_result {
                        return Poll::Ready(Some(Err(e)));
                    }
                }
            }
            _ => (),
        }

        // Now that we've inspected the polling result, all that's left to do is to return it.
        polling_result
    }

    // This function now has the added responsibility of cloning callback functions (but with fresh state)
    // in the case that the `SdkBody` needs to be rebuilt.
    pub fn try_clone(&self) -> Option<Self> {
        self.rebuild.as_ref().map(|rebuild| {
            let next = rebuild();
            let callbacks = self
                .callbacks
                .iter()
                .map(Callback::make_new)
                .collect();

            Self {
                inner: next,
                rebuild: self.rebuild.clone(),
                callbacks,
            }
        })
    }

    pub fn with_callback(&mut self, callback: BodyCallback) -> &mut Self {
        self.callbacks.push(callback);
        self
    }
}

/// Given two [`HeaderMap`][HeaderMap]s, merge them together and return the merged `HeaderMap`. If the
/// two `HeaderMap`s share any keys, values from the right `HeaderMap` be appended to the left `HeaderMap`.
///
/// # Example
///
/// ```rust
/// let header_name = HeaderName::from_static("some_key");
///
/// let mut left_hand_side_headers = HeaderMap::new();
/// left_hand_side_headers.insert(
///     header_name.clone(),
///     HeaderValue::from_str("lhs value").unwrap(),
/// );
///
/// let mut right_hand_side_headers = HeaderMap::new();
/// right_hand_side_headers.insert(
///     header_name.clone(),
///     HeaderValue::from_str("rhs value").unwrap(),
/// );
///
/// let merged_header_map =
///     append_merge_header_maps(left_hand_side_headers, right_hand_side_headers);
/// let merged_values: Vec<_> = merged_header_map
///     .get_all(header_name.clone())
///     .into_iter()
///     .collect();
///
/// // Will print 'some_key: ["lhs value", "rhs value"]'
/// println!("{}: {:?}", header_name.as_str(), merged_values);
/// ```
fn append_merge_header_maps(
    mut lhs: HeaderMap<HeaderValue>,
    rhs: HeaderMap<HeaderValue>,
) -> HeaderMap<HeaderValue> {
    let mut last_header_name_seen = None;
    for (header_name, header_value) in rhs.into_iter() {
        // For each yielded item that has None provided for the `HeaderName`,
        // then the associated header name is the same as that of the previously
        // yielded item. The first yielded item will have `HeaderName` set.
        // https://docs.rs/http/latest/http/header/struct.HeaderMap.html#method.into_iter-2
        match (&mut last_header_name_seen, header_name) {
            (_, Some(header_name)) => {
                lhs.append(header_name.clone(), header_value);
                last_header_name_seen = Some(header_name);
            }
            (Some(header_name), None) => {
                lhs.append(header_name.clone(), header_value);
            }
            (None, None) => unreachable!(),
        };
    }

    lhs
}

impl http_body::Body for SdkBody {
    // The other methods have been omitted because they haven't changed

    fn poll_trailers(
        self: Pin<&mut Self>,
        _cx: &mut Context<'_>,
    ) -> Poll<Result<Option<HeaderMap<HeaderValue>>, Self::Error>> {
        let header_map = self
            .callbacks
            .iter()
            .filter_map(|callback| {
                match callback.trailers() {
                    Ok(optional_header_map) => optional_header_map,
                    // early return if a callback encountered an error
                    Err(e) => { return e },
                }
            })
            // Merge any `HeaderMap`s from the last step together, one by one.
            .reduce(append_merge_header_maps);

        Poll::Ready(Ok(header_map))
    }
}

Implementing Checksums

What follows is a simplified example of how this API could be used to introduce checksum validation for outgoing request payloads. In this example, the checksum calculation is fallible and no validation takes place. All it does it calculate the checksum of some data and then returns the checksum of that data when trailers is called. This is fine because it's being used to calculate the checksum of a streaming body for a request.

#[derive(Default)]
struct Crc32cChecksumCallback {
    state: Option<u32>,
}

impl ReadCallback for Crc32cChecksumCallback {
    fn update(&mut self, bytes: &[u8]) -> Result<(), BoxError> {
        self.state = match self.state {
            Some(crc) => { self.state = Some(crc32c_append(crc, bytes)) }
            None => { Some(crc32c(&bytes)) }
        };

       Ok(())
    }

    fn trailers(&self) ->
    Result<Option<HeaderMap<HeaderValue>>,
          Box<dyn std::error::Error + Send + Sync>>
    {
        let mut header_map = HeaderMap::new();
        // This checksum name is an Amazon standard and would be a `const` in the real implementation
        let key = HeaderName::from_static("x-amz-checksum-crc32c");
        // If no data was provided to this callback and no CRC was ever calculated, we return zero as the checksum.
        let crc = self.state.unwrap_or_default();
        // Convert the CRC to a string, base 64 encode it, and then convert it into a `HeaderValue`.
        let value = HeaderValue::from_str(&base64::encode(crc.to_string())).expect("base64 will always produce valid header values");

        header_map.insert(key, value);

        Some(header_map)
    }

    fn make_new(&self) -> Box<dyn ReadCallback> {
        Box::new(Crc32cChecksumCallback::default())
    }
}

NOTE: If Crc32cChecksumCallback needed to validate a response, then we could modify it to check its internal state against a target checksum value and calling trailers would produce an error if the values didn't match.

In order to use this in a request, we'd modify codegen for that request's service.

  1. We'd check if the user had requested validation and also check if they'd pre-calculated a checksum.
  2. If validation was requested but no pre-calculated checksum was given, we'd create a callback similar to the one above
  3. Then, we'd create a new checksum callback and:
    • (if streaming) we'd set the checksum callback on the request body object
    • (if non-streaming) we'd immediately read the body and call BodyCallback::update manually. Once all data was read, we'd get the checksum by calling trailers and insert that data as a request header.

RFC: Fine-grained timeout configuration

Status: Implemented

For a summarized list of proposed changes, see the Changes Checklist section.

While it is currently possible for users to implement request timeouts by racing operation send futures against timeout futures, this RFC proposes a more ergonomic solution that would also enable users to set timeouts for things like TLS negotiation and "time to first byte".

Terminology

There's a lot of terminology to define, so I've broken it up into three sections.

General terms

  • Smithy Client: A aws_smithy_client::Client<C, M, R> struct that is responsible for gluing together the connector, middleware, and retry policy. This is not generated and lives in the aws-smithy-client crate.
  • Fluent Client: A code-generated Client<C, M, R> that has methods for each service operation on it. A fluent builder is generated alongside it to make construction easier.
  • AWS Client: A specialized Fluent Client that defaults to using a DynConnector, AwsMiddleware, and Standard retry policy.
  • Shared Config: An aws_types::Config struct that is responsible for storing shared configuration data that is used across all services. This is not generated and lives in the aws-types crate.
  • Service-specific Config: A code-generated Config that has methods for setting service-specific configuration. Each Config is defined in the config module of its parent service. For example, the S3-specific config struct is useable from aws_sdk_s3::config::Config and re-exported as aws_sdk_s3::Config. In this case, "service" refers to an AWS offering like S3.

HTTP stack terms

  • Service: A trait defined in the tower-service crate. The lowest level of abstraction we deal with when making HTTP requests. Services act directly on data to transform and modify that data. A Service is what eventually turns a request into a response.
  • Layer: Layers are a higher-order abstraction over services that is used to compose multiple services together, creating a new service from that combination. Nothing prevents us from manually wrapping services within services, but Layers allow us to do it in a flexible and generic manner. Layers don't directly act on data but instead can wrap an existing service with additional functionality, creating a new service. Layers can be thought of as middleware. NOTE: The use of Layers can produce compiler errors that are difficult to interpret and defining a layer requires a large amount of boilerplate code.
  • Middleware: a term with several meanings,
    • Generically speaking, middleware are similar to Services and Layers in that they modify requests and responses.
    • In the SDK, "Middleware" refers to a layer that can be wrapped around a DispatchService. In practice, this means that the resulting Service (and the inner service) must meet the bound T: where T: Service<operation::Request, Response=operation::Response, Error=SendOperationError>.
      • Note: This doesn't apply to the middlewares we use when generating presigned request because those don't wrap a DispatchService.
    • The most notable example of a Middleware is the AwsMiddleware. Other notable examples include MapRequest, AsyncMapRequest, and ParseResponse.
  • DispatchService: The innermost part of a group of nested services. The Service that actually makes an HTTP call on behalf of a request. Responsible for parsing success and error responses.
  • Connector: a term with several meanings,
    • DynConnectors (a struct that implements DynConnect) are Services with their specific type erased so that we can do dynamic dispatch.
    • A term from hyper for any object that implements the Connect trait. Really just an alias for tower_service::Service. Sometimes referred to as a Connection.
  • Stage: A form of middleware that's not related to tower. These currently function as a way of transforming requests and don't have the ability to transform responses.
  • Stack: higher order abstraction over Layers defined in the tower crate e.g. Layers wrap services in one another and Stacks wrap layers within one another.

Timeout terms

  • Connect Timeout: A limit on the amount of time after making an initial connect attempt on a socket to complete the connect-handshake.
    • TODO: the runtime is based on Hyper which reuses connection and doesn't currently have a way of guaranteeing that a fresh connection will be use for a given request.
  • TLS Negotiation Timeout: A limit on the amount of time a TLS handshake takes from when the CLIENT HELLO message is sent to the time the client and server have fully negotiated ciphers and exchanged keys.
  • Time to First Byte Timeout: Sometimes referred to as a "read timeout." A limit on the amount of time an application takes to attempt to read the first byte over an established, open connection after write request.
  • HTTP Request Timeout For A Single Attempt: A limit on the amount of time it takes for the first byte to be sent over an established, open connection and when the last byte is received from the service.
  • HTTP Request Timeout For Multiple Attempts: This timeout acts like the previous timeout but constrains the total time it takes to make a request plus any retries.

Configuring timeouts

Just like with Retry Behavior Configuration, these settings can be configured in several places and have the same precedence rules (paraphrased here for clarity).

  1. Service-specific config builders
  2. Shared config builders
  3. Environment variables
  4. Profile config file (e.g., ~/.aws/credentials)

The above list is in order of decreasing precedence e.g. configuration set in an app will override values from environment variables.

Configuration options

The table below details the specific ways each timeout can be configured. In all cases, valid values are non-negative floats representing the number of seconds before a timeout is triggered.

TimeoutEnvironment VariableAWS Config VariableBuilder Method
ConnectAWS_CONNECT_TIMEOUTconnect_timeoutconnect_timeout
TLS NegotiationAWS_TLS_NEGOTIATION_TIMEOUTtls_negotiation_timeouttls_negotiation_timeout
Time To First ByteAWS_READ_TIMEOUTread_timeoutread_timeout
HTTP Request - single attemptAWS_API_CALL_ATTEMPT_TIMEOUTapi_call_attempt_timeoutapi_call_attempt_timeout
HTTP Request - all attemptsAWS_API_CALL_TIMEOUTapi_call_timeoutapi_call_timeout

SDK-specific defaults set by AWS service teams

QUESTION: How does the SDK currently handle these defaults?

Prior Art

  • hjr3/hyper-timeout is a Connector for hyper that enables setting connect, read, and write timeouts
  • sfackler/tokio-io-timeout provides timeouts for tokio IO operations. Used within hyper-timeout.
  • [tokio::time::sleep_until] creates a Future that completes after some time has elapsed. Used within tokio-io-timeout.

Behind the scenes

Timeouts are achieved by racing a future against a tokio::time::Sleep future. The question, then, is "how can I create a future that represents a condition I want to watch for?". For example, in the case of a ConnectTimeout, how do we watch an ongoing request to see if it's completed the connect-handshake? Our current stack of Middleware acts on requests at different levels of granularity. The timeout Middlewares will be no different.

Middlewares for AWS Client requests

View AwsMiddleware in GitHub

#[derive(Debug, Default)]
#[non_exhaustive]
pub struct AwsMiddleware;
impl<S> tower::Layer<S> for AwsMiddleware {
  type Service = <AwsMiddlewareStack as tower::Layer<S>>::Service;

  fn layer(&self, inner: S) -> Self::Service {
    let credential_provider = AsyncMapRequestLayer::for_mapper(CredentialsStage::new());
    let signer = MapRequestLayer::for_mapper(SigV4SigningStage::new(SigV4Signer::new()));
    let endpoint_resolver = MapRequestLayer::for_mapper(AwsAuthStage);
    let user_agent = MapRequestLayer::for_mapper(UserAgentStage::new());
    ServiceBuilder::new()
            .layer(endpoint_resolver)
            .layer(user_agent)
            .layer(credential_provider)
            .layer(signer)
            .service(inner)
  }
}

The above code is only included for context. This RFC doesn't define any timeouts specific to AWS so AwsMiddleware won't require any changes.

Middlewares for Smithy Client requests

View aws_smithy_client::Client::call_raw in GitHub

impl<C, M, R> Client<C, M, R>
  where
          C: bounds::SmithyConnector,
          M: bounds::SmithyMiddleware<C>,
          R: retry::NewRequestPolicy,
{
  // ...other methods omitted
  pub async fn call_raw<O, T, E, Retry>(
    &self,
    input: Operation<O, Retry>,
  ) -> Result<SdkSuccess<T>, SdkError<E>>
    where
            R::Policy: bounds::SmithyRetryPolicy<O, T, E, Retry>,
            bounds::Parsed<<M as bounds::SmithyMiddleware<C>>::Service, O, Retry>:
            Service<Operation<O, Retry>, Response=SdkSuccess<T>, Error=SdkError<E>> + Clone,
  {
    let connector = self.connector.clone();

    let mut svc = ServiceBuilder::new()
            // Create a new request-scoped policy
            .retry(self.retry_policy.new_request_policy())
            .layer(ParseResponseLayer::<O, Retry>::new())
            // These layers can be considered as occurring in order. That is, first invoke the
            // customer-provided middleware, then dispatch dispatch over the wire.
            .layer(&self.middleware)
            .layer(DispatchLayer::new())
            .service(connector);

    svc.ready().await?.call(input).await
  }
}

The Smithy Client creates a new Stack of services to handle each request it sends. Specifically:

  • A method retry is used set the retry handler. The configuration for this was set during creation of the Client.
  • ParseResponseLayer inserts a service for transforming responses into operation-specific outputs or errors. The O generic parameter of input is what decides exactly how the transformation is implemented.
  • A middleware stack that was included during Client creation is inserted into the stack. In the case of the AWS SDK, this would be AwsMiddleware.
  • DispatchLayer inserts a service for transforming an http::Request into an operation::Request. It's also responsible for re-attaching the property bag from the Operation that triggered the request.
  • The innermost Service is a DynConnector wrapping a hyper client (which one depends on the TLS implementation was enabled by cargo features.)

The HTTP Request Timeout For A Single Attempt and HTTP Request Timeout For Multiple Attempts can be implemented at this level. The same Layer can be used to create both TimeoutServices. The TimeoutLayer would require two inputs:

  • sleep_fn: A runtime-specific implementation of sleep. The SDK is currently tokio-based and would default to tokio::time::sleep (this default is set in the aws_smithy_async::rt::sleep module.)
  • The duration of the timeout as a std::time::Duration

The resulting code would look like this:

impl<C, M, R> Client<C, M, R>
  where
          C: bounds::SmithyConnector,
          M: bounds::SmithyMiddleware<C>,
          R: retry::NewRequestPolicy,
{
  // ...other methods omitted
  pub async fn call_raw<O, T, E, Retry>(
    &self,
    input: Operation<O, Retry>,
  ) -> Result<SdkSuccess<T>, SdkError<E>>
    where
            R::Policy: bounds::SmithyRetryPolicy<O, T, E, Retry>,
            bounds::Parsed<<M as bounds::SmithyMiddleware<C>>::Service, O, Retry>:
            Service<Operation<O, Retry>, Response=SdkSuccess<T>, Error=SdkError<E>> + Clone,
  {
    let connector = self.connector.clone();
    let sleep_fn = aws_smithy_async::rt::sleep::default_async_sleep();

    let mut svc = ServiceBuilder::new()
            .layer(TimeoutLayer::new(
              sleep_fn,
              self.timeout_config.api_call_timeout(),
            ))
            // Create a new request-scoped policy
            .retry(self.retry_policy.new_request_policy())
            .layer(TimeoutLayer::new(
              sleep_fn,
              self.timeout_config.api_call_attempt_timeout(),
            ))
            .layer(ParseResponseLayer::<O, Retry>::new())
            // These layers can be considered as occurring in order. That is, first invoke the
            // customer-provided middleware, then dispatch dispatch over the wire.
            .layer(&self.middleware)
            .layer(DispatchLayer::new())
            .service(connector);

    svc.ready().await?.call(input).await
  }
}

Note: Our HTTP client supports multiple TLS implementations. We'll likely have to implement this feature once per library.

Timeouts will be implemented in the following places:

  • HTTP request timeout for multiple requests will be implemented as the outermost Layer in Client::call_raw.
  • HTTP request timeout for a single request will be implemented within RetryHandler::retry.
  • Time to first byte, TLS negotiation, and connect timeouts will be implemented within the central hyper connector.

Changes checklist

Changes are broken into to sections:

  • HTTP requests (single or multiple) are implementable as layers within our current stack
  • Other timeouts will require changes to our dependencies and may be slower to implement

Implementing HTTP request timeouts

  • Add TimeoutConfig to smithy-types
  • Add TimeoutConfigProvider to aws-config
    • Add provider that fetches config from environment variables
    • Add provider that fetches config from profile
  • Add timeout method to aws_types::Config for setting timeout configuration
  • Add timeout method to generated Configs too
  • Create a generic TimeoutService and accompanying Layer
    • TimeoutLayer should accept a sleep function so that it doesn't have a hard dependency on tokio
  • insert a TimeoutLayer before the RetryPolicy to handle timeouts for multiple-attempt requests
  • insert a TimeoutLayer after the RetryPolicy to handle timeouts for single-attempt requests
  • Add tests for timeout behavior
    • test multi-request timeout triggers after 3 slow retries
    • test single-request timeout triggers correctly
    • test single-request timeout doesn't trigger if request completes in time

RFC: How Cargo "features" should be used in the SDK and runtime crates

Status: Accepted

Some background on features

What is a feature? Here's a definition from the Cargo Book section on features:

Cargo "features" provide a mechanism to express conditional compilation and optional dependencies. A package defines a set of named features in the [features] table of Cargo.toml, and each feature can either be enabled or disabled. Features for the package being built can be enabled on the command-line with flags such as --features. Features for dependencies can be enabled in the dependency declaration in Cargo.toml.

We use features in a majority of our runtime crates and in all of our SDK crates. For example, aws-sigv4 uses them to enable event streams. Another common use case is exhibited by aws-sdk-s3 which uses them to enable the tokio runtime and the TLS implementation used when making requests.

Features should be additive

The Cargo book has this to say:

When a dependency is used by multiple packages, Cargo will use the union of all features enabled on that dependency when building it. This helps ensure that only a single copy of the dependency is used.

A consequence of this is that features should be additive. That is, enabling a feature should not disable functionality, and it should usually be safe to enable any combination of features. A feature should not introduce a SemVer-incompatible change.

What does this mean for the SDK?

Despite the constraints outlined above, we should use features in the SDKs because of the benefits they bring:

  • Features enable users to avoid compiling code that they won't be using. Additionally, features allow both general and specific control of compiled code, serving the needs of both novice and expert users.
  • A single feature in a crate can activate or deactivate multiple features exposed by that crate's dependencies, freeing the user from having to specifically activate or deactivate them.
  • Features can help users understand what a crate is capable of in the same way that looking at a graph of a crate's modules can.

When using features, we should adhere to the guidelines outlined below.

Avoid writing code that relies on only activating one feature from a set of mutually exclusive features.

As noted earlier in an excerpt from the Cargo book:

enabling a feature should not disable functionality, and it should usually be safe to enable any combination of features. A feature should not introduce a SemVer-incompatible change.

#![allow(unused)]
fn main() {
#[cfg(feature = "rustls")]
impl<M, R> ClientBuilder<(), M, R> {
    /// Connect to the service over HTTPS using Rustls.
    pub fn tls_adapter(self) -> ClientBuilder<Adapter<crate::conns::Https>, M, R> {
        self.connector(Adapter::builder().build(crate::conns::https()))
    }
}

#[cfg(feature = "native-tls")]
impl<M, R> ClientBuilder<(), M, R> {
    /// Connect to the service over HTTPS using the native TLS library on your platform.
    pub fn tls_adapter(
        self,
    ) -> ClientBuilder<Adapter<hyper_tls::HttpsConnector<hyper::client::HttpConnector>>, M, R> {
        self.connector(Adapter::builder().build(crate::conns::native_tls()))
    }
}
}

When the example code above is compiled with both features enabled, compilation will fail with a "duplicate definitions with name tls_adapter" error. Also, note that the return type of the function differs between the two versions. This is a SemVer-incompatible change.

Here's an updated version of the example that fixes these issues:

#![allow(unused)]
fn main() {
#[cfg(feature = "rustls")]
impl<M, R> ClientBuilder<(), M, R> {
    /// Connect to the service over HTTPS using Rustls.
    pub fn rustls(self) -> ClientBuilder<Adapter<crate::conns::Https>, M, R> {
        self.connector(Adapter::builder().build(crate::conns::https()))
    }
}

#[cfg(feature = "native-tls")]
impl<M, R> ClientBuilder<(), M, R> {
    /// Connect to the service over HTTPS using the native TLS library on your platform.
    pub fn native_tls(
        self,
    ) -> ClientBuilder<Adapter<hyper_tls::HttpsConnector<hyper::client::HttpConnector>>, M, R> {
        self.connector(Adapter::builder().build(crate::conns::native_tls()))
    }
}
}

Both features can now be enabled at once without creating a conflict. Since both methods have different names, it's now Ok for them to have different return types.

This is real code, see it in context

We should avoid using #[cfg(not(feature = "some-feature"))]

At the risk of seeming repetitive, the Cargo book says:

enabling a feature should not disable functionality, and it should usually be safe to enable any combination of features

Conditionally compiling code when a feature is not activated can make it hard for users and maintainers to reason about what will happen when they activate a feature. This is also a sign that a feature may not be "additive".

NOTE: It's ok to use #[cfg(not())] to conditionally compile code based on a user's OS. It's also useful when controlling what code gets rendered when testing or when generating docs.

One case where using not is acceptable is when providing a fallback when no features are set:

#[cfg(feature = "rt-tokio")]
pub fn default_async_sleep() -> Option<Arc<dyn AsyncSleep>> {
    Some(sleep_tokio())
}

#[cfg(not(feature = "rt-tokio"))]
pub fn default_async_sleep() -> Option<Arc<dyn AsyncSleep>> {
    None
}

Don't default to defining "default features"

Because Cargo will use the union of all features enabled on a dependency when building it, we should be wary of marking features as default. Once we do mark features as default, users that want to exclude code and dependencies brought in by those features will have a difficult time doing so. One need look no further than this issue submitted by a user that wanted to use Native TLS and struggled to make sure that Rustls was actually disabled (This issue was resolved in this PR which removed default features from our runtime crates.) This is not to say that we should never use them, as having defaults for the most common use cases means less work for those users.

When a default feature providing some functionality is disabled, active features must not automatically replace that functionality

As the SDK is currently designed, the TLS implementation in use can change depending on what features are pulled in. Currently, if a user disables default-features (which include rustls) and activates the native-tls feature, then we automatically use native-tls when making requests. For an example of what this looks like from the user's perspective, see this example.

This RFC proposes that we should have a single default for any configurable functionality and that that functionality depends on a corresponding default feature being active. If default-features are disabled, then so is the corresponding default functionality. In its place would be functionality that fails fast with a message describing why it failed (a default was deactivated but the user didn't set a replacement), and what the user should do to fix it (with links to documentation and examples where necessary). We should use compile-time errors to communicate failures with users, or panics for cases that can't be evaluated at compile-time.

For an example: Say you have a crate with features a, b, c that all provide some version of functionality foo. Feature a is part of default-features. When no-default-features = true but features b and c are active, don't automatically fall back to b or c. Instead, emit an error with a message like this:

"When default features are disabled, you must manually set foo. Features b and c active; You can use one of those. See an example of setting a custom foo here: link-to-docs.amazon.com/setting-foo"

Further reading

RFC: Supporting Flexible Checksums

Status: Implemented

We can't currently update the S3 SDK because we don't support the new "Flexible Checksums" feature. This RFC describes this new feature and details how we should implement it in smithy-rs.

What is the "Flexible Checksums" feature?

S3 has previously supported MD5 checksum validation of data. Now, it supports more checksum algorithms like CRC32, CRC32C, SHA-1, and SHA-256. This validation is available when putting objects to S3 and when getting them from S3. For more information, see this AWS News Blog post.

Implementing Checksums

Checksum callbacks were introduced as a result of the acceptance of RFC0013 and this RFC proposes a refactor to those callbacks, as well as several new wrappers for SdkBody that will provide new functionality.

Refactoring aws-smithy-checksums

TLDR; This refactor of aws-smithy-checksums:

  • Removes the "callback" terminology: As a word, "callback" doesn't carry any useful information, and doesn't aid in understanding.

  • Removes support for the BodyCallback API: Instead of adding checksum callbacks to a body, we're going to use a "body wrapping" instead. "Body wrapping" is demonstrated in the ChecksumBody, AwsChunkedBody, and ChecksumValidatedBody sections.

    NOTE: This doesn't remove the BodyCallback trait. That will still exist, we just won't use it.

  • Updates terminology to focus on "headers" instead of "trailers": Because the types we deal with in this module are named for HTTP headers, I chose to use that terminology instead. My hope is that this will be less strange to people reading this code.

  • Adds fn checksum_algorithm_to_checksum_header_name: a function that's used in generated code to set a checksum request header.

  • Adds fn checksum_header_name_to_checksum_algorithm: a function that's used in generated code when creating a checksum-validating response body.

  • Add new checksum-related "body wrapping" HTTP body types: These are defined in the body module and will be shown later in this RFC.

// In aws-smithy-checksums/src/lib.rs
//! Checksum calculation and verification callbacks

use aws_smithy_types::base64;

use bytes::Bytes;
use http::header::{HeaderMap, HeaderName, HeaderValue};
use sha1::Digest;
use std::io::Write;

pub mod body;

// Valid checksum algorithm names
pub const CRC_32_NAME: &str = "crc32";
pub const CRC_32_C_NAME: &str = "crc32c";
pub const SHA_1_NAME: &str = "sha1";
pub const SHA_256_NAME: &str = "sha256";

pub const CRC_32_HEADER_NAME: HeaderName = HeaderName::from_static("x-amz-checksum-crc32");
pub const CRC_32_C_HEADER_NAME: HeaderName = HeaderName::from_static("x-amz-checksum-crc32c");
pub const SHA_1_HEADER_NAME: HeaderName = HeaderName::from_static("x-amz-checksum-sha1");
pub const SHA_256_HEADER_NAME: HeaderName = HeaderName::from_static("x-amz-checksum-sha256");

// Preserved for compatibility purposes. This should never be used by users, only within smithy-rs
const MD5_NAME: &str = "md5";
const MD5_HEADER_NAME: HeaderName = HeaderName::from_static("content-md5");

/// Given a `&str` representing a checksum algorithm, return the corresponding `HeaderName`
/// for that checksum algorithm.
pub fn checksum_algorithm_to_checksum_header_name(checksum_algorithm: &str) -> HeaderName {
    if checksum_algorithm.eq_ignore_ascii_case(CRC_32_NAME) {
        CRC_32_HEADER_NAME
    } else if checksum_algorithm.eq_ignore_ascii_case(CRC_32_C_NAME) {
        CRC_32_C_HEADER_NAME
    } else if checksum_algorithm.eq_ignore_ascii_case(SHA_1_NAME) {
        SHA_1_HEADER_NAME
    } else if checksum_algorithm.eq_ignore_ascii_case(SHA_256_NAME) {
        SHA_256_HEADER_NAME
    } else if checksum_algorithm.eq_ignore_ascii_case(MD5_NAME) {
        MD5_HEADER_NAME
    } else {
        // TODO what's the best way to handle this case?
        HeaderName::from_static("x-amz-checksum-unknown")
    }
}

/// Given a `HeaderName` representing a checksum algorithm, return the name of that algorithm
/// as a `&'static str`.
pub fn checksum_header_name_to_checksum_algorithm(
    checksum_header_name: &HeaderName,
) -> &'static str {
    if checksum_header_name == CRC_32_HEADER_NAME {
        CRC_32_NAME
    } else if checksum_header_name == CRC_32_C_HEADER_NAME {
        CRC_32_C_NAME
    } else if checksum_header_name == SHA_1_HEADER_NAME {
        SHA_1_NAME
    } else if checksum_header_name == SHA_256_HEADER_NAME {
        SHA_256_NAME
    } else if checksum_header_name == MD5_HEADER_NAME {
        MD5_NAME
    } else {
        // TODO what's the best way to handle this case?
        "unknown-checksum-algorithm"
    }
}

/// When a response has to be checksum-verified, we have to check possible headers until we find the
/// header with the precalculated checksum. Because a service may send back multiple headers, we have
/// to check them in order based on how fast each checksum is to calculate.
pub const CHECKSUM_HEADERS_IN_PRIORITY_ORDER: [HeaderName; 4] = [
    CRC_32_C_HEADER_NAME,
    CRC_32_HEADER_NAME,
    SHA_1_HEADER_NAME,
    SHA_256_HEADER_NAME,
];

type BoxError = Box<dyn std::error::Error + Send + Sync>;

/// Checksum algorithms are use to validate the integrity of data. Structs that implement this trait
/// can be used as checksum calculators. This trait requires Send + Sync because these checksums are
/// often used in a threaded context.
pub trait Checksum: Send + Sync {
    /// Given a slice of bytes, update this checksum's internal state.
    fn update(&mut self, bytes: &[u8]) -> Result<(), BoxError>;
    /// Either return this checksum as a `HeaderMap` containing one HTTP header, or return an error
    /// describing why checksum calculation failed.
    fn headers(&self) -> Result<Option<HeaderMap<HeaderValue>>, BoxError>;
    /// Return the `HeaderName` used to represent this checksum algorithm
    fn header_name(&self) -> HeaderName;
    /// "Finalize" this checksum, returning the calculated value as `Bytes` or an error that
    /// occurred during checksum calculation. To print this value in a human-readable hexadecimal
    /// format, you can print it using Rust's builtin [formatter].
    ///
    /// _**NOTE:** typically, "finalizing" a checksum in Rust will take ownership of the checksum
    /// struct. In this method, we clone the checksum's state before finalizing because checksums
    /// may be used in a situation where taking ownership is not possible._
    ///
    /// [formatter]: https://doc.rust-lang.org/std/fmt/trait.UpperHex.html
    fn finalize(&self) -> Result<Bytes, BoxError>;
    /// Return the size of this checksum algorithms resulting checksum, in bytes. For example, the
    /// CRC32 checksum algorithm calculates a 32 bit checksum, so a CRC32 checksum struct
    /// implementing this trait method would return 4.
    fn size(&self) -> u64;
}

/// Create a new `Box<dyn Checksum>` from an algorithm name. Valid algorithm names are defined as
/// `const`s in this module.
pub fn new_checksum(checksum_algorithm: &str) -> Box<dyn Checksum> {
    if checksum_algorithm.eq_ignore_ascii_case(CRC_32_NAME) {
        Box::new(Crc32::default())
    } else if checksum_algorithm.eq_ignore_ascii_case(CRC_32_C_NAME) {
        Box::new(Crc32c::default())
    } else if checksum_algorithm.eq_ignore_ascii_case(SHA_1_NAME) {
        Box::new(Sha1::default())
    } else if checksum_algorithm.eq_ignore_ascii_case(SHA_256_NAME) {
        Box::new(Sha256::default())
    } else if checksum_algorithm.eq_ignore_ascii_case(MD5_NAME) {
        // It's possible to create an MD5 and we do this in some situations for compatibility.
        // We deliberately hide this from users so that they don't go using it.
        Box::new(Md5::default())
    } else {
        panic!("unsupported checksum algorithm '{}'", checksum_algorithm)
    }
}

#[derive(Debug, Default)]
struct Crc32 {
    hasher: crc32fast::Hasher,
}

impl Crc32 {
    fn update(&mut self, bytes: &[u8]) -> Result<(), BoxError> {
        self.hasher.update(bytes);

        Ok(())
    }

    fn headers(&self) -> Result<Option<HeaderMap<HeaderValue>>, BoxError> {
        let mut header_map = HeaderMap::new();
        header_map.insert(Self::header_name(), self.header_value());

        Ok(Some(header_map))
    }

    fn finalize(&self) -> Result<Bytes, BoxError> {
        Ok(Bytes::copy_from_slice(
            &self.hasher.clone().finalize().to_be_bytes(),
        ))
    }

    // Size of the checksum in bytes
    fn size() -> u64 {
        4
    }

    fn header_name() -> HeaderName {
        CRC_32_HEADER_NAME
    }

    fn header_value(&self) -> HeaderValue {
        // We clone the hasher because `Hasher::finalize` consumes `self`
        let hash = self.hasher.clone().finalize();
        HeaderValue::from_str(&base64::encode(u32::to_be_bytes(hash)))
            .expect("will always produce a valid header value from a CRC32 checksum")
    }
}

impl Checksum for Crc32 {
    fn update(
        &mut self,
        bytes: &[u8],
    ) -> Result<(), Box<(dyn std::error::Error + Send + Sync + 'static)>> {
        Self::update(self, bytes)
    }
    fn headers(
        &self,
    ) -> Result<Option<HeaderMap>, Box<(dyn std::error::Error + Send + Sync + 'static)>> {
        Self::headers(self)
    }
    fn header_name(&self) -> HeaderName {
        Self::header_name()
    }
    fn finalize(&self) -> Result<Bytes, BoxError> {
        Self::finalize(self)
    }
    fn size(&self) -> u64 {
        Self::size()
    }
}

#[derive(Debug, Default)]
struct Crc32c {
    state: Option<u32>,
}

impl Crc32c {
    fn update(&mut self, bytes: &[u8]) -> Result<(), BoxError> {
        self.state = match self.state {
            Some(crc) => Some(crc32c::crc32c_append(crc, bytes)),
            None => Some(crc32c::crc32c(bytes)),
        };

        Ok(())
    }

    fn headers(&self) -> Result<Option<HeaderMap<HeaderValue>>, BoxError> {
        let mut header_map = HeaderMap::new();
        header_map.insert(Self::header_name(), self.header_value());

        Ok(Some(header_map))
    }

    fn finalize(&self) -> Result<Bytes, BoxError> {
        Ok(Bytes::copy_from_slice(
            &self.state.unwrap_or_default().to_be_bytes(),
        ))
    }

    // Size of the checksum in bytes
    fn size() -> u64 {
        4
    }

    fn header_name() -> HeaderName {
        CRC_32_C_HEADER_NAME
    }

    fn header_value(&self) -> HeaderValue {
        // If no data was provided to this callback and no CRC was ever calculated, return zero as the checksum.
        let hash = self.state.unwrap_or_default();
        HeaderValue::from_str(&base64::encode(u32::to_be_bytes(hash)))
            .expect("will always produce a valid header value from a CRC32C checksum")
    }
}

impl Checksum for Crc32c {
    fn update(
        &mut self,
        bytes: &[u8],
    ) -> Result<(), Box<(dyn std::error::Error + Send + Sync + 'static)>> {
        Self::update(self, bytes)
    }
    fn headers(
        &self,
    ) -> Result<Option<HeaderMap>, Box<(dyn std::error::Error + Send + Sync + 'static)>> {
        Self::headers(self)
    }
    fn header_name(&self) -> HeaderName {
        Self::header_name()
    }
    fn finalize(&self) -> Result<Bytes, BoxError> {
        Self::finalize(self)
    }
    fn size(&self) -> u64 {
        Self::size()
    }
}

#[derive(Debug, Default)]
struct Sha1 {
    hasher: sha1::Sha1,
}

impl Sha1 {
    fn update(&mut self, bytes: &[u8]) -> Result<(), BoxError> {
        self.hasher.write_all(bytes)?;

        Ok(())
    }

    fn headers(&self) -> Result<Option<HeaderMap<HeaderValue>>, BoxError> {
        let mut header_map = HeaderMap::new();
        header_map.insert(Self::header_name(), self.header_value());

        Ok(Some(header_map))
    }

    fn finalize(&self) -> Result<Bytes, BoxError> {
        Ok(Bytes::copy_from_slice(
            self.hasher.clone().finalize().as_slice(),
        ))
    }

    // Size of the checksum in bytes
    fn size() -> u64 {
        20
    }

    fn header_name() -> HeaderName {
        SHA_1_HEADER_NAME
    }

    fn header_value(&self) -> HeaderValue {
        // We clone the hasher because `Hasher::finalize` consumes `self`
        let hash = self.hasher.clone().finalize();
        HeaderValue::from_str(&base64::encode(&hash[..]))
            .expect("will always produce a valid header value from a SHA-1 checksum")
    }
}

impl Checksum for Sha1 {
    fn update(
        &mut self,
        bytes: &[u8],
    ) -> Result<(), Box<(dyn std::error::Error + Send + Sync + 'static)>> {
        Self::update(self, bytes)
    }
    fn headers(
        &self,
    ) -> Result<Option<HeaderMap>, Box<(dyn std::error::Error + Send + Sync + 'static)>> {
        Self::headers(self)
    }
    fn header_name(&self) -> HeaderName {
        Self::header_name()
    }
    fn finalize(&self) -> Result<Bytes, BoxError> {
        Self::finalize(self)
    }
    fn size(&self) -> u64 {
        Self::size()
    }
}

#[derive(Debug, Default)]
struct Sha256 {
    hasher: sha2::Sha256,
}

impl Sha256 {
    fn update(&mut self, bytes: &[u8]) -> Result<(), BoxError> {
        self.hasher.write_all(bytes)?;

        Ok(())
    }

    fn headers(&self) -> Result<Option<HeaderMap<HeaderValue>>, BoxError> {
        let mut header_map = HeaderMap::new();
        header_map.insert(Self::header_name(), self.header_value());

        Ok(Some(header_map))
    }

    fn finalize(&self) -> Result<Bytes, BoxError> {
        Ok(Bytes::copy_from_slice(
            self.hasher.clone().finalize().as_slice(),
        ))
    }

    // Size of the checksum in bytes
    fn size() -> u64 {
        32
    }

    fn header_name() -> HeaderName {
        SHA_256_HEADER_NAME
    }

    fn header_value(&self) -> HeaderValue {
        // We clone the hasher because `Hasher::finalize` consumes `self`
        let hash = self.hasher.clone().finalize();
        HeaderValue::from_str(&base64::encode(&hash[..]))
            .expect("will always produce a valid header value from a SHA-256 checksum")
    }
}

impl Checksum for Sha256 {
    fn update(
        &mut self,
        bytes: &[u8],
    ) -> Result<(), Box<(dyn std::error::Error + Send + Sync + 'static)>> {
        Self::update(self, bytes)
    }
    fn headers(
        &self,
    ) -> Result<Option<HeaderMap>, Box<(dyn std::error::Error + Send + Sync + 'static)>> {
        Self::headers(self)
    }
    fn header_name(&self) -> HeaderName {
        Self::header_name()
    }
    fn finalize(&self) -> Result<Bytes, BoxError> {
        Self::finalize(self)
    }
    fn size(&self) -> u64 {
        Self::size()
    }
}

#[derive(Debug, Default)]
struct Md5 {
    hasher: md5::Md5,
}

impl Md5 {
    fn update(&mut self, bytes: &[u8]) -> Result<(), BoxError> {
        self.hasher.write_all(bytes)?;

        Ok(())
    }

    fn headers(&self) -> Result<Option<HeaderMap<HeaderValue>>, BoxError> {
        let mut header_map = HeaderMap::new();
        header_map.insert(Self::header_name(), self.header_value());

        Ok(Some(header_map))
    }

    fn finalize(&self) -> Result<Bytes, BoxError> {
        Ok(Bytes::copy_from_slice(
            self.hasher.clone().finalize().as_slice(),
        ))
    }

    // Size of the checksum in bytes
    fn size() -> u64 {
        16
    }

    fn header_name() -> HeaderName {
        MD5_HEADER_NAME
    }

    fn header_value(&self) -> HeaderValue {
        // We clone the hasher because `Hasher::finalize` consumes `self`
        let hash = self.hasher.clone().finalize();
        HeaderValue::from_str(&base64::encode(&hash[..]))
            .expect("will always produce a valid header value from an MD5 checksum")
    }
}

impl Checksum for Md5 {
    fn update(
        &mut self,
        bytes: &[u8],
    ) -> Result<(), Box<(dyn std::error::Error + Send + Sync + 'static)>> {
        Self::update(self, bytes)
    }
    fn headers(
        &self,
    ) -> Result<Option<HeaderMap>, Box<(dyn std::error::Error + Send + Sync + 'static)>> {
        Self::headers(self)
    }
    fn header_name(&self) -> HeaderName {
        Self::header_name()
    }
    fn finalize(&self) -> Result<Bytes, BoxError> {
        Self::finalize(self)
    }
    fn size(&self) -> u64 {
        Self::size()
    }
}

// We have existing tests for the checksums, those don't require an update

ChecksumBody

When creating a checksum-validated request with an in-memory request body, we can read the body, calculate a checksum, and insert the checksum header, all before sending the request. When creating a checksum-validated request with a streaming request body, we don't have that luxury. Instead, we must calculate a checksum while sending the body, and append that checksum as a trailer.

We will accomplish this by wrapping the SdkBody that requires validation within a ChecksumBody. Afterwards, we'll need to wrap the ChecksumBody in yet another layer which we'll discuss in the AwsChunkedBody and AwsChunkedBodyOptions section.

// In aws-smithy-checksums/src/body.rs
use crate::{new_checksum, Checksum};

use aws_smithy_http::body::SdkBody;
use aws_smithy_http::header::append_merge_header_maps;
use aws_smithy_types::base64;

use bytes::{Buf, Bytes};
use http::header::HeaderName;
use http::{HeaderMap, HeaderValue};
use http_body::{Body, SizeHint};
use pin_project::pin_project;

use std::fmt::Display;
use std::pin::Pin;
use std::task::{Context, Poll};

/// A `ChecksumBody` will read and calculate a request body as it's being sent. Once the body has
/// been completely read, it'll append a trailer with the calculated checksum.
#[pin_project]
pub struct ChecksumBody<InnerBody> {
    #[pin]
    inner: InnerBody,
    #[pin]
    checksum: Box<dyn Checksum>,
}

impl ChecksumBody<SdkBody> {
    /// Given an `SdkBody` and the name of a checksum algorithm as a `&str`, create a new
    /// `ChecksumBody<SdkBody>`. Valid checksum algorithm names are defined in this crate's
    /// [root module](super).
    ///
    /// # Panics
    ///
    /// This will panic if the given checksum algorithm is not supported.
    pub fn new(body: SdkBody, checksum_algorithm: &str) -> Self {
        Self {
            checksum: new_checksum(checksum_algorithm),
            inner: body,
        }
    }

    /// Return the name of the trailer that will be emitted by this `ChecksumBody`
    pub fn trailer_name(&self) -> HeaderName {
        self.checksum.header_name()
    }

    /// Calculate and return the sum of the:
    /// - checksum when base64 encoded
    /// - trailer name
    /// - trailer separator
    ///
    /// This is necessary for calculating the true size of the request body for certain
    /// content-encodings.
    pub fn trailer_length(&self) -> u64 {
        let trailer_name_size_in_bytes = self.checksum.header_name().as_str().len() as u64;
        let base64_encoded_checksum_size_in_bytes = base64::encoded_length(self.checksum.size());

        (trailer_name_size_in_bytes
            // HTTP trailer names and values may be separated by either a single colon or a single
            // colon and a whitespace. In the AWS Rust SDK, we use a single colon.
            + ":".len() as u64
            + base64_encoded_checksum_size_in_bytes)
    }

    fn poll_inner(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Result<Bytes, aws_smithy_http::body::Error>>> {
        let this = self.project();
        let inner = this.inner;
        let mut checksum = this.checksum;

        match inner.poll_data(cx) {
            Poll::Ready(Some(Ok(mut data))) => {
                let len = data.chunk().len();
                let bytes = data.copy_to_bytes(len);

                if let Err(e) = checksum.update(&bytes) {
                    return Poll::Ready(Some(Err(e)));
                }

                Poll::Ready(Some(Ok(bytes)))
            }
            Poll::Ready(None) => Poll::Ready(None),
            Poll::Ready(Some(Err(e))) => Poll::Ready(Some(Err(e))),
            Poll::Pending => Poll::Pending,
        }
    }
}

impl http_body::Body for ChecksumBody<SdkBody> {
    type Data = Bytes;
    type Error = aws_smithy_http::body::Error;

    fn poll_data(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Result<Self::Data, Self::Error>>> {
        self.poll_inner(cx)
    }

    fn poll_trailers(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Result<Option<HeaderMap<HeaderValue>>, Self::Error>> {
        let this = self.project();
        match (
            this.checksum.headers(),
            http_body::Body::poll_trailers(this.inner, cx),
        ) {
            // If everything is ready, return trailers, merging them if we have more than one map
            (Ok(outer_trailers), Poll::Ready(Ok(inner_trailers))) => {
                let trailers = match (outer_trailers, inner_trailers) {
                    // Values from the inner trailer map take precedent over values from the outer map
                    (Some(outer), Some(inner)) => Some(append_merge_header_maps(inner, outer)),
                    // If only one or neither produced trailers, just combine the `Option`s with `or`
                    (outer, inner) => outer.or(inner),
                };
                Poll::Ready(Ok(trailers))
            }
            // If the inner poll is Ok but the outer body's checksum callback encountered an error,
            // return the error
            (Err(e), Poll::Ready(Ok(_))) => Poll::Ready(Err(e)),
            // Otherwise return the result of the inner poll.
            // It may be pending or it may be ready with an error.
            (_, inner_poll) => inner_poll,
        }
    }

    fn is_end_stream(&self) -> bool {
        self.inner.is_end_stream()
    }

    fn size_hint(&self) -> SizeHint {
        let body_size_hint = self.inner.size_hint();
        match body_size_hint.exact() {
            Some(size) => {
                let checksum_size_hint = self.checksum.size();
                SizeHint::with_exact(size + checksum_size_hint)
            }
            // TODO is this the right behavior?
            None => {
                let checksum_size_hint = self.checksum.size();
                let mut summed_size_hint = SizeHint::new();
                summed_size_hint.set_lower(body_size_hint.lower() + checksum_size_hint);

                if let Some(body_size_hint_upper) = body_size_hint.upper() {
                    summed_size_hint.set_upper(body_size_hint_upper + checksum_size_hint);
                }

                summed_size_hint
            }
        }
    }
}

// The tests I have written are omitted from this RFC for brevity. The request body checksum calculation and trailer size calculations are all tested.

ChecksumValidatedBody

Users may request checksum validation for response bodies. That capability is provided by ChecksumValidatedBody, which will calculate a checksum as the response body is being read. Once all data has been read, the calculated checksum is compared to a precalculated checksum set during body creation. If the checksums don't match, then the body will emit an error.

// In aws-smithy-checksums/src/body.rs
/// A response body that will calculate a checksum as it is read. If all data is read and the
/// calculated checksum doesn't match a precalculated checksum, this body will emit an
/// [asw_smithy_http::body::Error].
#[pin_project]
pub struct ChecksumValidatedBody<InnerBody> {
    #[pin]
    inner: InnerBody,
    #[pin]
    checksum: Box<dyn Checksum>,
    precalculated_checksum: Bytes,
}

impl ChecksumValidatedBody<SdkBody> {
    /// Given an `SdkBody`, the name of a checksum algorithm as a `&str`, and a precalculated
    /// checksum represented as `Bytes`, create a new `ChecksumValidatedBody<SdkBody>`.
    /// Valid checksum algorithm names are defined in this crate's [root module](super).
    ///
    /// # Panics
    ///
    /// This will panic if the given checksum algorithm is not supported.
    pub fn new(body: SdkBody, checksum_algorithm: &str, precalculated_checksum: Bytes) -> Self {
        Self {
            checksum: new_checksum(checksum_algorithm),
            inner: body,
            precalculated_checksum,
        }
    }

    fn poll_inner(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Result<Bytes, aws_smithy_http::body::Error>>> {
        let this = self.project();
        let inner = this.inner;
        let mut checksum = this.checksum;

        match inner.poll_data(cx) {
            Poll::Ready(Some(Ok(mut data))) => {
                let len = data.chunk().len();
                let bytes = data.copy_to_bytes(len);

                if let Err(e) = checksum.update(&bytes) {
                    return Poll::Ready(Some(Err(e)));
                }

                Poll::Ready(Some(Ok(bytes)))
            }
            // Once the inner body has stopped returning data, check the checksum
            // and return an error if it doesn't match.
            Poll::Ready(None) => {
                let actual_checksum = {
                    match checksum.finalize() {
                        Ok(checksum) => checksum,
                        Err(err) => {
                            return Poll::Ready(Some(Err(err)));
                        }
                    }
                };
                if *this.precalculated_checksum == actual_checksum {
                    Poll::Ready(None)
                } else {
                    // So many parens it's starting to look like LISP
                    Poll::Ready(Some(Err(Box::new(Error::checksum_mismatch(
                        this.precalculated_checksum.clone(),
                        actual_checksum,
                    )))))
                }
            }
            Poll::Ready(Some(Err(e))) => Poll::Ready(Some(Err(e))),
            Poll::Pending => Poll::Pending,
        }
    }
}

/// Errors related to checksum calculation and validation
#[derive(Debug, Eq, PartialEq)]
#[non_exhaustive]
pub enum Error {
    /// The actual checksum didn't match the expected checksum. The checksummed data has been
    /// altered since the expected checksum was calculated.
    ChecksumMismatch { expected: Bytes, actual: Bytes },
}

impl Error {
    /// Given an expected checksum and an actual checksum in `Bytes` form, create a new
    /// `Error::ChecksumMismatch`.
    pub fn checksum_mismatch(expected: Bytes, actual: Bytes) -> Self {
        Self::ChecksumMismatch { expected, actual }
    }
}

impl Display for Error {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
        match self {
            Error::ChecksumMismatch { expected, actual } => write!(
                f,
                "body checksum mismatch. expected body checksum to be {:x} but it was {:x}",
                expected, actual
            ),
        }
    }
}

impl std::error::Error for Error {}

impl http_body::Body for ChecksumValidatedBody<SdkBody> {
    type Data = Bytes;
    type Error = aws_smithy_http::body::Error;

    fn poll_data(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Result<Self::Data, Self::Error>>> {
        self.poll_inner(cx)
    }

    fn poll_trailers(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Result<Option<HeaderMap<HeaderValue>>, Self::Error>> {
        self.project().inner.poll_trailers(cx)
    }

    // Once the inner body returns true for is_end_stream, we still need to
    // verify the checksum; Therefore, we always return false here.
    fn is_end_stream(&self) -> bool {
        false
    }

    fn size_hint(&self) -> SizeHint {
        self.inner.size_hint()
    }
}

// The tests I have written are omitted from this RFC for brevity. The response body checksum verification is tested.

AwsChunkedBody and AwsChunkedBodyOptions

In order to send a request with checksum trailers, we must use an AWS-specific content encoding called aws-chunked. This encoding requires that we:

  • Divide the original body content into one or more chunks. For our purposes we only ever use one chunk.
  • Append a hexadecimal chunk size header to each chunk.
  • Suffix each chunk with a CRLF (carriage return line feed).
  • Send a 0 and CRLF to close the original body content section.
  • Send trailers as part of the request body, suffixing each with a CRLF.
  • Send a final CRLF to close the request body.

As an example, Sending a regular request body with a SHA-256 checksum would look similar to this:

PUT SOMEURL HTTP/1.1
x-amz-checksum-sha256: ZOyIygCyaOW6GjVnihtTFtIS9PNmskdyMlNKiuyjfzw=
Content-Length: 11
...

Hello world

and the aws-chunked version would look like this:

PUT SOMEURL HTTP/1.1
x-amz-trailer: x-amz-checksum-sha256
x-amz-decoded-content-length: 11
Content-Encoding: aws-chunked
Content-Length: 87
...

B\r\n
Hello world\r\n
0\r\n
x-amz-checksum-sha256:ZOyIygCyaOW6GjVnihtTFtIS9PNmskdyMlNKiuyjfzw=\r\n
\r\n

NOTES:

  • In the second example, B is the hexadecimal representation of 11.
  • Authorization and other headers are omitted from the examples above for brevity.
  • When using aws-chunked content encoding, S3 requires that we send the x-amz-decoded-content-length with the length of the original body content.

This encoding scheme is performed by AwsChunkedBody and configured with AwsChunkedBodyOptions.

// In aws-http/src/content_encoding.rs
use aws_smithy_checksums::body::ChecksumBody;
use aws_smithy_http::body::SdkBody;

use bytes::{Buf, Bytes, BytesMut};
use http::{HeaderMap, HeaderValue};
use http_body::{Body, SizeHint};
use pin_project::pin_project;

use std::pin::Pin;
use std::task::{Context, Poll};

const CRLF: &str = "\r\n";
const CHUNK_TERMINATOR: &str = "0\r\n";

/// Content encoding header value constants
pub mod header_value {
    /// Header value denoting "aws-chunked" encoding
    pub const AWS_CHUNKED: &str = "aws-chunked";
}

/// Options used when constructing an [`AwsChunkedBody`][AwsChunkedBody].
#[derive(Debug, Default)]
#[non_exhaustive]
pub struct AwsChunkedBodyOptions {
    /// The total size of the stream. For unsigned encoding this implies that
    /// there will only be a single chunk containing the underlying payload,
    /// unless ChunkLength is also specified.
    pub stream_length: Option<u64>,
    /// The maximum size of each chunk to be sent.
    ///
    /// If ChunkLength and stream_length are both specified, the stream will be
    /// broken up into chunk_length chunks. The encoded length of the aws-chunked
    /// encoding can still be determined as long as all trailers, if any, have a
    /// fixed length.
    pub chunk_length: Option<u64>,
    /// The length of each trailer sent within an `AwsChunkedBody`. Necessary in
    /// order to correctly calculate the total size of the body accurately.
    pub trailer_lens: Vec<u64>,
}

impl AwsChunkedBodyOptions {
    /// Create a new [`AwsChunkedBodyOptions`][AwsChunkedBodyOptions]
    pub fn new() -> Self {
        Self::default()
    }

    /// Set stream length
    pub fn with_stream_length(mut self, stream_length: u64) -> Self {
        self.stream_length = Some(stream_length);
        self
    }

    /// Set chunk length
    pub fn with_chunk_length(mut self, chunk_length: u64) -> Self {
        self.chunk_length = Some(chunk_length);
        self
    }

    /// Set a trailer len
    pub fn with_trailer_len(mut self, trailer_len: u64) -> Self {
        self.trailer_lens.push(trailer_len);
        self
    }
}

#[derive(Debug, PartialEq, Eq)]
enum AwsChunkedBodyState {
    WritingChunkSize,
    WritingChunk,
    WritingTrailers,
    Closed,
}

/// A request body compatible with `Content-Encoding: aws-chunked`
///
/// Chunked-Body grammar is defined in [ABNF] as:
///
/// ```txt
/// Chunked-Body    = *chunk
///                   last-chunk
///                   chunked-trailer
///                   CRLF
///
/// chunk           = chunk-size CRLF chunk-data CRLF
/// chunk-size      = 1*HEXDIG
/// last-chunk      = 1*("0") CRLF
/// chunked-trailer = *( entity-header CRLF )
/// entity-header   = field-name ":" OWS field-value OWS
/// ```
/// For more info on what the abbreviations mean, see https://datatracker.ietf.org/doc/html/rfc7230#section-1.2
///
/// [ABNF]:https://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_form
#[derive(Debug)]
#[pin_project]
pub struct AwsChunkedBody<InnerBody> {
    #[pin]
    inner: InnerBody,
    #[pin]
    state: AwsChunkedBodyState,
    options: AwsChunkedBodyOptions,
}

// Currently, we only use this in terms of a streaming request body with checksum trailers
type Inner = ChecksumBody<SdkBody>;

impl AwsChunkedBody<Inner> {
    /// Wrap the given body in an outer body compatible with `Content-Encoding: aws-chunked`
    pub fn new(body: Inner, options: AwsChunkedBodyOptions) -> Self {
        Self {
            inner: body,
            state: AwsChunkedBodyState::WritingChunkSize,
            options,
        }
    }

    fn encoded_length(&self) -> Option<u64> {
        if self.options.chunk_length.is_none() && self.options.stream_length.is_none() {
            return None;
        }

        let mut length = 0;
        let stream_length = self.options.stream_length.unwrap_or_default();
        if stream_length != 0 {
            if let Some(chunk_length) = self.options.chunk_length {
                let num_chunks = stream_length / chunk_length;
                length += num_chunks * get_unsigned_chunk_bytes_length(chunk_length);
                let remainder = stream_length % chunk_length;
                if remainder != 0 {
                    length += get_unsigned_chunk_bytes_length(remainder);
                }
            } else {
                length += get_unsigned_chunk_bytes_length(stream_length);
            }
        }

        // End chunk
        length += CHUNK_TERMINATOR.len() as u64;

        // Trailers
        for len in self.options.trailer_lens.iter() {
            length += len + CRLF.len() as u64;
        }

        // Encoding terminator
        length += CRLF.len() as u64;

        Some(length)
    }
}

fn prefix_with_chunk_size(data: Bytes, chunk_size: u64) -> Bytes {
    // Len is the size of the entire chunk as defined in `AwsChunkedBodyOptions`
    let mut prefixed_data = BytesMut::from(format!("{:X?}\r\n", chunk_size).as_bytes());
    prefixed_data.extend_from_slice(&data);

    prefixed_data.into()
}

fn get_unsigned_chunk_bytes_length(payload_length: u64) -> u64 {
    let hex_repr_len = int_log16(payload_length);
    hex_repr_len + CRLF.len() as u64 + payload_length + CRLF.len() as u64
}

fn trailers_as_aws_chunked_bytes(
    total_length_of_trailers_in_bytes: u64,
    trailer_map: Option<HeaderMap>,
) -> Bytes {
    use std::fmt::Write;

    // On 32-bit operating systems, we might not be able to convert the u64 to a usize, so we just
    // use `String::new` in that case.
    let mut trailers = match usize::try_from(total_length_of_trailers_in_bytes) {
        Ok(total_length_of_trailers_in_bytes) => {
            String::with_capacity(total_length_of_trailers_in_bytes)
        }
        Err(_) => String::new(),
    };
    let mut already_wrote_first_trailer = false;

    if let Some(trailer_map) = trailer_map {
        for (header_name, header_value) in trailer_map.into_iter() {
            match header_name {
                // New name, new value
                Some(header_name) => {
                    if already_wrote_first_trailer {
                        // First trailer shouldn't have a preceding CRLF, but every trailer after it should
                        trailers.write_str(CRLF).unwrap();
                    } else {
                        already_wrote_first_trailer = true;
                    }

                    trailers.write_str(header_name.as_str()).unwrap();
                    trailers.write_char(':').unwrap();
                }
                // Same name, new value
                None => {
                    trailers.write_char(',').unwrap();
                }
            }
            trailers.write_str(header_value.to_str().unwrap()).unwrap();
        }
    }

    // Write CRLF to end the body
    trailers.write_str(CRLF).unwrap();
    // If we wrote at least one trailer, we need to write an extra CRLF
    if total_length_of_trailers_in_bytes != 0 {
        trailers.write_str(CRLF).unwrap();
    }

    trailers.into()
}

impl Body for AwsChunkedBody<Inner> {
    type Data = Bytes;
    type Error = aws_smithy_http::body::Error;

    fn poll_data(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Result<Self::Data, Self::Error>>> {
        tracing::info!("polling AwsChunkedBody");
        let mut this = self.project();

        match *this.state {
            AwsChunkedBodyState::WritingChunkSize => match this.inner.poll_data(cx) {
                Poll::Ready(Some(Ok(data))) => {
                    // A chunk must be prefixed by chunk size in hexadecimal
                    tracing::info!("writing chunk size and start of chunk");
                    *this.state = AwsChunkedBodyState::WritingChunk;
                    let total_chunk_size = this
                        .options
                        .chunk_length
                        .or(this.options.stream_length)
                        .unwrap_or_default();
                    Poll::Ready(Some(Ok(prefix_with_chunk_size(data, total_chunk_size))))
                }
                Poll::Ready(None) => {
                    tracing::info!("chunk was empty, writing last-chunk");
                    *this.state = AwsChunkedBodyState::WritingTrailers;
                    Poll::Ready(Some(Ok(Bytes::from("0\r\n"))))
                }
                Poll::Ready(Some(Err(e))) => Poll::Ready(Some(Err(e))),
                Poll::Pending => Poll::Pending,
            },
            AwsChunkedBodyState::WritingChunk => match this.inner.poll_data(cx) {
                Poll::Ready(Some(Ok(mut data))) => {
                    tracing::info!("writing rest of chunk data");
                    Poll::Ready(Some(Ok(data.copy_to_bytes(data.len()))))
                }
                Poll::Ready(None) => {
                    tracing::info!("no more chunk data, writing CRLF and last-chunk");
                    *this.state = AwsChunkedBodyState::WritingTrailers;
                    Poll::Ready(Some(Ok(Bytes::from("\r\n0\r\n"))))
                }
                Poll::Ready(Some(Err(e))) => Poll::Ready(Some(Err(e))),
                Poll::Pending => Poll::Pending,
            },
            AwsChunkedBodyState::WritingTrailers => {
                return match this.inner.poll_trailers(cx) {
                    Poll::Ready(Ok(trailers)) => {
                        *this.state = AwsChunkedBodyState::Closed;
                        let total_length_of_trailers_in_bytes =
                            this.options.trailer_lens.iter().fold(0, |acc, n| acc + n);

                        Poll::Ready(Some(Ok(trailers_as_aws_chunked_bytes(
                            total_length_of_trailers_in_bytes,
                            trailers,
                        ))))
                    }
                    Poll::Pending => Poll::Pending,
                    Poll::Ready(Err(e)) => Poll::Ready(Some(Err(e))),
                };
            }
            AwsChunkedBodyState::Closed => {
                return Poll::Ready(None);
            }
        }
    }

    fn poll_trailers(
        self: Pin<&mut Self>,
        _cx: &mut Context<'_>,
    ) -> Poll<Result<Option<HeaderMap<HeaderValue>>, Self::Error>> {
        // Trailers were already appended to the body because of the content encoding scheme
        Poll::Ready(Ok(None))
    }

    fn is_end_stream(&self) -> bool {
        self.state == AwsChunkedBodyState::Closed
    }

    fn size_hint(&self) -> SizeHint {
        SizeHint::with_exact(
            self.encoded_length()
                .expect("Requests made with aws-chunked encoding must have known size")
                as u64,
        )
    }
}

// Used for finding how many hexadecimal digits it takes to represent a base 10 integer
fn int_log16<T>(mut i: T) -> u64
where
    T: std::ops::DivAssign + PartialOrd + From<u8> + Copy,
{
    let mut len = 0;
    let zero = T::from(0);
    let sixteen = T::from(16);

    while i > zero {
        i /= sixteen;
        len += 1;
    }

    len
}

#[cfg(test)]
mod tests {
    use super::AwsChunkedBody;
    use crate::content_encoding::AwsChunkedBodyOptions;
    use aws_smithy_checksums::body::ChecksumBody;
    use aws_smithy_http::body::SdkBody;
    use bytes::Buf;
    use bytes_utils::SegmentedBuf;
    use http_body::Body;
    use std::io::Read;

    #[tokio::test]
    async fn test_aws_chunked_encoded_body() {
        let input_text = "Hello world";
        let sdk_body = SdkBody::from(input_text);
        let checksum_body = ChecksumBody::new(sdk_body, "sha256");
        let aws_chunked_body_options = AwsChunkedBodyOptions {
            stream_length: Some(input_text.len() as u64),
            chunk_length: None,
            trailer_lens: vec![
                "x-amz-checksum-sha256:ZOyIygCyaOW6GjVnihtTFtIS9PNmskdyMlNKiuyjfzw=".len() as u64,
            ],
        };
        let mut aws_chunked_body = AwsChunkedBody::new(checksum_body, aws_chunked_body_options);

        let mut output = SegmentedBuf::new();
        while let Some(buf) = aws_chunked_body.data().await {
            output.push(buf.unwrap());
        }

        let mut actual_output = String::new();
        output
            .reader()
            .read_to_string(&mut actual_output)
            .expect("Doesn't cause IO errors");

        let expected_output = "B\r\nHello world\r\n0\r\nx-amz-checksum-sha256:ZOyIygCyaOW6GjVnihtTFtIS9PNmskdyMlNKiuyjfzw=\r\n\r\n";

        // Verify data is complete and correctly encoded
        assert_eq!(expected_output, actual_output);

        assert!(
            aws_chunked_body
                .trailers()
                .await
                .expect("checksum generation was without error")
                .is_none(),
            "aws-chunked encoded bodies don't have normal HTTP trailers"
        );
    }

    #[tokio::test]
    async fn test_empty_aws_chunked_encoded_body() {
        let sdk_body = SdkBody::from("");
        let checksum_body = ChecksumBody::new(sdk_body, "sha256");
        let aws_chunked_body_options = AwsChunkedBodyOptions {
            stream_length: Some(0),
            chunk_length: None,
            trailer_lens: vec![
                "x-amz-checksum-sha256:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=".len() as u64,
            ],
        };
        let mut aws_chunked_body = AwsChunkedBody::new(checksum_body, aws_chunked_body_options);

        let mut output = SegmentedBuf::new();
        while let Some(buf) = aws_chunked_body.data().await {
            output.push(buf.unwrap());
        }

        let mut actual_output = String::new();
        output
            .reader()
            .read_to_string(&mut actual_output)
            .expect("Doesn't cause IO errors");

        let expected_output =
            "0\r\nx-amz-checksum-sha256:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=\r\n\r\n";

        // Verify data is complete and correctly encoded
        assert_eq!(expected_output, actual_output);

        assert!(
            aws_chunked_body
                .trailers()
                .await
                .expect("checksum generation was without error")
                .is_none(),
            "aws-chunked encoded bodies don't have normal HTTP trailers"
        );
    }
}

Sigv4 Update

When sending checksum-verified requests with a streaming body, we must update the usual signing process. Instead of signing the request based on the request body's checksum, we must sign it with a special header instead:

Authorization: <computed authorization header value using "STREAMING-UNSIGNED-PAYLOAD-TRAILER">
x-amz-content-sha256: STREAMING-UNSIGNED-PAYLOAD-TRAILER

Setting STREAMING-UNSIGNED-PAYLOAD-TRAILER tells the signer that we're sending an unsigned streaming body that will be followed by trailers.

We can achieve this by:

  • Adding a new variant to SignableBody:
    /// A signable HTTP request body
    #[derive(Debug, Clone, Eq, PartialEq)]
    #[non_exhaustive]
    pub enum SignableBody<'a> {
        // existing variants have been omitted for brevity...
    
        /// An unsigned payload with trailers
        ///
        /// StreamingUnsignedPayloadTrailer is used for streaming requests where the contents of the
        /// body cannot be known prior to signing **AND** which include HTTP trailers.
        StreamingUnsignedPayloadTrailer,
    }
  • Updating the CanonicalRequest::payload_hash method to include the new SignableBody variant:
    fn payload_hash<'b>(body: &'b SignableBody<'b>) -> Cow<'b, str> {
        // Payload hash computation
        //
        // Based on the input body, set the payload_hash of the canonical request:
        // Either:
        // - compute a hash
        // - use the precomputed hash
        // - use `UnsignedPayload`
        // - use `StreamingUnsignedPayloadTrailer`
        match body {
            SignableBody::Bytes(data) => Cow::Owned(sha256_hex_string(data)),
            SignableBody::Precomputed(digest) => Cow::Borrowed(digest.as_str()),
            SignableBody::UnsignedPayload => Cow::Borrowed(UNSIGNED_PAYLOAD),
            SignableBody::StreamingUnsignedPayloadTrailer => {
                Cow::Borrowed(STREAMING_UNSIGNED_PAYLOAD_TRAILER)
            }
        }
    }
  • (in generated code) Inserting the SignableBody into the request property bag when making a checksum-verified streaming request:
    if self.checksum_algorithm.is_some() {
        request
            .properties_mut()
            .insert(aws_sig_auth::signer::SignableBody::StreamingUnsignedPayloadTrailer);
    }

It's possible to send aws-chunked requests where each chunk is signed individually. Because this feature isn't strictly necessary for flexible checksums, I've avoided implementing it.

Inlineables

In order to avoid writing lots of Rust in Kotlin, I have implemented request and response building functions as inlineables:

  • Building checksum-validated requests with in-memory request bodies:
    // In aws/rust-runtime/aws-inlineable/src/streaming_body_with_checksum.rs
    /// Given a `&mut http::request::Request`, and checksum algorithm name, calculate a checksum and
    /// then modify the request to include the checksum as a header.
    pub fn build_checksum_validated_request(
        request: &mut http::request::Request<aws_smithy_http::body::SdkBody>,
        checksum_algorithm: &str,
    ) -> Result<(), aws_smithy_http::operation::BuildError> {
        let data = request.body().bytes().unwrap_or_default();
    
        let mut checksum = aws_smithy_checksums::new_checksum(checksum_algorithm);
        checksum
            .update(data)
            .map_err(|err| aws_smithy_http::operation::BuildError::Other(err))?;
        let checksum = checksum
            .finalize()
            .map_err(|err| aws_smithy_http::operation::BuildError::Other(err))?;
    
        request.headers_mut().insert(
            aws_smithy_checksums::checksum_algorithm_to_checksum_header_name(checksum_algorithm),
            aws_smithy_types::base64::encode(&checksum[..])
                .parse()
                .expect("base64-encoded checksums are always valid header values"),
        );
    
        Ok(())
    }
  • Building checksum-validated requests with streaming request bodies:
    /// Given an `http::request::Builder`, `SdkBody`, and a checksum algorithm name, return a
    /// `Request<SdkBody>` with checksum trailers where the content is `aws-chunked` encoded.
    pub fn build_checksum_validated_request_with_streaming_body(
        request_builder: http::request::Builder,
        body: aws_smithy_http::body::SdkBody,
        checksum_algorithm: &str,
    ) -> Result<http::Request<aws_smithy_http::body::SdkBody>, aws_smithy_http::operation::BuildError> {
        use http_body::Body;
    
        let original_body_size = body
            .size_hint()
            .exact()
            .expect("body must be sized if checksum is requested");
        let body = aws_smithy_checksums::body::ChecksumBody::new(body, checksum_algorithm);
        let checksum_trailer_name = body.trailer_name();
        let aws_chunked_body_options = aws_http::content_encoding::AwsChunkedBodyOptions::new()
            .with_stream_length(original_body_size as usize)
            .with_trailer_len(body.trailer_length() as usize);
    
        let body = aws_http::content_encoding::AwsChunkedBody::new(body, aws_chunked_body_options);
        let encoded_content_length = body
            .size_hint()
            .exact()
            .expect("encoded_length must return known size");
        let request_builder = request_builder
            .header(
                http::header::CONTENT_LENGTH,
                http::HeaderValue::from(encoded_content_length),
            )
            .header(
                http::header::HeaderName::from_static("x-amz-decoded-content-length"),
                http::HeaderValue::from(original_body_size),
            )
            .header(
                http::header::HeaderName::from_static("x-amz-trailer"),
                checksum_trailer_name,
            )
            .header(
                http::header::CONTENT_ENCODING,
                aws_http::content_encoding::header_value::AWS_CHUNKED.as_bytes(),
            );
    
        let body = aws_smithy_http::body::SdkBody::from_dyn(http_body::combinators::BoxBody::new(body));
    
        request_builder
            .body(body)
            .map_err(|err| aws_smithy_http::operation::BuildError::Other(Box::new(err)))
    }
  • Building checksum-validated responses:
    /// Given a `Response<SdkBody>`, checksum algorithm name, and pre-calculated checksum, return a
    /// `Response<SdkBody>` where the body will processed with the checksum algorithm and checked
    /// against the pre-calculated checksum.
    pub fn build_checksum_validated_sdk_body(
        body: aws_smithy_http::body::SdkBody,
        checksum_algorithm: &str,
        precalculated_checksum: bytes::Bytes,
    ) -> aws_smithy_http::body::SdkBody {
        let body = aws_smithy_checksums::body::ChecksumValidatedBody::new(
            body,
            checksum_algorithm,
            precalculated_checksum.clone(),
        );
        aws_smithy_http::body::SdkBody::from_dyn(http_body::combinators::BoxBody::new(body))
    }
    
    /// Given the name of a checksum algorithm and a `HeaderMap`, extract the checksum value from the
    /// corresponding header as `Some(Bytes)`. If the header is unset, return `None`.
    pub fn check_headers_for_precalculated_checksum(
        headers: &http::HeaderMap<http::HeaderValue>,
    ) -> Option<(&'static str, bytes::Bytes)> {
        for header_name in aws_smithy_checksums::CHECKSUM_HEADERS_IN_PRIORITY_ORDER {
            if let Some(precalculated_checksum) = headers.get(&header_name) {
                let checksum_algorithm =
                    aws_smithy_checksums::checksum_header_name_to_checksum_algorithm(&header_name);
                let precalculated_checksum =
                    bytes::Bytes::copy_from_slice(precalculated_checksum.as_bytes());
    
                return Some((checksum_algorithm, precalculated_checksum));
            }
        }
    
        None
    }

Codegen

Codegen will be updated to insert the appropriate inlineable functions for operations that are tagged with the @httpchecksum trait. Some operations will require an MD5 checksum fallback if the user hasn't set a checksum themselves.

Users also have the option of supplying a precalculated checksum of their own. This is already handled by our current header insertion logic and won't require updating the existing implementation. Because this checksum validation behavior is AWS-specific, it will be defined in SDK codegen.

Implementation Checklist

  • Implement codegen for building checksum-validated requests:
    • In-memory request bodies
      • Support MD5 fallback behavior for services that enable it.
    • Streaming request bodies
  • Implement codegen for building checksum-validated responses:

RFC: Customizable Client Operations

Status: Implemented

For a summarized list of proposed changes, see the Changes Checklist section.

SDK customers occasionally need to add additional HTTP headers to requests, and currently, the SDK has no easy way to accomplish this. At time of writing, the lower level Smithy client has to be used to create an operation, and then the HTTP request augmented on that operation type. For example:

let input = SomeOperationInput::builder().some_value(5).build()?;

let operation = {
    let op = input.make_operation(&service_config).await?;
    let (request, response) = op.into_request_response();

    let request = request.augment(|req, _props| {
        req.headers_mut().insert(
            HeaderName::from_static("x-some-header"),
            HeaderValue::from_static("some-value")
        );
        Result::<_, Infallible>::Ok(req)
    })?;

    Operation::from_parts(request, response)
};

let response = smithy_client.call(operation).await?;

This approach is both difficult to discover and implement since it requires acquiring a Smithy client rather than the generated fluent client, and it's anything but ergonomic.

This RFC proposes an easier way to augment requests that is compatible with the fluent client.

Terminology

  • Smithy Client: A aws_smithy_client::Client<C, M, R> struct that is responsible for gluing together the connector, middleware, and retry policy.
  • Fluent Client: A code generated Client that has methods for each service operation on it. A fluent builder is generated alongside it to make construction easier.

Proposal

The code generated fluent builders returned by the fluent client should have a method added to them, similar to send, but that returns a customizable request. The customer experience should look as follows:

let response = client.some_operation()
    .some_value(5)
    .customize()
    .await?
    .mutate_request(|mut req| {
        req.headers_mut().insert(
            HeaderName::from_static("x-some-header"),
            HeaderValue::from_static("some-value")
        );
    })
    .send()
    .await?;

This new async customize method would return the following:

pub struct CustomizableOperation<O, R> {
    handle: Arc<Handle>,
    operation: Operation<O, R>,
}

impl<O, R> CustomizableOperation<O, R> {
    // Allows for customizing the operation's request
    fn map_request<E>(
        mut self,
        f: impl FnOnce(Request<SdkBody>) -> Result<Request<SdkBody>, E>,
    ) -> Result<Self, E> {
        let (request, response) = self.operation.into_request_response();
        let request = request.augment(|req, _props| f(req))?;
        self.operation = Operation::from_parts(request, response);
        Ok(self)
    }

    // Convenience for `map_request` where infallible direct mutation of request is acceptable
    fn mutate_request<E>(
        mut self,
        f: impl FnOnce(&mut Request<SdkBody>) -> (),
    ) -> Self {
        self.map_request(|mut req| {
            f(&mut req);
            Result::<_, Infallible>::Ok(req)
        }).expect("infallible");
        Ok(self)
    }

    // Allows for customizing the entire operation
    fn map_operation<E>(
        mut self,
        f: impl FnOnce(Operation<O, R>) -> Result<Operation<O, R>, E>,
    ) -> Result<Self, E> {
        self.operation = f(self.operation)?;
        Ok(self)
    }

    // Direct access to read the request
    fn request(&self) -> &Request<SdkBody> {
        self.operation.request()
    }

    // Direct access to mutate the request
    fn request_mut(&mut self) -> &mut Request<SdkBody> {
        self.operation.request_mut()
    }

    // Sends the operation's request
    async fn send<T, E>(self) -> Result<T, SdkError<E>>
    where
        O: ParseHttpResponse<Output = Result<T, E>> + Send + Sync + Clone + 'static,
        E: std::error::Error,
        R: ClassifyResponse<SdkSuccess<T>, SdkError<E>> + Send + Sync,
    {
        self.handle.client.call(self.operation).await
    }
}

Additionally, for those who want to avoid closures, the Operation type will have request and request_mut methods added to it to get direct access to its underlying HTTP request.

The CustomizableOperation type will then mirror these functions so that the experience can look as follows:

let mut operation = client.some_operation()
    .some_value(5)
    .customize()
    .await?;
operation.request_mut()
    .headers_mut()
    .insert(
        HeaderName::from_static("x-some-header"),
        HeaderValue::from_static("some-value")
    );
let response = operation.send().await?;

Why not remove async from customize to make this more ergonomic?

In the proposal above, customers must await the result of customize in order to get the CustomizableOperation. This is a result of the underlying map_operation function that customize needs to call being async, which was made async during the implementation of customizations for Glacier (see #797, #801, and #1474). It is possible to move these Glacier customizations into middleware to make map_operation sync, but keeping it async is much more future-proof since if a future customization or feature requires it to be async, it won't be a breaking change in the future.

Why the name customize?

Alternatively, the name build could be used, but this increases the odds that customers won't realize that they can call send directly, and then call a longer build/send chain when customization isn't needed:

client.some_operation()
    .some_value()
    .build() // Oops, didn't need to do this
    .send()
    .await?;

vs.

client.some_operation()
    .some_value()
    .send()
    .await?;

Additionally, no AWS services at time of writing have a member named customize that would conflict with the new function, so adding it would not be a breaking change.

Changes Checklist

  • Create CustomizableOperation as an inlinable, and code generate it into client so that it has access to Handle
  • Code generate the customize method on fluent builders
  • Update the RustReservedWords class to include customize
  • Add ability to mutate the HTTP request on Operation
  • Add examples for both approaches
  • Comment on older discussions asking about how to do this with this improved approach

RFC: Logging in the Presence of Sensitive Data

Status: Accepted

Smithy provides a sensitive trait which exists as a @sensitive field annotation syntactically and has the following semantics:

Sensitive data MUST NOT be exposed in things like exception messages or log output. Application of this trait SHOULD NOT affect wire logging (i.e., logging of all data transmitted to and from servers or clients).

This RFC is concerned with solving the problem of honouring this specification in the context of logging.

Progress has been made towards this goal in the form of the Sensitive Trait PR, which uses code generation to remove sensitive fields from Debug implementations.

The problem remains open due to the existence of HTTP binding traits and a lack of clearly defined user guidelines which customers may follow to honour the specification.

This RFC proposes:

  • A new logging middleware is generated and applied to each OperationHandler Service.
  • A developer guideline is provided on how to avoid violating the specification.

Terminology

  • Model: A Smithy Model, usually pertaining to the one in use by the customer.
  • Runtime crate: A crate existing within the rust-runtime folder, used to implement shared functionalities that do not have to be code-generated.
  • Service: The tower::Service trait. The lowest level of abstraction we deal with when making HTTP requests. Services act directly on data to transform and modify that data. A Service is what eventually turns a request into a response.
  • Middleware: Broadly speaking, middleware modify requests and responses. Concretely, these are exist as implementations of Layer/a Service wrapping an inner Service.
  • Potentially sensitive: Data that could be bound to a sensitive field of a structure, for example via the HTTP Binding Traits.

Background

HTTP Binding Traits

Smithy provides various HTTP binding traits. These allow protocols to configure a HTTP request by way of binding fields to parts of the request. For this reason sensitive data might be unintentionally leaked through logging of a bound request.

TraitConfigurable
httpHeaderHeaders
httpPrefixHeadersHeaders
httpLabelURI
httpPayloadPayload
httpQueryQuery Parameters
httpResponseCodeStatus Code

Each of these configurable parts must therefore be logged cautiously.

Scope and Guidelines

It would be unfeasible to forbid the logging of sensitive data all together using the type system. With the current API, the customer will always have an opportunity to log a request containing sensitive data before it enters the Service<Request<B>> that we provide to them.

// The API provides us with a `Service<Request<B>>`
let app: Router = OperationRegistryBuilder::default().build().expect("unable to build operation registry").into();

// We can `ServiceExt::map_request` log a request with potentially sensitive data
let app = app.map_request(|request| {
        info!(?request);
        request
    });

A more subtle violation of the specification may occur when the customer enables verbose logging - a third-party dependency might simply log data marked as sensitive, for example tokio or hyper.

These two cases illustrate that smithy-rs can only prevent violation of the specification in a restricted scope - logs emitted from generated code and the runtime crates. A smithy-rs specific guideline should be available to the customer which outlines how to avoid violating the specification in areas outside of our control.

Routing

The sensitivity and HTTP bindings are declared within specific structures/operations. For this reason, in the general case, it's unknowable whether or not any given part of a request is sensitive until we determine which operation is tasked with handling the request and hence which fields are bound. Implementation wise, this means that any middleware applied before routing has taken place cannot log anything potentially sensitive without performing routing logic itself.

Note that:

  • We are not required to deserialize the entire request before we can make judgments on what data is sensitive or not - only which operation it has been routed to.
  • We are permitted to emit logs prior to routing when:
    • they contain no potentially sensitive data, or
    • the request failed to route, in which case it's not subject to the constraints of an operation.

Runtime Crates

The crates existing in rust-runtime are not code generated - their source code is agnostic to the specific model in use. For this reason, if such a crate wanted to log potentially sensitive data then there must be a way to conditionally toggle that log without manipulation of the source code. Any proposed solution must acknowledge this concern.

Proposal

This proposal serves to honor the sensitivity specification via code generation of a logging middleware which is aware of the sensitivity, together with a developer contract disallowing logging potentially sensitive data in the runtime crates. A developer guideline should be provided in addition to the middleware.

All data known to be sensitive should be replaced with "{redacted}" when logged. Implementation wise this means that tracing::Events and tracing::Spans of the form debug!(field = "sensitive data") and span!(..., field = "sensitive data") must become debug!(field = "{redacted}") and span!(..., field = "{redacted}").

Debug Logging

Developers might want to observe sensitive data for debugging purposes. It should be possible to opt-out of the redactions by enabling a feature flag unredacted-logging (which is disabled by default).

To prevent excessive branches such as

if cfg!(feature = "unredacted-logging") {
    debug!(%data, "logging here");
} else {
    debug!(data = "{redacted}", "logging here");
}

the following wrapper should be provided from a runtime crate:

pub struct Sensitive<T>(T);

impl<T> Debug for Sensitive<T>
where
    T: Debug
{
    fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error> {
        if cfg!(feature = "unredacted-logging") {
            self.0.fmt(f)
        } else {
            "{redacted}".fmt(f)
        }
    }
}

impl<T> Display for Sensitive<T>
where
    T: Display
{
    fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error> {
        if cfg!(feature = "unredacted-logging") {
            self.0.fmt(f)
        } else {
            "{redacted}".fmt(f)
        }
    }
}

In which case the branch above becomes

debug!(sensitive_data = %Sensitive(data));

Code Generated Logging Middleware

Using the smithy model, for each operation, a logging middleware should be generated. Through the model, the code generation knows which fields are sensitive and which HTTP bindings exist, therefore the logging middleware can be carefully crafted to avoid leaking sensitive data.

As a request enters this middleware it should record the method, HTTP headers, status code, and URI in a tracing::span. As a response leaves this middleware it should record the HTTP headers and status code in a tracing::debug.

The following model

@readonly
@http(uri: "/inventory/{name}", method: "GET")
operation Inventory {
    input: Product,
    output: Stocked
}

@input
structure Product {
    @required
    @sensitive
    @httpLabel
    name: String
}

@output
structure Stocked {
    @sensitive
    @httpResponseCode
    code: String,
}

should generate the following

// NOTE: This code is intended to show behavior - it does not compile

pub struct InventoryLogging<S> {
    inner: S,
    operation_name: &'static str
}

impl<S> InventoryLogging<S> {
    pub fn new(inner: S) -> Self {
        Self {
            inner
        }
    }
}

impl<B, S> Service<Request<B>> for InventoryLogging<S>
where
    S: Service<Request<B>>
{
    type Response = Response<BoxBody>;
    type Error = S::Error;
    type Future = /* Implementation detail */;

    fn call(&mut self, request: Request<B>) -> Self::Future {
        // Remove sensitive data from parts of the HTTP
        let uri = /* redact {name} from URI */;
        let headers = /* no redactions */;

        let fut = async {
            let response = self.inner.call(request).await;
            let status_code = /* redact status code */;
            let headers = /* no redactions */;

            debug!(%status_code, ?headers, "response");

            response
        };

        // Instrument the future with a span
        let span = debug_span!("request", operation = %self.operation_name, method = %request.method(), %uri, ?headers);
        fut.instrument(span)
    }
}

HTTP Debug/Display Wrappers

The Service::call path, seen in Code Generated Logging Middleware, is latency-sensitive. Careful implementation is required to avoid excess allocations during redaction of sensitive data. Wrapping Uri and HeaderMap then providing a new Display/Debug implementation which skips over the sensitive data is preferable over allocating a new String/HeaderMap and then mutating it.

These wrappers should be provided alongside the Sensitive struct described in Debug Logging. If they are implemented on top of Sensitive, they will inherit the same behavior - allowing redactions to be toggled using unredacted-logging feature flag.

Middleware Position

This logging middleware should be applied outside of the OperationHandler after its construction in the (generated) operation_registry.rs file. The middleware should preserve the associated types of the OperationHandler (Response = Response<BoxBody>, Error = Infallible) to cause minimal disruption.

An easy position to apply the logging middleware is illustrated below in the form of Logging{Operation}::new:

let empty_operation = LoggingEmptyOperation::new(operation(registry.empty_operation));
let get_pokemon_species = LoggingPokemonSpecies::new(operation(registry.get_pokemon_species));
let get_server_statistics = LoggingServerStatistics::new(operation(registry.get_server_statistics));
let routes = vec![
    (BoxCloneService::new(empty_operation), empty_operation_request_spec),
    (BoxCloneService::new(get_pokemon_species), get_pokemon_species_request_spec),
    (BoxCloneService::new(get_server_statistics), get_server_statistics_request_spec),
];
let router = aws_smithy_http_server::routing::Router::new_rest_json_router(routes);

Although an acceptable first step, putting logging middleware here is suboptimal - the Router allows a tower::Layer to be applied to the operation by using the Router::layer method. This middleware will be applied outside of the logging middleware and, as a result, will not be subject to the span of any middleware. Therefore, the Router must be changed to allow for middleware to be applied within the logging middleware rather than outside of it.

This is a general problem, not specific to this proposal. For example, Use Request Extensions must also solve this problem.

Fortunately, this problem is separable from the actual implementation of the logging middleware and we can get immediate benefit by application of it in the suboptimal position described above.

Logging within the Router

There is need for logging within the Router implementation - this is a crucial area of business logic. As mentioned in the Routing section, we are permitted to log potentially sensitive data in cases where requests fail to get routed to an operation.

In the case of AWS JSON 1.0 and 1.1 protocols, the request URI is always /, putting it outside of the reach of the @sensitive trait. We therefore have the option to log it before routing occurs. We make a choice not to do this in order to remove the special case - relying on the logging layer to log URIs when appropriate.

Developer Guideline

A guideline should be made available, which includes:

Alternative Proposals

All of the following proposals are compatible with, and benefit from, Debug Logging, HTTP Debug/Display Wrappers, and Developer Guideline portions of the main proposal.

The main proposal disallows the logging of potentially sensitive data in the runtime crates, instead opting for a dedicated code generated logging middleware. In contrast, the following proposals all seek ways to accommodate logging of potentially sensitive data in the runtime crates.

Use Request Extensions

Request extensions can be used to adjoin data to a Request as it passes through the middleware. Concretely, they exist as the type map http::Extensions accessed via http::extensions and http::extensions_mut.

These can be used to provide data to middleware interested in logging potentially sensitive data.

struct Sensitivity {
    /* Data concerning which parts of the request are sensitive */
}

struct Middleware<S> {
    inner: S
}

impl<B, S> Service<Request<B>> for Middleware<S> {
    /* ... */

    fn call(&mut self, request: Request<B>) -> Self::Future {
        if let Some(sensitivity) = request.extensions().get::<Sensitivity>() {
            if sensitivity.is_method_sensitive() {
                debug!(method = %request.method());
            }
        }

        /* ... */

        self.inner.call(request)
    }
}

A middleware layer must be code generated (much in the same way as the logging middleware) which is dedicated to inserting the Sensitivity struct into the extensions of each incoming request.

impl<B, S> Service<Request<B>> for SensitivityInserter<S>
where
    S: Service<Request<B>>
{
    /* ... */

    fn call(&mut self, request: Request<B>) -> Self::Future {
        let sensitivity = Sensitivity {
            /* .. */
        };
        request.extensions_mut().insert(sensitivity);

        self.inner.call(request)
    }
}

Advantages

  • Applicable to all middleware which takes http::Request<B>.
  • Does not pollute the API of the middleware - code internal to middleware simply inspects the request's extensions and performs logic based on its value.

Disadvantages

  • The sensitivity and HTTP bindings are known at compile time whereas the insertion/retrieval of the extension data is done at runtime.
    • http::Extensions is approximately a HashMap<u64, Box<dyn Any>> so lookup/insertion involves indirection/cache misses/heap allocation.

Accommodate the Sensitivity in Middleware API

It is possible that sensitivity is a parameter passed to middleware during construction. This is similar in nature to Use Request Extensions except that the Sensitivity is passed to middleware during construction.

struct Middleware<S> {
    inner: S,
    sensitivity: Sensitivity
}

impl Middleware<S> {
    pub fn new(inner: S) -> Self { /* ... */ }

    pub fn new_with_sensitivity(inner: S, sensitivity: Sensitivity) -> Self { /* ... */ }
}

impl<B, S> Service<Request<B>> for Middleware<S> {
    /* ... */

    fn call(&mut self, request: Request<B>) -> Self::Future {
        if self.sensitivity.is_method_sensitive() {
            debug!(method = %Sensitive(request.method()));
        }

        /* ... */

        self.inner.call(request)
    }
}

It would then be required that the code generation responsible constructing a Sensitivity for each operation. Additionally, if any middleware is being applied to a operation then the code generation would be responsible for passing that middleware the appropriate Sensitivity before applying it.

Advantages

  • Applicable to all middleware.
  • As the Sensitivity struct will be known statically, the compiler will remove branches, making it cheap.

Disadvantages

  • Pollutes the API of middleware.

Redact values using a tracing Layer

Distinct from tower::Layer, a tracing::Layer is a "composable handler for tracing events". It would be possible to write an implementation which would filter out events which contain sensitive data.

Examples of filtering tracing::Layers already exist in the form of the EnvFilter and Targets. It is unlikely that we'll be able to leverage them for our use, but the underlying principle remains the same - the tracing::Layer inspects tracing::Events/tracing::Spans, filtering them based on some criteria.

Code generation would be need to be used in order to produce the filtering criteria from the models. Internal developers would need to adhere to a common set of field names in order for them to be subject to the filtering. Spans would need to be opened after routing occurs in order for the tracing::Layer to know which operation Events are being produced within and hence which filtering rules to apply.

Advantages

  • Applicable to all middleware.
  • Good separation of concerns:
    • Does not pollute the API of the middleware
    • No specific logic required within middleware.

Disadvantages

  • Complex implementation.
  • Not necessarily fast.
  • tracing::Layers seem to only support filtering entire Events, rather than more fine grained removal of fields.

Changes Checklist

RFC: Errors for event streams

Status: Implemented

For a summarized list of proposed changes, see the Changes Checklist section.

This RFC defines how client and server will use errors defined in @streaming unions (event streams).

The user experience if this RFC is implemented

In the current version of smithy-rs, customers who want to use errors in event streams need to use them as so:

stream! {
    yield Ok(EventStreamUnion::ErrorVariant ...)
}

Furthermore, there is no support for @errors in event streams being terminal; that is, when an error is sent, it does not signal termination and thus does not complete the stream.

This RFC proposes to make changes to:

  • terminate the stream upon receiving a modeled error
  • change the API so that customers will write their business logic in a more Rust-like experience:
stream! {
    yield Err(EventStreamUnionError::ErrorKind ...)
}

Thus any Err(_) from the stream is terminal, rather than any Ok(x) with x being matched against the set of modeled variant errors in the union.

How to actually implement this RFC

In order to implement this feature:

  • Errors modeled in streaming unions are going to be treated like operation errors
    • They are in the error:: namespace
    • They have the same methods operation errors have (name on the server, metadata on the client and so on)
    • They are not variants in the corresponding error structure
  • Errors need to be marshalled and unmarshalled
  • Receiver must treat any error coming from the other end as terminal

The code examples below have been generated using the following model:

@http(uri: "/capture-pokemon-event/{region}", method: "POST")
operation CapturePokemonOperation {
    input: CapturePokemonOperationEventsInput,
    output: CapturePokemonOperationEventsOutput,
    errors: [UnsupportedRegionError, ThrottlingError]
}

@input
structure CapturePokemonOperationEventsInput {
    @httpPayload
    events: AttemptCapturingPokemonEvent,

    @httpLabel
    @required
    region: String,
}

@output
structure CapturePokemonOperationEventsOutput {
    @httpPayload
    events: CapturePokemonEvents,
}

@streaming
union AttemptCapturingPokemonEvent {
    event: CapturingEvent,
    masterball_unsuccessful: MasterBallUnsuccessful,
}

structure CapturingEvent {
    @eventPayload
    payload: CapturingPayload,
}

structure CapturingPayload {
    name: String,
    pokeball: String,
}

@streaming
union CapturePokemonEvents {
    event: CaptureEvent,
    invalid_pokeball: InvalidPokeballError,
    throttlingError: ThrottlingError,
}

structure CaptureEvent {
    @eventHeader
    name: String,
    @eventHeader
    captured: Boolean,
    @eventHeader
    shiny: Boolean,
    @eventPayload
    pokedex_update: Blob,
}

@error("server")
structure UnsupportedRegionError {
    @required
    region: String,
}
@error("client")
structure InvalidPokeballError {
    @required
    pokeball: String,
}
@error("server")
structure MasterBallUnsuccessful {
    @required
    message: String,
}
@error("client")
structure ThrottlingError {}

Wherever irrelevant, documentation and other lines are stripped out from the code examples below.

Errors in streaming unions

The error in AttemptCapturingPokemonEvent is modeled as follows.

On the client,

pub struct AttemptCapturingPokemonEventError {
    pub kind: AttemptCapturingPokemonEventErrorKind,
    pub(crate) meta: aws_smithy_types::Error,
}
pub enum AttemptCapturingPokemonEventErrorKind {
    MasterBallUnsuccessful(crate::error::MasterBallUnsuccessful),
    Unhandled(Box<dyn std::error::Error + Send + Sync + 'static>),
}

On the server,

pub enum AttemptCapturingPokemonEventError {
    MasterBallUnsuccessful(crate::error::MasterBallUnsuccessful),
}

Both are modeled as normal errors, where the name comes from Error with a prefix of the union's name. In fact, both the client and server generate operation errors and event stream errors the same way.

Event stream errors have their own marshaller. To make it work for users to stream errors, EventStreamSender<>, in addition to the union type T, takes an error type E; that is, the AttemptCapturingPokemonEventError in the example. This means that an error from the stream is marshalled and sent as a data structure similarly to the union's non-error members.

On the other side, the Receiver<> needs to terminate the stream upon receiving any error. A terminated stream has no more data and will always be a bug to use it.

An example of how errors can be used on clients, extracted from this test:

yield Err(AttemptCapturingPokemonEventError::new(
    AttemptCapturingPokemonEventErrorKind::MasterBallUnsuccessful(MasterBallUnsuccessful::builder().build()),
    Default::default()
));

Because unions can be used in input or output of more than one operation, errors must be generated once as they are in the error:: namespace.

Changes checklist

  • Errors are in the error:: namespace and created as operation errors
  • Errors can be sent to the stream
  • Errors terminate the stream
  • Customers' experience using errors mirrors the Rust way: Err(error::StreamingError ...)

RFC: Service Builder Improvements

Status: Accepted

One might characterize smithy-rs as a tool for transforming a Smithy service into a tower::Service builder. A Smithy model defines behavior of the generated service partially - handlers must be passed to the builder before the tower::Service is fully specified. This builder structure is the primary API surface we provide to the customer, as a result, it is important that it meets their needs.

This RFC proposes a new builder, deprecating the existing one, which addresses API deficiencies and takes steps to improve performance.

Terminology

  • Model: A Smithy Model, usually pertaining to the one in use by the customer.
  • Smithy Service: The entry point of an API that aggregates resources and operations together within a Smithy model. Described in detail here.
  • Service: The tower::Service trait is an interface for writing network applications in a modular and reusable way. Services act on requests to produce responses.
  • Service Builder: A tower::Service builder, generated from a Smithy service, by smithy-rs.
  • Middleware: Broadly speaking, middleware modify requests and responses. Concretely, these are exist as implementations of Layer/a Service wrapping an inner Service.
  • Handler: A closure defining the behavior of a particular request after routing. These are provided to the service builder to complete the description of the service.

Background

To provide context for the proposal we perform a survey of the current state of affairs.

The following is a reference model we will use throughout the RFC:

operation Operation0 {
    input: Input0,
    output: Output0
}

operation Operation1 {
    input: Input1,
    output: Output1
}

@restJson1
service Service0 {
    operations: [
        Operation0,
        Operation1,
    ]
}

We have purposely omitted details from the model that are unimportant to describing the proposal. We also omit distracting details from the Rust snippets. Code generation is linear in the sense that, code snippets can be assumed to extend to multiple operations in a predictable way. In the case where we do want to speak generally about an operation and its associated types, we use {Operation}, for example {Operation}Input is the input type of an unspecified operation.

Here is a quick example of what a customer might write when using the service builder:

async fn handler0(input: Operation0Input) -> Operation0Output {
    todo!()
}

async fn handler1(input: Operation1Input) -> Operation1Output {
    todo!()
}

let app: Router = OperationRegistryBuilder::default()
    // Use the setters
    .operation0(handler0)
    .operation1(handler1)
    // Convert to `OperationRegistry`
    .build()
    .unwrap()
    // Convert to `Router`
    .into();

During the survey we touch on the major mechanisms used to achieve this API.

Handlers

A core concept in the service builder is the Handler trait:

pub trait Handler<T, Input> {
    async fn call(self, req: http::Request) -> http::Response;
}

Its purpose is to provide an even interface over closures of the form FnOnce({Operation}Input) -> impl Future<Output = {Operation}Output> and FnOnce({Operation}Input, State) -> impl Future<Output = {Operation}Output>. It's this abstraction which allows the customers to supply both async fn handler(input: {Operation}Input) -> {Operation}Output and async fn handler(input: {Operation}Input, state: Extension<S>) -> {Operation}Output to the service builder.

We generate Handler implementations for said closures in ServerOperationHandlerGenerator.kt:

impl<Fun, Fut> Handler<(), Operation0Input> for Fun
where
    Fun: FnOnce(Operation0Input) -> Fut,
    Fut: Future<Output = Operation0Output>,
{
    async fn call(self, request: http::Request) -> http::Response {
        let input = /* Create `Operation0Input` from `request: http::Request` */;

        // Use closure on the input
        let output = self(input).await;

        let response = /* Create `http::Response` from `output: Operation0Output` */
        response
    }
}

impl<Fun, Fut> Handler<Extension<S>, Operation0Input> for Fun
where
    Fun: FnOnce(Operation0Input, Extension<S>) -> Fut,
    Fut: Future<Output = Operation0Output>,
{
    async fn call(self, request: http::Request) -> http::Response {
        let input = /* Create `Operation0Input` from `request: http::Request` */;

        // Use closure on the input and fetched extension data
        let extension = Extension(request.extensions().get::<T>().clone());
        let output = self(input, extension).await;

        let response = /* Create `http::Response` from `output: Operation0Output` */
        response
    }
}

Creating {Operation}Input from a http::Request and http::Response from a {Operation}Output involves protocol aware serialization/deserialization, for example, it can involve the HTTP binding traits. The RuntimeError enumerates error cases such as serialization/deserialization failures, extensions().get::<T>() failures, etc. We omit error handling in the snippet above, but, in full, it also involves protocol aware conversions from the RuntimeError to http::Response. The reader should make note of the influence of the model on the different sections of this procedure.

The request.extensions().get::<T>() present in the Fun: FnOnce(Operation0Input, Extension<S>) -> Fut implementation is the current approach to injecting state into handlers. The customer is required to apply a AddExtensionLayer to the output of the service builder so that, when the request reaches the handler, the extensions().get::<T>() will succeed.

To convert the closures described above into a Service an OperationHandler is used:

pub struct OperationHandler<H, T, Input> {
    handler: H,
}

impl<H, T, Input> Service<Request<B>> for OperationHandler<H, T, Input>
where
    H: Handler<T, I>,
{
    type Response = http::Response;
    type Error = Infallible;

    #[inline]
    fn poll_ready(&mut self, _cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        Poll::Ready(Ok(()))
    }

    async fn call(&mut self, req: Request<B>) -> Result<Self::Response, Self::Error> {
        self.handler.call(req).await.map(Ok)
    }
}

Builder

The service builder we provide to the customer is the OperationRegistryBuilder, generated from ServerOperationRegistryGenerator.kt.

Currently, the reference model would generate the following OperationRegistryBuilder and OperationRegistry:

pub struct OperationRegistryBuilder<Op0, In0, Op1, In1> {
    operation1: Option<Op0>,
    operation2: Option<Op1>,
}

pub struct OperationRegistry<Op0, In0, Op1, In1> {
    operation1: Op0,
    operation2: Op1,
}

The OperationRegistryBuilder includes a setter per operation, and a fallible build method:

impl<Op0, In0, Op1, In1> OperationRegistryBuilder<Op0, In0, Op1, In1> {
    pub fn operation0(mut self, value: Op0) -> Self {
        self.operation0 = Some(value);
        self
    }
    pub fn operation1(mut self, value: Op1) -> Self {
        self.operation1 = Some(value);
        self
    }
    pub fn build(
        self,
    ) -> Result<OperationRegistry<Op0, In0, Op1, In1>, OperationRegistryBuilderError> {
        Ok(OperationRegistry {
            operation0: self.operation0.ok_or(/* OperationRegistryBuilderError */)?,
            operation1: self.operation1.ok_or(/* OperationRegistryBuilderError */)?,
        })
    }
}

The OperationRegistry does not include any methods of its own, however it does enjoy a From<OperationRegistry> for Router<B> implementation:

impl<B, Op0, In0, Op1, In1> From<OperationRegistry<B, Op0, In0, Op1, In1>> for Router<B>
where
    Op0: Handler<B, In0, Operation0Input>,
    Op1: Handler<B, In1, Operation1Input>,
{
    fn from(registry: OperationRegistry<B, Op0, In0, Op1, In1>) -> Self {
        let operation0_request_spec = /* Construct Operation0 routing information */;
        let operation1_request_spec = /* Construct Operation1 routing information */;

        // Convert handlers into boxed services
        let operation0_svc = Box::new(OperationHandler::new(registry.operation0));
        let operation1_svc = Box::new(OperationHandler::new(registry.operation1));

        // Initialize the protocol specific router
        // We demonstrate it here with `new_rest_json_router`, but note that there is a different router constructor
        // for each protocol.
        aws_smithy_http_server::routing::Router::new_rest_json_router(vec![
            (
                operation0_request_spec,
                operation0_svc
            ),
            (
                operation1_request_spec,
                operation1_svc
            )
        ])
    }
}

Router

The aws_smithy_http::routing::Router provides the protocol aware routing of requests to their target , it exists as

pub struct Route {
    service: Box<dyn Service<http::Request, Response = http::Response>>,
}

enum Routes {
    RestXml(Vec<(Route, RequestSpec)>),
    RestJson1(Vec<(Route, RequestSpec)>),
    AwsJson1_0(TinyMap<String, Route>),
    AwsJson11(TinyMap<String, Route>),
}

pub struct Router {
    routes: Routes,
}

and enjoys the following Service<http::Request> implementation:

impl Service<http::Request> for Router
{
    type Response = http::Response;
    type Error = Infallible;

    fn poll_ready(&mut self, _: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        Poll::Ready(Ok(()))
    }

    async fn call(&mut self, request: http::Request) -> Result<Self::Response, Self::Error> {
        match &self.routes {
            Routes::/* protocol */(routes) => {
                let route: Result<Route, _> = /* perform route matching logic */;
                match route {
                    Ok(ok) => ok.oneshot().await,
                    Err(err) => /* Convert routing error into http::Response */
                }
            }
        }
    }
}

Along side the protocol specific constructors, Router includes a layer method. This provides a way for the customer to apply a tower::Layer to all routes. For every protocol, Router::layer has the approximately the same behavior:

let new_routes = old_routes
    .into_iter()
    // Apply the layer
    .map(|route| layer.layer(route))
    // Re-box the service, to restore `Route` type
    .map(|svc| Box::new(svc))
    // Collect the iterator back into a collection (`Vec` or `TinyMap`)
    .collect();

Comparison to Axum

Historically, smithy-rs has borrowed from axum. Despite various divergences the code bases still have much in common:

To identify where the implementations should differ we should classify in what ways the use cases differ. There are two primary areas which we describe below.

Extractors and Responses

In axum there is a notion of Extractor, which allows the customer to easily define a decomposition of an incoming http::Request by specifying the arguments to the handlers. For example,

async fn request(Json(payload): Json<Value>, Query(params): Query<HashMap<String, String>>, headers: HeaderMap) {
    todo!()
}

is a valid handler - each argument satisfies the axum::extract::FromRequest trait, therefore satisfies one of axums blanket Handler implementations:

#![allow(unused)]
fn main() {
macro_rules! impl_handler {
    ( $($ty:ident),* $(,)? ) => {
        impl<F, Fut, Res, $($ty,)*> Handler<($($ty,)*)> for F
        where
            F: FnOnce($($ty,)*) -> Fut + Clone + Send + 'static,
            Fut: Future<Output = Res> + Send,
            Res: IntoResponse,
            $( $ty: FromRequest + Send,)*
        {
            fn call(self, req: http::Request) -> Self::Future {
                async {
                    let mut req = RequestParts::new(req);

                    $(
                        let $ty = match $ty::from_request(&mut req).await {
                            Ok(value) => value,
                            Err(rejection) => return rejection.into_response(),
                        };
                    )*

                    let res = self($($ty,)*).await;

                    res.into_response()
                }
            }
        }
    };
}
}

The implementations of Handler in axum and smithy-rs follow a similar pattern - convert http::Request into the closure's input, run the closure, convert the output of the closure to http::Response.

In smithy-rs we do not need a general notion of "extractor" - the http::Request decomposition is specified by the Smithy model, whereas in axum it's defined by the handlers signature. Despite the Smithy specification the customer may still want an "escape hatch" to allow them access to data outside of the Smithy service inputs, for this reason we should continue to support a restricted notion of extractor. This will help support use cases such as passing lambda_http::Context through to the handler despite it not being modeled in the Smithy model.

Dual to FromRequest is the axum::response::IntoResponse trait. This plays the role of converting the output of the handler to http::Response. Again, the difference between axum and smithy-rs is that smithy-rs has the conversion from {Operation}Output to http::Response specified by the Smithy model, whereas in axum the customer is free to specify a return type which implements axum::response::IntoResponse.

Routing

The Smithy model not only specifies the http::Request decomposition and http::Response composition for a given service, it also determines the routing. The From<OperationRegistry> implementation, described in Builder, yields a fully formed router based on the protocol and http traits specified.

This is in contrast to axum, where the user specifies the routing by use of various combinators included on the axum::Router, applied to other tower::Services. In an axum application one might encounter the following code:

let user_routes = Router::new().route("/:id", /* service */);

let team_routes = Router::new().route("/", /* service */);

let api_routes = Router::new()
    .nest("/users", user_routes)
    .nest("/teams", team_routes);

let app = Router::new().nest("/api", api_routes);

Note that, in axum handlers are eagerly converted to a tower::Service (via IntoService) before they are passed into the Router. In contrast, in smithy-rs, handlers are passed into a builder and then the conversion to tower::Service is performed (via OperationHandler).

Introducing state to handlers in axum is done in the same way as smithy-rs, described briefly in Handlers - a layer is used to insert state into incoming http::Requests and the Handler implementation pops it out of the type map layer. In axum, if a customer wanted to scope state to all routes within /users/ they are able to do the following:

async fn handler(Extension(state): Extension</* State */>) -> /* Return Type */ {}

let api_routes = Router::new()
    .nest("/users", user_routes.layer(Extension(/* state */)))
    .nest("/teams", team_routes);

In smithy-rs a customer is only able to apply a layer around the aws_smithy_http::routing::Router or around every route via the layer method described above.

Proposal

The proposal is presented as a series of compatible transforms to the existing service builder, each paired with a motivation. Most of these can be independently implemented, and it is stated in the cases where an interdependency exists.

Although presented as a mutation to the existing service builder, the actual implementation should exist as an entirely separate builder, living in a separate namespace, reusing code generation from the old builder, while exposing a new Rust API. Preserving the old API surface will prevent breakage and make it easier to perform comparative benchmarks and testing.

Remove two-step build procedure

As described in Builder, the customer is required to perform two conversions. One from OperationRegistryBuilder via OperationRegistryBuilder::build, the second from OperationRegistryBuilder to Router via the From<OperationRegistry> for Router implementation. The intermediary stop at OperationRegistry is not required and can be removed.

Statically check for missing Handlers

As described in Builder, the OperationRegistryBuilder::build method is fallible - it yields a runtime error when one of the handlers has not been set.

    pub fn build(
        self,
    ) -> Result<OperationRegistry<Op0, In0, Op1, In1>, OperationRegistryBuilderError> {
        Ok(OperationRegistry {
            operation0: self.operation0.ok_or(/* OperationRegistryBuilderError */)?,
            operation1: self.operation1.ok_or(/* OperationRegistryBuilderError */)?,
        })
    }

We can do away with fallibility if we allow for on Op0, Op1 to switch types during build and remove the Option from around the fields. The OperationRegistryBuilder then becomes

struct OperationRegistryBuilder<Op0, Op1> {
    operation_0: Op0,
    operation_1: Op1
}

impl OperationRegistryBuilder<Op0, In0, Op1, In1> {
    pub fn operation0<NewOp0>(mut self, value: NewOp0) -> OperationRegistryBuilder<NewOp0, In0, Op1, In1> {
        OperationRegistryBuilder {
            operation0: value,
            operation1: self.operation1
        }
    }
    pub fn operation1<NewOp1>(mut self, value: NewOp1) -> OperationRegistryBuilder<Op0, In0, NewOp1, In1> {
        OperationRegistryBuilder {
            operation0: self.operation0,
            operation1: value
        }
    }
}

impl OperationRegistryBuilder<Op0, In0, Op1, In1>
where
    Op0: Handler<B, In0, Operation0Input>,
    Op1: Handler<B, In1, Operation1Input>,
{
    pub fn build(self) -> OperationRegistry<Op0, In0, Op1, In1> {
        OperationRegistry {
            operation0: self.operation0,
            operation1: self.operation1,
        }
    }
}

The customer will now get a compile time error rather than a runtime error when they fail to specify a handler.

Switch From<OperationRegistry> for Router to an OperationRegistry::build method

To construct a Router, the customer must either give a type ascription

let app: Router = /* Service builder */.into();

or be explicit about the Router namespace

let app = Router::from(/* Service builder */);

If we switch from a From<OperationRegistry> for Router to a build method on OperationRegistry the customer may simply

let app = /* Service builder */.build();

There already exists a build method taking OperationRegistryBuilder to OperationRegistry, this is removed in Remove two-step build procedure. These two transforms pair well together for this reason.

Operations as Middleware Constructors

As mentioned in Comparison to Axum: Routing and Handlers, the smithy-rs service builder accepts handlers and only converts them into a tower::Service during the final conversion into a Router. There are downsides to this:

  1. The customer has no opportunity to apply middleware to a specific operation before they are all collected into Router. The Router does have a layer method, described in Router, but this applies the middleware uniformly across all operations.
  2. The builder has no way to apply middleware around customer applied middleware. A concrete example of where this would be useful is described in the Middleware Position section of RFC: Logging in the Presence of Sensitive Data.
  3. The customer has no way of expressing readiness of the underlying operation - all handlers are converted to services with Service::poll_ready returning Poll::Ready(Ok(())).

The three use cases described above are supported by axum by virtue of the Router::route method accepting a tower::Service. The reader should consider a similar approach where the service builder setters accept a tower::Service<http::Request, Response = http::Response> rather than the Handler.

Throughout this section we purposely ignore the existence of handlers accepting state alongside the {Operation}Input, this class of handlers serve as a distraction and can be accommodated with small perturbations from each approach.

Approach A: Customer uses OperationHandler::new

It's possible to make progress with a small changeset, by requiring the customer eagerly uses OperationHandler::new rather than it being applied internally within From<OperationRegistry> for Router (see Handlers). The setter would then become:

pub struct OperationRegistryBuilder<Op0, Op1> {
    operation1: Option<Op0>,
    operation2: Option<Op1>
}

impl<Op0, Op1> OperationRegistryBuilder<Op0, Op1> {
    pub fn operation0(self, value: Op0) -> Self {
        self.operation1 = Some(value);
        self
    }
}

The API usage would then become

async fn handler0(input: Operation0Input) -> Operation0Output {
    todo!()
}

// Create a `Service<http::Request, Response = http::Response, Error = Infallible>` eagerly
let svc = OperationHandler::new(handler0);

// Middleware can be applied at this point
let operation0 = /* A HTTP `tower::Layer` */.layer(op1_svc);

OperationRegistryBuilder::default()
    .operation0(operation0)
    /* ... */

Note that this requires that the OperationRegistryBuilder stores services, rather than Handlers. An unintended and superficial benefit of this is that we are able to drop In{n} from the OperationRegistryBuilder<Op0, In0, Op1, In1> - only Op{n} remains and it parametrizes each operation's tower::Service.

It is still possible to retain the original API which accepts Handler by introducing the following setters:

impl<Op1, Op2> OperationRegistryBuilder<Op1, Op2> {
    fn operation0_handler<H: Handler>(self, handler: H) -> OperationRegistryBuilder<OperationHandler<H>, Op2> {
        OperationRegistryBuilder {
            operation0: OperationHandler::new(handler),
            operation1: self.operation1
        }
    }
}

There are two points at which the customer might want to apply middleware: around tower::Service<{Operation}Input, Response = {Operation}Output> and tower::Service<http::Request, Response = http::Response>, that is, before and after the serialization/deserialization is performed. The change described only succeeds in the latter, and therefore is only a partial solution to (1).

This solves (2), the service builder may apply additional middleware around the service.

This does not solve (3), as the customer is not able to provide a tower::Service<{Operation}Input, Response = {Operation}Output>.

Approach B: Operations as Middleware

In order to achieve all three we model operations as middleware:

pub struct Operation0<S> {
    inner: S,
}

impl<S> Service<http::Request> for Operation0<S>
where
    S: Service<Operation0Input, Response = Operation0Output, Error = Infallible>
{
    type Response = http::Response;
    type Error = Infallible;

    fn poll_ready(&mut self, cx: &mut Context) -> Poll<Result<(), Self::Error>> {
        // We defer to the inner service for readiness
        self.inner.poll_ready(cx)
    }

    async fn call(&mut self, request: http::Request) -> Result<Self::Response, Self::Error> {
        let input = /* Create `Operation0Input` from `request: http::Request` */;

        self.inner.call(input).await;

        let response = /* Create `http::Response` from `output: Operation0Output` */
        response
    }
}

Notice the similarity between this and the OperationHandler, the only real difference being that we hold an inner service rather than a closure. In this way we have separated all model aware serialization/deserialization, we noted in Handlers, into this middleware.

A consequence of this is that the user Operation0 must have two constructors:

  • from_service, which takes a tower::Service<Operation0Input, Response = Operation0Output>.
  • from_handler, which takes an async Operation0Input -> Operation0Output.

A brief example of how this might look:

use tower::util::{ServiceFn, service_fn};

impl<S> Operation0<S> {
    pub fn from_service(inner: S) -> Self {
        Self {
            inner,
        }
    }
}

impl<F> Operation0<ServiceFn<F>> {
    pub fn from_handler(inner: F) -> Self {
        // Using `service_fn` here isn't strictly correct - there is slight misalignment of closure signatures. This
        // still serves to illustrate the proposal.
        Operation0::from_service(service_fn(inner))
    }
}

The API usage then becomes:

async fn handler(input: Operation0Input) -> Operation0Output {
    todo!()
}

// These are both `tower::Service` and hence can have middleware applied to them
let operation_0 = Operation0::from_handler(handler);
let operation_1 = Operation1::from_service(/* some service */);

OperationRegistryBuilder::default()
    .operation0(operation_0)
    .operation1(operation_1)
    /* ... */

Approach C: Operations as Middleware Constructors

While Attempt B solves all three problems, it fails to adequately model the Smithy semantics. An operation cannot uniquely define a tower::Service without reference to a parent Smithy service - information concerning the serialization/deserialization, error modes are all inherited from the Smithy service an operation is used within. In this way, Operation0 should not be a standalone middleware, but become middleware once accepted by the service builder.

Any solution which provides an {Operation} structure and wishes it to be accepted by multiple service builders must deal with this problem. We currently build one library per service and hence have duplicate structures when service closures overlap. This means we wouldn't run into this problem today, but it would be a future obstruction if we wanted to reduce the amount of generated code.

use tower::layer::util::{Stack, Identity};
use tower::util::{ServiceFn, service_fn};

// This takes the same form as `Operation0` defined in the previous attempt. The difference being that this is now
// private.
struct Service0Operation0<S> {
    inner: S
}

impl<S> Service<http::Request> for ServiceOperation0<S>
where
    S: Service<Operation0Input, Response = Operation0Output, Error = Infallible>
{
    /* Same as above */
}

pub struct Operation0<S, L> {
    inner: S,
    layer: L
}

impl<S> Operation0<S, Identity> {
    pub fn from_service(inner: S) -> Self {
        Self {
            inner,
            layer: Identity
        }
    }
}

impl<F> Operation0<ServiceFn<F>, Identity> {
    pub fn from_handler(inner: F) -> Self {
        Operation0::from_service(service_fn(inner))
    }
}

impl<S, L> Operation0<S, L> {
    pub fn layer<NewL>(self, layer: L) -> Operation0<S, Stack<L, NewL>> {
        Operation0 {
            inner: self.inner,
            layer: Stack::new(self.layer, layer)
        }
    }

    pub fn logging(self, /* args */) -> Operation0<S, Stack<L, LoggingLayer>> {
        Operation0 {
            inner: self.inner,
            layer: Stack::new(self.layer, LoggingLayer::new(/* args */))
        }
    }

    pub fn auth(self, /* args */) -> Operation0<S, Stack<L, AuthLayer>> {
        Operation0 {
            inner: self.inner,
            layer: Stack::new(self.layer, /* Construct auth middleware */)
        }

    }
}

impl<Op1, Op2> OperationRegistryBuilder<Op1, Op2> {
    pub fn operation0<S, L>(self, operation: Operation0<S, L>) -> OperationRegistryBuilder<<L as Layer<Service0Operation0<S>>::Service, Op2>
    where
        L: Layer<Service0Operation0<S>>
    {
        // Convert `Operation0` to a `tower::Service`.
        let http_svc = Service0Operation0 { inner: operation.inner };
        // Apply the layers
        operation.layer(http_svc)
    }
}

Notice that we get some additional type safety here when compared to Approach A and Approach B - operation0 accepts a Operation0 rather than a general tower::Service. We also get a namespace to include utility methods - notice the logging and auth methods.

The RFC favours this approach out of all those presented.

Approach D: Add more methods to the Service Builder

An alternative to Approach C is to simply add more methods to the service builder while internally storing a tower::Service:

  • operation0_from_service, accepts a tower::Service<Operation0Input, Response = Operation0Output>.
  • operation0_from_handler, accepts an async Fn(Operation0Input) -> Operation0Output.
  • operation0_layer, accepts a tower::Layer<Op0>.

This is functionally similar to Attempt C except that all composition is done internal to the service builder and the namespace exists in the method name, rather than the {Operation} struct.

Service parameterized Routers

Currently the Router stores Box<dyn tower::Service<http::Request, Response = http::Response>. As a result the Router::layer method, seen in Router, must re-box a service after every tower::Layer applied. The heap allocation Box::new itself is not cause for concern because Routers are typically constructed once at startup, however one might expect the indirection to regress performance when the server is running.

Having the service type parameterized as Router<S>, allows us to write:

impl<S> Router<S> {
    fn layer<L>(self, layer: &L) -> Router<L::Service>
    where
        L: Layer<S>
    {
        /* Same internal implementation without boxing */
    }
}

Protocol specific Routers

Currently there is a single Router structure, described in Router, situated in the rust-runtime/aws-smithy-http-server crate, which is output by the service builder. This, roughly, takes the form of an enum listing the different protocols.

#![allow(unused)]
fn main() {
#[derive(Debug)]
enum Routes {
    RestXml(/* Container */),
    RestJson1(/* Container */),
    AwsJson1_0(/* Container */),
    AwsJson1_1(/* Container */),
}
}

Recall the form of the Service::call method, given in Router, which involved matching on the protocol and then performing protocol specific logic.

Two downsides of modelling Router in this way are:

  • Router is larger and has more branches than a protocol specific implementation.
  • If a third-party wanted to extend smithy-rs to additional protocols Routes would have to be extended. A synopsis of this obstruction is presented in Should we generate the Router type issue.

After taking the Switch From<OperationRegistry> for Router to an OperationRegistry::build method transform, code generation is free to switch between return types based on the model. This allows for a scenario where a @restJson1 causes the service builder to output a specific RestJson1Router.

Protocol specific Errors

Currently, protocol specific routing errors are either:

  • Converted to RuntimeErrors and then http::Response (see unknown_operation).
  • Converted directly to a http::Response (see method_not_allowed). This is an outlier to the common pattern.

The from_request functions yield protocol specific errors which are converted to RequestRejections then RuntimeErrors (see ServerHttpBoundProtocolGenerator.kt).

In these scenarios protocol specific errors are converted into RuntimeError before being converted to a http::Response via into_response method.

Two downsides of this are:

  • RuntimeError enumerates all possible errors across all existing protocols, so is larger than modelling the errors for a specific protocol.
  • If a third-party wanted to extend smithy-rs to additional protocols with differing failure modes RuntimeError would have to be extended. As in Protocol specific Errors, a synopsis of this obstruction is presented in Should we generate the Router type issue.

Switching from using RuntimeError to protocol specific errors which satisfy a common interface, IntoResponse, would resolve these problem.

Type erasure with the name of the Smithy service

Currently the service builder is named OperationRegistryBuilder. Despite the name being model agnostic, the OperationRegistryBuilder mutates when the associated service mutates. Renaming OperationRegistryBuilder to {Service}Builder would reflect the relationship between the builder and the Smithy service and prevent naming conflicts if multiple service builders are to exist in the same namespace.

Similarly, the output of the service builder is Router. This ties the output of the service builder to a structure in rust-runtime. Introducing a type erasure here around Router using a newtype named {Service} would:

  • Ensure we are free to change the implementation of {Service} without changing the Router implementation.
  • Hide the router type, which is determined by the protocol specified in the model.
  • Allow us to put a builder method on {Service} which returns {Service}Builder.

This is compatible with Protocol specific Routers, we simply newtype the protocol specific router rather than Router.

With both of these changes the API would take the form:

let service_0: Service0 = Service0::builder()
    /* use the setters */
    .build()
    .unwrap()
    .into();

With Remove two-step build procedure, Switch From<OperationRegistry> for Router to a OperationRegistry::build method, and Statically check for missing Handlers we obtain the following API:

let service_0: Service0 = Service0::builder()
    /* use the setters */
    .build();

Combined Proposal

A combination of all the proposed transformations results in the following API:

struct Context {
    /* fields */
}

async fn handler(input: Operation0Input) -> Operation0Output {
    todo!()
}

async fn handler_with_ext(input: Operation0Input, extension: Extension<Context>) -> Operation0Output {
    todo!()
}

struct Operation1Service {
    /* fields */
}

impl Service<Operation1Input> for Operation1Service {
    type Response = Operation1Output;

    /* implementation */
}

struct Operation1ServiceWithExt {
    /* fields */
}

impl Service<(Operation1Input, Extension<Context>)> for Operation1Service {
    type Response = Operation1Output;

    /* implementation */
}

// Create an operation from a handler
let operation_0 = Operation0::from_handler(handler);

// Create an operation from a handler with extension
let operation_0 = Operation::from_handler(handler_with_ext);

// Create an operation from a `tower::Service`
let operation_1_svc = Operation1Service { /* initialize */ };
let operation_1 = Operation::from_service(operation_1_svc);

// Create an operation from a `tower::Service` with extension
let operation_1_svc = Operation1ServiceWithExtension { /* initialize */ };
let operation_1 = Operation::from_service(operation_1_svc);

// Apply a layer
let operation_0 = operation_0.layer(/* layer */);

// Use the service builder
let service_0 = Service0::builder()
    .operation_0(operation_0)
    .operation_1(operation_1)
    .build();

A toy implementation of the combined proposal is presented in this PR.

Changes Checklist

RFC: Dependency Versions

Status: Accepted

Applies to: Client and Server

This RFC outlines how Rust dependency versions are selected for the smithy-rs project, and strives to meet the following semi-conflicting goals:

  • Dependencies are secure
  • Vended libraries have dependency ranges that overlap other Rust libraries as much as possible

When in conflict, the security goal takes priority over the compatibility goal.

Categorization of Crates

The Rust crates within smithy-rs can be divided up into two categories:

  1. Library Crates: Crates that are published to crates.io with the intention that other projects will depend on them via their Cargo.toml files. This category does NOT include binaries that are published to crates.io with the intention of being installed with cargo install.
  2. Application Crates: All examples, binaries, tools, standalone tests, or other crates that are not published to crates.io with the intent of being depended on by other projects.

All generated crates must be considered library crates even if they're not published since they are intended to be pulled into other Rust projects with other dependencies.

Support crates for Applications

The aws-smithy-http-server-python crate doesn't fit the categorization rules well since it is a runtime crate for a generated Rust application with bindings to Python. This RFC establishes this crate as an application crate since it needs to pull in application-specific dependencies such as tracing-subscriber in order to implement its full feature set.

Dependency Version Rules

Application crates should use the latest versions of dependencies, but must use a version greater than or equal to the minimum secure version as determined by the RUSTSEC advisories database. Library crates must use the minimum secure version. This is illustrated at a high level below:

graph TD
    S[Add Dependency] --> T{Crate Type?}
    T -->|Application Crate?| A[Use latest version]
    T -->|Library Crate?| L[Use minimum secure version]

What is a minimum secure version when there are multiple major versions?

If a dependency has multiple supported major versions, then the latest major version must be selected unless there is a compelling reason to do otherwise (such as the previous major version having been previously exposed in our public API). Choosing newer major versions will reduce the amount of upgrade work that needs to be done at a later date when support for the older version is inevitably dropped.

Changes Checklist

Some work needs to be done to establish these guidelines:

  • Establish automation for enforcing minimum secure versions for the direct dependencies of library crates

RFC: Error Context and Compatibility

Status: Implemented

Applies to: Generated clients and shared rust-runtime crates

This RFC proposes a pattern for writing Rust errors to provide consistent error context AND forwards/backwards compatibility. The goal is to strike a balance between these four goals:

  1. Errors are forwards compatible, and changes to errors are backwards compatible
  2. Errors are idiomatic and ergonomic. It is easy to match on them and extract additional information for cases where that's useful. The type system prevents errors from being used incorrectly (for example, incorrectly retrieving context for a different error variant)
  3. Error messages are easy to debug
  4. Errors implement best practices with Rust's Error trait (for example, implementing the optional source() function where possible)

Note: This RFC is not about error backwards compatibility when it comes to error serialization/deserialization for transfer over the wire. The Smithy protocols cover that aspect.

Past approaches in smithy-rs

This section examines some examples found in aws-config that illustrate different problems that this RFC will attempt to solve, and calls out what was done well, and what could be improved upon.

Case study: InvalidFullUriError

To start, let's examine InvalidFullUriError (doc comments omitted):

#[derive(Debug)]
#[non_exhaustive]
pub enum InvalidFullUriError {
    #[non_exhaustive] InvalidUri(InvalidUri),
    #[non_exhaustive] NoDnsService,
    #[non_exhaustive] MissingHost,
    #[non_exhaustive] NotLoopback,
    DnsLookupFailed(io::Error),
}

impl Display for InvalidFullUriError {
    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
        match self {
            InvalidFullUriError::InvalidUri(err) => write!(f, "URI was invalid: {}", err),
            InvalidFullUriError::MissingHost => write!(f, "URI did not specify a host"),
            // ... omitted ...
        }
    }
}

impl Error for InvalidFullUriError {
    fn source(&self) -> Option<&(dyn Error + 'static)> {
        match self {
            InvalidFullUriError::InvalidUri(err) => Some(err),
            InvalidFullUriError::DnsLookupFailed(err) => Some(err),
            _ => None,
        }
    }
}

This error does a few things well:

  1. Using #[non_exhaustive] on the enum allows new errors to be added in the future.
  2. Breaking out different error types allows for more useful error messages, potentially with error-specific context. Customers can match on these different error variants to change their program flow, although it's not immediately obvious if such use cases exist for this error.
  3. The error cause is available through the Error::source() impl for variants that have a cause.

However, there are also a number of things that could be improved:

  1. All tuple/struct enum members are public, and InvalidUri is an error from the http crate. Exposing a type from another crate can potentially lock the GA SDK into a specific crate version if breaking changes are ever made to the exposed types. In this specific case, it prevents using alternate HTTP implementations that don't use the http crate.
  2. DnsLookupFailed is missing #[non_exhaustive], so new members can never be added to it.
  3. Use of enum tuples, even with #[non_exhaustive], adds friction to evolving the API since the tuple members cannot be named.
  4. Printing the source error in the Display impl leads to error repetition by reporters that examine the full source chain.
  5. The source() impl has a _ match arm, which means future implementers could forget to propagate a source when adding new error variants.
  6. The error source can be downcasted to InvalidUri type from http in customer code. This is a leaky abstraction where customers can start to rely on the underlying library the SDK uses in its implementation, and if that library is replaced/changed, it can silently break the customer's application. Note: later in the RFC, I'll demonstrate why fixing this issue is not practical.

Case study: ProfileParseError

Next, let's look at a much simpler error. The ProfileParseError is focused purely on the parsing logic for the SDK config file:

#[derive(Debug, Clone)]
pub struct ProfileParseError {
    location: Location,
    message: String,
}

impl Display for ProfileParseError {
    fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
        write!(
            f,
            "error parsing {} on line {}:\n  {}",
            self.location.path, self.location.line_number, self.message
        )
    }
}

impl Error for ProfileParseError {}

What this error does well:

  • The members are private, so #[non_exhaustive] isn't even necessary
  • The error is completely opaque (maximizing compatibility) while still being debuggable thanks to the flexible messaging

What could be improved:

  • It needlessly implements Clone, which may prevent it from holding an error source in the future since errors are often not Clone.
  • In the future, if more error variants are needed, a private inner error kind enum could be added to change messaging, but there's not a nice way to expose new variant-specific information to the customer.
  • Programmatic access to the error Location may be desired, but this can be trivially added in the future without a breaking change by adding an accessor method.

Case study: code generated client errors

The SDK currently generates errors such as the following (from S3):

#[non_exhaustive]
pub enum Error {
    BucketAlreadyExists(BucketAlreadyExists),
    BucketAlreadyOwnedByYou(BucketAlreadyOwnedByYou),
    InvalidObjectState(InvalidObjectState),
    NoSuchBucket(NoSuchBucket),
    NoSuchKey(NoSuchKey),
    NoSuchUpload(NoSuchUpload),
    NotFound(NotFound),
    ObjectAlreadyInActiveTierError(ObjectAlreadyInActiveTierError),
    ObjectNotInActiveTierError(ObjectNotInActiveTierError),
    Unhandled(Box<dyn Error + Send + Sync + 'static>),
}

Each error variant gets its own struct, which can hold error-specific contextual information. Except for the Unhandled variant, both the error enum and the details on each variant are extensible. The Unhandled variant should move the error source into a struct so that its type can be hidden. Otherwise, the code generated errors are already aligned with the goals of this RFC.

Approaches from other projects

std::io::Error

The standard library uses an Error struct with an accompanying ErrorKind enum for its IO error. Roughly:

#![allow(unused)]
fn main() {
#[derive(Debug)]
#[non_exhaustive]
pub enum ErrorKind {
    NotFound,
    // ... omitted ...
    Other,
}

#[derive(Debug)]
pub struct Error {
    kind: ErrorKind,
    source: Box<dyn std::error::Error + Send + Sync>,
}
}

What this error does well:

  • It is extensible since the ErrorKind is non-exhaustive
  • It has an Other error type that can be instantiated by users in unit tests, making it easier to unit test error handling

What could be improved:

  • There isn't an ergonomic way to add programmatically accessible error-specific context to this error in the future
  • The source error can be downcasted, which could be a trap for backwards compatibility.

Hyper 1.0

Hyper has outlined some problems they want to address with errors for the coming 1.0 release. To summarize:

  • It's difficult to match on specific errors (Hyper 0.x's Error relies on is_x methods for error matching rather than enum matching).
  • Error reporters duplicate information since the hyper 0.x errors include the display of their error sources
  • Error::source() can leak internal dependencies

Opaque Error Sources

There is discussion in the errors working group about how to avoid leaking internal dependency error types through error source downcasting. One option is to create an opaque error wrapping new-type that removes the ability to downcast to the other library's error. This, however, can be circumvented via unsafe code, and also breaks the ability for error reporters to properly display the error (for example, if the error has backtrace information, that would be inaccessible to the reporter).

This situation might improve if the nightly request_value/request_ref/provide functions on std::error::Error are stabilized, since then contextual information needed for including things such as a backtrace could still be retrieved through the opaque error new-type.

This RFC proposes that error types from other libraries not be directly exposed in the API, but rather, be exposed indirectly through Error::source as &dyn Error + 'static.

Errors should not require downcasting to be useful. Downcasting the error's source should be a last resort, and with the understanding that the type could change at a later date with no compile-time guarantees.

Error Proposal

Taking a customer's perspective, there are two broad categories of errors:

  1. Actionable: Errors that can/should influence program flow; where it's useful to do different work based on additional error context or error variant information
  2. Informative: Errors that inform that something went wrong, but where it's not useful to match on the error to change program flow

This RFC proposes that a consistent pattern be introduced to cover these two use cases for all errors in the public API for the Rust runtime crates and generated client crates.

Actionable error pattern

Actionable errors are represented as enums. If an error variant has an error source or additional contextual information, it must use a separate context struct that is referenced via tuple in the enum. For example:

// Good: new error types can be added in the future
#[non_exhaustive]
pub enum Error {
    // Good: This is exhaustive and uses a tuple, but its sole member is an extensible struct with private fields
    VariantA(VariantA),

    // Bad: The fields are directly exposed and can't have accessor methods. The error
    // source type also can't be changed at a later date since.
    #[non_exhaustive]
    VariantB {
        some_additional_info: u32,
        source: AnotherError // AnotherError is from this crate
    },

    // Bad: There's no way to add additional contextual information to this error in the future, even
    // though it is non-exhaustive. Changing it to a tuple or struct later leads to compile errors in existing
    // match statements.
    #[non_exhaustive]
    VariantC,

    // Bad: Not extensible if additional context is added later (unless that context can be added to `AnotherError`)
    #[non_exhaustive]
    VariantD(AnotherError),

    // Bad: Not extensible. If new context is added later (for example, a second endpoint), there's no way to name it.
    #[non_exhaustive]
    VariantE(Endpoint, AnotherError),

    // Bad: Exposes another library's error type in the public API,
    // which makes upgrading or replacing that library a breaking change
    #[non_exhaustive]
    VariantF {
        source: http::uri::InvalidUri
    },

    // Bad: The error source type is public, and even though its a boxed error, it won't
    // be possible to change it to an opaque error type later (for example, if/when
    // opaque errors become practical due to standard library stabilizations).
    #[non_exhaustive]
    VariantG {
        source: Box<dyn Error + Send + Sync + 'static>,
    }
}

pub struct VariantA {
    some_field: u32,
    // This is private, so it's fine to reference the external library's error type
    source: http::uri::InvalidUri
}

impl VariantA {
    fn some_field(&self) -> u32 {
        self.some_field
    }
}

Error variants that contain a source must return it from the Error::source method. The source implementation should not use the catch all (_) match arm, as this makes it easy to miss adding a new error variant's source at a later date.

The error Display implementation must not include the source in its output:

// Good
impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            Self::VariantA => write!(f, "variant a"),
            Self::VariantB { some_additional_info, .. } => write!(f, "variant b ({some_additional_info})"),
            // ... and so on
        }
    }
}

// Bad
impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            Self::VariantA => write!(f, "variant a"),
            // Bad: includes the source in the `Display` output, which leads to duplicate error information
            Self::VariantB { some_additional_info, source } => write!(f, "variant b ({some_additional_info}): {source}"),
            // ... and so on
        }
    }
}

Informative error pattern

Informative errors must be represented as structs. If error messaging changes based on an underlying cause, then a private error kind enum can be used internally for this purpose. For example:

#[derive(Debug)]
pub struct InformativeError {
    some_additional_info: u32,
    source: AnotherError,
}

impl fmt::Display for InformativeError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "some informative message with {}", self.some_additional_info)
    }
}

impl Error for InformativeError {
    fn source(&self) -> Option<&(dyn Error + 'static)> {
        Some(&self.source)
    }
}

In general, informative errors should be referenced by variants in actionable errors since they cannot be converted to actionable errors at a later date without a breaking change. This is not a hard rule, however. Use your best judgement for the situation.

Displaying full error context

In code where errors are logged rather than returned to the customer, the full error source chain must be displayed. This will be made easy by placing a DisplayErrorContext struct in aws-smithy-types that is used as a wrapper to get the better error formatting:

tracing::warn!(err = %DisplayErrorContext(err), "some message");

This might be implemented as follows:

#[derive(Debug)]
pub struct DisplayErrorContext<E: Error>(pub E);

impl<E: Error> fmt::Display for DisplayErrorContext<E> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write_err(f, &self.0)?;
        // Also add a debug version of the error at the end
        write!(f, " ({:?})", self)
    }
}

fn write_err(f: &mut fmt::Formatter<'_>, err: &dyn Error) -> fmt::Result {
    write!(f, "{}", err)?;
    if let Some(source) = err.source() {
        write!(f, ": ")?;
        write_err(f, source)?;
    }
    Ok(())
}

Changes Checklist

  • Update every struct/enum that implements Error in all the non-server Rust runtime crates
  • Hide error source type in Unhandled variant in code generated errors
  • Remove Clone from ProfileParseError and any others that have it

Error Code Review Checklist

This is a checklist meant to aid code review of new errors:

  • The error fits either the actionable or informative pattern
  • If the error is informative, it's clear that it will never be expanded with additional variants in the future
  • The Display impl does not write the error source to the formatter
  • The catch all _ match arm is not used in the Display or Error::source implementations
  • Error types from external libraries are not exposed in the public API
  • Error enums are #[non_exhaustive]
  • Error enum variants that don't have a separate error context struct are #[non_exhaustive]
  • Error context is exposed via accessors rather than by public fields
  • Actionable errors and their context structs are in an error submodule for any given module. They are not mixed with other non-error code

RFC: Evolving the new service builder API

Status: Accepted

Applies to: Server

RFC 20 introduced a new service builder API. It supports fine-grained configuration at multiple levels (per-handler middlewares, router middlewares, plugins) while trying to prevent some misconfiguration issues at compile-time (i.e. missing operation handlers). There is consensus that the new API is an improvement over the pre-existing OperationRegistryBuilder/OperationRegistry, which is now on its way to deprecation in one of the next releases.

This RFC builds on top of RFC 20 to explore an alternative API design prior to its stabilisation. The API proposed in this RFC has been manually implemented for the Pokemon service. You can find the code here.

Overview

Type-heavy builders can lead to a poor developer experience when it comes to writing function signatures, conditional branches and clarity of error messages. This RFC provides examples for the issues we are trying to mitigate and showcases an alternative design for the service builder, cutting generic parameters from 2*(N+1) to 2, where N is the number of operations on the service. We rely on eagerly upgrading the registered handlers and operations to Route<B> to achieve this reduction.

Goals:

  • Maximise API ergonomics, with a particular focus on the developer experience for Rust beginners.

Strategy:

  • Reduce type complexity, exposing a less generic API;
  • Provide clearer errors when the service builder is misconfigured.

Trade-offs:

  • Reduce compile-time safety. Missing handlers will be detected at runtime instead of compile-time.

Constraints:

  • There should be no significant degradation in runtime performance (i.e. startup time for applications).

Handling missing operations

Let's start by reviewing the API proposed in RFC 20. We will use the Pokemon service as our driving example throughout the RFC. This is what the startup code looks like:

#[tokio::main]
pub async fn main() {
    // [...]
    let app = PokemonService::builder()
        .get_pokemon_species(get_pokemon_species)
        .get_storage(get_storage)
        .get_server_statistics(get_server_statistics)
        .capture_pokemon(capture_pokemon)
        .do_nothing(do_nothing)
        .check_health(check_health)
        .build();

    // Setup shared state and middlewares.
    let shared_state = Arc::new(State::default());
    let app = app.layer(&AddExtensionLayer::new(shared_state));

    // Start the [`hyper::Server`].
    let bind: SocketAddr = /* */;
    let server = hyper::Server::bind(&bind).serve(app.into_make_service());
    // [...]
}

The builder is infallible: we are able to verify at compile-time that all handlers have been provided using the typestate builder pattern.

Compiler errors cannot be tuned

What happens if we stray away from the happy path? We might forget, for example, to add the check_health handler. The compiler greets us with this error:

error[E0277]: the trait bound `MissingOperation: Upgradable<AwsRestJson1, CheckHealth, (), _, IdentityPlugin>` is not satisfied
  --> pokemon-service/src/bin/pokemon-service.rs:38:10
   |
38 |         .build();
   |          ^^^^^ the trait `Upgradable<AwsRestJson1, CheckHealth, (), _, IdentityPlugin>` is not implemented for `MissingOperation`
   |
   = help: the following other types implement trait `Upgradable<Protocol, Operation, Exts, B, Plugin>`:
             FailOnMissingOperation
             Operation<S, L>

The compiler complains that MissingOperation does not implement the Upgradable trait. Neither MissingOperation nor Upgradable appear in the startup code we looked at. This is likely to be the first time the developer sees those traits, assuming they haven't spent time getting familiar with aws-smithy-http-server's internals. The help section is unhelpful, if not actively misdirecting. How can the developer figure out that the issue lies with check_health? They need to inspect the generic parameters attached to Upgradable in the code label or the top-level error message - we see, among other things, a CheckHealth parameter. That is the hint they need to follow to move forward.

We unfortunately do not have agency on the compiler error we just examined. Rust does not expose hooks for crate authors to tweak the errors returned when a type does not implement a trait we defined. All implementations of the typestate builder pattern accept this shortcoming in exchange for compile-time safety.

Is it a good tradeoff in our case?

The cost of a runtime error

If build returns an error, the HTTP server is never launched. The application fails to start.

Let's examine the cost of this runtime error along two dimensions:

  • Impact on developer productivity;
  • Impact on end users.

We'd love for this issue to be caught on the developer machine - it provides the shortest feedback loop. The issue won't be surfaced by a cargo check or cargo build invocation, as it happens with the typestate builder approach. It should be surfaced by executing the application test suite, assuming that the developer has written at least a single integration test - e.g. a test that passes a request to the call method exposed by PokemonService or launches a full-blown instance of the application which is then probed via an HTTP client.

If there are no integration tests, the issue won't be detected on the developer machine nor in CI. Nonetheless, it is unlikely to cause any end-user impact even if it manages to escape detection and reach production. The deployment will never complete if they are using a progressive rollout strategy: instances of the new version will crash as soon as they are launched, never getting a chance to mark themselves as healthy; all traffic will keep being handled by the old version, with no visible impact on end users of the application.

Given the above, we think that the impact of a runtime error is low enough to be worth exploring designs that do not guarantee compile-safety for the builder API1.

Providing clear feedback

Moving from a compile-time error to a runtime error does not require extensive refactoring. The definition of PokemonServiceBuilder goes from:

pub struct PokemonServiceBuilder<
    Op1,
    Op2,
    Op3,
    Op4,
    Op5,
    Op6,
    Exts1 = (),
    Exts2 = (),
    Exts3 = (),
    Exts4 = (),
    Exts5 = (),
    Exts6 = (),
    Pl = aws_smithy_http_server::plugin::IdentityPlugin,
> {
    check_health: Op1,
    do_nothing: Op2,
    get_pokemon_species: Op3,
    get_server_statistics: Op4,
    capture_pokemon: Op5,
    get_storage: Op6,
    #[allow(unused_parens)]
    _exts: std::marker::PhantomData<(Exts1, Exts2, Exts3, Exts4, Exts5, Exts6)>,
    plugin: Pl,
}

to:

pub struct PokemonServiceBuilder<
    Op1,
    Op2,
    Op3,
    Op4,
    Op5,
    Op6,
    Exts1 = (),
    Exts2 = (),
    Exts3 = (),
    Exts4 = (),
    Exts5 = (),
    Exts6 = (),
    Pl = aws_smithy_http_server::plugin::IdentityPlugin,
> {
    check_health: Option<Op1>,
    do_nothing: Option<Op2>,
    get_pokemon_species: Option<Op3>,
    get_server_statistics: Option<Op4>,
    capture_pokemon: Option<Op5>,
    get_storage: Option<Op6>,
    #[allow(unused_parens)]
    _exts: std::marker::PhantomData<(Exts1, Exts2, Exts3, Exts4, Exts5, Exts6)>,
    plugin: Pl,
}

All operation fields are now Option-wrapped. We introduce a new MissingOperationsError error to hold the names of the missing operations and their respective setter methods:

#[derive(Debug)]
pub struct MissingOperationsError {
    service_name: &'static str,
    operation_names2setter_methods: HashMap<&'static str, &'static str>,
}

impl Display for MissingOperationsError { /* */ }
impl std::error::Error for MissingOperationsError {}

which is then used in build as error type (not shown here for brevity). We can now try again to stray away from the happy path by forgetting to register a handler for the CheckHealth operation. The code compiles just fine this time, but the application fails when launched via cargo run:

<timestamp> ERROR pokemon_service: You must specify a handler for all operations attached to the `Pokemon` service.
We are missing handlers for the following operations:
- com.aws.example#CheckHealth

Use the dedicated methods on `PokemonServiceBuilder` to register the missing handlers:
- PokemonServiceBuilder::check_health

The error speaks the language of the domain, Smithy's interface definition language: it mentions operations, services, handlers. Understanding the error requires no familiarity with smithy-rs' internal type machinery or advanced trait patterns in Rust. We can also provide actionable suggestions: Rust beginners should be able to easily process the information, rectify the mistake and move on quickly.

Simplifying PokemonServiceBuilder's signature

Let's take a second look at the (updated) definition of PokemonServiceBuilder:

pub struct PokemonServiceBuilder<
    Op1,
    Op2,
    Op3,
    Op4,
    Op5,
    Op6,
    Exts1 = (),
    Exts2 = (),
    Exts3 = (),
    Exts4 = (),
    Exts5 = (),
    Exts6 = (),
    Pl = aws_smithy_http_server::plugin::IdentityPlugin,
> {
    check_health: Option<Op1>,
    do_nothing: Option<Op2>,
    get_pokemon_species: Option<Op3>,
    get_server_statistics: Option<Op4>,
    capture_pokemon: Option<Op5>,
    get_storage: Option<Op6>,
    #[allow(unused_parens)]
    _exts: std::marker::PhantomData<(Exts1, Exts2, Exts3, Exts4, Exts5, Exts6)>,
    plugin: Pl,
}

We have 13 generic parameters:

  • 1 for plugins (Pl);
  • 2 for each operation (OpX and ExtsX);

All those generic parameters were necessary when we were using the typestate builder pattern. They kept track of which operation handlers were missing: if any OpX was set to MissingOperation when calling build -> compilation error!

Do we still need all those generic parameters if we move forward with this RFC? You might be asking yourselves: why do those generics bother us? Is there any harm in keeping them around? We'll look at the impact of those generic parameters on two scenarios:

  • Branching in startup logic;
  • Breaking down a monolithic startup function into multiple smaller functions.

Branching -> "Incompatible types"

Conditional statements appear quite often in the startup logic for an application (or in the setup code for its integration tests). Let's consider a toy example: if a check_database flag is set to true, we want to register a different check_health handler - one that takes care of pinging the database to make sure it's up.

The "obvious" solution would look somewhat like this:

let check_database: bool = /* */;
let app = if check_database {
    app.check_health(check_health)
} else {
    app.check_health(check_health_with_database)
};
app.build();

The compiler is not pleased:

error[E0308]: `if` and `else` have incompatible types
  --> pokemon-service/src/bin/pokemon-service.rs:39:9
   |
36 |       let app = if check_database {
   |  _______________-
37 | |         app.check_health(check_health)
   | |         ------------------------------ expected because of this
38 | |     } else {
39 | |         app.check_health(check_health_with_database)
   | |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected fn item, found a different fn item
40 | |     };
   | |_____- `if` and `else` have incompatible types
   |
   = note: expected struct `PokemonServiceBuilder<Operation<IntoService<_, fn(CheckHealthInput) -> impl Future<Output =
    CheckHealthOutput> {check_health}>>, _, _, _, _, _, _, _, _, _, _, _>`
              found struct `PokemonServiceBuilder<Operation<IntoService<_, fn(CheckHealthInput) -> impl Future<Output =
    CheckHealthOutput> {check_health_with_database}>>, _, _, _, _, _, _, _, _, _, _, _>`

The developer must be aware of the following facts to unpack the error message:

  1. The two branches of an if/else statement need to return the same type.
  2. Each function closure has a new unique type (represented as fn(CheckHealthInput) -> impl Future<Output = CheckHealthOutput> {check_health} for check_health);
  3. The handler function type becomes part of the overall PokemonServiceBuilder type, a cog in the larger Op1 generic parameter used to hold the handler for the CheckHealth operation (i.e. Operation<IntoService<_, fn(CheckHealthInput) -> impl Future<Output = CheckHealthOutput> {check_health}>>);

The second fact requires an intermediate understanding of Rust's closures and opaque types (impl Trait). It's quite likely to confuse Rust beginners.

The developer has three options to move forward:

  1. Convert check_health and check_health_with_database into a common type that can be passed as a handler to PokemonServiceBuilder::check_health;
  2. Invoke the build method inside the two branches in order to return a "plain" PokemonService<Route<B>> from both branches.
  3. Embed the configuration parameter (check_database) in the application state, retrieve it inside check_health and perform the branching there.

I can't easily see a way to accomplish 1) using the current API. Pursuing 2) is straight-forward with a single conditional:

let check_database: bool = /* */;
let app = if check_database {
    app.check_health(check_health).build()
} else {
    app.check_health(check_health_with_database).build()
};

It becomes more cumbersome when we have more than a single conditional:

let check_database: bool = /* */;
let include_cpu_statics: bool = /* */;
match (check_database, include_cpu_statics) {
    (true, true) => app
        .check_health(check_health_with_database)
        .get_server_statistics(get_server_statistics_with_cpu)
        .build(),
    (true, false) => app
        .check_health(check_health_with_database)
        .get_server_statistics(get_server_statistics)
        .build(),
    (false, true) => app
        .check_health(check_health)
        .get_server_statistics(get_server_statistics_with_cpu())
        .build(),
    (false, false) => app
        .check_health(check_health)
        .get_server_statistics(get_server_statistics)
        .build(),
}

A lot of repetition compared to the code for the "obvious" approach:

let check_database: bool = /* */;
let include_cpu_statics: bool = /* */;
let app = if check_database {
    app.check_health(check_health)
} else {
    app.check_health(check_health_with_database)
};
let app = if include_cpu_statistics {
    app.get_server_statistics(get_server_statistics_with_cpu)
} else {
    app.get_server_statistics(get_server_statistics)
};
app.build();

The obvious approach becomes viable if we stop embedding the handler function type in PokemonServiceBuilder's overall type.

Refactoring into smaller functions -> Prepare for some type juggling!

Services with a high number of routes can lead to fairly long startup routines. Developers might be tempted to break down the startup routine into smaller functions, grouping together operations with common requirements (similar domain, same middlewares, etc.).

What does the signature of those smaller functions look like? The service builder must be one of the arguments if we want to register handlers. We must also return it to allow the orchestrating function to finish the application setup (our setters take ownership of self).

A first sketch:

fn partial_setup(builder: PokemonServiceBuilder) -> PokemonServiceBuilder {
    /* */
}

The compiler demands to see those generic parameters in the signature:

error[E0107]: missing generics for struct `PokemonServiceBuilder`
  --> pokemon-service/src/bin/pokemon-service.rs:28:27
   |
28 | fn partial_setup(builder: PokemonServiceBuilder) -> PokemonServiceBuilder {
   |                           ^^^^^^^^^^^^^^^^^^^^^ expected at least 6 generic arguments
   |
note: struct defined here, with at least 6 generic parameters: `Op1`, `Op2`, `Op3`, `Op4`, `Op5`, `Op6`

error[E0107]: missing generics for struct `PokemonServiceBuilder`
  --> pokemon-service/src/bin/pokemon-service.rs:28:53
   |
28 | fn partial_setup(builder: PokemonServiceBuilder) -> PokemonServiceBuilder {
   |                                                     ^^^^^^^^^^^^^^^^^^^^^ expected at least 6 generic arguments
   |
note: struct defined here, with at least 6 generic parameters: `Op1`, `Op2`, `Op3`, `Op4`, `Op5`, `Op6`

We could try to nudge the compiler into inferring them:

fn partial_setup(
    builder: PokemonServiceBuilder<_, _, _, _, _, _>,
) -> PokemonServiceBuilder<_, _, _, _, _, _> {
    /* */
}

but that won't fly either:

error[E0121]: the placeholder `_` is not allowed within types on item signatures for return types
  --> pokemon-service/src/bin/pokemon-service.rs:30:28
   |
30 | ) -> PokemonServiceBuilder<_, _, _, _, _, _> {
   |                            ^  ^  ^  ^  ^  ^ not allowed in type signatures
   |                            |  |  |  |  |
   |                            |  |  |  |  not allowed in type signatures
   |                            |  |  |  not allowed in type signatures
   |                            |  |  not allowed in type signatures
   |                            |  not allowed in type signatures
   |                            not allowed in type signatures

We must type it all out:

fn partial_setup<Op1, Op2, Op3, Op4, Op5, Op6>(
    builder: PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6>,
) -> PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6> {
    builder
}

That compiles, at last. Let's try to register an operation handler now:

fn partial_setup<Op1, Op2, Op3, Op4, Op5, Op6>(
    builder: PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6>,
) -> PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6> {
    builder.get_server_statistics(get_server_statistics)
}

That looks innocent, but it doesn't fly:

error[E0308]: mismatched types
  --> pokemon-service/src/bin/pokemon-service.rs:31:5
   |
28 | fn partial_setup<Op1, Op2, Op3, Op4, Op5, Op6>(
   |                                 --- this type parameter
29 |     builder: PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6>,
30 | ) -> PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6> {
   |      --------------------------------------------------- expected `PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6>` because of return type
31 |     builder.get_server_statistics(get_server_statistics)
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected type parameter `Op4`, found struct `Operation`
   |
   = note: expected struct `PokemonServiceBuilder<_, _, _, Op4, _, _, _>`
              found struct `PokemonServiceBuilder<_, _, _, Operation<IntoService<GetServerStatistics, fn(GetServerStatisticsInput, Extension<Arc<State>>) -> impl Future<Output = GetServerStatisticsOutput> {get_server_statistics}>>, _, _, _>

By registering a handler we have changed the corresponding OpX generic parameter. Fixing this error requires some non-trivial type gymnastic - I gave up after trying for ~15 minutes.

Cut them down: going from 2N+1 to 2 generic parameters

The previous two examples should have convinced you that the 2N+1 generic parameters on PokemonServiceBuilder harm the ergonomics of our API. Can we get rid of them?

Yes! Let's look at one possible approach:

pub struct PokemonServiceBuilder<Body, Plugin> {
    check_health: Option<Route<Body>>,
    do_nothing: Option<Route<Body>>,
    get_pokemon_species: Option<Route<Body>>,
    get_server_statistics: Option<Route<Body>>,
    capture_pokemon: Option<Route<Body>>,
    get_storage: Option<Route<Body>>,
    plugin: Plugin,
}

We no longer store the raw handlers inside PokemonServiceBuilder. We eagerly upgrade the operation handlers to a Route instance when they are registered with the builder.

impl<Body, Plugin> PokemonServiceBuilder<Body, Plugin> {
    pub fn get_pokemon_species<Handler, Extensions>(mut self, handler: Handler) -> Self
    /* Complex trait bounds */
    {
        let route = Route::new(Operation::from_handler(handler).upgrade(&self.plugin));
        self.get_pokemon_species = Some(route);
        self
    }

    /* other setters and methods */
}

The existing API performs the upgrade when build is called, forcing PokemonServiceBuilder to store the raw handlers and keep two generic parameters around (OpX and ExtsX) for each operation. The proposed API requires plugins to be specified upfront, when creating an instance of the builder. They cannot be modified after a PokemonServiceBuilder instance has been built:

impl PokemonService<()> {
    /// Constructs a builder for [`PokemonService`].
    pub fn builder<Body, Plugin>(plugin: Plugin) -> PokemonServiceBuilder<Body, Plugin> {
        PokemonServiceBuilder {
            check_health: None,
            do_nothing: None,
            get_pokemon_species: None,
            get_server_statistics: None,
            capture_pokemon: None,
            get_storage: None,
            plugin,
        }
    }
}

This constraint guarantees that all operation handlers are upgraded to a Route using the same set of plugins.

Having to specify all plugins upfront is unlikely to have a negative impact on developers currently using smithy-rs. We have seen how cumbersome it is to break the startup logic into different functions using the current service builder API. Developers are most likely specifying all plugins and routes in the same function even if the current API allows them to intersperse route registrations and plugin registrations: they would simply have to re-order their registration statements to adopt the API proposed in this RFC.

Alternatives: allow new plugins to be registered after builder creation

The new design prohibits the following invocation style:

let plugin = ColorPlugin::new();
PokemonService::builder(plugin)
    // [...]
    .get_pokemon_species(get_pokemon_species)
    // Add PrintPlugin
    .print()
    .get_storage(get_storage)
    .build()

We could choose to remove this limitation and allow handlers to be upgraded using a different set of plugins depending on where they were registered. In the snippet above, for example, we would have:

  • get_pokemon_species is upgraded using just the ColorPlugin;
  • get_storage is upgraded using both the ColorPlugin and the PrintPlugin.

There are no technical obstacles preventing us from implementing this API, but I believe it could easily lead to confusion and runtime surprises due to a mismatch between what the developer might expect PrintPlugin to apply to (all handlers) and what it actually applies to (handlers registered after .print()).

We can provide developers with other mechanisms to register plugins for a single operation or a subset of operations without introducing ambiguity. For attaching additional plugins to a single operation, we could introduce a blanket Pluggable implementation for all operations in aws-smithy-http-server:

impl<P, Op, Pl, S, L> Pluggable<Pl> for Operation<S, L> where Pl: Plugin<P, Op, S, L> {
    type Output = Operation<Pl::Service, Pl::Layer>;

    fn apply(self, new_plugin: Pl) -> Self::Output {
       new_plugin.map(self)
   }
}

which would allow developers to invoke op.apply(MyPlugin) or call extensions methods such as op.print() where op is an Operation. For attaching additional plugins to a subgroup of operations, instead, we could introduce nested builders:

let initial_plugins = ColorPlugin;
let mut builder = PokemonService::builder(initial_plugins)
    .get_pokemon_species(get_pokemon_species);
let additional_plugins = PrintPlugin;
// PrintPlugin will be applied to all handlers registered on the scoped builder returned by `scope`.
let nested_builder = builder.scoped(additional_plugins)
    .get_storage(get_storage)
    .capture_pokemon(capture_pokemon)
    // Register all the routes on the scoped builder with the parent builder.
    // API names are definitely provisional and bikesheddable.
    .attach(builder);
let app = builder.build();

Both proposals are outside the scope of this RFC, but they are shown here for illustrative purposes.

Alternatives: lazy and eager-on-demand type erasure

A lot of our issues stem from type mismatch errors: we are encoding the type of our handlers into the overall type of the service builder and, as a consequence, we end up modifying that type every time we set a handler or modify its state. Type erasure is a common approach for mitigating these issues - reduce those generic parameters to a common type to avoid the mismatch errors. This whole RFC can be seen as a type erasure proposal - done eagerly, as soon as the handler is registered, using Option<Route<B>> as our "common type" after erasure.

We could try to strike a different balance - i.e. avoid performing type erasure eagerly, but allow developers to erase types on demand. Based on my analysis, this could happen in two ways:

  1. We cast handlers into a Box<dyn Upgradable<Protocol, Operation, Exts, Body, Plugin>> to which we can later apply plugins (lazy type erasure);
  2. We upgrade registered handlers to Route<B> and apply plugins in the process (eager type erasure on-demand).

Let's ignore these implementation issues for the time being to focus on what the ergonomics would look like assuming we can actually perform type erasure. In practice, we are going to assume that:

  • In approach 1), we can call .boxed() on a registered operation and get a Box<dyn Upgradable> back;
  • In approach 2), we can call .erase() on the entire service builder and convert all registered operations to Route<B> while keeping the MissingOperation entries as they are. After erase has been called, you can no longer register plugins (or, alternatively, the plugins you register will only apply new handlers).

We are going to explore both approaches under the assumption that we want to preserve compile-time verification for missing handlers. If we are willing to abandon compile-time verification, we get better ergonomics since all OpX and ExtsX generic parameters can be erased (i.e. we no longer need to worry about MissingOperation).

On Box<dyn Upgradable<Protocol, Operation, Exts, Body, Plugin>>

This is the current definition of the Upgradable trait:

/// Provides an interface to convert a representation of an operation to a HTTP [`Service`](tower::Service) with
/// canonical associated types.
pub trait Upgradable<Protocol, Operation, Exts, Body, Plugin> {
    type Service: Service<http::Request<Body>, Response = http::Response<BoxBody>>;

    /// Performs an upgrade from a representation of an operation to a HTTP [`Service`](tower::Service).
    fn upgrade(self, plugin: &Plugin) -> Self::Service;
}

In order to perform type erasure, we need to determine:

  • what type parameters we are going to pass as generic arguments to Upgradable;
  • what type we are going to use for the associated type Service.

We have:

  • there is a single known protocol for a service, therefore we can set Protocol to its concrete type (e.g. AwsRestJson1);
  • each handler refers to a different operation, therefore we cannot erase the Operation and the Exts parameters;
  • both Body and Plugin appear as generic parameters on the service builder itself, therefore we can set them to the same type;
  • we can use Route<B> to normalize the Service associated type.

The above leaves us with two unconstrained type parameters, Operation and Exts, for each operation. Those unconstrained type parameters leak into the type signature of the service builder itself. We therefore find ourselves having, again, 2N+2 type parameters.

Branching

Going back to the branching example:

let check_database: bool = /* */;
let builder = if check_database {
    builder.check_health(check_health)
} else {
    builder.check_health(check_health_with_database)
};
let app = builder.build();

In approach 1), we could leverage the .boxed() method to convert the actual OpX type into a Box<dyn Upgradable>, thus ensuring that both branches return the same type:

let check_database: bool = /* */;
let builder = if check_database {
    builder.check_health_operation(Operation::from_handler(check_health).boxed())
} else {
    builder.check_health_operation(Operation::from_handler(check_health_with_database).boxed())
};
let app = builder.build();

The same cannot be done when conditionally registering a route, because on the else branch we cannot convert MissingOperation into a Box<dyn Upgradable> since MissingOperation doesn't implement Upgradable - the pillar on which we built all our compile-time safety story.

// This won't compile!
let builder = if check_database {
    builder.check_health_operation(Operation::from_handler(check_health).boxed())
} else {
    builder
};

In approach 2), we can erase the whole builder in both branches when they both register a route:

let check_database: bool = /* */;
let boxed_builder = if check_database {
    builder.check_health(check_health).erase()
} else {
    builder.check_health(check_health_with_database).erase()
};
let app = boxed_builder.build();

but, like in approach 1), we will still get a type mismatch error if one of the two branches leaves the route unset.

Refactoring into smaller functions

Developers would still have to spell out all generic parameters when writing a function that takes in a builder as a parameter:

fn partial_setup<Op1, Op2, Op3, Op4, Op5, Op6, Body, Plugin>(
    builder: PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6, Body, Plugin>,
) -> PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6, Body, Plugin> {
    builder
}

Writing the signature after having modified the builder becomes easier though. In approach 1), they can explicitly change the touched operation parameters to the boxed variant:

fn partial_setup<Op1, Op2, Op3, Op4, Op5, Op6, Exts4, Body, Plugin>(
    builder: PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6, Body, Plugin, Exts4=Exts4>,
) -> PokemonServiceBuilder<
        Op1, Op2, Op3, Box<dyn Upgradable<AwsRestJson1, GetServerStatistics, Exts4, Body, Plugin>>,
        Op5, Op6, Body, Plugin, Body, Plugin, Exts4=Exts
    > {
    builder.get_server_statistics(get_server_statistics)
}

It becomes trickier in approach 2), since to retain compile-time safety on the builder we expect erase to map MissingOperation into MissingOperation. Therefore, we can't write something like this:

fn partial_setup<Body, Op1, Op2, Op3, Op4, Op5, Op6>(
    builder: PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6>,
) -> PokemonServiceBuilder<Route<B>, Route<B>, Route<B>, Route<B>, Route<B>, Route<B>> {
    builder.get_server_statistics(get_server_statistics).()
}

The compiler would reject it since it can't guarantee that all other operations can be erased to a Route<B>. This is likely to require something along the lines of:

fn partial_setup<Body, Op1, Op2, Op3, Op4, Op5, Op6>(
    builder: PokemonServiceBuilder<Op1, Op2, Op3, Op4, Op5, Op6>,
) -> PokemonServiceBuilder<<Op1 as TypeErase>::Erased, <Op2 as TypeErase>::Erased, <Op3 as TypeErase>::Erased, <Op4 as TypeErase>::Erased, <Op5 as TypeErase>::Erased, <Op6 as TypeErase>::Erased>
where
    // Omitting a bunch of likely needed additional generic parameters and bounds here
    Op1: TypeErase,
    Op2: TypeErase,
    Op3: TypeErase,
    Op4: TypeErase,
    Op5: TypeErase,
    Op6: TypeErase,
{
    builder.get_server_statistics(get_server_statistics).()
}

Summary

Both approaches force us to have a number of generic parameters that scales linearly with the number of operations on the service, affecting the ergonomics of the resulting API in both the branching and the refactoring scenarios. We believe that the ergonomics advantages of the proposal advanced by this RFC outweigh the limitation of having to specify your plugins upfront, when creating the builder instance.

Builder extensions: what now?

The Pluggable trait was an interesting development out of RFC 20: it allows you to attach methods to a service builder using an extension trait.

/// An extension to service builders to add the `print()` function.
pub trait PrintExt: aws_smithy_http_server::plugin::Pluggable<PrintPlugin> {
    /// Causes all operations to print the operation name when called.
    ///
    /// This works by applying the [`PrintPlugin`].
    fn print(self) -> Self::Output
        where
            Self: Sized,
    {
        self.apply(PrintPlugin)
    }
}

This pattern needs to be revisited if we want to move forward with this RFC, since new plugins cannot be registered after the builder has been instantiated. My recommendation would be to implement Pluggable for PluginStack, providing the same pattern ahead of the creation of the builder:

// Currently you'd have to go for `PluginStack::new(IdentityPlugin, IdentityPlugin)`,
// but that can be smoothed out even if this RFC isn't approved.
let plugin_stack = PluginStack::default()
    // Use the extension method
    .print();
let app = PokemonService::builder(plugin_stack)
    .get_pokemon_species(get_pokemon_species)
    .get_storage(get_storage)
    .get_server_statistics(get_server_statistics)
    .capture_pokemon(capture_pokemon)
    .do_nothing(do_nothing)
    .build()?;

Playing around with the design

The API proposed in this RFC has been manually implemented for the Pokemon service. You can find the code here.

Changes checklist

1

The impact of a runtime error on developer productivity can be further minimised by encouraging adoption of integration testing; this can be achieved, among other options, by authoring guides that highlight its benefits and provide implementation guidance.

RFC: RequestID in business logic handlers

Status: Implemented

Applies to: server

For a summarized list of proposed changes, see the Changes Checklist section.

Terminology

  • RequestID: a service-wide request's unique identifier
  • UUID: a universally unique identifier

RequestID is an element that uniquely identifies a client request. RequestID is used by services to map all logs, events and specific data to a single operation. This RFC discusses whether and how smithy-rs can make that value available to customers.

Services use a RequestID to collect logs related to the same request and see its flow through the various operations, help clients debug requests by sharing this value and, in some cases, use this value to perform their business logic. RequestID is unique across a service at least within a certain timeframe.

This value for the purposes above must be set by the service.

Having the client send the value brings the following challenges:

  • The client could repeatedly send the same RequestID
  • The client could send no RequestID
  • The client could send a malformed or malicious RequestID (like in 1 and 2).

To minimise the attack surface and provide a uniform experience to customers, servers should generate the value. However, services should be free to read the ID sent by clients in HTTP headers: it is common for services to read the request ID a client sends, record it and send it back upon success. A client may want to send the same value to multiple services. Services should still decide to have their own unique request ID per actual call.

RequestIDs are not to be used by multiple services, but only within a single service.

The user experience if this RFC is implemented

The proposal is to implement a RequestId type and make it available to middleware and business logic handlers, through FromParts and as a Service. To aid customers already relying on clients' request IDs, there will be two types: ClientRequestId and ServerRequestId.

  1. Implementing FromParts for Extension<RequestId> gives customers the ability to write their handlers:
pub async fn handler(
    input: input::Input,
    request_id: Extension<ServerRequestId>,
) -> ...
pub async fn handler(
    input: input::Input,
    request_id: Extension<ClientRequestId>,
) -> ...

ServerRequestId and ClientRequestId will be injected into the extensions by a layer. This layer can also be used to open a span that will log the request ID: subsequent logs will be in the scope of that span.

  1. ServerRequestId format:

Common formats for RequestIDs are:

  • UUID: a random string, represented in hex, of 128 bits from IETF RFC 4122: 7c038a43-e499-4162-8e70-2d4d38595930
  • The hash of a sequence such as date+thread+server: 734678902ea938783a7200d7b2c0b487
  • A verbose description: current_ms+hostname+increasing_id

For privacy reasons, any format that provides service details should be avoided. A random string is preferred. The proposed format is to use UUID, version 4.

A Service that inserts a RequestId in the extensions will be implemented as follows:

impl<R, S> Service<http::Request<R>> for ServerRequestIdProvider<S>
where
    S: Service<http::Request<R>>,
{
    type Response = S::Response;
    type Error = S::Error;
    type Future = S::Future;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.inner.poll_ready(cx)
    }

    fn call(&mut self, mut req: http::Request<R>) -> Self::Future {
        req.extensions_mut().insert(ServerRequestId::new());
        self.inner.call(req)
    }
}

For client request IDs, the process will be, in order:

  • If a header is found matching one of the possible ones, use it
  • Otherwise, None

Option is used to distinguish whether a client had provided an ID or not.

impl<R, S> Service<http::Request<R>> for ClientRequestIdProvider<S>
where
    S: Service<http::Request<R>>,
{
    type Response = S::Response;
    type Error = S::Error;
    type Future = S::Future;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.inner.poll_ready(cx)
    }

    fn call(&mut self, mut req: http::Request<R>) -> Self::Future {
        for possible_header in self.possible_headers {
            if let Some(id) = req.headers.get(possible_header) {
                req.extensions_mut().insert(Some(ClientRequestId::new(id)));
                return self.inner.call(req)
            }
        }
        req.extensions_mut().insert(None);
        self.inner.call(req)
    }
}

The string representation of a generated ID will be valid for this regex:

  • For ServerRequestId: /^[A-Za-z0-9_-]{,48}$/
  • For ClientRequestId: see the spec

Although the generated ID is opaque, this will give guarantees to customers as to what they can expect, if the server ID is ever updated to a different format.

Changes checklist

  • Implement ServerRequestId: a new() function that generates a UUID, with Display, Debug and ToStr implementations
  • Implement ClientRequestId: new() that wraps a string (the header value) and the header in which the value could be found, with Display, Debug and ToStr implementations
  • Implement FromParts for Extension<ServerRequestId>
  • Implement FromParts for Extension<ClientRequestId>

Changes since the RFC has been approved

This RFC has been changed to only implement ServerRequestId.

RFC: Constraint traits

Status: Implemented.

See the description of the PR that laid the foundation for the implementation of constraint traits for a complete reference. See the Better Constraint Violations RFC too for subsequent improvements to this design.

See the uber tracking issue for pending work.

Constraint traits are used to constrain the values that can be provided for a shape.

For example, given the following Smithy model,

@length(min: 18)
Integer Age

the integer Age must take values greater than or equal to 18.

Constraint traits are most useful when enforced as part of input model validation to a service. When a server receives a request whose contents deserialize to input data that violates the modeled constraints, the operation execution's preconditions are not met, and as such rejecting the request without executing the operation is expected behavior.

Constraint traits can also be applied to operation output member shapes, but the expectation is that service implementations not fail to render a response when an output value does not meet the specified constraints. From awslabs/smithy#1039:

This might seem counterintuitive, but our philosophy is that a change in server-side state should not be hidden from the caller unless absolutely necessary. Refusing to service an invalid request should always prevent server-side state changes, but refusing to send a response will not, as there's generally no reasonable route for a server implementation to unwind state changes due to a response serialization failure.

In general, clients should not enforce constraint traits in generated code. Clients must also never enforce constraint traits when sending requests. This is because:

  • addition and removal of constraint traits are backwards-compatible from a client's perspective (although this is not documented anywhere in the Smithy specification),
  • the client may have been generated with an older version of the model; and
  • the most recent model version might have lifted some constraints.

On the other hand, server SDKs constitute the source of truth for the service's behavior, so they interpret the model in all its strictness.

The Smithy spec defines 8 constraint traits:

The idRef and private traits are enforced at SDK generation time by the awslabs/smithy libraries and bear no relation to generated Rust code.

The only constraint trait enforcement that is generated by smithy-rs clients should be and is the enum trait, which renders Rust enums.

The required trait is already and only enforced by smithy-rs servers since #1148.

That leaves 4 traits: length, pattern, range, and uniqueItems.

Implementation

This section addresses how to implement and enforce the length, pattern, range, and uniqueItems traits. We will use the length trait applied to a string shape as a running example. The implementation of this trait mostly carries over to the other three.

Example implementation for the length trait

Consider the following Smithy model:

@length(min: 1, max: 69)
string NiceString

The central idea to the implementation of constraint traits is: parse, don't validate. Instead of code-generating a Rust String to represent NiceString values and perform the validation at request deserialization, we can leverage Rust's type system to guarantee domain invariants. We can generate a wrapper tuple struct that parses the string's value and is "tight" in the set of values it can accept:

pub struct NiceString(String);

impl TryFrom<String> for NiceString {
    type Error = nice_string::ConstraintViolation;

    fn try_from(value: String) -> Result<Self, Self::Error> {
        let num_code_points = value.chars().count();
        if 1 <= num_code_points && num_code_points <= 69 {
            Ok(Self(value))
        } else {
            Err(nice_string::ConstraintViolation::Length(num_code_points))
        }
    }
}

(Note that we're using the linear time check chars().count() instead of len() on the input value, since the Smithy specification says the length trait counts the number of Unicode code points when applied to string shapes.)

The goal is to enforce, at the type-system level, that these constrained structs always hold valid data. It should be impossible for the service implementer, without resorting to unsafe Rust, to construct a NiceString that violates the model. The actual check is performed in the implementation of TryFrom<InnerType> for the generated struct, which makes it convenient to use the ? operator for error propagation. Each constrained struct will have a related std::error::Error enum type to signal the first parsing failure, with one enum variant per applied constraint trait:

pub mod nice_string {
    pub enum ConstraintViolation {
        /// Validation error holding the number of Unicode code points found, when a value between `1` and
        /// `69` (inclusive) was expected.
        Length(usize),
    }

    impl std::error::Error for ConstraintViolation {}
}

std::error::Error requires Display and Debug. We will #[derive(Debug)], unless the shape also has the sensitive trait, in which case we will just print the name of the struct:

impl std::fmt::Debug for ConstraintViolation {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        let mut formatter = f.debug_struct("ConstraintViolation");
        formatter.finish()
    }
}

Display is used to produce human-friendlier representations. Its implementation might be called when formatting a 400 HTTP response message in certain protocols, for example.

Request deserialization

We will continue to deserialize the different parts of the HTTP message into the regular Rust standard library types. However, just before the deserialization function returns, we will convert the type into the wrapper tuple struct that will eventually be handed over to the operation handler. This is what we're already doing when deserializing strings into enums. For example, given the Smithy model:

@enum([
    { name: "Spanish", value: "es" },
    { name: "English", value: "en" },
    { name: "Japanese", value: "jp" },
])
string Language

the code the client generates when deserializing a string from a JSON document into the Language enum is (excerpt):

...
match key.to_unescaped()?.as_ref() {
    "language" => {
        builder = builder.set_language(
            aws_smithy_json::deserialize::token::expect_string_or_null(
                tokens.next(),
            )?
            .map(|s| {
                s.to_unescaped()
                    .map(|u| crate::model::Language::from(u.as_ref()))
            })
            .transpose()?,
        );
    }
    _ => aws_smithy_json::deserialize::token::skip_value(tokens)?,
}
...

Note how the String gets converted to the enum via Language::from().

impl std::convert::From<&str> for Language {
    fn from(s: &str) -> Self {
        match s {
            "es" => Language::Spanish,
            "en" => Language::English,
            "jp" => Language::Japanese,
            other => Language::Unknown(other.to_owned()),
        }
    }
}

For constrained shapes we would do the same to parse the inner deserialized value into the wrapper tuple struct, except for these differences:

  1. For enums, the client generates an Unknown variant that "contains new variants that have been added since this code was generated". The server does not need such a variant (#1187).
  2. Conversions into the tuple struct are fallible (try_from() instead of from()). These errors will result in a my_struct::ConstraintViolation.

length trait

We will enforce the length constraint by calling len() on Rust's Vec (list and set shapes), HashMap (map shapes) and our aws_smithy_types::Blob (bytes shapes).

We will enforce the length constraint trait on String (string shapes) by calling .chars().count().

pattern trait

The pattern trait

restricts string shape values to a specified regular expression.

We will implement this by using the regex's crate is_match. We will use once_cell to compile the regex only the first time it is required.

uniqueItems trait

The uniqueItems trait

indicates that the items in a List MUST be unique.

If the list shape is sparse, more than one null value violates this constraint.

We will enforce this by copying references to the Vec's elements into a HashSet and checking that the sizes of both containers coincide.

Trait precedence and naming of the tuple struct

From the spec:

Some constraints can be applied to shapes as well as structure members. If a constraint of the same type is applied to a structure member and the shape that the member targets, the trait applied to the member takes precedence.

structure ShoppingCart {
    @range(min: 7, max:12)
    numberOfItems: PositiveInteger
}

@range(min: 1)
integer PositiveInteger

In the above example,

the range trait applied to numberOfItems takes precedence over the one applied to PositiveInteger. The resolved minimum will be 7, and the maximum 12.

When the constraint trait is applied to a member shape, the tuple struct's name will be the PascalCased name of the member shape, NumberOfItems.

Unresolved questions

  1. Should we code-generate unsigned integer types (u16, u32, u64) when the range trait is applied with min set to a value greater than or equal to 0?
    • A user has even suggested to use the std::num::NonZeroUX types (e.g. NonZeroU64) when range is applied with min set to a value greater than 0.
    • UPDATE: This requires further design work. There are interoperability concerns: for example, the positive range of a u32 is strictly greater than that of an i32, so clients wouldn't be able to receive values within the non-overlapping range.
  2. In request deserialization, should we fail with the first violation and immediately render a response, or attempt to parse the entire request and provide a complete and structured report?
    • UPDATE: We will provide a response containing all violations. See the "Collecting Constraint Violations" section in the Better Constraint Violations RFC.
  3. Should we provide a mechanism for the service implementer to construct a Rust type violating the modeled constraints in their business logic e.g. a T::new_unchecked() constructor? This could be useful (1) when the user knows the provided inner value does not violate the constraints and doesn't want to incur the performance penalty of the check; (2) when the struct is in a transient invalid state. However:
    • (2) is arguably a modelling mistake and a separate struct to represent the transient state would be a better approach,
    • the user could use unsafe Rust to bypass the validation; and
    • adding this constructor is a backwards-compatible change, so it can always be added later if this feature is requested.
    • UPDATE: We decided to punt on this until users express interest.

Alternative design

An alternative design with less public API surface would be to perform constraint validation at request deserialization, but hand over a regular "loose" type (e.g. String instead of NiceString) that allows for values violating the constraints. If we were to implement this approach, we can implement it by wrapping the incoming value in the aforementioned tuple struct to perform the validation, and immediately unwrap it.

Comparative advantages:

  • Validation remains an internal detail of the framework. If the semantics of a constraint trait change, the behavior of the service is still backwards-incompatibly affected, but user code is not.
  • Less "invasive". Baking validation in the generated type might be deemed as the service framework overreaching responsibilities.

Comparative disadvantages:

  • It becomes possible to send responses with invalid operation outputs. All the service framework could do is log the validation errors.
  • Baking validation at the type-system level gets rid of an entire class of logic errors.
  • Less idiomatic (this is subjective). The pattern of wrapping a more primitive type to guarantee domain invariants is widespread in the Rust ecosystem. The standard library makes use of it extensively.

Note that both designs are backwards incompatible in the sense that you can't migrate from one to the other without breaking user code.

UPDATE: We ended up implementing both designs, adding a flag to opt into the alternative design. Refer to the mentions of the publicConstrainedTypes flag in the description of the Builders of builders PR.

RFC: Client Crate Organization

Status: Implemented

Applies to: clients (and may impact servers due to shared codegen)

This RFC proposes changing the organization structure of the generated client crates to:

  1. Make discovery in the crate documentation easier.
  2. Facilitate re-exporting types from runtime crates in related modules without name collisions.
  3. Facilitate feature gating operations for faster compile times in the future.

Previous Organization

Previously, crates were organized as such:

.
├── client
|   ├── fluent_builders
|   |   └── <One fluent builder per operation>
|   ├── Builder (*)
|   └── Client
├── config
|   ├── retry
|   |   ├── RetryConfig (*)
|   |   ├── RetryConfigBuilder (*)
|   |   └── RetryMode (*)
|   ├── timeout
|   |   ├── TimeoutConfig (*)
|   |   └── TimeoutConfigBuilder (*)
|   ├── AsyncSleep (*)
|   ├── Builder
|   ├── Config
|   └── Sleep (*)
├── error
|   ├── <One module per error to contain a single struct named `Builder`>
|   ├── <One struct per error named `${error}`>
|   ├── <One struct per operation named `${operation}Error`>
|   └── <One enum per operation named `${operation}ErrorKind`>
├── http_body_checksum (empty)
├── input
|   ├── <One module per input to contain a single struct named `Builder`>
|   └── <One struct per input named `${operation}Input`>
├── lens (empty)
├── middleware
|   └── DefaultMiddleware
├── model
|   ├── <One module per shape to contain a single struct named `Builder`>
|   └── <One struct per shape>
├── operation
|   ├── customize
|   |   ├── ClassifyRetry (*)
|   |   ├── CustomizableOperation
|   |   ├── Operation (*)
|   |   ├── RetryKind (*)
|   └── <One struct per operation>
├── output
|   ├── <One module per output to contain a single struct named `Builder`>
|   └── <One struct per output named `${operation}Input`>
├── paginator
|   ├── <One struct per paginated operation named `${operation}Paginator`>
|   └── <Zero to one struct(s) per paginated operation named `${operation}PaginatorItems`>
├── presigning
|   ├── config
|   |   ├── Builder
|   |   ├── Error
|   |   └── PresigningConfig
|   └── request
|       └── PresignedRequest
├── types
|   ├── AggregatedBytes (*)
|   ├── Blob (*)
|   ├── ByteStream (*)
|   ├── DateTime (*)
|   └── SdkError (*)
├── AppName (*)
├── Client
├── Config
├── Credentials (*)
├── Endpoint (*)
├── Error
├── ErrorExt (for some services)
├── PKG_VERSION
└── Region (*)

(*) - signifies that a type is re-exported from one of the runtime crates

Proposed Changes

This RFC proposes reorganizing types by operation first and foremost, and then rearranging other pieces to reduce codegen collision risk.

Establish a pattern for builder organization

Builders (distinct from fluent builders) are generated alongside all inputs, outputs, models, and errors. They all follow the same overall pattern (where shapeType is Input, Output, or empty for models/errors):

.
└── module
    ├── <One module per shape to contain a single struct named `Builder`>
    └── <One struct per shape named `${prefix}${shapeType}`>

This results in large lists of modules that all have exactly one item in them, which makes browsing the documentation difficult, and introduces the possibility of name collisions when re-exporting modules from the runtime crates.

Builders should adopt a prefix and go into a single builders module, similar to how the fluent builders currently work:

.
├── module
|   └── builders
|       └── <One struct per shape named `${prefix}${shapeType}Builder`>
└──---- <One struct per shape named `${prefix}${shapeType}`>

Organize code generated types by operation

All code generated for an operation that isn't shared between operations will go into operation-specific modules. This includes inputs, outputs, errors, parsers, and paginators. Types shared across operations will remain in another module (discussed below), and serialization/deserialization logic for those common types will also reside in that common location for now. If operation feature gating occurs in the future, further optimization can be done to track which of these are used by feature, or they can be reorganized (this would be discussed in a future RFC and is out of scope here).

With code generated operations living in crate::operation, there is a high chance of name collision with the customize module. To resolve this, customize will be moved into crate::client.

The new crate::operation module will look as follows:

.
└── operation
    └── <One module per operation named after the operation in lower_snake_case>
        ├── paginator
        |   ├── `${operation}Paginator`
        |   └── `${operation}PaginatorItems`
        ├── builders
        |   ├── `${operation}FluentBuilder`
        |   ├── `${operation}InputBuilder`
        |   └── `${operation}OutputBuilder`
        ├── `${operation}Error`
        ├── `${operation}Input`
        ├── `${operation}Output`
        └── `${operation}Parser` (private/doc hidden)

Reorganize the crate root

The crate root should only host the most frequently used types, or phrased differently, the types that are critical to making a service call with default configuration, or that are required for the most frequent config changes (such as setting credentials, or changing the region/endpoint).

Previously, the following were exported in root:

.
├── AppName
├── Client
├── Config
├── Credentials
├── Endpoint
├── Error
├── ErrorExt (for some services)
├── PKG_VERSION
└── Region

The AppName is infrequently set, and will be moved into crate::config. Customers are encouraged to use aws-config crate to resolve credentials, region, and endpoint. Thus, these types no longer need to be at the top-level, and will be moved into crate::config. ErrorExt will be moved into crate::error, but Error will stay in the crate root so that customers that alias the SDK crate can easily reference it in their Results:

use aws_sdk_s3 as s3;

fn some_function(/* ... */) -> Result<(), s3::Error> {
    /* ... */
}

The PKG_VERSION should move into a new meta module, which can also include other values in the future such as the SHA-256 hash of the model used to produce the crate, or the version of smithy-rs that generated it.

Conditionally remove Builder from crate::client

Previously, the Smithy Client builder was re-exported alongside the SDK fluent Client so that non-SDK clients could easily customize the underlying Smithy client by using the fluent client's Client::with_config function or From<aws_smithy_client::client::Client<C, M, R>> trait implementation.

This makes sense for non-SDK clients where customization of the connector and middleware types is supported generically, but less sense for SDKs since the SDK clients are hardcoded to use DynConnector and DynMiddleware.

Thus, the Smithy client Builder should not be re-exported for SDKs.

Create a primitives module

Previously, crate::types held re-exported types from aws-smithy-types that are used by code generated structs/enums.

This module will be renamed to crate::primitives so that the name types can be repurposed in the next section.

Repurpose the types module

The name model is meaningless outside the context of code generation (although there is precedent since both the Java V2 and Kotlin SDKs use the term). Previously, this module held all the generated structs/enums that are referenced by inputs, outputs, and errors.

This RFC proposes that this module be renamed to types, and that all code generated types for shapes that are reused between operations (basically anything that is not an input, output, or error) be moved here. This would look as follows:

.
└── types
    ├── error
    |   ├── builders
    |   |   └── <One struct per error named `${error}Builder`>
    |   └── <One struct per error named `${error}`>
    ├── builders
    |   └── <One struct per shape named `${shape}Builder`>
    └── <One struct per shape>

Customers using the fluent builder should be able to just use ${crate}::types::*; to immediately get access to all the shared types needed by the operations they are calling.

Additionally, moving the top-level code generated error types into crate::types will eliminate a name collision issue in the crate::error module.

Repurpose the original crate::error module

The error module is significantly smaller after all the code generated error types are moved out of it. This top-level module is now available for re-exports and utilities.

The following will be re-exported in crate::error:

  • aws_smithy_http::result::SdkError
  • aws_smithy_types::error::display::DisplayErrorContext

For crates that have an ErrorExt, it will also be moved into crate::error.

Flatten the presigning module

The crate::presigning module only has four members, so it should be flattened from:

.
└── presigning
    ├── config
    |   ├── Builder
    |   ├── Error
    |   └── PresigningConfig
    └── request
        └── PresignedRequest

to:

.
└── presigning
    ├── PresigningConfigBuilder
    ├── PresigningConfigError
    ├── PresigningConfig
    └── PresignedRequest

At the same time, Builder and Error will be renamed to PresigningConfigBuilder and PresigningConfigError respectively since these will rarely be referred to directly (preferring PresigningConfig::builder() instead; the error will almost always be unwrapped).

Remove the empty modules

The lens and http_body_checksum modules have nothing inside them, and their documentation descriptions are not useful to customers:

lens: Generated accessors for nested fields

http_body_checksum: Functions for modifying requests and responses for the purposes of checksum validation

These modules hold private functions that are used by other generated code, and should just be made private or #[doc(hidden)] if necessary.

New Organization

All combined, the following is the new publicly visible organization:

.
├── client
|   ├── customize
|   |   ├── ClassifyRetry (*)
|   |   ├── CustomizableOperation
|   |   ├── Operation (*)
|   |   └── RetryKind (*)
|   ├── Builder (only in non-SDK crates) (*)
|   └── Client
├── config
|   ├── retry
|   |   ├── RetryConfig (*)
|   |   ├── RetryConfigBuilder (*)
|   |   └── RetryMode (*)
|   ├── timeout
|   |   ├── TimeoutConfig (*)
|   |   └── TimeoutConfigBuilder (*)
|   ├── AppName (*)
|   ├── AsyncSleep (*)
|   ├── Builder
|   ├── Config
|   ├── Credentials (*)
|   ├── Endpoint (*)
|   ├── Region (*)
|   └── Sleep (*)
├── error
|   ├── DisplayErrorContext (*)
|   ├── ErrorExt (for some services)
|   └── SdkError (*)
├── meta
|   └── PKG_VERSION
├── middleware
|   └── DefaultMiddleware
├── operation
|   └── <One module per operation named after the operation in lower_snake_case>
|       ├── paginator
|       |   ├── `${operation}Paginator`
|       |   └── `${operation}PaginatorItems`
|       ├── builders
|       |   ├── `${operation}FluentBuilder`
|       |   ├── `${operation}InputBuilder`
|       |   └── `${operation}OutputBuilder`
|       ├── `${operation}Error`
|       ├── `${operation}Input`
|       ├── `${operation}Output`
|       └── `${operation}Parser` (private/doc hidden)
├── presigning
|   ├── PresigningConfigBuilder
|   ├── PresigningConfigError
|   ├── PresigningConfig
|   └── PresignedRequest
├── primitives
|   ├── AggregatedBytes (*)
|   ├── Blob (*)
|   ├── ByteStream (*)
|   └── DateTime (*)
├── types
|   ├── error
|   |   ├── builders
|   |   |   └── <One struct per error named `${error}Builder`>
|   |   └── <One struct per error named `${error}`>
|   ├── builders
|   |   └── <One struct per shape named `${shape}Builder`>
|   └── <One struct per shape>
├── Client
├── Config
└── Error

(*) - signifies that a type is re-exported from one of the runtime crates

Changes Checklist

  • Move crate::AppName into crate::config
  • Move crate::PKG_VERSION into a new crate::meta module
  • Move crate::Endpoint into crate::config
  • Move crate::Credentials into crate::config
  • Move crate::Region into crate::config
  • Move crate::operation::customize into crate::client
  • Finish refactor to decouple client/server modules
  • Organize code generated types by operation
  • Reorganize builders
  • Rename crate::types to crate::primitives
  • Rename crate::model to crate::types
  • Move crate::error into crate::types
  • Only re-export aws_smithy_client::client::Builder for non-SDK clients (remove from SDK clients)
  • Move crate::ErrorExt into crate::error
  • Re-export aws_smithy_types::error::display::DisplayErrorContext and aws_smithy_http::result::SdkError in crate::error
  • Move crate::paginator into crate::operation
  • Flatten crate::presigning
  • Hide or remove crate::lens and crate::http_body_checksum
  • Move fluent builders into crate::operation::x::builders
  • Remove/hide operation ParseResponse implementations in crate::operation
  • Update "Crate Organization" top-level section in generated crate docs
  • Update all module docs
  • Break up modules/files so that they're not 30k lines of code
    • models/types; each struct/enum should probably get its own file with pub-use
    • models/types::builders: now this needs to get split up
    • client.rs
  • Fix examples
  • Write changelog

RFC: Endpoints 2.0

Status: RFC

For a summarized list of proposed changes, see the Changes Checklist section.

This RFC defines how the Rust SDK will integrate with the next generation of endpoint resolution logic (Endpoints 2.0). Endpoints 2.0 defines a rules language for resolving endpoints. The Rust SDK will code-generate Rust code from this intermediate language and use this to create service-specific endpoint resolvers.

Endpoints 2.0 will be a core feature and be available for generic clients as well as the AWS SDK.

Terminology

  • Generic client: In reference to features/code that is not AWS specific and is supported for all Smithy clients.
  • Rules language: A JSON-based rules language used to resolve endpoints
  • Smithy Endpoint: An endpoint, as returned from the rules-language. This contains a URI, headers, and configuration map of String -> Document (properties). This must undergo another level of transformation before it can be used as an AwsEndpoint.
  • AWS Endpoint: An endpoint with explicit signing configuration applied. AWS Endpoints need to contain region & service metadata to control signing.
  • Middleware: A transformation applied to a request, prior to request dispatch
  • Endpoint Parameters: A code-generated structure for each service which contains service-specific (and general) endpoint parameters.

The user experience if this RFC is implemented

Overview

SDKs will generate a new, public, endpoint module. The module will contain a Params structure and a DefaultResolver. Supporting these modules, a private endpoints_impl module will be generated.

Why generate two modules?

Generating two separate modules, endpoint and endpoint_impl ensures that we don't have namespace collisions between hand-written and generated code.

SDK middleware will be updated to use the new smithy_types::Endpoint. During request construction in make_operation, a smithy endpoint will be inserted into the property bag. The endpoint middleware will be updated to extract the Smithy endpoint from the property bag and set the request endpoint & signing information accordingly (see: Converting to AWS Endpoint.

The following flow chart traces the endpoints 2.0 influence on a request via the green boxes.

flowchart TD
    globalConfig("SDK global configuration (e.g. region provider, UseFIPS, etc.)")

    serviceConfig("Modeled, service specific configuration information (clientContextParams)")

    operationConfig("Operation-specific configuration (S3 Bucket, accountId, etc.)")

    getObject["S3::GetObject"]

    params["Create endpoint parameters"]

    evaluate["Evaluate ruleset"]

    rules["Generated Endpoint Ruleset for S3"]

    middleware["Apply endpoint & properties to request via endpoint middleware"]



    style getObject fill:green,stroke:#333,stroke-width:4px
    style params fill:green,stroke:#333,stroke-width:4px
    style evaluate fill:green,stroke:#333,stroke-width:4px
    style middleware fill:green,stroke:#333,stroke-width:4px

    getObject ==> params
    globalConfig ---> params
    operationConfig --> params
    serviceConfig ---> params

    rules --> evaluate
    params --> evaluate
    evaluate --> middleware

Overriding Endpoints

In the general case, users will not be impacted by Endpoints 2.0 with one exception: today, users can provide a global endpoint provider that can override different services. There is a single ResolveAwsEndpoint trait that is shared across all services. However, this isn't the case for Endpoints 2.0 where the trait actually has a generic parameter:

pub trait ResolveEndpoint<T>: Send + Sync {
    fn resolve_endpoint(&self, params: &T) -> Result<Endpoint, BoxError>;
}

The trait itself would then be parameterized by service-specific endpoint parameter, eg: aws_sdk_s3::endpoint::Params. The endpoint parameters we would use for S3 (e.g. including Bucket) are different from the endpoint parameters we might use for a service like DynamoDB which, today, doesn't have any custom endpoint behavior.

Going forward we will to provide two different avenues for customers to customize endpoints:

  1. Configuration driven URL override. This mechanism hasn't been specified, but suppose that the Rust SDK supported an SDK_ENDPOINT environment variable. This variable would be an input to the existing endpoint resolver. machinery and would be backwards compatible with other SDKs (e.g. by prefixing the bucket as a host label for S3).
  2. Wholesale endpoint resolver override. In this case, customers would gain access to all endpoint parameters and be able to write their own resolver.

This RFC proposes making the following changes:

  1. For the current global ability to override an endpoint, instead of accepting an AwsEndpoint, accept a URI. This will simplify the interface for most customers who don't actually need logic-driven endpoint construction. The Endpoint that can be set will be passed in as the SDK::Endpoint built-in. This will be renamed to endpoint_url for clarity. All AWS services MUST accept the SDK::Endpoint built-in.
  2. For complex, service-specific behavior, customers will be able to provide a service specific endpoint resolver at client construction time. This resolver will be parameterized with the service-specific parameters type, ( eg. aws_sdk_s3::endpoint::Params). Finally, customers will be able to access the default_resolver() for AWS services directly. This will enable them to utilize the default S3 endpoint resolver in their resolver implementation.

Example: overriding the endpoint URI globally

async fn main() {
    let sdk_conf = aws_config::from_env().endpoint_url("http://localhost:8123").load().await;
    let dynamo = aws_sdk_dynamodb::Client::new(&sdk_conf);
    // snip ...
}

Example: overriding the endpoint resolver for a service

/// Resolve to Localhost when an environment variable is set
struct CustomDdbResolver;

impl ResolveEndpoint<aws_sdk_dynamodb::endpoint::Params> for CustomDdbResolver {
    fn resolve_endpoint(&self, params: &Params) -> Result<Endpoint, EndpointResolutionError> {
        // custom resolver to redirect to DDB local if a flag is set
        let base_endpoint = aws_sdk_dynamodb::endpoint::default_resolver().resolve_endpoint(params).expect("valid endpoint should be resolved");
        if env::var("LOCAL") == Ok("true") {
            // update the URI on the returned endpoint to localhost while preserving the other properties
            Ok(base_endpoint.builder().uri("http://localhost:8888").build())
        } else {
            Ok(base_endpoint)
        }
    }
}

async fn main() {
    let conf = aws_config::load_from_env().await;
    let ddb_conf = aws_sdk_dynamodb::config::Builder::from(&conf).endpoint_resolver(CustomDdbResolver);
    let dynamodb = aws_sdk_dynamodb::Client::from_conf(ddb_conf);
}

Note: for generic clients, they cannot use endpoint_url—this is because endpoint_url is dependent on rules and generic clients do not necessarily rules. However, they can use the impl<T> ResolveEndpoint<T> for &'static str { ... } implementation.

What about alternative S3 implementations? How do we say "don't put prefix bucket on this?"

For cases where users want to use the provided URL directly with no modification users will need to rely on service specific configuration, like forcing path style addressing for S3.

Alternative Design: Context Aware Endpoint Trait

Optional addition: We could add an additional EndpointResolver parameter to SdkConfig that exposed a global trait where Params is &dyn Any similar to Context Aware Endpoint Trait. If these were both set, a runtime panic would alert users to the misconfiguration.

New Endpoint Traits

The new endpoint resolution trait and Endpoint struct will be available for generic clients. AWS endpoint middleware will pull the Endpoint out of the property bag and read the properties to determine auth/signing + any other AWS metadata that may be required.

An example of the Endpoint struct is below. This struct will be in aws-smithy-types, however, it should initially be gated with documentation warning about stability.

The Endpoint Struct

// module: `aws_smithy_types::endpoint`
// potential optimization to reduce / remove allocations for keys which are almost always static
// this can also just be `String`
type MaybeStatic<T> = Cow<'static, T>;

/// Endpoint
#[derive(Debug, PartialEq)]
pub struct Endpoint {
    // Note that this allows `Endpoint` to contain an invalid URI. During conversion to an actual endpoint, the
    // the middleware can fail, returning a `ConstructionFailure` to the user
    url: MaybeStatic<str>,
    headers: HashMap<MaybeStatic<str>, Vec<MaybeStatic<str>>>,
    properties: HashMap<MaybeStatic<str>, aws_smithy_types::Document>,
}

// not shown:
// - impl block with standard accessors
// - builder, designed to be invoked / used by generated code

What's an Endpoint property?

Endpoint properties, on their own, have no intrinsic meaning. Endpoint properties have established conventions for AWS SDKs. Other Smithy implementors may choose a different pattern. For AWS SDKs, the authSchemes key is an ordered list of authentication/signing schemes supported by the Endpoint that the SDK should use.

To perform produce an Endpoint struct we have a generic ResolveEndpoint trait which will be both generic in terms of parameters and being "smithy-generic:

// module: `smithy_types::endpoint` or `aws_smithy_client`??
pub trait ResolveEndpoint<Params>: Send + Sync {
    /// Resolves an `Endpoint` for `Params`
    fn resolve_endpoint(&self, params: &Params) -> Result<aws_smithy_types::Endpoint, EndpointResolutionError>;
}

All Smithy services that have the @endpointRuleSet trait applied to the service shape will code generate a default endpoint resolver implementation. The default endpoint resolver MUST be public, so that customers can delegate to it if they wish to override the endpoint resolver.

Endpoint Params

We've mentioned "service specific endpoint parameters" a few times. In Endpoints 2.0, we will code generate Endpoint Parameters for every service based on their rules. Note: the endpoint parameters themselves are generated solely from the ruleset. The Smithy model provides additional information about parameter binding, but that only influences how the parameters are set, not how they are generated.

Example Params struct for S3:

#[non_exhaustive]
#[derive(std::clone::Clone, std::cmp::PartialEq, std::fmt::Debug)]
/// Configuration parameters for resolving the correct endpoint
pub struct Params {
    pub(crate) bucket: std::option::Option<std::string::String>,
    pub(crate) region: std::option::Option<std::string::String>,
    pub(crate) use_fips: bool,
    pub(crate) use_dual_stack: bool,
    pub(crate) endpoint: std::option::Option<std::string::String>,
    pub(crate) force_path_style: std::option::Option<bool>,
    pub(crate) accelerate: bool,
    pub(crate) disable_access_points: std::option::Option<bool>,
    pub(crate) disable_mrap: std::option::Option<bool>,
}

impl Params {
    /// Create a builder for [`Params`]
    pub fn builder() -> crate::endpoint_resolver::Builder {
        crate::endpoint_resolver::Builder::default()
    }
    /// Gets the value for bucket
    pub fn bucket(&self) -> std::option::Option<&str> {
        self.bucket.as_deref()
    }
    /// Gets the value for region
    pub fn region(&self) -> std::option::Option<&str> {
        self.region.as_deref()
    }
    /// Gets the value for use_fips
    pub fn use_fips(&self) -> std::option::Option<bool> {
        Some(self.use_fips)
    }
    /// Gets the value for use_dual_stack
    pub fn use_dual_stack(&self) -> std::option::Option<bool> {
        Some(self.use_dual_stack)
    }
    // ... more accessors
}

The default endpoint resolver

When an endpoint ruleset is present, Smithy will code generate an endpoint resolver from that ruleset. The endpoint resolver MUST be a struct so that it can store/cache computations (such as a partition resolver that has compiled regexes).

pub struct DefaultEndpointResolver {
    partition_resolver: PartitionResolver
}

impl ResolveEndpoint<crate::endpoint::Params> for DefaultEndpointResolver {
    fn resolve_endpoint(&self, params: &Params) -> Result<aws_smithy_types::Endpoint, EndpointResolutionError> {
        // delegate to private impl
        crate::endpoints_impl::resolve_endpoint(params)
    }
}

DefaultEndpointResolver MUST be publicly accessible and offer both a default constructor and the ability to configure resolution behavior (e.g. by supporting adding additional partitions.)

How to actually implement this RFC

To describe how this feature will work, let's take a step-by-step path through endpoint resolution.

  1. A user defines a service client, possibly with some client specific configuration like region.

    @clientContextParams are code generated onto the client Config . Code generating @clientContextParams

  2. A user invokes an operation like s3::GetObject. A params object is created. In the body of make_operation(), this is passed to config.endpoint_resolver to load a generic endpoint. The Result of the of the endpoint resolution is written into the property bag.

  3. The generic smithy middleware (SmithyEndpointStage) sets the request endpoint.

  4. The AWS auth middleware (AwsAuthStage) reads the endpoint out of the property bag and applies signing overrides.

  5. The request is signed & dispatched

The other major piece of implementation required is actually implementing the rules engine. To learn more about rules-engine internals, skip to implementing the rules engine.

Code generating client context params

When a smithy model uses the @clientContextParams trait, we need to generate client params onto the Rust SDK. This is a Smithy-native feature. This should be implemented as a "standard" config decorator that reads traits from the current model.

Kotlin Snippet for Client context params
class ClientContextDecorator(ctx: ClientCodegenContext) : NamedSectionGenerator<ServiceConfig>() {
    private val contextParams = ctx.serviceShape.getTrait<ClientContextParamsTrait>()?.parameters.orEmpty().toList()
        .map { (key, value) -> ContextParam.fromClientParam(key, value, ctx.symbolProvider) }

    data class ContextParam(val name: String, val type: Symbol, val docs: String?) {
        companion object {
            private fun toSymbol(shapeType: ShapeType, symbolProvider: RustSymbolProvider): Symbol =
                symbolProvider.toSymbol(
                    when (shapeType) {
                        ShapeType.STRING -> StringShape.builder().id("smithy.api#String").build()
                        ShapeType.BOOLEAN -> BooleanShape.builder().id("smithy.api#Boolean").build()
                        else -> TODO("unsupported type")
                    }
                )

            fun fromClientParam(
                name: String,
                definition: ClientContextParamDefinition,
                symbolProvider: RustSymbolProvider
            ): ContextParam {
                return ContextParam(
                    RustReservedWords.escapeIfNeeded(name.toSnakeCase()),
                    toSymbol(definition.type, symbolProvider),
                    definition.documentation.orNull()
                )
            }
        }
    }

    override fun section(section: ServiceConfig): Writable {
        return when (section) {
            is ServiceConfig.ConfigStruct -> writable {
                contextParams.forEach { param ->
                    rust("pub (crate) ${param.name}: #T,", param.type.makeOptional())
                }
            }
            ServiceConfig.ConfigImpl -> emptySection
            ServiceConfig.BuilderStruct -> writable {
                contextParams.forEach { param ->
                    rust("${param.name}: #T,", param.type.makeOptional())
                }
            }
            ServiceConfig.BuilderImpl -> writable {
                contextParams.forEach { param ->
                    param.docs?.also { docs(it) }
                    rust(
                        """
                        pub fn ${param.name}(mut self, ${param.name}: #T) -> Self {
                            self.${param.name} = Some(${param.name});
                            self
                        }
                        """,
                        param.type
                    )
                }
            }
            ServiceConfig.BuilderBuild -> writable {
                contextParams.forEach { param ->
                    rust("${param.name}: self.${param.name},")
                }
            }
            else -> emptySection
        }
    }
}

Creating Params

Params will be created and utilized in generic code generation.

make_operation() needs to load the parameters from several configuration sources. These sources have a priority order. To handle this priority order, we will load from all sources in reverse priority order, with lower priority sources overriding higher priority ones.

Implementation of operation decorator
class EndpointParamsDecorator(
    private val ctx: ClientCodegenContext,
    private val operationShape: OperationShape,
) : OperationCustomization() {
    val idx = ContextIndex.of(ctx.model)
    private val ruleset = EndpointRuleset.fromNode(ctx.serviceShape.expectTrait<EndpointRuleSetTrait>().ruleSet)

    override fun section(section: OperationSection): Writable {
        return when (section) {
            is OperationSection.MutateInput -> writable {
                rustTemplate(
                    """
                    let params = #{Params}::builder()
                        #{builder:W}.expect("invalid endpoint");
                    """,
                    "Params" to EndpointParamsGenerator(ruleset).paramsStruct(),
                    "builder" to builderFields(section)
                )
            }
            is OperationSection.MutateRequest -> writable {
                rust("// ${section.request}.properties_mut().insert(params);")
            }
            else -> emptySection
        }
    }

    private fun builderFields(section: OperationSection.MutateInput) = writable {
        val memberParams = idx.getContextParams(operationShape)
        val builtInParams = ruleset.parameters.toList().filter { it.isBuiltIn }
        // first load builtins and their defaults
        builtInParams.forEach { param ->
            val defaultProviders = section.endpointCustomizations.mapNotNull { it.defaultFor(param, section.config) }
            if (defaultProviders.size > 1) {
                error("Multiple providers provided a value for the builtin $param")
            }
            defaultProviders.firstOrNull()?.also { defaultValue ->
                rust(".set_${param.name.rustName()}(#W)", defaultValue)
            }
        }
        // these can be overridden with client context params
        idx.getClientContextParams(ctx.serviceShape).forEach { (name, _param) ->
            rust(".set_${name.toSnakeCase()}(${section.config}.${name.toSnakeCase()}.as_ref())")
        }

        // lastly, allow these to be overridden by members
        memberParams.forEach { (memberShape, param) ->
            rust(".set_${param.name.toSnakeCase()}(${section.input}.${ctx.symbolProvider.toMemberName(memberShape)}.as_ref())")
        }
        rust(".build()")
    }
}

Loading values for builtIns

The fundamental point of builtIn values is enabling other code generators to define where these values come from. Because of that, we will need to expose the ability to customize AwsBuiltIns. One way to do this is with a new customization type, EndpointCustomization:

fun endpointCustomizations(
    clientCodegenContext: C,
    operation: OperationShape,
    baseCustomizations: List<EndpointCustomization>
): List<EndpointCustomization> = baseCustomizations


abstract class EndpointCustomization {
    abstract fun defaultFor(parameter: Parameter, config: String): Writable?
}

Customizations have the ability to specify the default value for a parameter. (Of course, these customizations need to be wired in properly.)

Converting a Smithy Endpoint to an AWS Endpoint

A Smithy endpoint has an untyped, string->Document collection of properties. We need to interpret these properties to handle actually resolving an endpoint. As part of the AwsAuthStage, we load authentication schemes from the endpoint properties and use these to configure signing on the request.

Note: Authentication schemes are NOT required as part of an endpoint. When the auth schemes are not set, the default authentication should be used. The Rust SDK will set SigningRegion and SigningName in the property bag by default as part of make_operation.

Implementing the rules engine

The Rust SDK code converts the rules into Rust code that will be compiled.

Changes checklist

Rules Engine

  • Endpoint rules code generator
  • Endpoint params code generator
  • Endpoint tests code generator
  • Implement ruleset standard library functions as inlineables. Note: pending future refactoring work, the aws. functions will need to be integrated into the smithy core endpoint resolver.
  • Implement partition function & ability to customize partitions SDK Integration
  • Add a Smithy endpoint resolver to the service config, with a default that loads the default endpoint resolver.
  • Update SdkConfig to accept a URI instead of an implementation of ResolveAwsEndpoint. This change can be done standalone.
  • Remove/deprecate the ResolveAwsEndpoint trait and replace it with the vanilla Smithy trait. Potentially, provide a bridge.
  • Update make_operation to write a smithy::Endpoint into the property bag
  • Update AWS Endpoint middleware to work off of a smithy::Endpoint
  • Wire the endpoint override to the SDK::Endpoint builtIn parameter
  • Remove the old smithy endpoint

Alternative Designs

Context Aware Endpoint Traits

An alternative design that could provide more flexibility is a context-aware endpoint trait where the return type would give context about the endpoint being returned. This would, for example, allow a customer to say explicitly "don't modify this endpoint":

enum ContextualEndpoint {
    /// Just the URI please. Pass it into the default endpoint resolver as a baseline
    Uri { uri: Uri, immutable: bool },

    /// A fully resolved, ready to rumble endpoint. Don't bother hitting the default endpoint resolver, just use what
    /// I've got.
    AwsEndpoint(AwsEndpoint)
}

trait ResolveGlobalEndpoint {
    fn resolve_endpoint(params: &dyn Any) -> Result<ContextualEndpoint, EndpointResolutionError>;
}

Service clients would then use ResolveGlobalEndpoint, optional specified from SdkConfig to perform routing decisions.

RFC: SDK Credential Cache Type Safety

Status: Implemented in smithy-rs#2122

Applies to: AWS SDK for Rust

At time of writing (2022-10-11), the SDK's credentials provider can be customized by providing:

  1. A profile credentials file to modify the default provider chain
  2. An instance of one of the credentials providers implemented in aws-config, such as the AssumeRoleCredentialsProvider, ImdsCredentialsProvider, and so on.
  3. A custom struct that implements the ProvideCredentials

The problem this RFC examines is that when options 2 and 3 above are exercised, the customer needs to be aware of credentials caching and put additional effort to ensure caching is set up correctly (and that double caching doesn't occur). This is especially difficult to get right since some built-in credentials providers (such as AssumeRoleCredentialsProvider) already have caching, while most others do not and need to be wrapped in LazyCachingCredentialsProvider.

The goal of this RFC is to create an API where Rust's type system ensures caching is set up correctly, or explicitly opted out of.

CredentialsCache and ConfigLoader::credentials_cache

A new config method named credentials_cache() will be added to ConfigLoader and the generated service Config builders that takes a CredentialsCache instance. This CredentialsCache will be a struct with several functions on it to create and configure the cache.

Client creation will ultimately be responsible for taking this CredentialsCache instance and wrapping the given (or default) credentials provider.

The CredentialsCache would look as follows:

enum Inner {
    Lazy(LazyConfig),
    // Eager doesn't exist today, so this is purely for illustration
    Eager(EagerConfig),
    // Custom may not be implemented right away
    // Not naming or specifying the custom cache trait for now since its out of scope
    Custom(Box<dyn SomeCacheTrait>),

    NoCaching,
}
pub struct CredentialsCache {
    inner: Inner,
}

impl CredentialsCache {
    // These methods use default cache settings
    pub fn lazy() -> Self { /* ... */ }
    pub fn eager() -> Self { /* ... */ }

    // Unprefixed methods return a builder that can take customizations
    pub fn lazy_builder() -> LazyBuilder { /* ... */ }
    pub fn eager_builder() -> EagerBuilder { /* ... */ }

    // Later, when custom implementations are supported
    pub fn custom(cache_impl: Box<dyn SomeCacheTrait>) -> Self { /* ... */ }

    pub(crate) fn create_cache(
        self,
        provider: Box<dyn ProvideCredentials>,
        sleep_impl: Arc<dyn AsyncSleep>
    ) -> SharedCredentialsProvider {
        // Note: SharedCredentialsProvider would get renamed to SharedCredentialsCache.
        // This code is using the old name to make it clearer that it already exists,
        // and the rename is called out in the change checklist.
        SharedCredentialsProvider::new(
            match self {
                Self::Lazy(inner) => LazyCachingCredentialsProvider::new(provider, settings.time, /* ... */),
                Self::Eager(_inner) => unimplemented!(),
                Self::Custom(_custom) => unimplemented!(),
                Self::NoCaching => unimplemented!(),
            }
        )
    }
}

Using a struct over a trait prevents custom caching implementations, but if customization is desired, a Custom variant could be added to the inner enum that has its own trait that customers implement.

The SharedCredentialsProvider needs to be updated to take a cache implementation in addition to the impl ProvideCredentials + 'static. A sealed trait could be added to facilitate this.

Customers that don't care about credential caching can configure credential providers without needing to think about it:

let sdk_config = aws_config::from_env()
    .credentials_provider(ImdsCredentialsProvider::builder().build())
    .load()
    .await;

However, if they want to customize the caching, they can do so without modifying the credentials provider at all (in case they want to use the default):

let sdk_config = aws_config::from_env()
    .credentials_cache(CredentialsCache::default_eager())
    .load()
    .await;

The credentials_cache will default to CredentialsCache::default_lazy() if not provided.

Changes Checklist

  • Remove cache from AssumeRoleProvider
  • Implement CredentialsCache with its Lazy variant and builder
  • Add credentials_cache method to ConfigLoader
  • Refactor ConfigLoader to take CredentialsCache instead of impl ProvideCredentials + 'static
  • Refactor SharedCredentialsProvider to take a cache implementation in addition to an impl ProvideCredentials + 'static
  • Remove ProvideCredentials impl from LazyCachingCredentialsProvider
  • Rename LazyCachingCredentialsProvider -> LazyCredentialsCache
  • Refactor the SDK Config code generator to be consistent with ConfigLoader
  • Write changelog upgrade instructions
  • Fix examples (if there are any for configuring caching)

Appendix: Alternatives Considered

Alternative A: ProvideCachedCredentials trait

In this alternative, aws-types has a ProvideCachedCredentials in addition to ProvideCredentials. All individual credential providers (such as ImdsCredentialsProvider) implement ProvideCredentials, while credential caches (such as LazyCachingCredentialsProvider) implement the ProvideCachedCredentials. The ConfigLoader would only take impl ProvideCachedCredentials.

This allows customers to provide their own caching solution by implementing ProvideCachedCredentials, while requiring that caching be done correctly through the type system since ProvideCredentials is only useful inside the implementation of ProvideCachedCredentials.

Caching can be opted out by creating a NoCacheCredentialsProvider that implements ProvideCachedCredentials without any caching logic, although this wouldn't be recommended and this provider wouldn't be vended in aws-config.

Example configuration:

// Compiles
let sdk_config = aws_config::from_env()
    .credentials(
        LazyCachingCredentialsProvider::builder()
            .load(ImdsCredentialsProvider::new())
            .build()
    )
    .load()
    .await;

// Doesn't compile
let sdk_config = aws_config::from_env()
    // Wrong type: doesn't implement `ProvideCachedCredentials`
    .credentials(ImdsCredentialsProvider::new())
    .load()
    .await;

Another method could be added to ConfigLoader that makes it easier to use the default cache:

let sdk_config = aws_config::from_env()
    .credentials_with_default_cache(ImdsCredentialsProvider::new())
    .load()
    .await;

Pros/cons

  • :+1: It's flexible, and somewhat enforces correct cache setup through types.
  • :+1: Removes the possibility of double caching since the cache implementations won't implement ProvideCredentials.
  • :-1: Customers may unintentionally implement ProvideCachedCredentials instead of ProvideCredentials for a custom provider, and then not realize they're not benefiting from caching.
  • :-1: The documentation needs to make it very clear what the differences are between ProvideCredentials and ProvideCachedCredentials since they will look identical.
  • :-1: It's possible to implement both ProvideCachedCredentials and ProvideCredentials, which breaks the type safety goals.

Alternative B: CacheCredentials trait

This alternative is similar to alternative A, except that the cache trait is distinct from ProvideCredentials so that it's more apparent when mistakenly implementing the wrong trait for a custom credentials provider.

A CacheCredentials trait would be added that looks as follows:

pub trait CacheCredentials: Send + Sync + Debug {
    async fn cached(&self, now: SystemTime) -> Result<Credentials, CredentialsError>;
}

Instances implementing CacheCredentials need to own the ProvideCredentials implementation to make both lazy and eager credentials caching possible.

The configuration examples look identical to Option A.

Pros/cons

  • :+1: It's flexible, and enforces correct cache setup through types slightly better than Option A.
  • :+1: Removes the possibility of double caching since the cache implementations won't implement ProvideCredentials.
  • :-1: Customers can still unintentionally implement the wrong trait and miss out on caching when creating custom credentials providers, but it will be more apparent than in Option A.
  • :-1: It's possible to implement both CacheCredentials and ProvideCredentials, which breaks the type safety goals.

Alternative C: CredentialsCache struct with composition

The struct approach posits that customers don't need or want to implement custom credential caching, but at the same time, doesn't make it impossible to add custom caching later.

The idea is that there would be a struct called CredentialsCache that specifies the desired caching approach for a given credentials provider:

pub struct LazyCache {
    credentials_provider: Arc<dyn ProvideCredentials>,
    // ...
}

pub struct EagerCache {
    credentials_provider: Arc<dyn ProvideCredentials>,
    // ...
}

pub struct CustomCache {
    credentials_provider: Arc<dyn ProvideCredentials>,
    // Not naming or specifying the custom cache trait for now since its out of scope
    cache: Arc<dyn SomeCacheTrait>
}

enum CredentialsCacheInner {
    Lazy(LazyCache),
    // Eager doesn't exist today, so this is purely for illustration
    Eager(EagerCache),
    // Custom may not be implemented right away
    Custom(CustomCache),
}

pub struct CredentialsCache {
    inner: CredentialsCacheInner,
}

impl CredentialsCache {
    // Methods prefixed with `default_` just use the default cache settings
    pub fn default_lazy(provider: impl ProvideCredentials + 'static) -> Self { /* ... */ }
    pub fn default_eager(provider: impl ProvideCredentials + 'static) -> Self { /* ... */ }

    // Unprefixed methods return a builder that can take customizations
    pub fn lazy(provider: impl ProvideCredentials + 'static) -> LazyBuilder { /* ... */ }
    pub fn eager(provider: impl ProvideCredentials + 'static) -> EagerBuilder { /* ... */ }

    pub(crate) fn create_cache(
        self,
        sleep_impl: Arc<dyn AsyncSleep>
    ) -> SharedCredentialsProvider {
        // ^ Note: SharedCredentialsProvider would get renamed to SharedCredentialsCache.
        // This code is using the old name to make it clearer that it already exists,
        // and the rename is called out in the change checklist.
        SharedCredentialsProvider::new(
            match self {
                Self::Lazy(inner) => LazyCachingCredentialsProvider::new(inner.credentials_provider, settings.time, /* ... */),
                Self::Eager(_inner) => unimplemented!(),
                Self::Custom(_custom) => unimplemented!(),
            }
        )
    }
}

Using a struct over a trait prevents custom caching implementations, but if customization is desired, a Custom variant could be added to the inner enum that has its own trait that customers implement.

The SharedCredentialsProvider needs to be updated to take a cache implementation rather than impl ProvideCredentials + 'static. A sealed trait could be added to facilitate this.

Configuration would look as follows:

let sdk_config = aws_config::from_env()
    .credentials(CredentialsCache::default_lazy(ImdsCredentialsProvider::builder().build()))
    .load()
    .await;

The credentials_provider method on ConfigLoader would only take CredentialsCache as an argument so that the SDK could not be configured without credentials caching, or if opting out of caching becomes a use case, then a CredentialsCache::NoCache variant could be made.

Like alternative A, a convenience method can be added to make using the default cache easier:

let sdk_config = aws_config::from_env()
    .credentials_with_default_cache(ImdsCredentialsProvider::builder().build())
    .load()
    .await;

In the future if custom caching is added, it would look as follows:

let sdk_config = aws_config::from_env()
    .credentials(
        CredentialsCache::custom(ImdsCredentialsProvider::builder().build(), MyCache::new())
    )
    .load()
    .await;

The ConfigLoader wouldn't be able to immediately set its credentials provider since other values from the config are needed to construct the cache (such as sleep_impl). Thus, the credentials setter would merely save off the CredentialsCache instance, and then when load is called, the complete SharedCredentialsProvider would be constructed:

pub async fn load(self) -> SdkConfig {
    // ...
    let credentials_provider = self.credentials_cache.create_cache(sleep_impl);
    // ...
}

Pros/cons

  • :+1: Removes the possibility of missing out on caching when implementing a custom provider.
  • :+1: Removes the possibility of double caching since the cache implementations won't implement ProvideCredentials.
  • :-1: Requires thinking about caching when only wanting to customize the credentials provider
  • :-1: Requires a lot of boilerplate in aws-config for the builders, enum variant structs, etc.

RFC: Finding New Home for Credential Types

Status: Implemented in smithy-rs#2108

Applies to: clients

This RFC supplements RFC 28 and discusses for the selected design where to place the types for credentials providers, credentials caching, and everything else that comes with them.

It is assumed that the primary motivation behind the introduction of type safe credentials caching remains the same as the preceding RFC.

Assumptions

This document assumes that the following items in the changes checklist in the preceding RFC have been implemented:

  • Implement CredentialsCache with its Lazy variant and builder
  • Add the credentials_cache method to ConfigLoader
  • Rename SharedCredentialsProvider to SharedCredentialsCache
  • Remove ProvideCredentials impl from LazyCachingCredentialsProvider
  • Rename LazyCachingCredentialsProvider -> LazyCredentialsCache
  • Refactor the SDK Config code generator to be consistent with ConfigLoader

Problems

Here is how our attempt to implement the selected design in the preceding RFC can lead to an obstacle. Consider this code snippet we are planning to support:

let sdk_config = aws_config::from_env()
    .credentials_cache(CredentialsCache::lazy())
    .load()
    .await;

let client = aws_sdk_s3::Client::new(&sdk_config);

A CredentialsCache created by CredentialsCache::lazy() above will internally go through three crates before the variable client has been created:

  1. aws-config: after it has been passed to aws_config::ConfigLoader::credentials_cache
// in lib.rs

impl ConfigLoader {
    // --snip--
    pub fn credentials_cache(mut self, credentials_cache: CredentialsCache) -> Self {
        self.credentials_cache = Some(credentials_cache);
        self
    }
    // --snip--
}
  1. aws-types: after aws_config::ConfigLoader::load has passed it to aws_types::sdk_config::Builder::credentials_cache
// in sdk_config.rs

impl Builder {
    // --snip--
    pub fn credentials_cache(mut self, cache: CredentialsCache) -> Self {
        self.set_credentials_cache(Some(cache));
        self
    }
    // --snip--
}
  1. aws-sdk-s3: after aws_sdk_s3::Client::new has been called with the variable sdk_config
// in client.rs

impl Client {
    // --snip--
    pub fn new(sdk_config: &aws_types::sdk_config::SdkConfig) -> Self {
        Self::from_conf(sdk_config.into())
    }
    // --snip--
}

calls

// in config.rs

impl From<&aws_types::sdk_config::SdkConfig> for Builder {
    fn from(input: &aws_types::sdk_config::SdkConfig) -> Self {
        let mut builder = Builder::default();
        builder = builder.region(input.region().cloned());
        builder.set_endpoint_resolver(input.endpoint_resolver().clone());
        builder.set_retry_config(input.retry_config().cloned());
        builder.set_timeout_config(input.timeout_config().cloned());
        builder.set_sleep_impl(input.sleep_impl());
	builder.set_credentials_cache(input.credentials_cache().cloned());
        builder.set_credentials_provider(input.credentials_provider().cloned());
        builder.set_app_name(input.app_name().cloned());
        builder.set_http_connector(input.http_connector().cloned());
        builder
    }
}

impl From<&aws_types::sdk_config::SdkConfig> for Config {
    fn from(sdk_config: &aws_types::sdk_config::SdkConfig) -> Self {
        Builder::from(sdk_config).build()
    }
}

What this all means is that CredentialsCache needs to be accessible from aws-config, aws-types, and aws-sdk-s3 (SDK client crates, to be more generic). We originally assumed that CredentialsCache would be defined in aws-config along with LazyCredentialsCache, but the assumption no longer holds because aws-types and aws-sdk-s3 do not depend upon aws-config.

Therefore, we need to find a new place in which to create credentials caches accessible from the aforementioned crates.

Proposed Solution

We propose to move the following items to a new crate called aws-credential-types:

  • All items in aws_types::credentials and their dependencies
  • All items in aws_config::meta::credentials and their dependencies

For the first bullet point, we move types and traits associated with credentials out of aws-types. Crucially, the ProvideCredentials trait now lives in aws-credential-types.

For the second bullet point, we move the items related to credentials caching. CredentialsCache with its Lazy variant and builder lives in aws-credential-types and CredentialsCache::create_cache will be marked as pub. One area where we make an adjustment, though, is that LazyCredentialsCache depends on aws_types::os_shim_internal::TimeSource so we need to move TimeSource into aws-credentials-types as well.

A result of the above arrangement will give us the following module dependencies (only showing what's relevant):

Selected design

  • :+1: aws_types::sdk_config::Builder and a service client config::Builder can create a SharedCredentialsCache with a concrete type of credentials cache.
  • :+1: It avoids cyclic crate dependencies.
  • :-1: There is one more AWS runtime crate to maintain and version.

Rejected Alternative

An alternative design is to move the following items to a separate crate (tentatively called aws-XXX):

  • All items in aws_types::sdk_config, i.e. SdkConfig and its builder
  • All items in aws_types::credentials and their dependencies
  • All items in aws_config::meta::credentials and their dependencies

The reason for the first bullet point is that the builder needs to be somewhere it has access to the credentials caching factory function, CredentialsCache::create_cache. The factory function is in aws-XXX and if the builder stayed in aws-types, it would cause a cyclic dependency between those two crates.

A result of the above arrangement will give us the following module dependencies:

Option A

We have dismissed this design mainly because we try moving out of the aws-types create as little as possible. Another downside is that SdkConfig sitting together with the items for credentials provider & caching does not give us a coherent mental model for the aws-XXX crate, making it difficult to choose the right name for XXX.

Changes Checklist

The following list does not repeat what is listed in the preceding RFC but does include those new mentioned in the Assumptions section:

  • Create aws-credential-types
  • Move all items in aws_types::credentials and their dependencies to the aws-credential-types crate
  • Move all items in aws_config::meta::credentials and their dependencies to the aws-credential-types crate
  • Update use statements and fully qualified names in the affected places

RFC: Serialization and Deserialization

Status: RFC

Applies to: Output, Input, and Builder types as well as DateTime, Document, Blob, and Number implemented in aws_smithy_types crate.

Terminology

  • Builder Refers to data types prefixed with Builder, which converts itself into a corresponding data type upon being built. e.g. aws_sdk_dynamodb::input::PutItemInput.
  • serde Refers to serde crate.
  • Serialize Refers to Serialize trait avaialble on serde crate.
  • Deserialize Refers to Deserialize trait available on serde crate.

Overview

We are going to implement Serialize and Deserialize traits from serde crate to some data types. Data types that are going to be affected are;

  • builder data types
  • operation Input types
  • operation Output types
  • data types that builder types may have on their field(s)
  • aws_smithy_types::DateTime
  • aws_smithy_types::Document
  • aws_smithy_types::Blob
  • aws_smithy_types::Number

DateTime and Blob implements different serialization/deserialization format for human-readable and non-human readable format; We must emphasize that these 2 formats are not compatible with each other. The reason for this is explained in the Blob section and Date Time.

Additionally, we add fn set_fields to fluent builders to allow users to set the data they deserialized to fluent builders.

Lastly, we emphasize that this RFC does NOT aim to serialize the entire response or request or implement serde traits on data types for server-side code.

Use Case

Users have requested serde traits to be implemented on data types implemented in rust SDK. We have created this RFC with the following use cases in mind.

  1. [request]: Serialize/Deserialize of models for Lambda events #269
  2. Tests as suggested in the design FAQ.
  3. Building tools

Feature Gate

Enabling Feature

To enable any of the features from this RFC, user must pass --cfg aws-sdk-unstable to rustc.

You can do this by specifying it on env-variable or by config.toml.

  • specifying it on .cargo/config.toml
[build]
rustflags = ["--cfg", "aws-sdk-unstable"]
  • As an environment variable
export RUSTFLAGS="--cfg aws-sdk-unstable"
cargo build

We considered allowing users to enable this feature on a crate-level.

e.g.

[dependencies]
aws_sdk_dynamodb = { version = "0.22.0", features = ["unstable", "serialize"] }

Compared to the cfg approach, it is lot easier for the users to enable this feature. However, we believe that cfg approach ensure users won't enable this feature by surprise, and communicate to users that features behind this feature gate can be taken-away or exprience breaking changes any time in future.

Feature Gate for Serialization and De-serialization

Serde traits are implemented behind feature gates. Serialize is implemented behind serde-serialize, while Deserialize is implemented behind serde-deserialize. Users must enable the unstable feature to expose those features.

We considered giving each feature a dedicated feature gate such as unstable-serde-serialize. In this case, we will need to change the name of feature gates entirely once it leaves the unstable status which will cause users to make changes to their code base. We conclude that this brings no benefit to the users.

Furthermore, we considered naming the fature-gate serialize/deserialize. However, this way it would be confusing for the users when we add support for different serialization/deserialization framework such as deser. Thus, to emphasize that the traits is from serde crate, we decided to name it serde-serialize/serde-deserialize

Keeping both features behind the same feature gate

We considered keeping both features behind the same feature gate. There is no significant difference in the complexity of implementation. We do not see any benefit in keeping them behind the same feature gate as this will only increase compile time when users do not need one of the features.

Different feature gates for different data types

We considered implementing different feature gates for output, input, and their corresponding data types. For example, output and input types can have output-serde-* and input-serde-*. We are unable to do this as relevant metadata is not available during the code-gen.

Implementation

Smithy Types

aws_smithy_types is a crate that implements smithy's data types. These data types must implement serde traits as well since SDK uses the data types.

Blob

Serialize and Deserialize is not implemented with derive macro.

In human-readable format, Blob is serialized as a base64 encoded string and any data to be deserialized as this data type must be encoded in base 64. Encoding must be carried out by base64::encode function available from aws_smithy_types crate. Non-human readable format serializes Blob with fn serialize_bytes.

  • Reason behind the implementation of human-readable format

aws_smithy_types crate comes with functions for encoding/decoding base 64, which makes the implementation simpler. Additionally, AWS CLI and AWS SDK for other languages require data to be encoded in base 64 when it requires Blob type as input.

We also considered serializing them with serialize_bytes, without encoding them with serialize_bytes. In this case, the implementation will depend on the implementation of the library author.

There are many different crates, so we decided to survey how some of the most popular crates implement this feature.

libraryversionimplementationall-time downloads on crate.io as of writing (Dec 2022)
serde_json1.0Array of number109,491,713
toml0.5.9Array of number63,601,994
serde_yaml0.9.14Unsupported23,767,300

First of all, bytes could have hundreds of elements; reading an array of hundreds of numbers will never be a pleasing experience, and it is especially troubling when you are writing data for test cases. Additionally, it has come to our attention that some crates just doesn't support it, which would hinder users' ability to be productive and tie users' hand.

For the reasons described above, we believe that it is crucial to encode them to string and base64 is favourable over other encoding schemes such as base 16, 32, or Ascii85.

  • Reason behind the implementation of a non-human readable format We considered using the same logic for non-human readable format as well. However, readable-ness is not necessary for non-human readable format. Additionally, non-human readable format tends to emphasize resource efficiency over human-readable format; Base64 encoded string would take up more space, which is not what the users would want.

Thus, we believe that implementing a tailored serialization logic would be beneficial to the users.

DateTime

Serialize and Deserialize is not implemented with derive macro. For human-readable format, DateTime is serialized in RFC-3339 format; It expects the value to be in RFC-3339 format when it is Deserialized.

Non-human readable implements DateTime as a tuple of u32 and i64; the latter corresponds to seconds field and the first is the seubsecond_nanos.

  • Reason behind the implementation of a human-readable format

For serialization, DateTime format already implements a function to encode itself into RFC-3339 format. For deserialization, it is possible to accept other formats, we can add this later if we find it reasonable.

  • Reason behind the implementation of a non-human readable format

Serializing them as tuples of two integers results in a smaller data size and requires less computing power than any string-based format. Tuple will be smaller in size as it does not require tagging like in maps.

Document

Serialize and Deserialize is implemented with derive macro. Additionally, it implements container attribute #[serde(untagged)]. Serde can distinguish each variant without tagging thanks to the difference in each variant's datatypes.

Number

Serialize and Deserialize is implemented with derive macro. Additionally, it implements container attribute #[serde(untagged)].

Serde can distinguish each variant without a tag as each variant's content is different.

Builder Types and Non-Builder Types

Builder types and non Builder types implement Serialize and Deserialize with derive macro.

Example:

#[cfg_attr(
    all(aws-sdk-unstable, feature = "serialize"),
    derive(serde::Serialize)
)]
#[cfg_attr(
    all(aws-sdk-unstable, feature = "deserialize"),
    derive(serde::Deserialize)
)]
#[non_exhaustive]
#[derive(std::clone::Clone, std::cmp::PartialEq)]
pub struct UploadPartCopyOutput {
  ...
}

Enum Representation

serde allows programmers to use one of four different tagging (internal, external, adjacent, and untagged) when serializing an enum.

untagged

You cannot deserialize serialized data in some cases. For example, aws_sdk_dynamodb::model::AttributeValue has Null(bool) and Bool(bool), which you cannot distinguish serialized values without a tag.

internal

This results in compile time error. Using a #[serde(tag = "...")] attribute on an enum containing a tuple variant is an error at compile time.

external and adjacent

We are left with external and adjacent tagging. External tagging is the default way. This RFC can be achieved either way.

The resulting size of the serialized data is smaller when tagged externally, as adjacent tagging will require a tag even when a variant has no content.

For the reasons mentioned above, we implement an enum that is externally tagged.

Data Types to Skip Serialization/Deserialization

We are going to skip serialization and deserialization of fields that have the datatype that corresponds to @streaming blob from smithy. Any fields with these data types are tagged with #[serde(skip)].

By skipping, corresponding field's value will be assigned the value generated by Default trait.

As of writing, aws_smithy_http::byte_stream::ByteStream is the only data type that is affected by this decision.

Here is an example of data types affected by this decision:

  • aws_sdk_s3::input::put_object_input::PutObjectInput

We considered serializing them as bytes, however, it could take some time for a stream to reach the end, and the resulting serialized data may be too big for itself to fit into the ram.

Here is an example snippet.

#[allow(missing_docs)]
#[cfg_attr(
    all(aws-sdk-unstable, feature = "serde-serialize"),
    derive(serde::Serialize)
)]
#[cfg_attr(
    all(aws-sdk-unstable, feature = "serde-deserialize"),
    derive(serde::Deserialize)
)]
#[non_exhaustive]
#[derive(std::fmt::Debug)]
pub struct PutObjectInput {
    pub acl: std::option::Option<crate::model::ObjectCannedAcl>,
    pub body: aws_smithy_http::byte_stream::ByteStream,
    // ... other fields
}

Data types to exclude from ser/de code generation

For data types that include @streaming union in any of their fields, we do NOT implement serde traits.

As of writing, following Rust data types corresponds to @streaming union.

  • aws_smithy_http::event_stream::Receiver
  • aws_smithy_http::event_stream::EventStreamSender

Here is an example of data type affected by this decision;

  • aws_sdk_transcribestreaming::client::fluent_builders::StartMedicalStreamTranscription

We considered skipping relevant fields on serialization and creating a custom de-serialization function which creates event stream that will always result in error when a user tries to send/receive data. However, we believe that our decision is justified for following reason.

  • All for operations that feature event streams since the stream is ephemeral (tied to the HTTP connection), and is effectively unusable after serialization and deserialization
  • Most event stream operations don't have fields that go along with them, making the stream the sole component in them, which makes ser/de not so useful
  • SDK that uses event stream, such as aws-sdk-transcribestreaming only has just over 5000 all-time downloads with recent downloads of just under 1000 as of writing (2023/01/21); It makes it difficult to justify since the implementation impacts smaller number of people.

Serde traits implemented on Builder of Output Types

Output data, such as aws_sdk_dynamodb::output::UpdateTableOutput has builder types. These builder types are available to users, however, no API requires users to build data types by themselves.

We considered removing traits from these data types.

Removing serde traits on these types will help reduce compile time, however, builder type can be useful, for example, for testing. We have prepared examples here.

fn set_fields to allow users to use externally created Input

Currently, to set the value to fluent builders, users must call setter methods for each field. SDK does not have a method that allows users to use deserialized Input. Thus, we add a new method fn set_fields to Client types. This method accepts inputs and replaces all parameters that Client has with the new one.

pub fn set_fields(mut self, input_type: path::to::input_type) -> path::to::input_type {
    self.inner = input_type;
    self
}

Users can use fn set_fields to replace the parameters in fluent builders. You can find examples here.

Other Concerns

Model evolution

SDK will introduce new fields and we may see new data types in the future.

We believe that this will not be a problem.

Introduction of New Fields

Most fields are Option<T> type. When the user de-serializes data written for a format before the new fields were introduced, new fields will be assigned with None type.

If a field isn't Option, serde uses Default trait unless a custom de-serialization/serialization is specified to generate data to fill the field. If the new field is not an Option<T> type and has no Default implementation, we must implement a custom de-serialization logic.

In the case of serialization, the introduction of new fields will not be an issue unless the data format requires a schema. (e.g. parquet, avro) However, this is outside the scope of this RFC.

Introduction of New Data Type

If a new field introduces a new data type, it will not require any additional work if the data type can derive serde traits.

If the data cannot derive serde traits on its own, then we have two options. To clarify, this is the same approach we took on Data Type to skip section.

  1. skip We will simply skip serializing/de-serializing. However, we may need to implement custom serialization/de-serialization logic if a value is not wrapped with Option.
  2. custom serialization/de-serialization logic We can implement tailored serialization/de-serialization logic.

Either way, we will mention this on the generated docs to avoid surprising users.

e.g.

#[derive(serde::Serialize, serde::Deserialize)]
struct OutputV1 {
  string_field: Option<String>
}

#[derive(serde::Serialize, serde::Deserialize)]
struct OutputV2 {
  string_field: Option<String>,
  // this will always be treated as None value by serde
  #[serde(skip)]
  skip_not_serializable: Option<SomeComplexDataType>,
  // We can implement a custom serialization logic
  #[serde(serialize_with = "custom_serilization_logic", deserialize_with = "custom_deserilization_logic")]
  not_derive_able: SomeComplexDataType,
  // Serialization will be skipped, and de-serialization will be handled with the function provided on default tag
  #[serde(skip, default = "default_value")]
  skip_with_custom: DataTypeWithoutDefaultTrait,
}

Discussions

Sensitive Information

If serialized data contains sensitive information, it will not be masked. We mention that fields can compromise such information on every struct field to ensure that users know this.

Compile Time

We ran the following benchmark on C6a.2xlarge instance with 50gb of GP2 SSD. The commit hash of the code is a8e2e19129aead4fbc8cf0e3d34df0188a62de9f.

It clearly shows an increase in compile time. Users are advised to consider the use of software such as sccache or mold to reduce the compile time.

  • aws-sdk-dynamodb

    • when compiled with debug profile

      commandreal timeuser timesys time
      cargo build0m35.728s2m24.243s0m11.868s
      cargo build --features unstable-serde-serialize0m38.079s2m26.082s0m11.631s
      cargo build --features unstable-serde-deserialize0m45.689s2m34.000s0m11.978s
      cargo build --all-features0m48.959s2m45.688s0m13.359s
    • when compiled with release profile

      commandreal timeuser timesys time
      cargo build --release0m52.040s5m0.841s0m11.313s
      cargo build --release --features unstable-serde-serialize0m53.153s5m4.069s0m11.577s
      cargo build --release --features unstable-serde-deserialize1m0.107s5m10.231s0m11.699s
      cargo build --release --all-features1m3.198s5m26.076s0m12.311s
  • aws-sdk-ec2

    • when compiled with debug profile

      commandreal timeuser timesys time
      cargo build1m20.041s2m14.592s0m6.611s
      cargo build --features unstable-serde-serialize2m0.555s4m24.881s0m16.131s
      cargo build --features unstable-serde-deserialize3m10.857s5m34.246s0m18.844s
      cargo build --all-features3m31.473s6m1.052s0m19.681s
    • when compiled with release profile

      commandreal timeuser timesys time
      cargo build --release2m29.480s9m19.530s0m15.957s
      cargo build --release --features unstable-serde-serialize2m45.002s9m43.098s0m16.886s
      cargo build --release --features unstable-serde-deserialize3m47.531s10m52.017s0m18.404s
      cargo build --release --all-features3m45.208s8m46.168s0m10.211s

Misleading Results

SDK team previously expressed concern that serialized data may be misleading. We believe that features implemented as part of this RFC do not produce a misleading result as we focus on builder types and it's corresponding data types which are mapped to serde's data type model with the derive macro.

Appendix

Use Case Examples

use aws_sdk_dynamodb::{Client, Error};

async fn example(read_builder: bool) -> Result<(), Error> {
    // getting the client
    let shared_config = aws_config::load_from_env().await;
    let client = Client::new(&shared_config);

    // de-serializing input's builder types and input types from json
    let deserialized_input = if read_builder {
      let mut parameter: aws_sdk_dynamodb::input::list_tables_input::Builder = serde_json::from_str(include_str!("./builder.json"));
      parameter.set_exclusive_start_table_name("some_name").build()
    } else {
      let input: aws_sdk_dynamodb::input::ListTablesInput = serde_json::from_str(include_str!("./input.json"));
      input
    };

    // sending request using the deserialized input
    let res = client.list_tables().set_fields(deserialized_input).send().await?;
    println!("DynamoDB tables: {:?}", res.table_names);

    let out: aws_sdk_dynamodb::output::ListTablesOutput = {
      // say you want some of the field to have certain values
      let mut out_builder: aws_sdk_dynamodb::output::list_tables_output::Builder = serde_json::from_str(r#"
        {
          table_names: [ "table1", "table2" ]
        }
      "#);
      // but you don't really care about some other values
      out_builder.set_last_evaluated_table_name(res.last_evaluated_table_name()).build()
    };
    assert_eq!(res, out);

    // serializing json output
    let json_output = serde_json::to_string(res).unwrap();
    // you can save the serialized input
    println!(json_output);
    Ok(())
}

Changes checklist

  • Implement human-redable serialization for DateTime and Blob in aws_smithy_types
  • Implement non-human-redable serialization for DateTime and Blob in aws_smithy_types
  • Implement Serialize and Deserialize for relevant data types in aws_smithy_types
  • Modify Kotlin's codegen so that generated Builder and non-Builder types implement Serialize and Deserialize
  • Add feature gate for Serialize and Deserialize
  • Prepare examples
  • Prepare reproducible compile time benchmark

RFC: Providing fallback credentials on external timeout

Status: Implemented in smithy-rs#2246

Applies to: client

For a summarized list of proposed changes, see the Changes Checklist section.

This RFC proposes a fallback mechanism for credentials providers on external timeout (see the Terminology section), allowing them to continue serving (possibly expired) credentials for the sake of overall reliability of the intended service; The IMDS credentials provider is an example that must fulfill such a requirement to support static stability.

Terminology

  • External timeout: The name of the timeout that occurs when a duration elapses before an async call to provide_credentials returns. In this case, provide_credentials returns no credentials.
  • Internal timeout: The name of the timeout that occurs when a duration elapses before an async call to some function, inside the implementation of provide_credentials, returns. Examples include connection timeouts, TLS negotiation timeouts, and HTTP request timeouts. Implementations of provide_credentials may handle these failures at their own discretion e.g. by returning (possibly expired) credentials or a CredentialsError.
  • Static stability: Continued availability of a service in the face of impaired dependencies.

Assumption

This RFC is concerned only with external timeouts, as the cost of poor API design is much higher in this case than for internal timeouts. The former will affect a public trait implemented by all credentials providers whereas the latter can be handled locally by individual credentials providers without affecting one another.

Problem

We have mentioned static stability. Supporting it calls for the following functional requirement, among others:

  • REQ 1: Once a credentials provider has served credentials, it should continue serving them in the event of a timeout (whether internal or external) while obtaining refreshed credentials.

Today, we have the following trait method to obtain credentials:

fn provide_credentials<'a>(&'a self) -> future::ProvideCredentials<'a>
where
    Self: 'a,

This method returns a future, which can be raced against a timeout future as demonstrated by the following code snippet from LazyCredentialsCache:

let timeout_future = self.sleeper.sleep(self.load_timeout); // by default self.load_timeout is 5 seconds.
// --snip--
let future = Timeout::new(provider.provide_credentials(), timeout_future);
let result = cache
   .get_or_load(|| async move {
        let credentials = future.await.map_err(|_err| {
            CredentialsError::provider_timed_out(load_timeout)
        })??;
        // --snip--
    }).await;
// --snip--

This creates an external timeout for provide_credentials. If timeout_future wins the race, a future for provide_credentials gets dropped, timeout_future returns an error, and the error is mapped to CredentialsError::ProviderTimedOut and returned. This makes it impossible for the variable provider above to serve credentials as stated in REQ 1.

A more complex use case involves CredentialsProviderChain. It is a manifestation of the chain of responsibility pattern and keeps calling the provide_credentials method on each credentials provider down the chain until credentials are returned by one of them. In addition to REQ 1, we have the following functional requirement with respect to CredentialsProviderChain:

  • REQ 2: Once a credentials provider in the chain has returned credentials, it should continue serving them even in the event of a timeout (whether internal or external) without falling back to another credentials provider.

Referring back to the code snippet above, we analyze two relevant cases (and suppose provider 2 below must meet REQ 1 and REQ 2 in each case):

Case 1: Provider 2 successfully loaded credentials but later failed to do so because an external timeout kicked in.

chain-provider-ext-timeout-1

The figure above illustrates an example. This CredentialsProviderChain consists of three credentials providers. When CredentialsProviderChain::provide_credentials is called, provider 1's provide_credentials is called but does not find credentials so passes the torch to provider 2, which in turn successfully loads credentials and returns them. The next time the method is called, provider 1 does not find credentials but neither does provider 2 this time, because an external timeout by timeout_future given to the whole chain kicked in and the future is dropped while provider 2's provide_credentials was running. Given the functional requirements, provider 2 should return the previously available credentials but today the code snippet from LazyCredentialsCache returns a CredentialsError::ProviderTimedOut instead.

Case 2: Provider 2 successfully loaded credentials but later was not reached because its preceding provider was still running when an external timeout kicked in.

chain-provider-ext-timeout-2

The figure above illustrates an example with the same setting as the previous figure. Again, when CredentialsProviderChain::provide_credentials is called the first time, provider 1 does not find credentials but provider 2 does. The next time the method is called, provider 1 is still executing provide_credentials and then an external timeout by timeout_future kicked in. Consequently, the execution of CredentialsProviderChain::provide_credentials has been terminated. Given the functional requirements, provider 2 should return the previously available credentials but today the code snippet from LazyCredentialsCache returns CredentialsError::ProviderTimedOut instead.

Proposal

To address the problem in the previous section, we propose to add a new method to the ProvideCredentials trait called fallback_on_interrupt. This method allows credentials providers to have a fallback mechanism on an external timeout and to serve credentials to users if needed. There are two options as to how it is implemented, either as a synchronous primitive or as an asynchronous primitive.

Option A: Synchronous primitive

pub trait ProvideCredentials: Send + Sync + std::fmt::Debug {
    // --snip--

    fn fallback_on_interrupt(&self) -> Option<Credentials> {
        None
    }
}
  • :+1: Users can be guided to use only synchronous primitives when implementing fallback_on_interrupt.
  • :-1: It cannot support cases where fallback credentials are asynchronously retrieved.
  • :-1: It may turn into a blocking operation if it takes longer than it should.

Option B: Asynchronous primitive

mod future {
    // --snip--

    // This cannot use `OnlyReady` in place of `BoxFuture` because
    // when a chain of credentials providers implements its own
    // `fallback_on_interrupt`, it needs to await fallback credentials
    // in its inner providers. Thus, `BoxFuture` is required.
    pub struct FallbackOnInterrupt<'a>(NowOrLater<Option<Credentials>, BoxFuture<'a, Option<Credentials>>>);

    // impls for FallbackOnInterrupt similar to those for the ProvideCredentials future newtype
}

pub trait ProvideCredentials: Send + Sync + std::fmt::Debug {
    // --snip--

    fn fallback_on_interrupt<'a>(&'a self) -> future::FallbackOnInterrupt<'a> {
        future::FallbackOnInterrupt::ready(None)
    }
}
  • :+1: It is async from the beginning, so less likely to introduce a breaking change.
  • :-1: We may have to consider yet another timeout for fallback_on_interrupt itself.

Option A cannot be reversible in the future if we are to support the use case for asynchronously retrieving the fallback credentials, whereas option B allows us to continue supporting both ready and pending futures when retrieving the fallback credentials. However, fallback_on_interrupt is supposed to return credentials that have been set aside in case provide_credentials is timed out. To express that intent, we choose option A and document that users should NOT go fetch new credentials in fallback_on_interrupt.

The user experience for the code snippet in question will look like this once this proposal is implemented:

let timeout_future = self.sleeper.sleep(self.load_timeout); // by default self.load_timeout is 5 seconds.
// --snip--
let future = Timeout::new(provider.provide_credentials(), timeout_future);
let result = cache
    .get_or_load(|| {
        async move {
           let credentials = match future.await {
                Ok(creds) => creds?,
                Err(_err) => match provider.fallback_on_interrupt() { // can provide fallback credentials
                    Some(creds) => creds,
                    None => return Err(CredentialsError::provider_timed_out(load_timeout)),
                }
            };
            // --snip--
        }
    }).await;
// --snip--

How to actually implement this RFC

Almost all credentials providers do not have to implement their own fallback_on_interrupt except for CredentialsProviderChain (ImdsCredentialsProvider also needs to implement fallback_on_interrupt when we are adding static stability support to it but that is outside the scope of this RFC).

Considering the two cases we analyzed above, implementing CredentialsProviderChain::fallback_on_interrupt is not so straightforward. Keeping track of whose turn in the chain it is to call provide_credentials when an external timeout has occurred is a challenging task. Even if we figured it out, that would still not satisfy Case 2 above, because it was provider 1 that was actively running when the external timeout kicked in, but the chain should return credentials from provider 2, not from provider 1.

With that in mind, consider instead the following approach:

impl ProvideCredentials for CredentialsProviderChain {
    // --snip--

    fn fallback_on_interrupt(&self) -> Option<Credentials> { {
        for (_, provider) in &self.providers {
            match provider.fallback_on_interrupt() {
                creds @ Some(_) => return creds,
                None => {}
            }
        }
        None
    }
}

CredentialsProviderChain::fallback_on_interrupt will invoke each provider's fallback_on_interrupt method until credentials are returned by one of them. It ensures that the updated code snippet for LazyCredentialsCache can return credentials from provider 2 in both Case 1 and Case 2. Even if timeout_future wins the race, the execution subsequently calls provider.fallback_on_interrupt() to obtain fallback credentials from provider 2, assuming provider 2's fallback_on_interrupt is implemented to return fallback credentials accordingly.

The downside of this simple approach is that the behavior is not clear if more than one credentials provider in the chain can return credentials from their fallback_on_interrupt. Note, however, that it is the exception rather than the norm for a provider's fallback_on_interrupt to return fallback credentials, at least at the time of writing (01/13/2023). The fact that it returns fallback credentials means that the provider successfully loaded credentials at least once, and it usually continues serving credentials on subsequent calls to provide_credentials.

Should we have more than one provider in the chain that can potentially return fallback credentials from fallback_on_interrupt, we could configure the behavior of CredentialsProviderChain managing in what order and how each fallback_on_interrupt should be executed. See the Possible enhancement section for more details. The use case described there is an extreme edge case, but it's worth exploring what options are available to us with the proposed design.

Alternative

In this section, we will describe an alternative approach that we ended up dismissing as unworkable.

Instead of fallback_on_interrupt, we considered the following method to be added to the ProvideCredentials trait:

pub trait ProvideCredentials: Send + Sync + std::fmt::Debug {
    // --snip--

    /// Returns a future that provides credentials within the given `timeout`.
    ///
    /// The default implementation races `provide_credentials` against
    /// a timeout future created from `timeout`.
    fn provide_credentials_with_timeout<'a>(
        &'a self,
        sleeper: Arc<dyn AsyncSleep>,
        timeout: Duration,
    ) -> future::ProvideCredentials<'a>
    where
        Self: 'a,
    {
        let timeout_future = sleeper.sleep(timeout);
        let future = Timeout::new(self.provide_credentials(), timeout_future);
        future::ProvideCredentials::new(async move {
            let credentials = future
                .await
                .map_err(|_err| CredentialsError::provider_timed_out(timeout))?;
            credentials
        })
    }

provide_credentials_with_timeout encapsulated the timeout race and allowed users to specify how long the external timeout for provide_credentials would be. The code snippet from LazyCredentialsCache then looked like

let sleeper = Arc::clone(&self.sleeper);
let load_timeout = self.load_timeout; // by default self.load_timeout is 5 seconds.
// --snip--
let result = cache
    .get_or_load(|| {
        async move {
            let credentials = provider
                .provide_credentials_with_timeout(sleeper, load_timeout)
                .await?;
            // --snip--
        }
    }).await;
// --snip--

However, implementing CredentialsProviderChain::provide_credentials_with_timeout quickly ran into the following problem:

impl ProvideCredentials for CredentialsProviderChain {
    // --snip--

    fn provide_credentials_with_timeout<'a>(
        &'a self,
        sleeper: Arc<dyn AsyncSleep>,
        timeout: Duration,
    ) -> future::ProvideCredentials<'a>
    where
        Self: 'a,
    {
        future::ProvideCredentials::new(self.credentials_with_timeout(sleeper, timeout))
    }
}

impl CredentialsProviderChain {
    // --snip--

    async fn credentials_with_timeout(
        &self,
        sleeper: Arc<dyn AsyncSleep>,
        timeout: Duration,
    ) -> provider::Result {
        for (_, provider) in &self.providers {
            match provider
                .provide_credentials_with_timeout(Arc::clone(&sleeper), /* how do we calculate timeout for each provider ? */)
                .await
            {
                Ok(credentials) => {
                    return Ok(credentials);
                }
                Err(CredentialsError::ProviderTimedOut(_)) => {
                    // --snip--
                }
                Err(err) => {
                   // --snip--
                }
           }
        }
        Err(CredentialsError::provider_timed_out(timeout))
    }

There are mainly two problems with this approach. The first problem is that as shown above, there is no sensible way to calculate a timeout for each provider in the chain. The second problem is that exposing a parameter like timeout at a public trait's level is giving too much control to users; delegating overall timeout to the individual provider means each provider has to get it right.

Changes checklist

  • Add fallback_on_interrupt method to the ProvideCredentials trait with the default implementation
  • Implement CredentialsProviderChain::fallback_on_interrupt
  • Implement DefaultCredentialsChain::fallback_on_interrupt
  • Add unit tests for Case 1 and Case 2

Possible enhancement

We will describe how to customize the behavior for CredentialsProviderChain::fallback_on_interrupt. We are only demonstrating how much the proposed design can be extended and currently do not have concrete use cases to implement using what we present in this section.

As described in the Proposal section, CredentialsProviderChain::fallback_on_interrupt traverses the chain from the head to the tail and returns the first fallback credentials found. This precedence policy works most of the time, but when we have more than one provider in the chain that can potentially return fallback credentials, it could break in the following edge case (we are still basing our discussion on the code snippet from LazyCredentialsCache but forget REQ 1 and REQ 2 for the sake of simplicity).

fallback_on_interrupt_appendix excalidraw

During the first call to CredentialsProviderChain::provide_credentials, provider 1 fails to load credentials, maybe due to an internal timeout, and then provider 2 succeeds in loading its credentials (call them credentials 2) and internally stores them for Provider2::fallback_on_interrupt to return them subsequently. During the second call, provider 1 succeeds in loading credentials (call them credentials 1) and internally stores them for Provider1::fallback_on_interrupt to return them subsequently. Suppose, however, that credentials 1's expiry is earlier than credentials 2's expiry. Finally, during the third call, CredentialsProviderChain::provide_credentials did not complete due to an external timeout. CredentialsProviderChain::fallback_on_interrupt then returns credentials 1, when it should return credentials 2 whose expiry is later, because of the precedence policy.

This a case where CredentialsProviderChain::fallback_on_interrupt requires the recency policy for fallback credentials found in provider 1 and provider 2, not the precedence policy. The following figure shows how we can set up such a chain:

heterogeneous_policies_for_fallback_on_interrupt

The outermost chain is a CredentialsProviderChain and follows the precedence policy for fallback_on_interrupt. It contains a sub-chain that, in turn, contains provider 1 and provider 2. This sub-chain implements its own fallback_on_interrupt to realize the recency policy for fallback credentials found in provider 1 and provider 2. Conceptually, we have

pub struct FallbackRecencyChain {
    provider_chain: CredentialsProviderChain,
}

impl ProvideCredentials for FallbackRecencyChain {
    fn provide_credentials<'a>(&'a self) -> future::ProvideCredentials<'a>
    where
        Self: 'a,
    {
        // Can follow the precedence policy for loading credentials
        // if it chooses to do so.
    }

    fn fallback_on_interrupt(&self) -> Option<Credentials> {
        // Iterate over `self.provider_chain` and return
        // fallback credentials whose expiry is the most recent.
    }
}

We can then compose the entire chain like so:

let provider_1 = /* ... */
let provider_2 = /* ... */
let provider_3 = /* ... */

let sub_chain = CredentialsProviderChain::first_try("Provider1", provider_1)
    .or_else("Provider2", provider_2);

let recency_chain = /* Create a FallbackRecencyChain with sub_chain */

let final_chain = CredentialsProviderChain::first_try("fallback_recency", recency_chain)
    .or_else("Provider3", provider_3);

The fallback_on_interrupt method on final_chain still traverses from the head to the tail, but once it hits recency_chain, fallback_on_interrupt on recency_chain respects the expiry of fallback credentials found in its inner providers.

What we have presented in this section can be generalized thanks to chain composability. We could have different sub-chains, each implementing its own policy for fallback_on_interrupt.

RFC: Better Constraint Violations

Status: Accepted

Applies to: server

During and after the design and the core implementation of constraint traits in the server SDK, some problems relating to constraint violations were identified. This RFC sets out to explain and address three of them: impossible constraint violations, collecting constraint violations, and "tightness" of constraint violations. The RFC explains each of them in turn, solving them in an iterative and pedagogical manner, i.e. the solution of a problem depends on the previous ones having been solved with their proposed solutions. The three problems are meant to be addressed atomically in one changeset (see the Checklist) section.

Note: code snippets from generated SDKs in this document are abridged so as to be didactic and relevant to the point being made. They are accurate with regards to commit 2226fe.

Terminology

The design and the description of the PR where the core implementation of constraint traits was made are recommended prior reading to understand this RFC.

  • Shape closure: the set of shapes a shape can "reach", including itself.
  • Transitively constrained shape: a shape whose closure includes:
    1. a shape with a constraint trait attached,
    2. a (member) shape with a required trait attached,
    3. an enum shape; or
    4. an intEnum shape.
  • A directly constrained shape is any of these:
    1. a shape with a constraint trait attached,
    2. a (member) shape with a required trait attached,
    3. an enum shape,
    4. an intEnum shape; or
    5. a structure shape with at least one required member shape.
  • Constrained type: the Rust type a constrained shape gets rendered as. For shapes that are not structure, union, enum or intEnum shapes, these are wrapper newtypes.

In the absence of a qualifier, "constrained shape" should be interpreted as "transitively constrained shape".

Impossible constraint violations

Background

A constrained type has a fallible constructor by virtue of it implementing the TryFrom trait. The error type this constructor may yield is known as a constraint violation:

impl TryFrom<UnconstrainedType> for ConstrainedType {
    type Error = ConstraintViolation;

    fn try_from(value: UnconstrainedType) -> Result<Self, Self::Error> {
        ...
    }
}

The ConstraintViolation type is a Rust enum with one variant per way "constraining" the input value may fail. So, for example, the following Smithy model:

structure A {
    @required
    member: String,
}

Yields:

/// See [`A`](crate::model::A).
pub mod a {
    #[derive(std::cmp::PartialEq, std::fmt::Debug)]
    /// Holds one variant for each of the ways the builder can fail.
    pub enum ConstraintViolation {
        /// `member` was not provided but it is required when building `A`.
        MissingMember,
    }
}

Constraint violations are always Rust enums, even if they only have one variant.

Constraint violations can occur in application code:

use my_server_sdk::model

let res = model::a::Builder::default().build(); // We forgot to set `member`.

match res {
    Ok(a) => { ... },
    Err(e) => {
        assert_eq!(model::a::ConstraintViolation::MissingMember, e);
    }
}

Problem

Currently, the constraint violation types we generate are used by both:

  1. the server framework upon request deserialization; and
  2. by users in application code.

However, the kinds of constraint violations that can occur in application code can sometimes be a strict subset of those that can occur during request deserialization.

Consider the following model:

@length(min: 1, max: 69)
map LengthMap {
    key: String,
    value: LengthString
}

@length(min: 2, max: 69)
string LengthString

This produces:

pub struct LengthMap(
    pub(crate) std::collections::HashMap<std::string::String, crate::model::LengthString>,
);

impl
    std::convert::TryFrom<
        std::collections::HashMap<std::string::String, crate::model::LengthString>,
    > for LengthMap
{
    type Error = crate::model::length_map::ConstraintViolation;

    /// Constructs a `LengthMap` from an
    /// [`std::collections::HashMap<std::string::String,
    /// crate::model::LengthString>`], failing when the provided value does not
    /// satisfy the modeled constraints.
    fn try_from(
        value: std::collections::HashMap<std::string::String, crate::model::LengthString>,
    ) -> Result<Self, Self::Error> {
        let length = value.len();
        if (1..=69).contains(&length) {
            Ok(Self(value))
        } else {
            Err(crate::model::length_map::ConstraintViolation::Length(length))
        }
    }
}

pub mod length_map {
    pub enum ConstraintViolation {
        Length(usize),
        Value(
            std::string::String,
            crate::model::length_string::ConstraintViolation,
        ),
    }
    ...
}

Observe how the ConstraintViolation::Value variant is never constructed. Indeed, this variant is impossible to be constructed in application code: a user has to provide a map whose values are already constrained LengthStrings to the try_from constructor, which only enforces the map's @length trait.

The reason why these seemingly "impossible violations" are being generated is because they can arise during request deserialization. Indeed, the server framework deserializes requests into fully unconstrained types. These are types holding unconstrained types all the way through their closures. For instance, in the case of structure shapes, builder types (the unconstrained type corresponding to the structure shape) hold builders all the way down.

In the case of the above model, below is the alternate pub(crate) constructor the server framework uses upon deserialization. Observe how LengthMapOfLengthStringsUnconstrained is fully unconstrained and how the try_from constructor can yield ConstraintViolation::Value.

pub(crate) mod length_map_of_length_strings_unconstrained {
    #[derive(Debug, Clone)]
    pub(crate) struct LengthMapOfLengthStringsUnconstrained(
        pub(crate) std::collections::HashMap<std::string::String, std::string::String>,
    );

    impl std::convert::TryFrom<LengthMapOfLengthStringsUnconstrained>
        for crate::model::LengthMapOfLengthStrings
    {
        type Error = crate::model::length_map_of_length_strings::ConstraintViolation;
        fn try_from(value: LengthMapOfLengthStringsUnconstrained) -> Result<Self, Self::Error> {
            let res: Result<
                std::collections::HashMap<std::string::String, crate::model::LengthString>,
                Self::Error,
            > = value
                .0
                .into_iter()
                .map(|(k, v)| {
                    let v: crate::model::LengthString = k.try_into().map_err(Self::Error::Key)?;

                    Ok((k, v))
                })
                .collect();
            let hm = res?;
            Self::try_from(hm)
        }
    }
}

In conclusion, the user is currently exposed to an internal detail of how the framework operates that has no bearing on their application code. They shouldn't be exposed to impossible constraint violation variants in their Rust docs, nor have to match on these variants when handling errors.

Note: this comment alludes to the problem described above.

Solution proposal

The problem can be mitigated by adding #[doc(hidden)] to the internal variants and #[non_exhaustive] to the enum. We're already doing this in some constraint violation types.

However, a "less leaky" solution is achieved by splitting the constraint violation type into two types, which this RFC proposes:

  1. one for use by the framework, with pub(crate) visibility, named ConstraintViolationException; and
  2. one for use by user application code, with pub visibility, named ConstraintViolation.
pub mod length_map {
    pub enum ConstraintViolation {
        Length(usize),
    }
    pub (crate) enum ConstraintViolationException {
        Length(usize),
        Value(
            std::string::String,
            crate::model::length_string::ConstraintViolation,
        ),
    }
}

Note that, to some extent, the spirit of this approach is already currently present in the case of builder types when publicConstrainedTypes is set to false:

  1. ServerBuilderGenerator.kt renders the usual builder type that enforces constraint traits, setting its visibility to pub (crate), for exclusive use by the framework.
  2. ServerBuilderGeneratorWithoutPublicConstrainedTypes.kt renders the builder type the user is exposed to: this builder does not take in constrained types and does not enforce all modeled constraints.

Collecting constraint violations

Background

Constrained operations are currently required to have smithy.framework#ValidationException as a member in their errors property. This is the shape that is rendered in responses when a request contains data that violates the modeled constraints.

The shape is defined in the smithy-validation-model Maven package, as follows:

$version: "2.0"

namespace smithy.framework

/// A standard error for input validation failures.
/// This should be thrown by services when a member of the input structure
/// falls outside of the modeled or documented constraints.
@error("client")
structure ValidationException {

    /// A summary of the validation failure.
    @required
    message: String,

    /// A list of specific failures encountered while validating the input.
    /// A member can appear in this list more than once if it failed to satisfy multiple constraints.
    fieldList: ValidationExceptionFieldList
}

/// Describes one specific validation failure for an input member.
structure ValidationExceptionField {
    /// A JSONPointer expression to the structure member whose value failed to satisfy the modeled constraints.
    @required
    path: String,

    /// A detailed description of the validation failure.
    @required
    message: String
}

list ValidationExceptionFieldList {
    member: ValidationExceptionField
}

It was mentioned in the constraint traits RFC, and implicit in the definition of Smithy's smithy.framework.ValidationException shape, that server frameworks should respond with a complete collection of errors encountered during constraint trait enforcement to the client.

Problem

As of writing, the TryFrom constructor of constrained types whose shapes have more than one constraint trait attached can only yield a single error. For example, the following shape:

@pattern("[a-f0-5]*")
@length(min: 5, max: 10)
string LengthPatternString

Yields:

pub struct LengthPatternString(pub(crate) std::string::String);

impl LengthPatternString {
    fn check_length(
        string: &str,
    ) -> Result<(), crate::model::length_pattern_string::ConstraintViolation> {
        let length = string.chars().count();

        if (5..=10).contains(&length) {
            Ok(())
        } else {
            Err(crate::model::length_pattern_string::ConstraintViolation::Length(length))
        }
    }

    fn check_pattern(
        string: String,
    ) -> Result<String, crate::model::length_pattern_string::ConstraintViolation> {
        let regex = Self::compile_regex();

        if regex.is_match(&string) {
            Ok(string)
        } else {
            Err(crate::model::length_pattern_string::ConstraintViolation::Pattern(string))
        }
    }

    pub fn compile_regex() -> &'static regex::Regex {
        static REGEX: once_cell::sync::Lazy<regex::Regex> = once_cell::sync::Lazy::new(|| {
            regex::Regex::new(r#"[a-f0-5]*"#).expect(r#"The regular expression [a-f0-5]* is not supported by the `regex` crate; feel free to file an issue under https://github.com/smithy-lang/smithy-rs/issues for support"#)
        });

        &REGEX
    }
}

impl std::convert::TryFrom<std::string::String> for LengthPatternString {
    type Error = crate::model::length_pattern_string::ConstraintViolation;

    /// Constructs a `LengthPatternString` from an [`std::string::String`],
    /// failing when the provided value does not satisfy the modeled constraints.
    fn try_from(value: std::string::String) -> Result<Self, Self::Error> {
        Self::check_length(&value)?;

        let value = Self::check_pattern(value)?;

        Ok(Self(value))
    }
}

Observe how a failure to adhere to the @length trait will short-circuit the evaluation of the constructor, when the value could technically also not adhere with the @pattern trait.

Similarly, constrained structures fail upon encountering the first member that violates a constraint.

Additionally, in framework request deserialization code:

  • collections whose members are constrained fail upon encountering the first member that violates the constraint,
  • maps whose keys and/or values are constrained fail upon encountering the first violation; and
  • structures whose members are constrained fail upon encountering the first member that violates the constraint,

In summary, any shape that is transitively constrained yields types whose constructors (both the internal one and the user-facing one) currently short-circuit upon encountering the first violation.

Solution proposal

The deserializing architecture lends itself to be easily refactored so that we can collect constraint violations before returning them. Indeed, note that deserializers enforce constraint traits in a two-step phase: first, the entirety of the unconstrained value is deserialized, then constraint traits are enforced by feeding the entire value to the TryFrom constructor.

Let's consider a ConstraintViolations type (note the plural) that represents a collection of constraint violations that can occur within user application code. Roughly:

pub ConstraintViolations<T>(pub(crate) Vec<T>);

impl<T> IntoIterator<Item = T> for ConstraintViolations<T> { ... }

impl std::convert::TryFrom<std::string::String> for LengthPatternString {
    type Error = ConstraintViolations<crate::model::length_pattern_string::ConstraintViolation>;

    fn try_from(value: std::string::String) -> Result<Self, Self::Error> {
        // Check constraints and collect violations.
        ...
    }
}
  • The main reason for wrapping a vector in ConstraintViolations as opposed to directly returning the vector is forwards-compatibility: we may want to expand ConstraintViolations with conveniences.
  • If the constrained type can only ever yield a single violation, we will dispense with ConstraintViolations and keep directly returning the crate::model::shape_name::ConstraintViolation type.

We will analogously introduce a ConstraintViolationExceptions type that represents a collection of constraint violations that can occur within the framework's request deserialization code. This type will be pub(crate) and will be the one the framework will map to Smithy's ValidationException that eventually gets serialized into the response.

Collecting constraint violations may constitute a DOS attack vector

This is a problem that already exists as of writing, but that collecting constraint violations highlights, so it is a good opportunity, from a pedagogical perspective, to explain it here. Consider the following model:

@length(max: 3)
list ListOfPatternStrings {
    member: PatternString
}

@pattern("expensive regex to evaluate")
string PatternString

Our implementation currently enforces constraints from the leaf to the root: when enforcing the @length constraint, the TryFrom constructor the server framework uses gets a Vec<String> and first checks the members adhere to the @pattern trait, and only after is the @length trait checked. This means that if a client sends a request with n >>> 3 list members, the expensive check runs n times, when a constant-time check inspecting the length of the input vector would have sufficed to reject the request. Additionally, we may want to avoid serializing n ValidationExceptionFields due to performance concerns.

  1. A possibility to circumvent this is making the @length validator special, having it bound the other validators via effectively permuting the order of the checks and thus short-circuiting.
    • In general, it's unclear what constraint traits should cause short-circuiting. A probably reasonable rule of thumb is to include traits that can be attached directly to aggregate shapes: as of writing, that would be @uniqueItems on list shapes and @length on list shapes.
  2. Another possiblity is to do nothing and value complete validation exception response messages over trying to mitigate this with special handling. One could argue that these kind of DOS attack vectors should be taken care of with a separate solution e.g. a layer that bounds a request body's size to a reasonable default (see how Axum added this). We will provide a similar request body limiting mechanism regardless.

This RFC advocates for implementing the first option, arguing that it's fair to say that the framework should return an error that is as informative as possible, but it doesn't necessarily have to be complete. However, we will also write a layer, applied by default to all server SDKs, that bounds a request body's size to a reasonable (yet high) default. Relying on users to manually apply the layer is dangerous, since such a configuration is trivially exploitable. Users can always manually apply the layer again to their resulting service if they want to further restrict a request's body size.

"Tightness" of constraint violations

Problem

ConstraintViolationExceptions is not "tight" in that there's nothing in the type system that indicates to the user, when writing the custom validation error mapping function, that the iterator will not return a sequence of ConstraintViolationExceptions that is actually impossible to occur in practice.

Recall that ConstraintViolationExceptions are enums that model both direct constraint violations as well as transitive ones. For example, given the model:

@length(min: 1, max: 69)
map LengthMap {
    key: String,
    value: LengthString
}

@length(min: 2, max: 69)
string LengthString

The corresponding ConstraintViolationException Rust type for the LengthMap shape is:

pub mod length_map {
    pub enum ConstraintViolation {
        Length(usize),
    }
    pub (crate) enum ConstraintViolationException {
        Length(usize),
        Value(
            std::string::String,
            crate::model::length_string::ConstraintViolationException,
        ),
    }
}

ConstraintViolationExceptions is just a container over this type:

pub ConstraintViolationExceptions<T>(pub(crate) Vec<T>);

impl<T> IntoIterator<Item = T> for ConstraintViolationExceptions<T> { ... }

There might be multiple map values that fail to adhere to the constraints in LengthString, which would make the iterator yield multiple length_map::ConstraintViolationException::Values; however, at most one length_map::ConstraintViolationException::Length can be yielded in practice. This might be obvious to the service owner when inspecting the model and the Rust docs, but it's not expressed in the type system.

The above tightness problem has been formulated in terms of ConstraintViolationExceptions, because the fact that ConstraintViolationExceptions contain transitive constraint violations highlights the tightness problem. Note, however, that the tightness problem also afflicts ConstraintViolations.

Indeed, consider the following model:

@pattern("[a-f0-5]*")
@length(min: 5, max: 10)
string LengthPatternString

This would yield:

pub struct ConstraintViolations<T>(pub(crate) Vec<T>);

impl<T> IntoIterator<Item = T> for ConstraintViolations<T> { ... }

pub mod length_pattern_string {
    pub enum ConstraintViolation {
        Length(usize),
        Pattern(String)
    }
}

impl std::convert::TryFrom<std::string::String> for LengthPatternString {
    type Error = ConstraintViolations<crate::model::length_pattern_string::ConstraintViolation>;

    fn try_from(value: std::string::String) -> Result<Self, Self::Error> {
        // Check constraints and collect violations.
        ...
    }
}

Observe how the iterator of an instance of ConstraintViolations<crate::model::length_pattern_string::ConstraintViolation>, may, a priori, yield e.g. the length_pattern_string::ConstraintViolation::Length variant twice, when it's clear that the iterator should contain at most one of each of length_pattern_string::ConstraintViolation's variants.

Final solution proposal

We propose a tighter API design.

  1. We substitute enums for structs whose members are all Optional, representing all the constraint violations that can occur.
  2. For list shapes and map shapes:
    1. we implement IntoIterator on an additional struct Members representing only the violations that can occur on the collection's members.
    2. we add a non Option-al field to the struct representing the constraint violations of type Members.

Let's walk through an example. Take the last model:

@pattern("[a-f0-5]*")
@length(min: 5, max: 10)
string LengthPatternString

This would yield, as per the first substitution:

pub mod length_pattern_string {
    pub struct ConstraintViolations {
        pub length: Option<constraint_violation::Length>,
        pub pattern: Option<constraint_violation::Pattern>,
    }

    pub mod constraint_violation {
        pub struct Length(usize);
        pub struct Pattern(String);
    }
}

impl std::convert::TryFrom<std::string::String> for LengthPatternString {
    type Error = length_pattern_string::ConstraintViolations;

    // The error type returned by this constructor, `ConstraintViolations`,
    // will always have _at least_ one member set.
    fn try_from(value: std::string::String) -> Result<Self, Self::Error> {
        // Check constraints and collect violations.
        ...
    }
}

We now expand the model to highlight the second step of the algorithm:

@length(min: 1, max: 69)
map LengthMap {
    key: String,
    value: LengthString
}

This gives us:

pub mod length_map {
    pub struct ConstraintViolations {
        pub length: Option<constraint_violation::Length>,

        // Would be `Option<T>` in the case of an aggregate shape that is _not_ a
        // list shape or a map shape.
        pub member_violations: constraint_violation::Members,
    }

    pub mod constraint_violation {
        // Note that this could now live outside the `length_map` module and be
        // reused across all `@length`-constrained shapes, if we expanded it with
        // another `usize` indicating the _modeled_ value in the `@length` trait; by
        // keeping it inside `length_map` we can hardcode that value in the
        // implementation of e.g. error messages.
        pub struct Length(usize);

        pub struct Members {
            pub(crate) Vec<Member>
        }

        pub struct Member {
            // If the map's key shape were constrained, we'd have a `key`
            // field here too.

            value: Option<Value>
        }

        pub struct Value(
            std::string::String,
            crate::model::length_string::ConstraintViolation,
        );

        impl IntoIterator<Item = Member> for Members { ... }
    }
}

The above examples have featured the tight API design with ConstraintViolations. Of course, we will apply the same design in the case of ConstraintViolationExceptions. For the sake of completeness, let's expand our model yet again with a structure shape:

structure A {
    @required
    member: String,

    @required
    length_map: LengthMap,
}

And this time let's feature both the resulting ConstraintViolationExceptions and ConstraintViolations types:

pub mod a {
    pub struct ConstraintViolationExceptions {
        // All fields must be `Option`, despite the members being `@required`,
        // since no violations for their values might have occurred.

        pub missing_member_exception: Option<constraint_violation_exception::MissingMember>,
        pub missing_length_map_exception: Option<constraint_violation_exception::MissingLengthMap>,
        pub length_map_exceptions: Option<crate::model::length_map::ConstraintViolationExceptions>,
    }

    pub mod constraint_violation_exception {
        pub struct MissingMember;
        pub struct MissingLengthMap;
    }

    pub struct ConstraintViolations {
        pub missing_member: Option<constraint_violation::MissingMember>,
        pub missing_length_map: Option<constraint_violation::MissingLengthMap>,
    }

    pub mod constraint_violation {
        pub struct MissingMember;
        pub struct MissingLengthMap;
    }
}

As can be intuited, the only differences are that:

  • ConstraintViolationExceptions hold transitive violations while ConstraintViolations only need to expose direct violations (as explained in the Impossible constraint violations section),
  • ConstraintViolationExceptions have members suffixed with _exception, as is the module name.

Note that while the constraint violation (exception) type names are plural, the module names are always singular.

We also make a conscious decision of, in this case of structure shapes, making the types of all members Options, for simplicity. Another choice would have been to make length_map_exceptions not Option-al, and, in the case where no violations in LengthMap values occurred, set length_map::ConstraintViolations::length to None and length_map::ConstraintViolations::member_violations eventually reach an empty iterator. However, it's best that we use the expressiveness of Options at the earliest ("highest" in the shape hierarchy) opportunity: if a member is Some, it means it (eventually) reaches data.

Checklist

Unfortunately, while this RFC could be implemented iteratively (i.e. solve each of the problems in turn), it would introduce too much churn and throwaway work: solving the tightness problem requires a more or less complete overhaul of the constraint violations code generator. It's best that all three problems be solved in the same changeset.

  • Generate ConstraintViolations and ConstraintViolationExceptions types so as to not reify impossible constraint violations, add the ability to collect constraint violations, and solve the "tightness" problem of constraint violations.
  • Special-case generated request deserialization code for operations using @length and @uniqueItems constrained shapes whose closures reach other constrained shapes so that the validators for these two traits short-circuit upon encountering a number of inner constraint violations above a certain threshold.
  • Write and expose a layer, applied by default to all generated server SDKs, that bounds a request body's size to a reasonable (yet high) default, to prevent trivial DoS attacks.

RFC: Improving access to request IDs in SDK clients

Status: Implemented in #2129

Applies to: AWS SDK clients

At time of writing, customers can retrieve a request ID in one of four ways in the Rust SDK:

  1. For error cases where the response parsed successfully, the request ID can be retrieved via accessor method on operation error. This also works for unmodeled errors so long as the response parsing succeeds.
  2. For error cases where a response was received but parsing fails, the response headers can be retrieved from the raw response on the error, but customers have to manually extract the request ID from those headers (there's no convenient accessor method).
  3. For all error cases where the request ID header was sent in the response, customers can call SdkError::into_service_error to transform the SdkError into an operation error, which has a request_id accessor on it.
  4. For success cases, the customer can't retrieve the request ID at all if they use the fluent client. Instead, they must manually make the operation and call the underlying Smithy client so that they have access to SdkSuccess, which provides the raw response where the request ID can be manually extracted from headers.

Only one of these mechanisms is convenient and ergonomic. The rest need considerable improvements. Additionally, the request ID should be attached to tracing events where possible so that enabling debug logging reveals the request IDs without any code changes being necessary.

This RFC proposes changes to make the request ID easier to access.

Terminology

  • Request ID: A unique identifier assigned to and associated with a request to AWS that is sent back in the response headers. This identifier is useful to customers when requesting support.
  • Operation Error: Operation errors are code generated for each operation in a Smithy model. They are an enum of every possible modeled error that that operation can respond with, as well as an Unhandled variant for any unmodeled or unrecognized errors.
  • Modeled Errors: Any error that is represented in a Smithy model with the @error trait.
  • Unmodeled Errors: Errors that a service responds with that do not appear in the Smithy model.
  • SDK Clients: Clients generated for the AWS SDK, including "adhoc" or "one-off" clients.
  • Smithy Clients: Any clients not generated for the AWS SDK, excluding "adhoc" or "one-off" clients.

SDK/Smithy Purity

Before proposing any changes, the topic of purity needs to be covered. Request IDs are not currently a Smithy concept. However, at time of writing, the request ID concept is leaked into the non-SDK rust runtime crates and generated code via the generic error struct and the request_id functions on generated operation errors (e.g., GetObjectError example in S3).

This RFC attempts to remove these leaks from Smithy clients.

Proposed Changes

First, we'll explore making it easier to retrieve a request ID from errors, and then look at making it possible to retrieve them from successful responses. To see the customer experience of these changes, see the Example Interactions section below.

Make request ID retrieval on errors consistent

One could argue that customers being able to convert a SdkError into an operation error that has a request ID on it is sufficient. However, there's no way to write a function that takes an error from any operation and logs a request ID, so it's still not ideal.

The aws-http crate needs to have a RequestId trait on it to facilitate generic request ID retrieval:

#![allow(unused)]
fn main() {
pub trait RequestId {
    /// Returns the request ID if it's available.
    fn request_id(&self) -> Option<&str>;
}
}

This trait will be implemented for SdkError in aws-http where it is declared, complete with logic to pull the request ID header out of the raw HTTP responses (it will always return None for event stream Message responses; an additional trait may need to be added to aws-smithy-http to facilitate access to the headers). This logic will try different request ID header names in order of probability since AWS services have a couple of header name variations. x-amzn-requestid is the most common, with x-amzn-request-id being the second most common.

aws-http will also implement RequestId for aws_smithy_types::error::Error, and the request_id method will be removed from aws_smithy_types::error::Error. Places that construct Error will place the request ID into its extras field, where the RequestId trait implementation can retrieve it.

A codegen decorator will be added to sdk-codegen to implement RequestId for operation errors, and the existing request_id accessors will be removed from CombinedErrorGenerator in codegen-core.

With these changes, customers can directly access request IDs from SdkError and operations errors by importing the RequestId trait. Additionally, the Smithy/SDK purity is improved since both places where request IDs are leaked to Smithy clients will be resolved.

Implement RequestId for outputs

To make it possible to retrieve request IDs when using the fluent client, the new RequestId trait can be implemented for outputs.

Some services (e.g., Transcribe Streaming) model the request ID header in their outputs, while other services (e.g., Directory Service) model a request ID field on errors. In some cases, services take RequestId as a modeled input (e.g., IoT Event Data). It follows that it is possible, but unlikely, that a service could have a field named RequestId that is not the same concept in the future.

Thus, name collisions are going to be a concern for putting a request ID accessor on output. However, if it is implemented as a trait, then this concern is partially resolved. In the vast majority of cases, importing RequestId will provide the accessor without any confusion. In cases where it is already modeled and is the same concept, customers will likely just use it and not even realize they didn't import the trait. The only concern is future cases where it is modeled as a separate concept, and as long as customers don't import RequestId for something else in the same file, that confusion can be avoided.

In order to implement RequestId for outputs, either the original response needs to be stored on the output, or the request ID needs to be extracted earlier and stored on the output. The latter will lead to a small amount of header lookup code duplication.

In either case, the StructureGenerator needs to be customized in sdk-codegen (Appendix B outlines an alternative approach to this and why it was dismissed). This will be done by adding customization hooks to StructureGenerator similar to the ones for ServiceConfigGenerator so that a sdk-codegen decorator can conditionally add fields and functions to any generated structs. A hook will also be needed to additional trait impl blocks.

Once the hooks are in place, a decorator will be added to store either the original response or the request ID on outputs, and then the RequestId trait will be implemented for them. The ParseResponse trait implementation will be customized to populate this new field.

Note: To avoid name collisions of the request ID or response on the output struct, these fields can be prefixed with an underscore. It shouldn't be possible for SDK fields to code generate with this prefix given the model validation rules in place.

Implement RequestId for Operation and operation::Response

In the case that a customer wants to ditch the fluent client, it should still be easy to retrieve a request ID. To do this, aws-http will provide RequestId implementations for Operation and operation::Response. These implementations will likely make the other RequestId implementations easier to implement as well.

Implement RequestId for Result

The Result returned by the SDK should directly implement RequestId when both its Ok and Err variants implement RequestId. This will make it possible for a customer to feed the return value from send() directly to a request ID logger.

Example Interactions

Generic Handling Case

// A re-export of the RequestId trait
use aws_sdk_service::primitives::RequestId;

fn my_request_id_logging_fn(request_id: &dyn RequestId) {
    println!("request ID: {:?}", request_id.request_id());
}

let result = client.some_operation().send().await?;
my_request_id_logging_fn(&result);

Success Case

use aws_sdk_service::primitives::RequestId;

let output = client.some_operation().send().await?;
println!("request ID: {:?}", output.request_id());

Error Case with SdkError

use aws_sdk_service::primitives::RequestId;

match client.some_operation().send().await {
    Ok(_) => { /* handle OK */ }
    Err(err) => {
        println!("request ID: {:?}", output.request_id());
    }
}

Error Case with operation error

use aws_sdk_service::primitives::RequestId;

match client.some_operation().send().await {
    Ok(_) => { /* handle OK */ }
    Err(err) => match err.into_service_err() {
        err @ SomeOperationError::SomeError(_) => { println!("request ID: {:?}", err.request_id()); }
        _ => { /* don't care */ }
    }
}

Changes Checklist

  • Create the RequestId trait in aws-http
  • Implement for errors
    • Implement RequestId for SdkError in aws-http
    • Remove request_id from aws_smithy_types::error::Error, and store request IDs in its extras instead
    • Implement RequestId for aws_smithy_types::error::Error in aws-http
    • Remove generation of request_id accessors from CombinedErrorGenerator in codegen-core
  • Implement for outputs
    • Add customization hooks to StructureGenerator
    • Add customization hook to ParseResponse
    • Add customization hook to HttpBoundProtocolGenerator
    • Customize output structure code gen in sdk-codegen to add either a request ID or a response field
    • Customize ParseResponse in sdk-codegen to populate the outputs
  • Implement RequestId for Operation and operation::Response
  • Implement RequestId for Result<O, E> where O and E both implement RequestId
  • Re-export RequestId in generated crates
  • Add integration tests for each request ID access point

Appendix A: Alternate solution for access on successful responses

Alternatively, for successful responses, a second send method (that is difficult to name)w be added to the fluent client that has a return value that includes both the output and the request ID (or entire response).

This solution was dismissed due to difficulty naming, and the risk of name collision.

Appendix B: Adding RequestId as a string to outputs via model transform

The request ID could be stored on outputs by doing a model transform in sdk-codegen to add a RequestId member field. However, this causes problems when an output already has a RequestId field, and requires the addition of a synthetic trait to skip binding the field in the generated serializers/deserializers.

Smithy Orchestrator

status: implemented applies-to: The smithy client

This RFC proposes a new process for constructing client requests and handling service responses. This new process is intended to:

  • Improve the user experience by
  • Simplifying several aspects of sending a request
  • Adding more extension points to the request/response lifecycle
  • Improve the maintainer experience by
  • Making our SDK more similar in structure to other AWS SDKs
  • Simplifying many aspects of the request/response lifecycle
  • Making room for future changes

Additionally, functionality that the SDKs currently provide like retries, logging, and auth with be incorporated into this new process in such a way as to make it more configurable and understandable.

This RFC references but is not the source of truth on:

  • Interceptors: To be described in depth in a future RFC.
  • Runtime Plugins: To be described in depth in a future RFC.

TLDR;

When a smithy client communicates with a smithy service, messages are handled by an "orchestrator." The orchestrator runs in two main phases:

  1. Constructing configuration.
    • This process is user-configurable with "runtime plugins."
    • Configuration is stored in a typemap.
  2. Transforming a client request into a server response.
    • This process is user-configurable with "interceptors."
    • Interceptors are functions that are run by "hooks" in the request/response lifecycle.

Terminology

  • SDK Client: A high-level abstraction allowing users to make requests to remote services.
  • Remote Service: A remote API that a user wants to use. Communication with a remote service usually happens over HTTP. The remote service is usually, but not necessarily, an AWS service.
  • Operation: A high-level abstraction representing an interaction between an *SDK Client and a remote service.
  • Input Message: A modeled request passed into an SDK client. For example, S3’s ListObjectsRequest.
  • Transport Request Message: A message that can be transmitted to a remote service. For example, an HTTP request.
  • Transport Response Message: A message that can be received from a remote service. For example, an HTTP response.
  • Output Message: A modeled response or exception returned to an SDK client caller. For example, S3’s ListObjectsResponse or NoSuchBucketException.
  • The request/response lifecycle: The process by which an SDK client makes requests and receives responses from a remote service. This process is enacted and managed by the orchestrator.
  • Orchestrator: The code within an SDK client that handles the process of making requests and receiving responses from remote services. The orchestrator is configurable by modifying the runtime plugins it's built from. The orchestrator is responsible for calling interceptors at the appropriate times in the request/response lifecycle.
  • Interceptor/Hook: A generic extension point within the orchestrator. Supports "anything that someone should be able to do", NOT "anything anyone might want to do". These hooks are:
    • Either read-only or read/write.
    • Able to read and modify the Input, Transport Request, Transport Response, or Output messages.
  • Runtime Plugin: Runtime plugins are similar to interceptors, but they act on configuration instead of requests and response. Both users and services may define runtime plugins. Smithy also defines several default runtime plugins used by most clients. See the F.A.Q. for a list of plugins with descriptions.
  • ConfigBag: A typemap that's equivalent to http::Extensions. Used to store configuration for the orchestrator.

The user experience if this RFC is implemented

For many users, the changes described by this RFC will be invisible. Making a request with an orchestrator-based SDK client looks very similar to the way requests were made pre-RFC:

let sdk_config = aws_config::load_from_env().await;
let client = aws_sdk_s3::Client::new(&sdk_config);
let res = client.get_object()
    .bucket("a-bucket")
    .key("a-file.txt")
    .send()
    .await?;

match res {
    Ok(res) => println!("success: {:?}"),
    Err(err) => eprintln!("failure: {:?}")
};

Users may further configure clients and operations with runtime plugins, and they can modify requests and responses with interceptors. We'll examine each of these concepts in the following sections.

Service clients and operations are configured with runtime plugins

The exact implementation of runtime plugins is left for another RFC. That other RFC will be linked here once it's written. To get an idea of what they may look like, see the "Layered configuration, stored in type maps" section of this RFC.

Runtime plugins construct and modify client configuration. Plugin initialization is the first step of sending a request, and plugins set in later steps can override the actions of earlier plugins. Plugin ordering is deterministic and non-customizable.

While AWS services define a default set of plugins, users may define their own plugins, and set them by calling the appropriate methods on a service's config, client, or operation. Plugins are specifically meant for constructing service and operation configuration. If a user wants to define behavior that should occur at specific points in the request/response lifecycle, then they should instead consider defining an interceptor.

Requests and responses are modified by interceptors

Interceptors are similar to middlewares, in that they are functions that can read and modify request and response state. However, they are more restrictive than middlewares in that they can't modify the "control flow" of the request/response lifecycle. This is intentional. Interceptors can be registered on a client or operation, and the orchestrator is responsible for calling interceptors at the appropriate time. Users MUST NOT perform blocking IO within an interceptor. Interceptors are sync, and are not intended to perform large amounts of work. This makes them easier to reason about and use. Depending on when they are called, interceptors may read and modify input messages, transport request messages, transport response messages, and output messages. Additionally, all interceptors may write to a context object that is shared between all interceptors.

Currently supported hooks

  1. Read Before Execution (Read-Only): Before anything happens. This is the first thing the SDK calls during operation execution.
  2. Modify Before Serialization (Read/Write): Before the input message given by the customer is marshalled into a transport request message. Allows modifying the input message.
  3. Read Before Serialization (Read-Only): The last thing the SDK calls before marshaling the input message into a transport message.
  4. Read After Serialization (Read-Only): The first thing the SDK calls after marshaling the input message into a transport message.
  5. (Retry Loop)
    1. Modify Before Retry Loop (Read/Write): The last thing the SDK calls before entering the retry look. Allows modifying the transport message.
    2. Read Before Attempt (Read-Only): The first thing the SDK calls “inside” of the retry loop.
    3. Modify Before Signing (Read/Write): Before the transport request message is signed. Allows modifying the transport message.
    4. Read Before Signing (Read-Only): The last thing the SDK calls before signing the transport request message.
    5. **Read After Signing (Read-Only)****: The first thing the SDK calls after signing the transport request message.
    6. Modify Before Transmit (Read/Write): Before the transport request message is sent to the service. Allows modifying the transport message.
    7. Read Before Transmit (Read-Only): The last thing the SDK calls before sending the transport request message.
    8. Read After Transmit (Read-Only): The last thing the SDK calls after receiving the transport response message.
    9. Modify Before Deserialization (Read/Write): Before the transport response message is unmarshaled. Allows modifying the transport response message.
    10. Read Before Deserialization (Read-Only): The last thing the SDK calls before unmarshalling the transport response message into an output message.
    11. Read After Deserialization (Read-Only): The last thing the SDK calls after unmarshaling the transport response message into an output message.
    12. Modify Before Attempt Completion (Read/Write): Before the retry loop ends. Allows modifying the unmarshaled response (output message or error).
    13. Read After Attempt (Read-Only): The last thing the SDK calls “inside” of the retry loop.
  6. Modify Before Execution Completion (Read/Write): Before the execution ends. Allows modifying the unmarshaled response (output message or error).
  7. Read After Execution (Read-Only): After everything has happened. This is the last thing the SDK calls during operation execution.

Interceptor context

As mentioned above, interceptors may read/write a context object that is shared between all interceptors:

pub struct InterceptorContext<ModReq, TxReq, TxRes, ModRes> {
    // a.k.a. the input message
    modeled_request: ModReq,
    // a.k.a. the transport request message
    tx_request: Option<TxReq>,
    // a.k.a. the output message
    modeled_response: Option<ModRes>,
    // a.k.a. the transport response message
    tx_response: Option<TxRes>,
    // A type-keyed map
    properties: SharedPropertyBag,
}

The optional request and response types in the interceptor context can only be accessed by interceptors that are run after specific points in the request/response lifecycle. Rather than go into depth in this RFC, I leave that to a future "Interceptors RFC."

How to implement this RFC

Integrating with the orchestrator

Imagine we have some sort of request signer. This signer doesn't refer to any orchestrator types. All it needs is a HeaderMap along with two strings, and will return a signature in string form.

struct Signer;

impl Signer {
    fn sign(headers: &http::HeaderMap, signing_name: &str, signing_region: &str) -> String {
        todo!()
    }
}

Now imagine things from the orchestrator's point of view. It requires something that implements an AuthOrchestrator which will be responsible for resolving the correct auth scheme, identity, and signer for an operation, as well as signing the request

pub trait AuthOrchestrator<Req>: Send + Sync + Debug {
    fn auth_request(&self, req: &mut Req, cfg: &ConfigBag) -> Result<(), BoxError>;
}

// And it calls that `AuthOrchestrator` like so:
fn invoke() {
    // code omitted for brevity

    // Get the request to be signed
    let tx_req_mut = ctx.tx_request_mut().expect("tx_request has been set");
    // Fetch the auth orchestrator from the bag
    let auth_orchestrator = cfg
        .get::<Box<dyn AuthOrchestrator<Req>>>()
        .ok_or("missing auth orchestrator")?;
    // Auth the request
    auth_orchestrator.auth_request(tx_req_mut, cfg)?;

    // code omitted for brevity
}

The specific implementation of the AuthOrchestrator is what brings these two things together:

struct Sigv4AuthOrchestrator;

impl AuthOrchestrator for Sigv4AuthOrchestrator {
    fn auth_request(&self, req: &mut http::Request<SdkBody>, cfg: &ConfigBag) -> Result<(), BoxError> {
        let signer = Signer;
        let signing_name = cfg.get::<SigningName>().ok_or(Error::MissingSigningName)?;
        let signing_region = cfg.get::<SigningRegion>().ok_or(Error::MissingSigningRegion)?;
        let headers = req.headers_mut();

        let signature = signer.sign(headers, signing_name, signing_region);
        match cfg.get::<SignatureLocation>() {
            Some(SignatureLocation::Query) => req.query.set("sig", signature),
            Some(SignatureLocation::Header) => req.headers_mut().insert("sig", signature),
            None => return Err(Error::MissingSignatureLocation),
        };

        Ok(())
    }
}

This intermediate code should be free from as much logic as possible. Whenever possible, we must maintain this encapsulation. Doing so will make the Orchestrator more flexible, maintainable, and understandable.

Layered configuration, stored in type maps

Type map: A data structure where stored values are keyed by their type. Hence, only one value can be stored for a given type.

See typemap, type-map, http::Extensions, and actix_http::Extensions for examples.

 let conf: ConfigBag = aws_config::from_env()
    // Configuration can be common to all smithy clients
    .with(RetryConfig::builder().disable_retries().build())
    // Or, protocol-specific
    .with(HttpClient::builder().build())
    // Or, AWS-specific
    .with(Region::from("us-east-1"))
    // Or, service-specific
    .with(S3Config::builder().force_path_style(false).build())
    .await;

let client = aws_sdk_s3::Client::new(&conf);

client.list_buckets()
    .customize()
    // Configuration can be set on operations as well as clients
    .with(HttpConfig::builder().conn(some_other_conn).build())
    .send()
    .await;

Setting configuration that will not be used wastes memory and can make debugging more difficult. Therefore, configuration defaults are only set when they're relevant. For example, if a smithy service doesn't support HTTP, then no HTTP client will be set.

What is "layered" configuration?

Configuration has precedence. Configuration set on an operation will override configuration set on a client, and configuration set on a client will override default configuration. However, configuration with a higher precedence can also augment configuration with a lower precedence. For example:

let conf: ConfigBag = aws_config::from_env()
    .with(
        SomeConfig::builder()
            .option_a(1)
            .option_b(2)
            .option_c(3)
    )
    .build()
    .await;

let client = aws_sdk_s3::Client::new(&conf);

client.list_buckets()
    .customize()
    .with(
        SomeConfig::builder()
            .option_a(0)
            .option_b(Value::Inherit)
            .option_c(Value::Unset)
    )
    .build()
    .send()
    .await;

In the above example, when the option_a, option_b, option_c, values of SomeConfig are accessed, they'll return:

  • option_a: 0
  • option_b: 2
  • option_c: No value

Config values are wrapped in a special enum called Value with three variants:

  • Value::Set: A set value that will override values from lower layers.
  • Value::Unset: An explicitly unset value that will override values from lower layers.
  • Value::Inherit: An explicitly unset value that will inherit a value from a lower layer.

Builders are defined like this:

struct SomeBuilder {
    value: Value<T>,
}

impl struct SomeBuilder<T> {
    fn new() -> Self {
        // By default, config values inherit from lower-layer configs
        Self { value: Value::Inherit }
    }

    fn some_field(&mut self, value: impl Into<Value<T>>) -> &mut self {
        self.value = value.into();
        self
    }
}

Because of impl Into<Value<T>>, users don't need to reference the Value enum unless they want to "unset" a value.

Layer separation and precedence

Codegen defines default sets of interceptors and runtime plugins at various "levels":

  1. AWS-wide defaults set by codegen.
  2. Service-wide defaults set by codegen.
  3. Operation-specific defaults set by codegen.

Likewise, users may mount their own interceptors and runtime plugins:

  1. The AWS config level, e.g. aws_types::Config.
  2. The service config level, e.g. aws_sdk_s3::Config.
  3. The operation config level, e.g. aws_sdk_s3::Client::get_object.

Configuration is resolved in a fixed manner by reading the "lowest level" of config available, falling back to "higher levels" only when no value has been set. Therefore, at least 3 separate ConfigBags are necessary, and user configuration has precedence over codegen-defined default configuration. With that in mind, resolution of configuration would look like this:

  1. Check user-set operation config.
  2. Check codegen-defined operation config.
  3. Check user-set service config.
  4. Check codegen-defined service config.
  5. Check user-set AWS config.
  6. Check codegen-defined AWS config.

The aws-smithy-orchestrator crate

I've omitted some of the error conversion to shorten this example and make it easier to understand. The real version will be messier.

/// `In`: The input message e.g. `ListObjectsRequest`
/// `Req`: The transport request message e.g. `http::Request<SmithyBody>`
/// `Res`: The transport response message e.g. `http::Response<SmithyBody>`
/// `Out`: The output message. A `Result` containing either:
///     - The 'success' output message e.g. `ListObjectsResponse`
///     - The 'failure' output message e.g. `NoSuchBucketException`
pub async fn invoke<In, Req, Res, T>(
    input: In,
    interceptors: &mut Interceptors<In, Req, Res, Result<T, BoxError>>,
    runtime_plugins: &RuntimePlugins,
    cfg: &mut ConfigBag,
) -> Result<T, BoxError>
    where
        // The input must be Clone in case of retries
        In: Clone + 'static,
        Req: 'static,
        Res: 'static,
        T: 'static,
{
    let mut ctx: InterceptorContext<In, Req, Res, Result<T, BoxError>> =
        InterceptorContext::new(input);

    runtime_plugins.apply_client_configuration(cfg)?;
    interceptors.client_read_before_execution(&ctx, cfg)?;

    runtime_plugins.apply_operation_configuration(cfg)?;
    interceptors.operation_read_before_execution(&ctx, cfg)?;

    interceptors.read_before_serialization(&ctx, cfg)?;
    interceptors.modify_before_serialization(&mut ctx, cfg)?;

    let request_serializer = cfg
        .get::<Box<dyn RequestSerializer<In, Req>>>()
        .ok_or("missing serializer")?;
    let req = request_serializer.serialize_request(ctx.modeled_request_mut(), cfg)?;
    ctx.set_tx_request(req);

    interceptors.read_after_serialization(&ctx, cfg)?;
    interceptors.modify_before_retry_loop(&mut ctx, cfg)?;

    loop {
        make_an_attempt(&mut ctx, cfg, interceptors).await?;
        interceptors.read_after_attempt(&ctx, cfg)?;
        interceptors.modify_before_attempt_completion(&mut ctx, cfg)?;

        let retry_strategy = cfg
            .get::<Box<dyn RetryStrategy<Result<T, BoxError>>>>()
            .ok_or("missing retry strategy")?;
        let mod_res = ctx
            .modeled_response()
            .expect("it's set during 'make_an_attempt'");
        if retry_strategy.should_retry(mod_res, cfg)? {
            continue;
        }

        interceptors.modify_before_completion(&mut ctx, cfg)?;
        let trace_probe = cfg
            .get::<Box<dyn TraceProbe>>()
            .ok_or("missing trace probes")?;
        trace_probe.dispatch_events(cfg);
        interceptors.read_after_execution(&ctx, cfg)?;

        break;
    }

    let (modeled_response, _) = ctx.into_responses()?;
    modeled_response
}

// Making an HTTP request can fail for several reasons, but we still need to
// call lifecycle events when that happens. Therefore, we define this
// `make_an_attempt` function to make error handling simpler.
async fn make_an_attempt<In, Req, Res, T>(
    ctx: &mut InterceptorContext<In, Req, Res, Result<T, BoxError>>,
    cfg: &mut ConfigBag,
    interceptors: &mut Interceptors<In, Req, Res, Result<T, BoxError>>,
) -> Result<(), BoxError>
    where
        In: Clone + 'static,
        Req: 'static,
        Res: 'static,
        T: 'static,
{
    interceptors.read_before_attempt(ctx, cfg)?;

    let tx_req_mut = ctx.tx_request_mut().expect("tx_request has been set");
    let endpoint_orchestrator = cfg
        .get::<Box<dyn EndpointOrchestrator<Req>>>()
        .ok_or("missing endpoint orchestrator")?;
    endpoint_orchestrator.resolve_and_apply_endpoint(tx_req_mut, cfg)?;

    interceptors.modify_before_signing(ctx, cfg)?;
    interceptors.read_before_signing(ctx, cfg)?;

    let tx_req_mut = ctx.tx_request_mut().expect("tx_request has been set");
    let auth_orchestrator = cfg
        .get::<Box<dyn AuthOrchestrator<Req>>>()
        .ok_or("missing auth orchestrator")?;
    auth_orchestrator.auth_request(tx_req_mut, cfg)?;

    interceptors.read_after_signing(ctx, cfg)?;
    interceptors.modify_before_transmit(ctx, cfg)?;
    interceptors.read_before_transmit(ctx, cfg)?;

    // The connection consumes the request but we need to keep a copy of it
    // within the interceptor context, so we clone it here.
    let res = {
        let tx_req = ctx.tx_request_mut().expect("tx_request has been set");
        let connection = cfg
            .get::<Box<dyn Connection<Req, Res>>>()
            .ok_or("missing connector")?;
        connection.call(tx_req, cfg).await?
    };
    ctx.set_tx_response(res);

    interceptors.read_after_transmit(ctx, cfg)?;
    interceptors.modify_before_deserialization(ctx, cfg)?;
    interceptors.read_before_deserialization(ctx, cfg)?;
    let tx_res = ctx.tx_response_mut().expect("tx_response has been set");
    let response_deserializer = cfg
        .get::<Box<dyn ResponseDeserializer<Res, Result<T, BoxError>>>>()
        .ok_or("missing response deserializer")?;
    let res = response_deserializer.deserialize_response(tx_res, cfg)?;
    ctx.set_modeled_response(res);

    interceptors.read_after_deserialization(ctx, cfg)?;

    Ok(())
}

Traits

At various points in the execution of invoke, trait objects are fetched from the ConfigBag. These are preliminary definitions of those traits:

pub trait TraceProbe: Send + Sync + Debug {
    fn dispatch_events(&self, cfg: &ConfigBag) -> BoxFallibleFut<()>;
}

pub trait RequestSerializer<In, TxReq>: Send + Sync + Debug {
    fn serialize_request(&self, req: &mut In, cfg: &ConfigBag) -> Result<TxReq, BoxError>;
}

pub trait ResponseDeserializer<TxRes, Out>: Send + Sync + Debug {
    fn deserialize_response(&self, res: &mut TxRes, cfg: &ConfigBag) -> Result<Out, BoxError>;
}

pub trait Connection<TxReq, TxRes>: Send + Sync + Debug {
    fn call(&self, req: &mut TxReq, cfg: &ConfigBag) -> BoxFallibleFut<TxRes>;
}

pub trait RetryStrategy<Out>: Send + Sync + Debug {
    fn should_retry(&self, res: &Out, cfg: &ConfigBag) -> Result<bool, BoxError>;
}

pub trait AuthOrchestrator<Req>: Send + Sync + Debug {
    fn auth_request(&self, req: &mut Req, cfg: &ConfigBag) -> Result<(), BoxError>;
}

pub trait EndpointOrchestrator<Req>: Send + Sync + Debug {
    fn resolve_and_apply_endpoint(&self, req: &mut Req, cfg: &ConfigBag) -> Result<(), BoxError>;
    fn resolve_auth_schemes(&self) -> Result<Vec<String>, BoxError>;
}

F.A.Q.

  • The orchestrator is a large and complex feature, with many moving parts. How can we ensure that multiple people can contribute in parallel?
    • By defining the entire orchestrator and agreeing on its structure, we can then move on to working on individual runtime plugins and interceptors.
  • What is the precedence of interceptors?
    • The precedence of interceptors is as follows:
      • Interceptors registered via Smithy default plugins.
      • (AWS Services only) Interceptors registered via AWS default plugins.
      • Interceptors registered via service-customization plugins.
      • Interceptors registered via client-level plugins.
      • Interceptors registered via client-level configuration.
      • Interceptors registered via operation-level plugins.
      • Interceptors registered via operation-level configuration.
  • What runtime plugins will be defined in smithy-rs?
    • RetryStrategy: Configures how requests are retried.
    • TraceProbes: Configures locations to which SDK metrics are published.
    • EndpointProviders: Configures which hostname an SDK will call when making a request.
    • HTTPClients: Configures how remote services are called.
    • IdentityProviders: Configures how customers identify themselves to remote services.
    • HTTPAuthSchemes & AuthSchemeResolvers: Configures how customers authenticate themselves to remote services.
    • Checksum Algorithms: Configures how an SDK calculates request and response checksums.

Changes checklist

  • Create a new aws-smithy-runtime crate.
    • Add orchestrator implementation
    • Define the orchestrator/runtime plugin interface traits
      • TraceProbe
      • RequestSerializer<In, TxReq>
      • ResponseDeserializer<TxRes, Out>
      • Connection<TxReq, TxRes>
      • RetryStrategy<Out>
      • AuthOrchestrator<Req>
      • EndpointOrchestrator<Req>
  • Create a new aws-smithy-runtime-api crate.
    • Add ConfigBag module
    • Add retries module
      • Add rate_limiting sub-module
    • Add interceptors module
      • Interceptor trait
      • InterceptorContext impl
    • Add runtime_plugins module
  • Create a new integration test that ensures the orchestrator works.

RFC: Collection Defaults

Status: Implemented

Applies to: client

For a summarized list of proposed changes, see the Changes Checklist section.

This RFC proposes a breaking change to how generated clients automatically provide default values for collections. Currently the SDK generated fields for List generate optional values:

    /// <p> Container for elements related to a particular part.
    pub fn parts(&self) -> Option<&[crate::types::Part]> {
        self.parts.as_deref()
    }

This is almost never what users want and leads to code noise when using collections:

async fn get_builds() {
    let project = codebuild
        .list_builds_for_project()
        .project_name(build_project)
        .send()
        .await?;
    let build_ids = project
        .ids()
        .unwrap_or_default();
    //  ^^^^^^^^^^^^^^^^^^ this is pure noise
}

This RFC proposes unwrapping into default values in our accessor methods.

Terminology

  • Accessor: The Rust SDK defines accessor methods on modeled structures for fields to make them more convenient for users
  • Struct field: The accessors point to concrete fields on the struct itself.

The user experience if this RFC is implemented

In the current version of the SDK, users must call .unwrap_or_default() frequently. Once this RFC is implemented, users will be able to use these accessors directly. In the rare case where users need to distinguish be None and [], we will direct users towards model.<field>.is_some().

async fn get_builds() {
    let project = codebuild
        .list_builds_for_project()
        .project_name(build_project)
        .send()
        .await?;
    let build_ids = project.ids();
    // Goodbye to this line:
    //    .unwrap_or_default();
}

How to actually implement this RFC

In order to implement this feature, we need update the code generate accessors for lists and maps to add .unwrap_or_default(). Because we are returning slices unwrap_or_default() does not produce any additional allocations for empty collection.

Could this be implemented for HashMap?

This works for lists because we are returning a slice (allowing a statically owned &[] to be returned.) If we want to support HashMaps in the future this is possible by using OnceCell to create empty HashMaps for requisite types. This would allow us to return references to those empty maps.

Isn't this handled by the default trait?

No, many existing APIs don't have the default trait.

Changes checklist

Estimated total work: 2 days

  • Update accessor method generation to auto flatten lists
  • Update docs for accessors to guide users to .field.is_some() if they MUST determine if the field was set.

RFC: Eliminating Public http dependencies

Status: Accepted

Applies to: client

For a summarized list of proposed changes, see the Changes Checklist section.

This RFC defines how we plan to refactor the SDK to allow the SDK to consume a 1.0 version of hyper, http-body, and http at a later date. Currently, hyper is 0.14.x and a 1.0 release candidate series is in progress. However, there are open questions that may significantly delay the launch of these three crates. We do not want to tie the 1.0 of the Rust SDK to these crates.

Terminology

  • http-body: A crate (and trait) defining how HTTP bodies work. Notably, the change from 0.* to 1.0 changes http-body to operate on frames instead of having separate methods.
  • http (crate): a low level crate of http primitives (no logic, just requests and responses)
  • ossified dependency: An ossified dependency describes a dependency that, when a new version is released, cannot be utilized without breaking changes. For example, if the mutate_request function on every operation operates on &mut http::Request where http = 0.2, that dependency is "ossified." Compare this to a function that offers the ability to convert something into an http = 0.2 request—since http=1 and http=0.2 are largely equivalent, the existence of this function does not prevent us from using http = 1 in the future. In general terms, functions that operate on references are much more likely to ossify—There is no practical way for someone to mutate an http = 0.2 request if you have an http = 1 request other than a time-consuming clone, and reconversion process.

Why is this important?

Performance: At some point in the Future, hyper = 1, http = 1 and http-body = 1 will be released. It takes ~1-2 microseconds to rebuild an HTTP request. If we assume that hyper = 1 will only operate on http = 1 requests, then if we can't use http = 1 requests internally, our only way of supporting hyper = 1 will be to convert the HTTP request at dispatch time. Besides pinning us to a potentially unsupported version of the HTTP crate, this will prevent us from directly dispatching requests in an efficient manner. With a total overhead of 20µs for the SDK, 1µs is not insignificant. Furthermore, it grows as the number of request headers grow. A benchmark should be run for a realistic HTTP request e.g. one that we send to S3.

Hyper Upgrade: Hyper 1 is significantly more flexible than Hyper 0.14.x, especially WRT to connection management & pooling. If we don't make these changes, the upgrade to Hyper 1.x could be significantly more challenging.

Security Fixes: If we're still on http = 0.* and a vulnerability is identified, we may end up needing to manually contribute the patch. The http crate is not trivial and contains parsing logic and optimized code (including a non-trivial amount of unsafe). See this GitHub issue. Notable is that one issue may be unsound and result in changing the public API.

API Friendliness If we ship with an API that public exposes customers to http = 0.*, we have the API forever. We have to consider that we aren't shipping the Rust SDK for this month or even this year but probably the Rust SDK for the next 5-10 years.

Future CRT Usage If we make this change, we enable a future where we can use the CRT HTTP request type natively without needing a last minute conversion to the CRT HTTP Request type.

struct HttpRequest {
  inner: Inner
}

enum Inner {
  Httpv0(http_0::Request),
  Httpv1(http_1::Request),
  Crt(aws_crt_http::Request)
}

The user experience if this RFC is implemented

Customers are impacted in 3 main locations:

  1. HTTP types in Interceptors
  2. HTTP types in customize(...)
  3. HTTP types in Connectors

In all three of these cases, users would interact with our http wrapper types instead.

In the current version of the SDK, we expose public dependencies on the http crate in several key places:

  1. The sigv4 crate. The sigv4 crate currently operates directly on many types from the http crate. This is unnecessary and actually makes the crate more difficult to use. Although http may be used internally, http will be removed from the public API of this crate.
  2. Interceptor Context: interceptors can mutate the HTTP request through an unshielded interface. This requires creating a wrapper layer around http::Request and updating already written interceptors.
  3. aws-config: http::Response and uri
  4. A long tail of exposed requests and responses in the runtime crates. Many of these crates will be removed post-orchestrator so this can be temporarily delayed.

How to actually implement this RFC

Enabling API evolution

One key mechanism that we SHOULD use for allowing our APIs to evolve in the future is usage of ~ version bounds for the runtime crates after releasing 1.0.

Http Request Wrapper

In order to enable HTTP evolution, we will create a set of wrapper structures around http::Request and http::Response. These will use http = 0 internally. Since the HTTP crate itself is quite small, including private dependencies on both versions of the crate is a workable solution. In general, we will aim for an API that is close to drop-in compatible to the HTTP crate while ensuring that a different crate could be used as the backing storage.

// since it's our type, we can default `SdkBody`
pub struct Request<B = SdkBody> {
    // this uses the http = 0.2 request. In the future, we can make an internal enum to allow storing an http = 1
    http_0: http::Request<B>
}

Conversion to/from http::Request One key property here is that although converting to/from an http::Request can be expensive, this is not ossification of the API. This is because the API can support converting from/to both http = 0 and http = 1 in the future—because it offers mutation of the request via a unified interface, the request would only need to be converted once for dispatch if there was a mismatch (instead of repeatedly). At some point in the future, the http = 0 representation could be deprecated and removed or feature gated.

Challenges

  1. Creating an HTTP API which is forwards compatible, idiomatic and "truthful" without relying on existing types from Hyper—e.g. when adding a header, we need to account for the possibility that a header is invalid.
  2. Allow for future forwards-compatible evolution in the API—A lot of thought went into the http crate API w.r.t method parameters, types, and generics. Although we can aim for a simpler solution in some cases (e.g. accepting &str instead of HeaderName), we need to be careful that we do so while allowing API evolution.

Removing the SigV4 HTTP dependency

The SigV4 crate signs a number of HTTP types directly. We should change it to accept strings, and when appropriate, iterators of strings for headers.

Removing the HTTP dependency from generated clients

Generated clients currently include a public HTTP dependency in customize. This should be changed to accept our HTTP wrapper type instead or be restricted to a subset of operations (e.g. add_header) while forcing users to add an interceptor if they need full control.

Changes checklist

  • Create the http::Request wrapper. Carefully audit for compatibility without breaking changes. 5 Days.
  • Refactor currently written interceptors to use the wrapper: 2 days.
  • Refactor the SigV4 crate to remove the HTTP dependency from the public interface: 2 days.
  • Add / validate support for SdkBody http-body = 1.0rc.2 either in a PR or behind a feature gate. Test this to ensure it works with Hyper. Some previous work here exists: 1 week
  • Remove http::Response and Uri from the public exposed types in aws-config: 1-4 days.
  • Long tail of other usages: 1 week
  • Implement ~ versions for SDK Crate => runtime crate dependencies: 1 week

RFC: The HTTP Wrapper Type

Status: RFC

Applies to: client

For a summarized list of proposed changes, see the Changes Checklist section.

This RFC defines the API of our wrapper types around http::Request and http::Response. For more information about why we are wrapping these types, see RFC 0036: The HTTP Dependency.

Terminology

  • Extensions / "Request Extensions": The http crate Request/Response types include a typed property bag to store additional metadata along with the request.

The user experience if this RFC is implemented

In the current version of the SDK, external customers and internal code interacts directly with the http crate. Once this RFC is implemented, interactions at the public API level will occur with our own http types instead.

Our types aim to be nearly drop-in-compatible for types in the http crate, however:

  1. We will not expose existing HTTP types in public APIs in ways that are ossified.
  2. When possible, we aim to simplify the APIs to make them easier to use.
  3. We will add SDK specific helper functionality when appropriate, e.g. first-level support for applying an endpoint to a request.

How to actually implement this RFC

We will need to add two types, HttpRequest and HttpResponse.

To string or not to String

Our header library restricts header names and values to Strings (UTF-8).

Although the http library is very precise in its representation—it allows for HeaderValues that are both a super and subset of String—a superset because headers support arbitrary binary data but a subset because headers cannot contain control characters like \n.

Although technically allowed, headers containing arbitrary binary data are not widely supported. Generally, Smithy protocols will use base-64 encoding when storing binary data in headers.

Finally, it's nicer for users if they can stay in "string land". Because of this, HttpRequest and Response expose header names and values as strings. Internally, the current design uses HeaderName and HeaderValue, however, there is a gate on construction that enforces that values are valid UTF-8.

This is a one way door because .as_str() would panic in the future if we allow non-string values into headers.

Where should these types live?

These types will be used by all orchestrator functionality, so they will be housed in aws-smithy-runtime-api

What's in and what's out?

At the onset, these types focus on supporting the most ossified usages: &mut modification of HTTP types. They do not support construction of HTTP types, other than impl From<http::Request> and From<http::Response>. We will also make it possible to use http::HeaderName / http::HeaderValue in a zero-cost way.

The AsHeaderComponent trait

All header insertion methods accept impl AsHeaderComponent. This allows us to provide a nice user experience while taking advantage of zero-cost usage of 'static str. We will seal this trait to prevent external usage. We will have separate implementation for:

  • &'static str
  • String
  • http02x::HeaderName

Additional Functionality

Our wrapper type will add the following additional functionality:

  1. Support for self.try_clone()
  2. Support for &mut self.apply_endpoint(...)

Handling failure

There is no stdlib type that cleanly defines what may be placed into headers—String is too broad (even if we restrict to ASCII). This RFC proposes moving fallibility to the APIs:

impl HeadersMut<'_> {
    pub fn try_insert(
        &mut self,
        key: impl AsHeaderComponent,
        value: impl AsHeaderComponent,
    ) -> Result<Option<String>, BoxError> {
        // ...
    }
}

This allows us to offer user-friendly types while still avoiding runtime panics. We also offer insert and append which panic on invalid values.

Request Extensions

There is ongoing work which MAY restrict HTTP extensions to clone types. We will preempt that by:

  1. Preventing Extensions from being present when initially constructing our HTTP request wrapper.
  2. Forbidding non-clone extensions from being inserted into the wrapped request.

This also enables supporting request extensions for different downstream providers by allowing cloning into different extension types.

Proposed Implementation

Proposed Implementation of `request`
{{#include ../../../rust-runtime/aws-smithy-runtime-api/src/client/http/request.rs}}

Future Work

Currently, the only way to construct Request is from a compatible type (e.g. http02x::Request)

Changes checklist

  • Implement initial implementation and test it against the SDK as written
  • Add test suite of HTTP wrapper
  • External design review
  • Update the SigV4 crate to remove http API dependency
  • Update the SDK to use the new type (breaking change)

RFC: User-configurable retry classification

Status: Implemented

Applies to: client

For a summarized list of proposed changes, see the Changes Checklist section.

This RFC defines the user experience and implementation of user-configurable retry classification. Custom retry classifiers enable users to change what responses are retried while still allowing them to rely on defaults set by SDK authors when desired.

Terminology

  • Smithy Service: An HTTP service, whose API is modeled with the Smithy IDL.
  • Smithy Client: An HTTP client generated by smithy-rs from a .smithy model file.
  • AWS SDK: A smithy client that's specifically configured to work with an AWS service.
  • Operation: A modeled interaction with a service, defining the proper input and expected output shapes, as well as important metadata related to request construction. "Sending" an operation implies sending one or more HTTP requests to a Smithy service, and then receiving an output or error in response.
  • Orchestrator: The client code which manages the request/response pipeline. The orchestrator is responsible for:
    • Constructing, serializing, and sending requests.
    • Receiving, deserializing, and (optionally) retrying requests.
    • Running interceptors (not covered in this RFC) and handling errors.
  • Runtime Component: A part of the orchestrator responsible for a specific function. Runtime components are used by the orchestrator itself, may depend on specific configuration, and must not be changed by interceptors. Examples include the endpoint resolver, retry strategy, and request signer.
  • Runtime Plugin: Code responsible for setting and runtime components and related configuration. Runtime plugins defined by codegen are responsible for setting default configuration and altering the behavior of Smithy clients including the AWS SDKs.

How the orchestrator should model retries

A Retry Strategy is the process by which the orchestrator determines when and how to retry failed requests. Only one retry strategy may be set at any given time. During its operation, the retry strategy relies on a series of Retry Classifiers to determine if and how a failed request should be retried. Retry classifiers each have a Retry Classifier Priority so that regardless of whether they are set during config or operation construction, they'll always run in a consistent order.

Classifiers are each run in turn by the retry strategy:

pub fn run_classifiers_on_ctx(
    classifiers: impl Iterator<Item = SharedRetryClassifier>,
    ctx: &InterceptorContext,
) -> RetryAction {
    // By default, don't retry
    let mut result = RetryAction::NoActionIndicated;

    for classifier in classifiers {
        let new_result = classifier.classify_retry(ctx);

        // If the result is `NoActionIndicated`, continue to the next classifier
        // without overriding any previously-set result.
        if new_result == RetryAction::NoActionIndicated {
            continue;
        }

        // Otherwise, set the result to the new result.
        tracing::trace!(
            "Classifier '{}' set the result of classification to '{}'",
            classifier.name(),
            new_result
        );
        result = new_result;

        // If the result is `RetryForbidden`, stop running classifiers.
        if result == RetryAction::RetryForbidden {
            tracing::trace!("retry classification ending early because a `RetryAction::RetryForbidden` was emitted",);
            break;
        }
    }

    result
}

NOTE: User-defined retry strategies are responsible for calling run_classifiers_on_ctx.

Lower-priority classifiers run first, but the retry actions they return may be overridden by higher-priority classifiers. Classification stops immediately if any classifier returns RetryAction::RetryForbidden.

The user experience if this RFC is implemented

In the current version of the SDK, users are unable to configure retry classification, except by defining a custom retry strategy. Once this RFC is implemented, users will be able to define and set their own classifiers.

Defining a custom classifier

#[derive(Debug)]
struct CustomRetryClassifier;

impl ClassifyRetry for CustomRetryClassifier {
    fn classify_retry(
        &self,
        ctx: &InterceptorContext,
    ) -> Option<RetryAction> {
        // Check for a result
        let output_or_error = ctx.output_or_error();
        // Check for an error
        let error = match output_or_error {
            // Typically, when the response is OK or unset
            // then `RetryAction::NoActionIndicated` is returned.
            Some(Ok(_)) | None => return RetryAction::NoActionIndicated,
            Some(Err(err)) => err,
        };

        todo!("inspect the error to determine if a retry attempt should be made.")
    }

    fn name(&self) -> &'static str { "my custom retry classifier" }

    fn priority(&self) -> RetryClassifierPriority {
        RetryClassifierPriority::default()
    }
}

Choosing a retry classifier priority

Sticking with the default priority is often the best choice. Classifiers should restrict the number of cases they can handle in order to avoid having to compete with other classifiers. When two classifiers would classify a response in two different ways, the priority system gives us the ability to decide which classifier should be respected.

Internally, priority is implemented with a simple numeric system. In order to give the smithy-rs team the flexibility to make future changes, this numeric system is private and inaccessible to users. Instead, users may set the priority of classifiers relative to one another with the with_lower_priority_than and with_higher_priority_than methods:

impl RetryClassifierPriority {
    /// Create a new `RetryClassifierPriority` with lower priority than the given priority.
    pub fn with_lower_priority_than(other: Self) -> Self { ... }

    /// Create a new `RetryClassifierPriority` with higher priority than the given priority.
    pub fn with_higher_priority_than(other: Self) -> Self { ... }
}

For example, if it was important for our CustomRetryClassifier in the previous example to run before the default HttpStatusCodeClassifier, a user would define the CustomRetryClassifier priority like this:

impl ClassifyRetry for CustomRetryClassifier {
    fn priority(&self) -> RetryClassifierPriority {
        RetryClassifierPriority::run_before(RetryClassifierPriority::http_status_code_classifier())
    }
}

The priorities of the three default retry classifiers (HttpStatusCodeClassifier, ModeledAsRetryableClassifier, and TransientErrorClassifier) are all public for this purpose. Users may ONLY set a retry priority relative to an existing retry priority.

RetryAction and RetryReason

Retry classifiers communicate to the retry strategy by emitting RetryActions:

/// The result of running a [`ClassifyRetry`] on a [`InterceptorContext`].
#[non_exhaustive]
#[derive(Clone, Eq, PartialEq, Debug, Default)]
pub enum RetryAction {
    /// When a classifier can't run or has no opinion, this action is returned.
    ///
    /// For example, if a classifier requires a parsed response and response parsing failed,
    /// this action is returned. If all classifiers return this action, no retry should be
    /// attempted.
    #[default]
    NoActionIndicated,
    /// When a classifier runs and thinks a response should be retried, this action is returned.
    RetryIndicated(RetryReason),
    /// When a classifier runs and decides a response must not be retried, this action is returned.
    ///
    /// This action stops retry classification immediately, skipping any following classifiers.
    RetryForbidden,
}

When a retry is indicated by a classifier, the action will contain a RetryReason:

/// The reason for a retry.
#[non_exhaustive]
#[derive(Clone, Eq, PartialEq, Debug)]
pub enum RetryReason {
    /// When an error is received that should be retried, this reason is returned.
    RetryableError {
        /// The kind of error.
        kind: ErrorKind,
        /// A server may tell us to retry only after a specific time has elapsed.
        retry_after: Option<Duration>,
    },
}

NOTE: RetryReason currently only has a single variant, but it's defined as an enum for forward compatibility purposes.

RetryAction's impl defines several convenience methods:

impl RetryAction {
    /// Create a new `RetryAction` indicating that a retry is necessary.
    pub fn retryable_error(kind: ErrorKind) -> Self {
        Self::RetryIndicated(RetryReason::RetryableError {
            kind,
            retry_after: None,
        })
    }

    /// Create a new `RetryAction` indicating that a retry is necessary after an explicit delay.
    pub fn retryable_error_with_explicit_delay(kind: ErrorKind, retry_after: Duration) -> Self {
        Self::RetryIndicated(RetryReason::RetryableError {
            kind,
            retry_after: Some(retry_after),
        })
    }

    /// Create a new `RetryAction` indicating that a retry is necessary because of a transient error.
    pub fn transient_error() -> Self {
        Self::retryable_error(ErrorKind::TransientError)
    }

    /// Create a new `RetryAction` indicating that a retry is necessary because of a throttling error.
    pub fn throttling_error() -> Self {
        Self::retryable_error(ErrorKind::ThrottlingError)
    }

    /// Create a new `RetryAction` indicating that a retry is necessary because of a server error.
    pub fn server_error() -> Self {
        Self::retryable_error(ErrorKind::ServerError)
    }

    /// Create a new `RetryAction` indicating that a retry is necessary because of a client error.
    pub fn client_error() -> Self {
        Self::retryable_error(ErrorKind::ClientError)
    }
}

Setting classifiers

The interface for setting classifiers is very similar to the interface of settings interceptors:

// All service configs support these setters. Operations support a nearly identical API.
impl ServiceConfigBuilder {
    /// Add type implementing ClassifyRetry that will be used by the RetryStrategy
    /// to determine what responses should be retried.
    ///
    /// A retry classifier configured by this method will run according to its priority.
    pub fn retry_classifier(mut self, retry_classifier: impl ClassifyRetry + 'static) -> Self {
        self.push_retry_classifier(SharedRetryClassifier::new(retry_classifier));
        self
    }

    /// Add a SharedRetryClassifier that will be used by the RetryStrategy to
    /// determine what responses should be retried.
    ///
    /// A retry classifier configured by this method will run according to its priority.
    pub fn push_retry_classifier(&mut self, retry_classifier: SharedRetryClassifier) -> &mut Self {
        self.runtime_components.push_retry_classifier(retry_classifier);
        self
    }

    /// Set SharedRetryClassifiers for the builder, replacing any that were
    /// previously set.
    pub fn set_retry_classifiers(&mut self, retry_classifiers: impl IntoIterator<Item = SharedRetryClassifier>) -> &mut Self {
        self.runtime_components.set_retry_classifiers(retry_classifiers.into_iter());
        self
    }
}

Default classifiers

Smithy clients have three classifiers enabled by default:

  • ModeledAsRetryableClassifier: Checks for errors that are marked as retryable in the smithy model. If one is encountered, returns RetryAction::RetryIndicated. Requires a parsed response.
  • TransientErrorClassifier: Checks for timeout, IO, and connector errors. If one is encountered, returns RetryAction::RetryIndicated. Requires a parsed response.
  • HttpStatusCodeClassifier: Checks the HTTP response's status code. By default, this classifies 500, 502, 503, and 504 errors as RetryAction::RetryIndicated. The list of retryable status codes may be customized when creating this classifier with the HttpStatusCodeClassifier::new_from_codes method.

AWS clients enable the three smithy classifiers as well as one more by default:

  • AwsErrorCodeClassifier: Checks for errors with AWS error codes marking them as either transient or throttling errors. If one is encountered, returns RetryAction::RetryIndicated. Requires a parsed response. This classifier will also check the HTTP response for an x-amz-retry-after header. If one is set, then the returned RetryAction will include the explicit delay.

The priority order of these classifiers is as follows:

  1. (highest priority) TransientErrorClassifier
  2. ModeledAsRetryableClassifier
  3. AwsErrorCodeClassifier
  4. (lowest priority) HttpStatusCodeClassifier

The priority order of the default classifiers is not configurable. However, it's possible to wrap a default classifier in a newtype and set your desired priority when implementing the ClassifyRetry trait, delegating the classify_retry and name fields to the inner classifier.

Disable default classifiers

Disabling the default classifiers is possible, but not easy. They are set at different points during config and operation construction, and must be unset at each of those places. A far simpler solution is to implement your own classifier that has the highest priority.

Still, if completely removing the other classifiers is desired, use the set_retry_classifiers method on the config to replace the config-level defaults and then set a config override on the operation that does the same.

How to actually implement this RFC

In order to implement this feature, we must:

  • Update the current retry classification system so that individual classifiers as well as collections of classifiers can be easily composed together.
  • Create two new configuration mechanisms for users that allow them to customize retry classification at the service level and at the operation level.
  • Update retry classifiers so that they may 'short-circuit' the chain, ending retry classification immediately.

The RetryClassifier trait

/// The result of running a [`ClassifyRetry`] on a [`InterceptorContext`].
#[non_exhaustive]
#[derive(Clone, Eq, PartialEq, Debug)]
pub enum RetryAction {
    /// When an error is received that should be retried, this action is returned.
    Retry(ErrorKind),
    /// When the server tells us to retry after a specific time has elapsed, this action is returned.
    RetryAfter(Duration),
    /// When a response should not be retried, this action is returned.
    NoRetry,
}

/// Classifies what kind of retry is needed for a given [`InterceptorContext`].
pub trait ClassifyRetry: Send + Sync + fmt::Debug {
    /// Run this classifier on the [`InterceptorContext`] to determine if the previous request
    /// should be retried. If the classifier makes a decision, `Some(RetryAction)` is returned.
    /// Classifiers may also return `None`, signifying that they have no opinion of whether or
    /// not a request should be retried.
    fn classify_retry(
        &self,
        ctx: &InterceptorContext,
        preceding_action: Option<RetryAction>,
    ) -> Option<RetryAction>;

    /// The name of this retry classifier.
    ///
    /// Used for debugging purposes.
    fn name(&self) -> &'static str;

    /// The priority of this retry classifier. Classifiers with a higher priority will run before
    /// classifiers with a lower priority. Classifiers with equal priorities make no guarantees
    /// about which will run first.
    fn priority(&self) -> RetryClassifierPriority {
        RetryClassifierPriority::default()
    }
}

Resolving the correct order of multiple retry classifiers

Because each classifier has a defined priority, and because RetryClassifierPriority implements PartialOrd and Ord, the standard library's sort method may be used to correctly arrange classifiers. The RuntimeComponents struct is responsible for storing classifiers, so it's also responsible for sorting them whenever a new classifier is added. Thus, when a retry strategy fetches the list of classifiers, they'll already be in the expected order.

Questions and answers

  • Q: Should retry classifiers be fallible?
    • A: I think no, because of the added complexity. If we make them fallible then we'll have to decide what happens when classifiers fail. Do we skip them or does classification end? The retry strategy is responsible for calling the classifiers, so it be responsible for deciding how to handle a classifier error. I don't foresee a use case where an error returned by a classifier would be interpreted either by classifiers following the failed classifier or the retry strategy.

Changes checklist

  • Add retry classifiers field and setters to RuntimeComponents and RuntimeComponentsBuilder.
    • Add unit tests ensuring that classifier priority is respected by RuntimeComponents::retry_classifiers, especially when multiple layers of config are in play.
  • Add codegen customization allowing users to set retry classifiers on service configs.
  • Add codegen for setting default classifiers at the service level.
    • Add integration tests for setting classifiers at the service level.
  • Add codegen for settings default classifiers that require knowledge of operation error types at the operation level.
    • Add integration tests for setting classifiers at the operation level.
  • Implement retry classifier priority.
    • Add unit tests for retry classifier priority.
  • Update existing tests that would fail for lack of a retry classifier.

RFC: Forward Compatible Errors

Status: RFC

Applies to: client

For a summarized list of proposed changes, see the Changes Checklist section.

This RFC defines an approach for making it forwards-compatible to convert unmodeled Unhandled errors into modeled ones. This occurs as servers update their models to include errors that were previously unmodeled.

Currently, SDK errors are not forward compatible in this way. If a customer matches Unhandled in addition to the _ branch and a new variant is added, they will fail to match the new variant. We currently handle this issue with enums by prevent useful information from being readable from the Unknown variant.

This is related to ongoing work on the non_exhaustive_omitted_patterns lint which would produce a compiler warning when a new variant was added even when _ was used.

Terminology

For purposes of discussion, consider the following error:

#[non_exhaustive]
pub enum AbortMultipartUploadError {
    NoSuchUpload(NoSuchUpload),
    Unhandled(Unhandled),
}
  • Modeled Error: An error with an named variant, e.g. NoSuchUpload above
  • Unmodeled Error: Any other error, e.g. if the server returned ValidationException for the above operation.
  • Error code: All errors across all protocols provide a code, a unique method to identify an error across the service closure.

The user experience if this RFC is implemented

In the current version of the SDK, users match the Unhandled variant. They can then read the code from the Unhandled variant because Unhandled implements the ProvideErrorMetadata trait as well as the standard-library std::error::Error trait.

Note: It's possible to write correct code today because the operation-level and service-level errors already expose code() via ProvideErrorMetadata. This RFC describes mechanisms to guide customers to write forward-compatible code.

fn docs() {
    match client.get_object().send().await {
        Ok(obj) => { ... },
        Err(e) => match e.into_service_error() {
            GetObjectError::NotFound => { ... },
            GetObjectError::Unhandled(err) if err.code() == "ValidationException" => { ... }
            other => { /** do something with this variant */ }
        }
    }
}

We must instead guide customers into the following pattern:

fn docs() {
    match client.get_object().send().await {
        Ok(obj) => { ... },
        Err(e) => match e.into_service_error() {
            GetObjectError::NotFound => { ... },
            err if err.code() == "ValidationException" => { ... },
            err => warn!("{}", err.code()),
        }
    }
}

In this example, because customers are not matching on the Unhandled variant explicitly this code is forward compatible for ValidationException being introduced in the future.

Guiding Customers to this Pattern There are two areas we need to handle:

  1. Prevent customers from extracting useful information from Unhandled
  2. Alert customers currently using unhandled what to use instead. For example, the following code is still problematic:
        match err {
            GetObjectError::NotFound => { ... },
            err @ GetObject::Unhandled(_) if err.code() == Some("ValidationException") => { ... }
        }

For 1, we need to remove the ProvideErrorMetadata trait implementation from Unhandled. We would expose this isntead through a layer of indirection to enable code generated to code to still read the data.

For 2, we would deprecate the Unhandled variants with a message clearly indicating how this code should be written.

How to actually implement this RFC

Locking down Unhandled

In order to prevent accidental matching on Unhandled, we need to make it hard to extract useful information from Unhandled itself. We will do this by removing the ProvideErrorMetadata trait implementation and exposing the following method:

#[doc(hidden)]
/// Introspect the error metadata of this error.
///
/// This method should NOT be used from external code because matching on `Unhandled` directly is a backwards-compatibility
/// hazard. See `RFC-0039` for more information.
pub fn introspect(&self) -> impl ProvideErrorMetadata + '_ {
   struct Introspected<'a>(&'a Unhandled);
   impl ProvideErrorMetadata for Introspected { ... }
   Introspected(self)
}

Generated code would this use introspect when supporting top-level ErrorMetadata (e.g. for aws_sdk_s3::Error).

Deprecating the Variant

The Unhandled variant will be deprecated to prevent users from matching on it inadvertently.

enum GetObjectError {
   NotFound(NotFound),
   #[deprecated("Matching on `Unhandled` directly is a backwards compatibility hazard. Use `err if err.error_code() == ...` instead. See [here](<docs about using errors>) for more information.")]
   Unhandled(Unhandled)
}

Changes checklist

  • Generate code to deprecate unhandled variants. Determine the best way to allow Unhandled to continue to be constructed in client code
  • Generate code to deprecate the Unhandled variant for the service meta-error. Consider how this interacts with non-service errors.
  • Update Unhandled to make it useless on its own and expose information via an Introspect doc hidden struct.
  • Update developer guide to address this issue.
  • Changelog & Upgrade Guidance

RFC: Behavior Versions

Status: RFC

Applies to: client

For a summarized list of proposed changes, see the Changes Checklist section.

This RFC describes "Behavior Versions," a mechanism to allow SDKs to ship breaking behavioral changes like a new retry strategy, while allowing customers who rely on extremely consistent behavior to evolve at their own pace.

By adding behavior major versions (BMV) to the Rust SDK, we will make it possible to ship new secure/recommended defaults to new customers without impacting legacy customers.

The fundamental issue stems around our inability to communicate and decouple releases of service updates and behavior within a single major version.

Both legacy and new SDKs have the need to alter their SDKs default. Historically, this caused new customers on legacy SDKs to be subject to legacy defaults, even when a better alternative existed.

For new SDKs, a GA cutline presents difficult choices around timeline and features that can’t be added later without altering behavior.

Both of these use cases are addressed by Behavior Versions.

The user experience if this RFC is implemented

In the current version of the SDK, users can construct clients without indicating any sort of behavior major version. Once this RFC is implemented, there will be two ways to set a behavior major version:

  1. In code via aws_config::defaults(BehaviorVersion::latest()) and <service>::Config::builder().behavior_version(...). This will also work for config_override.
  2. By enabling behavior-version-latest in either aws-config (which brings back from_env) OR a specific generated SDK crate
# Cargo.toml
[dependencies]
aws-config = { version = "1", features = ["behavior-version-latest"] }
# OR
aws-sdk-s3 = { version = "1", features = ["behavior-version-latest"] }

If no BehaviorVersion is set, the client will panic during construction.

BehaviorVersion is an opaque struct with initializers like ::latest(), ::v2023_11_09(). Downstream code can check the version by calling methods like ::supports_v1()

When new BMV are added, the previous version constructor will be marked as deprecated. This serves as a mechanism to alert customers that a new BMV exists to allow them to upgrade.

How to actually implement this RFC

In order to implement this feature, we need to create a BehaviorVersion struct, add config options to SdkConfig and aws-config, and wire it throughout the stack.

#![allow(unused)]
fn main() {
/// Behavior major-version of the client
///
/// Over time, new best-practice behaviors are introduced. However, these behaviors might not be backwards
/// compatible. For example, a change which introduces new default timeouts or a new retry-mode for
/// all operations might be the ideal behavior but could break existing applications.
#[derive(Debug, Clone)]
pub struct BehaviorVersion {
    // currently there is only 1 MV so we don't actually need anything in here.
    _private: (),
}
}

To help customers migrate, we are including from_env hooks that set behavior-version-latest that are deprecated. This allows customers to see that they are missing the required cargo feature and add it to remove the deprecation warning.

Internally, BehaviorVersion will become an additional field on <client>::Config. It is not ever stored in the ConfigBag or in RuntimePlugins.

When constructing the set of "default runtime plugins," the default runtime plugin parameters will be passed the BehaviorVersion. This will select the correct runtime plugin. Logging will clearly indicate which plugin was selected.

Design Alternatives Considered

An original design was also considered that made BMV optional and relied on documentation to steer customers in the right direction. This was deemed too weak of a mechanism to ensure that customers aren't broken by unexpected changes.

Changes checklist

  • Create BehaviorVersion and the BMV runtime plugin
  • Add BMV as a required runtime component
  • Wire up setters throughout the stack
  • Add tests of BMV (set via aws-config, cargo features & code params)
  • Remove aws_config::from_env deprecation stand-ins We decided to persist these deprecations
  • Update generated usage examples

RFC: Improve Client Error Ergonomics

Status: Implemented

Applies to: clients

This RFC proposes some changes to code generated errors to make them easier to use for customers. With the SDK and code generated clients, customers have two primary use-cases that should be made easy without compromising the compatibility rules established in RFC-0022:

  1. Checking the error type
  2. Retrieving information specific to that error type

Case Study: Handling an error in S3

The following is an example of handling errors with S3 with the latest generated (and unreleased) SDK as of 2022-12-07:

let result = client
    .get_object()
    .bucket(BUCKET_NAME)
    .key("some-key")
    .send()
    .await;
    match result {
        Ok(_output) => { /* Do something with the output */ }
        Err(err) => match err.into_service_error() {
            GetObjectError { kind, .. } => match kind {
                GetObjectErrorKind::InvalidObjectState(value) => println!("invalid object state: {:?}", value),
                GetObjectErrorKind::NoSuchKey(_) => println!("object didn't exist"),
            }
            err @ GetObjectError { .. } if err.code() == Some("SomeUnmodeledError") => {}
            err @ _ => return Err(err.into()),
        },
    }

The refactor that implemented RFC-0022 added the into_service_error() method on SdkError that infallibly converts the SdkError into the concrete error type held by the SdkError::ServiceError variant. This improvement lets customers discard transient failures and immediately handle modeled errors returned by the service.

Despite this, the code is still quite verbose.

Proposal: Combine Error and ErrorKind

At time of writing, each operation has both an Error and ErrorKind type generated. The Error type holds information that is common across all operation errors: message, error code, "extra" key/value pairs, and the request ID.

The ErrorKind is always nested inside the Error, which results in the verbose nested matching shown in the case study above.

To make error handling more ergonomic, the code generated Error and ErrorKind types should be combined. Hypothetically, this would allow for the case study above to look as follows:

let result = client
    .get_object()
    .bucket(BUCKET_NAME)
    .key("some-key")
    .send()
    .await;
match result {
    Ok(_output) => { /* Do something with the output */ }
    Err(err) => match err.into_service_error() {
        GetObjectError::InvalidObjectState(value) => {
            println!("invalid object state: {:?}", value);
        }
        err if err.is_no_such_key() => {
            println!("object didn't exist");
        }
        err if err.code() == Some("SomeUnmodeledError") => {}
        err @ _ => return Err(err.into()),
    },
}

If a customer only cares about checking one specific error type, they can also do:

match result {
    Ok(_output) => { /* Do something with the output */ }
    Err(err) => {
        let err = err.into_service_error();
        if err.is_no_such_key() {
            println!("object didn't exist");
        } else {
            return Err(err);
        }
    }
}

The downside of this is that combining the error types requires adding the general error metadata to each generated error struct so that it's accessible by the enum error type. However, this aligns with our tenet of making things easier for customers even if it makes it harder for ourselves.

Changes Checklist

  • Merge the ${operation}Error/${operation}ErrorKind code generators to only generate an ${operation}Error enum:
    • Preserve the is_${variant} methods
    • Preserve error metadata by adding it to each individual variant's context struct
  • Write upgrade guidance
  • Fix examples

RFC: File-per-change changelog

Status: Implemented

Applies to: client and server

For a summarized list of proposed changes, see the Changes Checklist section.

Historically, the smithy-rs and AWS SDK for Rust's changelogs and release notes have been generated from the changelogger tool in tools/ci-build/changelogger. This is a tool built specifically for development and release of smithy-rs, and it requires developers to add changelog entries to a root CHANGELOG.next.toml file. Upon release, the [[smithy-rs]] entries in this file go into the smithy-rs release notes, and the [[aws-sdk-rust]] entries are associated with a smithy-rs release commit hash, and added to the aws/SDK_CHANGELOG.next.json for incorporation into the AWS SDK's changelog when it releases.

This system has gotten us far, but it has always made merging PRs into main more difficult since the central CHANGELOG.next.toml file is almost always a merge conflict for two PRs with changelog entries.

This RFC proposes a new approach to change logging that will remedy the merge conflict issue, and explains how this can be done without disrupting the current release process.

The proposed developer experience

There will be a changelog/ directory in the smithy-rs root where developers can add changelog entry Markdown files. Any file name can be picked for these entries. Suggestions are the development branch name for the change, or the PR number.

The changelog entry format will change to make it easier to duplicate entries across both smithy-rs and aws-sdk-rust, a common use-case.

This new format will make use of Markdown front matter in the YAML format. This change in format has a couple benefits:

  • It's easier to write change entries in Markdown than in a TOML string.
  • There's no way to escape special characters (such as quotes) in a TOML string, so the text that can be a part of the message will be expanded.

While it would be preferable to use TOML for the front matter (and there are libraries that support that), it will use YAML so that GitHub's Markdown renderer will recognize it.

A changelog entry file will look as follows:

---
# Adding `aws-sdk-rust` here duplicates this entry into the SDK changelog.
applies_to: ["client", "server", "aws-sdk-rust"]
authors: ["author1", "author2"]
references: ["smithy-rs#1234", "aws-sdk-rust#1234"]
# The previous `meta` section is broken up into its constituents:
breaking: false
# This replaces "tada":
new_feature: false
bug_fix: false
---

Some message for the change.

Implementation

When a release is performed, the release script will generate the release notes, update the CHANGELOG.md file, copy SDK changelog entries into the SDK, and delete all the files in changelog/.

SDK Entries

The SDK changelog entries currently end up in aws/SDK_CHANGELOG.next.json, and each entry is given age and since_commit entries. The age is a number that starts at zero, and gets incremented with every smithy-rs release. When it reaches a hardcoded threshold, that entry is removed from aws/SDK_CHANGELOG.next.json. The SDK release process uses the since_commit to determine which changelog entries go into the next SDK release's changelog.

The SDK release process doesn't write back to smithy-rs, and history has shown that it can't since this leads to all sorts of release issues as PRs get merged into smithy-rs while the release is in progress. Thus, this age/since_commit dichotomy needs to stay in place.

The aws/SDK_CHANGELOG.next.json will stay in place in its current format without changes. Its JSON format is capable of escaping characters in the message string, so it will be compatible with the transition from TOML to Markdown with YAML front matter.

The SDK_CHANGELOG.next.json file has had merge conflicts in the past, but this only happened when the release process wasn't followed correctly. If we're consistent with our release process, it should never have conflicts.

Safety requirements

Implementation will be tricky since it needs to be done without disrupting the existing release process. The biggest area of risk is the SDK sync job that generates individual commits in the aws-sdk-rust repo for each commit in the smithy-rs release. Fortunately, the changelogger is invoked a single time at the very end of that process, and only the latest changelogger version that is included in the build image. Thus, we can safely refactor the changelogger tool so long as the command-line interface for it remains backwards compatible. (We could change the CLI interface as well, but it will require synchronizing the smithy-rs changes with changes to the SDK release scripts.)

At a high level, these requirements must be observed to do this refactor safely:

  • The CLI for the changelogger render subcommand MUST stay the same, or have minimal backwards compatible changes made to it.
  • The SDK_CHANGELOG.next.json format can change, but MUST remain a single JSON file. If it is changed at all, the existing file MUST be transitioned to the new format, and a mechanism MUST be in place for making sure it is the correct format after merging with other PRs. It's probably better to leave this file alone though, or make any changes to it backwards compatible.

Future Improvements

After the initial migration, additional niceties could be added such as pulling authors from git history rather than needing to explicitly state them (at least by default; there should always be an option to override the author in case a maintainer adds a changelog entry on behalf of a contributor).

Changes checklist

  • Refactor changelogger and smithy-rs-tool-common to separate the changelog serialization format from the internal representation used for rendering and splitting.
  • Implement deserialization for the new Markdown entry format
  • Incorporate new format into the changelogger render subcommand
  • Incorporate new format into the changelogger split subcommand
  • Port existing CHANGELOG.next.toml to individual entries
  • Update sdk-lints to fail if CHANGELOG.next.toml exists at all to avoid losing changelog entries during merges.
  • Dry-run test against the smithy-rs release process.
  • Dry-run test against the SDK release process.

RFC: Identity Cache Partitions

Status: Implemented

Applies to: AWS SDK for Rust

Motivation

In the below example two clients are created from the same shared SdkConfig instance and each invoke a fictitious operation. Assume the operations use the same auth scheme relying on the same identity resolver.

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {

    let config = aws_config::defaults(BehaviorVersion::latest())
        .load()
        .await;

    let c1 = aws_sdk_foo::Client::new(&config);
    c1.foo_operation().send().await;

    let c2 = aws_sdk_bar::Client::new(&config);
    c2.bar_operation().send().await;

    Ok(())
}

There are two problems with this currently.

  1. The identity resolvers (e.g. credentials_provider for SigV4) are re-used but we end up with a different IdentityCachePartition each time a client is created.

    • More specifically this happens every time a SharedIdentityResolver is created. The conversion from From<SdkConfig> sets the credentials provider which associates it as the identity resolver for the auth scheme. Internally this is converted to SharedIdentityResolver which creates the new partition (if it were already a SharedIdentityResolver this would be detected and a new instance would not be created which means it must be a SharedCredentialsProvider or SharedTokenProvider that is getting converted). The end result is the credentials provider from shared config is re-used but the cache partition differs so a cache miss occurs the first time any new client created from that shared config needs credentials.
  2. The SdkConfig does not create an identity cache by default. Even if the partitioning is fixed, any clients created from a shared config instance will end up with their own identity cache which also results in having to resolve identity again. Only if a user supplies an identity cache explicitly when creating shared config would it be re-used across different clients.

Design intent

Identity providers and identity caching are intentionally decoupled. This allows caching behavior to be more easily customized and centrally configured while also removing the need for each identity provider to have to implement caching. There is some fallout from sharing an identity cache though. This is fairly well documented on IdentityCachePartition itself.

/// ...
///
/// Identities need cache partitioning because a single identity cache is used across
/// multiple identity providers across multiple auth schemes. In addition, a single auth scheme
/// may have many different identity providers due to operation-level config overrides.
///
/// ...
pub struct IdentityCachePartition(...)

Cache partitioning allows for different identity types to be stored in the same cache instance as long as they are assigned to a different partition. Partitioning also solves the issue of overriding configuration on a per operation basis where it would not be the correct or desired behavior to re-use or overwrite the cache if a different resolver is used.

In other words cache partitioning is effectively tied to a particular instance of an identity resolver. Re-using the same instance of a resolver SHOULD be allowed to share a cache partition. The fact that this isn't the case today is an oversight in how types are wrapped and threaded through the SDK.

The user experience if this RFC is implemented

In the current version of the SDK, users are unable to share cached results of identity resolvers via shared SdkConfig across clients.

Once this RFC is implemented, users that create clients via SdkConfig with the latest behavior version will share a default identity cache. Shared identity resolvers (e.g. credentials_provider, token_provider, etc) will provide their own cache partition that is re-used instead of creating a new one each time a provider is converted into a SharedIdentityResolver.

Default behavior

let config = aws_config::defaults(BehaviorVersion::latest())
    .load()
    .await;

let c1 = aws_sdk_foo::Client::new(&config);
c1.foo_operation().send().await;


let c2 = aws_sdk_bar::Client::new(&config);
// will re-use credentials/identity resolved via c1
c2.bar_operation().send().await;

Operations invoked on c2 will see the results of cached identities resolved by client c1 (for operations that use the same identity resolvers). The creation of a default identity cache in SdkConfig if not provided will be added behind a new behavior version.

Opting out

Users can disable the shared identity cache by explicitly setting it to None. This will result in each client creating their own identity cache.

let config = aws_config::defaults(BehaviorVersion::latest())
    // new method similar to `no_credentials()` to disable default cache setup
    .no_identity_cache()
    .load()
    .await;

let c1 = aws_sdk_foo::Client::new(&config);
c1.foo_operation().send().await;


let c2 = aws_sdk_bar::Client::new(&config);
c2.bar_operation().send().await;

The same can be achieved by explicitly supplying a new identity cache to a client:


let config = aws_config::defaults(BehaviorVersion::latest())
    .load()
    .await;

let c1 = aws_sdk_foo::Client::new(&config);
c1.foo_operation().send().await;

let modified_config = aws_sdk_bar::Config::from(&config)
    .to_builder()
    .identity_cache(IdentityCache::lazy().build())
    .build();

// uses it's own identity cache
let c2 = aws_sdk_bar::Client::from_conf(modified_config);
c2.bar_operation().send().await;

Interaction with operation config override

How per/operation configuration override behaves depends on what is provided for an identity resolver.

let config = aws_config::defaults(BehaviorVersion::latest())
    .load()
    .await;

let c1 = aws_sdk_foo::Client::new(&config);

let scoped_creds = my_custom_provider();
let config_override = c1
        .config()
        .to_builder()
        .credentials_provider(scoped_creds);

// override config for two specific operations

c1.operation1()
    .customize()
    .config_override(config_override);
    .send()
    .await;

c1.operation2()
    .customize()
    .config_override(config_override);
    .send()
    .await;

By default if an identity resolver does not provide it's own cache partition then operation1 and operation2 will be wrapped in new SharedIdentityResolver instances and get distinct cache partitions. If my_custom_provider() provides it's own cache partition then operation2 will see the cached results.

Users can control this by wrapping their provider into a SharedCredentialsProvider which will claim it's own cache partition.


let scoped_creds = SharedCredentialsProvider::new(my_custom_provider());
let config_override = c1
        .config()
        .to_builder()
        .set_credentials_provider(Some(scoped_creds));
...

How to actually implement this RFC

In order to implement this RFC implementations of ResolveIdentity need to be allowed to provide their own cache partition.

pub trait ResolveIdentity: Send + Sync + Debug {
    ...

    /// Returns the identity cache partition associated with this identity resolver.
    ///
    /// By default this returns `None` and cache partitioning is left up to `SharedIdentityResolver`.
    /// If sharing instances of this type should use the same partition then you should override this
    /// method and return a claimed partition.
    fn cache_partition(&self) -> Option<IdentityCachePartition> {
        None
    }

}

Crucially cache partitions must remain globally unique so this method returns IdentityCachePartition which is unique by construction. It doesn't matter if partitions are claimed early by an implementation of ResolveIdentity or at the time they are wrapped in SharedIdentityResolver.

This is because SdkConfig stores instances of SharedCredentialsProvider (or SharedTokenProvider) rather than SharedIdentityResolver which is what currently knows about cache partitioning. By allowing implementations of ResolveIdentity to provide their own partition then SharedCredentialsProvider can claim a partition at construction time and return that which will re-use the same partition anywhere that the provider is shared.

#[derive(Clone, Debug)]
pub struct SharedCredentialsProvider(Arc<dyn ProvideCredentials>, IdentityCachePartition);

impl SharedCredentialsProvider {
    pub fn new(provider: impl ProvideCredentials + 'static) -> Self {
        Self(Arc::new(provider), IdentityCachePartition::new())
    }
}

impl ResolveIdentity for SharedCredentialsProvider {
    ...

    fn cache_partition(&self) -> Option<IdentityCachePartition> {
        Some(self.1)
    }
}

Additionally a new behavior version must be introduced that conditionally creates a default IdentityCache on SdkConfig if not explicitly configured (similar to how credentials provider works internally).

Alternatives Considered

SdkConfig internally stores SharedCredentialsProvider/SharedTokenProvider. Neither of these types knows anything about cache partitioning. One alternative would be to create and store a SharedIdentityResolver for each identity resolver type.

pub struct SdkConfig {
    ...
    credentials_provider: Option<SharedCredentialsProvider>,
    credentials_identity_provider: Option<SharedIdentityResolver>,
    token_provider: Option<SharedTokenProvider>,
    token_identity_provider: Option<SharedIdentityResolver>,
}

Setting one of the identity resolver types like credentials_provider would also create and set the equivalent SharedIdentityResolver which would claim a cache partition. When generating the From<SdkConfig> implementations the identity resolver type would be favored.

There are a few downsides to this approach:

  1. SdkConfig would have to expose accessor methods for the equivalents (e.g. credentials_identity_provider(&self) -> Option<&SharedIdentityResolver>). This creates additional noise and confusion as well as the chance for using the type wrong.
  2. Every new identity type added to SdkConfig would have to be sure to use SharedIdentityResolver.

The advantage of the proposed approach of letting ResolveIdentity implementations provide a cache partition means SdkConfig does not need to change. It also gives customers more control over whether an identity resolver implementation shares a cache partition or not.

Changes checklist

  • Add new cache_partition() method to ResolveIdentity
  • Update SharedIdentityResolver::new to use the new cache_partition() method on the resolver to determine if a new cache partition should be created or not
  • Claim a cache partition when SharedCredentialsProvider is created and override the new ResolveIdentity method
  • Claim a cache partition when SharedTokenProvider is created and override the new ResolveIdentity method
  • Introduce new behavior version
  • Conditionally (gated on behavior version) create a new default IdentityCache on SdkConfig if not explicitly configured
  • Add a new no_identity_cache() method to ConfigLoader that marks the identity cache as explicitly unset

RFC: Environment-defined service configuration

Status: RFC

Applies to: client

For a summarized list of proposed changes, see the Changes Checklist section.

In the AWS SDK for Rust today, customers are limited to setting global configuration variables in their environment; They cannot set service-specific variables. Other SDKs and the AWS CLI do allow for setting service-specific variables.

This RFC proposes an implementation that would enable users to set service-specific variables in their environment.

Terminology

  • Global configuration: configuration which will be used for requests to any service. May be overridden by service-specific configuration.
  • Service-specific configuration: configuration which will be used for requests only to a specific service.
  • Configuration variable: A key-value pair that defines configuration e.g. key = value, key: value, KEY=VALUE, etc.
    • Key and value as used in this RFC refer to each half of a configuration variable.
  • Sub-properties: When parsing config variables from a profile file, sub-properties are a newline-delimited list of key-value pairs in an indented block following a <service name>=\n line. For an example, see the Profile File Configuration section of this RFC where sub-properties are declared for two different services.

The user experience if this RFC is implemented

While users can already set global configuration in their environment, this RFC proposes two new ways to set service-specific configuration in their environment.

Environment Variables

When defining service-specific configuration with an environment variable, all keys are formatted like so:

"AWS" + "_" + "<config key in CONST_CASE>" + "_" + "<service ID in CONST_CASE>"

As an example, setting an endpoint URL for different services would look like this:

export AWS_ENDPOINT_URL=http://localhost:4444
export AWS_ENDPOINT_URL_ELASTICBEANSTALK=http://localhost:5555
export AWS_ENDPOINT_URL_DYNAMODB=http://localhost:6666

The first variable sets a global endpoint URL. The second variable overrides the first variable, but only for the Elastic Beanstalk service. The third variable overrides the first variable, but only for the DynamoDB service.

Profile File Configuration

When defining service-specific configuration in a profile file, it looks like this:

[profile dev]
services = testing-s3-and-eb
endpoint_url = http://localhost:9000

[services testing-s3-and-eb]
s3 =
  endpoint_url = http://localhost:4567
elasticbeanstalk =
  endpoint_url = http://localhost:8000

When dev is the active profile, all services will use the http://localhost:9000 endpoint URL except where it is overridden. Because the dev profile references the testing-s3-and-eb services, and because two service-specific endpoint URLs are set, those URLs will override the http://localhost:9000 endpoint URL when making requests to S3 (http://localhost:4567) and Elastic Beanstalk (http://localhost:8000).

Configuration Precedence

When configuration is set in multiple places, the value used is determined in this order of precedence:

highest precedence

  1. EXISTING Programmatic client configuration
  2. NEW Service-specific environment variables
  3. EXISTING Global environment variables
  4. NEW Service-specific profile file variables in the active profile
  5. EXISTING Global profile file variables in the active profile

lowest precedence

How to actually implement this RFC

This RFC may be implemented in several steps which are detailed below.

Sourcing service-specific config from the environment and profile

aws_config::profile::parser::ProfileSet is responsible for storing the active profile and all profile configuration data. Currently, it only tracks sso_session and profile sections, so it must be updated to store arbitrary sections, their properties, and sub-properties. These sections will be publicly accessible via a new method ProfileSet::other_sections which returns a ref to a Properties struct.

The Properties struct is defined as follows:

type SectionKey = String;
type SectionName = String;
type PropertyName = String;
type SubPropertyName = String;
type PropertyValue = String;

/// A key for to a property value.
///
/// ```txt
/// # An example AWS profile config section with properties and sub-properties
/// [section-key section-name]
/// property-name = property-value
/// property-name =
///   sub-property-name = property-value
/// ```
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub struct PropertiesKey {
    section_key: SectionKey,
    section_name: SectionName,
    property_name: PropertyName,
    sub_property_name: Option<SubPropertyName>,
}

impl PropertiesKey {
    /// Create a new builder for a `PropertiesKey`.
    pub fn builder() -> Builder {
        Default::default()
    }
}

// The builder code is omitted from this RFC. It allows users to set each field
// individually and then build a PropertiesKey

/// A map of [`PropertiesKey`]s to property values.
#[derive(Clone, Debug, Default, PartialEq, Eq)]
pub struct Properties {
    inner: HashMap<PropertiesKey, PropertyValue>,
}

impl Properties {
    /// Create a new empty [`Properties`].
    pub fn new() -> Self {
        Default::default()
    }

    #[cfg(test)]
    pub(crate) fn new_from_slice(slice: &[(PropertiesKey, PropertyValue)]) -> Self {
        let mut properties = Self::new();
        for (key, value) in slice {
            properties.insert(key.clone(), value.clone());
        }
        properties
    }

    /// Insert a new key/value pair into this map.
    pub fn insert(&mut self, properties_key: PropertiesKey, value: PropertyValue) {
        let _ = self
            .inner
            // If we don't clone then we don't get to log a useful warning for a value getting overwritten.
            .entry(properties_key.clone())
            .and_modify(|v| {
                tracing::trace!("overwriting {properties_key}: was {v}, now {value}");
                *v = value.clone();
            })
            .or_insert(value);
    }

    /// Given a [`PropertiesKey`], return the corresponding value, if any.
    pub fn get(&self, properties_key: &PropertiesKey) -> Option<&PropertyValue> {
        self.inner.get(properties_key)
    }
}

The aws_config::env module remains unchanged. It already provides all the necessary functionality.

Exposing valid service configuration during <service>::Config construction

Environment variables (from Env) and profile variables (from EnvConfigSections) must be available during the conversion of SdkConfig to <service>::Config. To accomplish this, we'll define a new trait LoadServiceConfig and implement it for EnvServiceConfig which will be stored in the SdkConfig struct.

/// A struct used with the [`LoadServiceConfig`] trait to extract service config from the user's environment.
// [profile active-profile]
// services = dev
//
// [services dev]
// service-id =
//   config-key = config-value
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub struct ServiceConfigKey<'a> {
    service_id: &'a str,
    profile: &'a str,
    env: &'a str,
}

impl<'a> ServiceConfigKey<'a> {
    /// Create a new [`ServiceConfigKey`] builder struct.
    pub fn builder() -> builder::Builder<'a> {
        Default::default()
    }
    /// Get the service ID.
    pub fn service_id(&self) -> &'a str {
        self.service_id
    }
    /// Get the profile key.
    pub fn profile(&self) -> &'a str {
        self.profile
    }
    /// Get the environment key.
    pub fn env(&self) -> &'a str {
        self.env
    }
}

/// Implementers of this trait can provide service config defined in a user's environment.
pub trait LoadServiceConfig: fmt::Debug + Send + Sync {
    /// Given a [`ServiceConfigKey`], return the value associated with it.
    fn load_config(&self, key: ServiceConfigKey<'_>) -> Option<String>;
}

#[derive(Debug)]
pub(crate) struct EnvServiceConfig {
    pub(crate) env: Env,
    pub(crate) env_config_sections: EnvConfigSections,
}

impl LoadServiceConfig for EnvServiceConfig {
    fn load_config(&self, key: ServiceConfigKey<'_>) -> Option<String> {
        let (value, _source) = EnvConfigValue::new()
            .env(key.env())
            .profile(key.profile())
            .service_id(key.service_id())
            .load(&self.env, Some(&self.env_config_sections))?;

        Some(value.to_string())
    }
}

Code generation

We require two things to check for when constructing the service config:

  • The service's ID
  • The service's supported configuration variables

We only have this information once we get to the service level. Because of that, we must use code generation to define:

  • What config to look for in the environment
  • How to validate that config

Codegen for configuration must be updated for all config variables that we want to support. For an example, here's how we'd update the RegionDecorator to check for service-specific regions:

class RegionDecorator : ClientCodegenDecorator {
    // ...
    override fun extraSections(codegenContext: ClientCodegenContext): List<AdHocCustomization> {
        return usesRegion(codegenContext).thenSingletonListOf {
            adhocCustomization<SdkConfigSection.CopySdkConfigToClientConfig> { section ->
                rust(
                    """
                    ${section.serviceConfigBuilder}.set_region(
                        ${section.sdkConfig}
                            .service_config()
                            .and_then(|conf| {
                                conf.load_config(service_config_key($envKey, $profileKey))
                                    .map(Region::new)
                            })
                            .or_else(|| ${section.sdkConfig}.region().cloned()),
                    );
                    """,
                )
            }
        }
    }
    // ...

To construct the keys necessary to locate the service-specific configuration, we generate a service_config_key function for each service crate:

class ServiceEnvConfigDecorator : ClientCodegenDecorator {
    override val name: String = "ServiceEnvConfigDecorator"
    override val order: Byte = 10

    override fun extras(
        codegenContext: ClientCodegenContext,
        rustCrate: RustCrate,
    ) {
        val rc = codegenContext.runtimeConfig
        val serviceId = codegenContext.serviceShape.sdkId().toSnakeCase().dq()
        rustCrate.withModule(ClientRustModule.config) {
            Attribute.AllowDeadCode.render(this)
            rustTemplate(
                """
                fn service_config_key<'a>(
                    env: &'a str,
                    profile: &'a str,
                ) -> aws_types::service_config::ServiceConfigKey<'a> {
                    #{ServiceConfigKey}::builder()
                        .service_id($serviceId)
                        .env(env)
                        .profile(profile)
                        .build()
                        .expect("all field sets explicitly, can't fail")
                }
                """,
                "ServiceConfigKey" to AwsRuntimeType.awsTypes(rc).resolve("service_config::ServiceConfigKey"),
            )
        }
    }
}

Changes checklist

  • In aws-types:
    • Add new service_config: Option<Arc<dyn LoadServiceConfig>> field to SdkConfig and builder.
    • Add setters and getters for the new service_config field.
    • Add a new service_config module.
      • Add new ServiceConfigKey struct and builder.
      • Add new LoadServiceConfig trait.
  • In aws-config:
    • Move profile parsing out of aws-config into aws-runtime.
    • Deprecate the aws-config reëxports and direct users to aws-runtime.
    • Add a new EnvServiceConfig struct and implement LoadServiceConfig for it.
    • Update ConfigLoader to set the service_config field in SdkConfig.
    • Update all default providers to use the new of the EnvConfigValue::validate method.
  • In aws-runtime:
    • Rename all profile-related code moved from aws-config to aws-runtime so that it's easier to understand in light of the API changes we're making.
    • Add a new struct PropertiesKey and Properties to store profile data.
  • Add an integration test that ensures service-specific config has the expected precedence.
  • Update codegen to generate a method to easily construct ServiceConfigKeys.
  • Update codegen to generate code that loads service-specific config from the environment for a limited initial set of config variables:
    • Region
    • Endpoint URL
    • Endpoint-related "built-ins" like use_arn_region and disable_multi_region_access_points.
  • Write a guide for users.
    • Explain to users how they can determine a service's ID.

Contributing

This is a collection of written resources for smithy-rs and SDK contributors.

Writing and debugging a low-level feature that relies on HTTP

Background

This article came about as a result of all the difficulties I encountered while developing the request checksums feature laid out in the internal-only Flexible Checksums spec (the feature is also highlighted in this public blog post.) I spent much more time developing the feature than I had anticipated. In this article, I'll talk about:

  • How the SDK sends requests with a body
  • How the SDK sends requests with a streaming body
  • The various issues I encountered and how I addressed them
  • Key takeaways for contributors developing similar low-level features

How the SDK sends requests with a body

All interactions between the SDK and a service are modeled as "operations". Operations contain:

  • A base HTTP request (with a potentially streaming body)
  • A typed property bag of configuration options
  • A fully generic response handler

Users create operations piecemeal with a fluent builder. The options set in the builder are then used to create the inner HTTP request, becoming headers or triggering specific request-building functionality (In this case, calculating a checksum and attaching it either as a header or a trailer.)

Here's an example from the QLDB SDK of creating a body from inputs and inserting it into the request to be sent:

let body = aws_smithy_http::body::SdkBody::from(
    crate::operation_ser::serialize_operation_crate_operation_send_command(&self)?,
);

if let Some(content_length) = body.content_length() {
    request = aws_smithy_http::header::set_request_header_if_absent(
        request,
        http::header::CONTENT_LENGTH,
        content_length,
    );
}
let request = request.body(body).expect("should be valid request");

Most all request body creation in the SDKs looks like that. Note how it automatically sets the Content-Length header whenever the size of the body is known; It'll be relevant later. The body is read into memory and can be inspected before the request is sent. This allows for things like calculating a checksum and then inserting it into the request as a header.

How the SDK sends requests with a streaming body

Often, sending a request with a streaming body looks much the same. However, it's not possible to read a streaming body until you've sent the request. Any metadata that needs to be calculated by inspecting the body must be sent as trailers. Additionally, some metadata, like Content-Length, can't be sent as a trailer at all. MDN maintains a helpful list of metadata that can only be sent as a header.

// When trailers are set, we must send an AWS-specific header that lists them named `x-amz-trailer`.
// For example, when sending a SHA256 checksum as a trailer,
// we have to send an `x-amz-trailer` header telling the service to watch out for it:
request
    .headers_mut()
    .insert(
        http::header::HeaderName::from_static("x-amz-trailer"),
        http::header::HeaderValue::from_static("x-amz-checksum-sha256"),
    );

The issues I encountered while implementing checksums for streaming request bodies

Content-Encoding: aws-chunked

When sending a request body with trailers, we must use an AWS-specific content encoding called aws-chunked. To encode a request body for aws-chunked requires us to know the length of each chunk we're going to send before we send it. We have to prefix each chunk with its size in bytes, represented by one or more hexadecimal digits. To close the body, we send a final chunk with a zero. For example, the body "Hello world" would look like this when encoded:

B\r\n
Hello world\r\n
0\r\n

When sending a request body encoded in this way, we need to set two length headers:

  • Content-Length is the length of the entire request body, including the chunk size prefix and zero terminator. In the example above, this would be 19.
  • x-amz-decoded-content-length is the length of the decoded request body. In the example above, this would be 11.

NOTE: Content-Encoding is distinct from Transfer-Encoding. It's possible to construct a request with both Content-Encoding: chunked AND Transfer-Encoding: chunked, although we don't ever need to do that for SDK requests.

S3 requires a Content-Length unless you also set Transfer-Encoding: chunked

S3 does not require you to send a Content-Length header if you set the Transfer-Encoding: chunked header. That's very helpful because it's not always possible to know the total length of a stream of bytes if that's what you're constructing your request body from. However, when sending trailers, this part of the spec can be misleading.

  1. When sending a streaming request, we must send metadata like checksums as trailers
  2. To send a request body with trailers, we must set the Content-Encoding: aws-chunked header
  3. When using aws-chunked encoding for a request body, we must set the x-amz-decoded-content-length header with the pre-encoding length of the request body.

This means that we can't actually avoid having to know and specify the length of the request body when sending a request to S3. This turns out to not be much of a problem for common use of the SDKs because most streaming request bodies are constructed from files. In these cases we can ask the operating system for the file size before sending the request. So long as that size doesn't change during sending of the request, all is well. In any other case, the request will fail.

Adding trailers to a request changes the size of that request

Headers don't count towards the size of a request body, but trailers do. That means we need to take trailers (which aren't sent until after the body) into account when setting the Content-Length header (which are sent before the body.) This means that without setting Transfer-Encoding: chunked, the SDKs only support trailers of known length. In the case of checksums, we're lucky because they're always going to be the same size. We must also take into account the fact that checksum values are base64 encoded before being set (this lengthens them.)

hyper supports HTTP request trailers but isn't compatible with Content-Encoding: aws-chunked

This was a big source of confusion for me, and I only figured out what was happening with the help of @seanmonstar. When using aws-chunked encoding, the trailers have to be appended to the body as part of poll_data instead of relying on the poll_trailers method. The working http_body::Body implementation of an aws-chunked encoded body looked like this:

impl Body for AwsChunkedBody<Inner> {
    type Data = Bytes;
    type Error = aws_smithy_http::body::Error;

    fn poll_data(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Result<Self::Data, Self::Error>>> {
        let this = self.project();
        if *this.already_wrote_trailers {
            return Poll::Ready(None);
        }

        if *this.already_wrote_chunk_terminator {
            return match this.inner.poll_trailers(cx) {
                Poll::Ready(Ok(trailers)) => {
                    *this.already_wrote_trailers = true;
                    let total_length_of_trailers_in_bytes = this.options.trailer_lens.iter().sum();

                    Poll::Ready(Some(Ok(trailers_as_aws_chunked_bytes(
                        total_length_of_trailers_in_bytes,
                        trailers,
                    ))))
                }
                Poll::Pending => Poll::Pending,
                Poll::Ready(err) => Poll::Ready(Some(err)),
            };
        };

        match this.inner.poll_data(cx) {
            Poll::Ready(Some(Ok(mut data))) => {
                let bytes = if *this.already_wrote_chunk_size_prefix {
                    data.copy_to_bytes(data.len())
                } else {
                    // A chunk must be prefixed by chunk size in hexadecimal
                    *this.already_wrote_chunk_size_prefix = true;
                    let total_chunk_size = this
                        .options
                        .chunk_length
                        .or(this.options.stream_length)
                        .unwrap_or_default();
                    prefix_with_total_chunk_size(data, total_chunk_size)
                };

                Poll::Ready(Some(Ok(bytes)))
            }
            Poll::Ready(None) => {
                *this.already_wrote_chunk_terminator = true;
                Poll::Ready(Some(Ok(Bytes::from("\r\n0\r\n"))))
            }
            Poll::Ready(Some(Err(e))) => Poll::Ready(Some(Err(e))),
            Poll::Pending => Poll::Pending,
        }
    }

    fn poll_trailers(
        self: Pin<&mut Self>,
        _cx: &mut Context<'_>,
    ) -> Poll<Result<Option<HeaderMap<HeaderValue>>, Self::Error>> {
        // When using aws-chunked content encoding, trailers have to be appended to the body
        Poll::Ready(Ok(None))
    }

    fn is_end_stream(&self) -> bool {
        self.already_wrote_trailers
    }

    fn size_hint(&self) -> SizeHint {
        SizeHint::with_exact(
            self.encoded_length()
                .expect("Requests made with aws-chunked encoding must have known size")
                as u64,
        )
    }
}

"The stream is closing early, and I don't know why"

In my early implementation of http_body::Body for an aws-chunked encoded body, the body wasn't being completely read out. The problem turned out to be that I was delegating to the is_end_stream trait method of the inner body. Because the innermost body had no knowledge of the trailers I needed to send, it was reporting that the stream had ended. The fix was to instead rely on the outermost body's knowledge of its own state in order to determine if all data had been read.

What helped me to understand the problems and their solutions

  • Reaching out to others that had specific knowledge of a problem: Talking to a developer that had tackled this feature for another SDK was a big help. Special thanks is due to @jasdel and the Go v2 SDK team. Their implementation of an aws-chunked encoded body was the basis for my own implementation.

  • Avoiding codegen: The process of updating codegen code and then running codegen for each new change you make is slow compared to running codegen once at the beginning of development and then just manually editing the generated SDK as necessary. I still needed to run ./gradlew :aws:sdk:relocateAwsRuntime :aws:sdk:relocateRuntime whenever I made changes to a runtime crate but that was quick because it's just copying the files. Keep as much code out of codegen as possible. It's much easier to modify/debug Rust than it is to write a working codegen module that does the same thing. Whenever possible, write the codegen modules later, once the design has settled.

  • Using the Display impl for errors: The Display impl for an error can ofter contain helpful info that might not be visible when printing with the Debug impl. Case in point was an error I was getting because of the is_end_stream issue. When Debug printed, the error looked like this:

    DispatchFailure(ConnectorError { err: hyper::Error(User(Body), hyper::Error(BodyWriteAborted)), kind: User })

    That wasn't too helpful for me on its own. I looked into the hyper source code and found that the Display impl contained a helpful message, so I matched into the error and printed the hyper::Error with the Display impl:

    user body write aborted: early end, expected 2 more bytes'
    

    This helped me understand that I wasn't encoding things correctly and was missing a CRLF.

  • Echo Server: I first used netcat and then later a small echo server written in Rust to see the raw HTTP request being sent out by the SDK as I was working on it. The Rust SDK supports setting endpoints for request. This is often used to send requests to something like LocalStack, but I used it to send request to localhost instead:

    #[tokio::test]
    async fn test_checksum_on_streaming_request_against_s3() {
        let sdk_config = aws_config::from_env()
            .endpoint_resolver(Endpoint::immutable("http://localhost:8080".parse().expect("valid URI")))
            .load().await;
        let s3_client = aws_sdk_s3::Client::new(&sdk_config);
    
        let input_text = b"Hello world";
        let _res = s3_client
            .put_object()
            .bucket("some-real-bucket")
            .key("test.txt")
            .body(aws_sdk_s3::types::ByteStream::from_static(input_text))
            .checksum_algorithm(ChecksumAlgorithm::Sha256)
            .send()
            .await
            .unwrap();
    }

    The echo server was based off of an axum example and looked like this:

    use axum::{
      body::{Body, Bytes},
      http::{request::Parts, Request, StatusCode},
      middleware::{self, Next},
      response::IntoResponse,
      routing::put,
      Router,
    };
    use std::net::SocketAddr;
    use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
    
    #[tokio::main]
    async fn main() {
      tracing_subscriber::registry().with(tracing_subscriber::EnvFilter::new(
        std::env::var("RUST_LOG").unwrap_or_else(|_| "trace".into()),
      ))
      .with(tracing_subscriber::fmt::layer())
      .init();
    
      let app = Router::new()
          .route("/", put(|| async move { "200 OK" }))
          .layer(middleware::from_fn(print_request_response));
    
      let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
      tracing::debug!("listening on {}", addr);
      axum::Server::bind(&addr)
          .serve(app.into_make_service())
          .await
          .unwrap();
    }
    
    async fn print_request_response(
      req: Request<Body>,
      next: Next<Body>,
    ) -> Result<impl IntoResponse, (StatusCode, String)> {
        let (parts, body) = req.into_parts();
    
        print_parts(&parts).await;
        let bytes = buffer_and_print("request", body).await?;
        let req = Request::from_parts(parts, Body::from(bytes));
    
        let res = next.run(req).await;
    
        Ok(res)
    }
    
    async fn print_parts(parts: &Parts) {
        tracing::debug!("{:#?}", parts);
    }
    
    async fn buffer_and_print<B>(direction: &str, body: B) -> Result<Bytes, (StatusCode, String)>
    where
      B: axum::body::HttpBody<Data = Bytes>,
      B::Error: std::fmt::Display,
    {
        let bytes = match hyper::body::to_bytes(body).await {
            Ok(bytes) => bytes,
            Err(err) => {
                return Err((
                    StatusCode::BAD_REQUEST,
                    format!("failed to read {} body: {}", direction, err),
                ));
            }
        };
    
        if let Ok(body) = std::str::from_utf8(&bytes) {
            tracing::debug!("{} body = {:?}", direction, body);
        }
    
        Ok(bytes)
    }

](writing_and_debugging_a_low-level_feature_that_relies_on_HTTP.md)