RFC: Constraint traits

Status: Implemented.

See the description of the PR that laid the foundation for the implementation of constraint traits for a complete reference. See the Better Constraint Violations RFC too for subsequent improvements to this design.

See the uber tracking issue for pending work.

Constraint traits are used to constrain the values that can be provided for a shape.

For example, given the following Smithy model,

@length(min: 18)
Integer Age

the integer Age must take values greater than or equal to 18.

Constraint traits are most useful when enforced as part of input model validation to a service. When a server receives a request whose contents deserialize to input data that violates the modeled constraints, the operation execution's preconditions are not met, and as such rejecting the request without executing the operation is expected behavior.

Constraint traits can also be applied to operation output member shapes, but the expectation is that service implementations not fail to render a response when an output value does not meet the specified constraints. From awslabs/smithy#1039:

This might seem counterintuitive, but our philosophy is that a change in server-side state should not be hidden from the caller unless absolutely necessary. Refusing to service an invalid request should always prevent server-side state changes, but refusing to send a response will not, as there's generally no reasonable route for a server implementation to unwind state changes due to a response serialization failure.

In general, clients should not enforce constraint traits in generated code. Clients must also never enforce constraint traits when sending requests. This is because:

addition and removal of constraint traits are backwards-compatible from a client's perspective (although this is not documented anywhere in the Smithy specification),
the client may have been generated with an older version of the model; and
the most recent model version might have lifted some constraints.

On the other hand, server SDKs constitute the source of truth for the service's behavior, so they interpret the model in all its strictness.

The Smithy spec defines 8 constraint traits:

The idRef and private traits are enforced at SDK generation time by the awslabs/smithy libraries and bear no relation to generated Rust code.

The only constraint trait enforcement that is generated by smithy-rs clients should be and is the enum trait, which renders Rust enums.

The required trait is already and only enforced by smithy-rs servers since #1148.

That leaves 4 traits: length, pattern, range, and uniqueItems.

Implementation

This section addresses how to implement and enforce the length, pattern, range, and uniqueItems traits. We will use the length trait applied to a string shape as a running example. The implementation of this trait mostly carries over to the other three.

Example implementation for the `length` trait

Consider the following Smithy model:

@length(min: 1, max: 69)
string NiceString

The central idea to the implementation of constraint traits is: parse, don't validate. Instead of code-generating a Rust String to represent NiceString values and perform the validation at request deserialization, we can leverage Rust's type system to guarantee domain invariants. We can generate a wrapper tuple struct that parses the string's value and is "tight" in the set of values it can accept:

pub struct NiceString(String);

impl TryFrom<String> for NiceString {
    type Error = nice_string::ConstraintViolation;

    fn try_from(value: String) -> Result<Self, Self::Error> {
        let num_code_points = value.chars().count();
        if 1 <= num_code_points && num_code_points <= 69 {
            Ok(Self(value))
        } else {
            Err(nice_string::ConstraintViolation::Length(num_code_points))
        }
    }
}

(Note that we're using the linear time check chars().count() instead of len() on the input value, since the Smithy specification says the length trait counts the number of Unicode code points when applied to string shapes.)

The goal is to enforce, at the type-system level, that these constrained structs always hold valid data. It should be impossible for the service implementer, without resorting to unsafe Rust, to construct a NiceString that violates the model. The actual check is performed in the implementation of TryFrom<InnerType> for the generated struct, which makes it convenient to use the ? operator for error propagation. Each constrained struct will have a related std::error::Error enum type to signal the first parsing failure, with one enum variant per applied constraint trait:

pub mod nice_string {
    pub enum ConstraintViolation {
        /// Validation error holding the number of Unicode code points found, when a value between `1` and
        /// `69` (inclusive) was expected.
        Length(usize),
    }

    impl std::error::Error for ConstraintViolation {}
}

std::error::Error requires Display and Debug. We will #[derive(Debug)], unless the shape also has the sensitive trait, in which case we will just print the name of the struct:

impl std::fmt::Debug for ConstraintViolation {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        let mut formatter = f.debug_struct("ConstraintViolation");
        formatter.finish()
    }
}

Display is used to produce human-friendlier representations. Its implementation might be called when formatting a 400 HTTP response message in certain protocols, for example.

Request deserialization

We will continue to deserialize the different parts of the HTTP message into the regular Rust standard library types. However, just before the deserialization function returns, we will convert the type into the wrapper tuple struct that will eventually be handed over to the operation handler. This is what we're already doing when deserializing strings into enums. For example, given the Smithy model:

@enum([
    { name: "Spanish", value: "es" },
    { name: "English", value: "en" },
    { name: "Japanese", value: "jp" },
])
string Language

the code the client generates when deserializing a string from a JSON document into the Language enum is (excerpt):

...
match key.to_unescaped()?.as_ref() {
    "language" => {
        builder = builder.set_language(
            aws_smithy_json::deserialize::token::expect_string_or_null(
                tokens.next(),
            )?
            .map(|s| {
                s.to_unescaped()
                    .map(|u| crate::model::Language::from(u.as_ref()))
            })
            .transpose()?,
        );
    }
    _ => aws_smithy_json::deserialize::token::skip_value(tokens)?,
}
...

Note how the String gets converted to the enum via Language::from().

impl std::convert::From<&str> for Language {
    fn from(s: &str) -> Self {
        match s {
            "es" => Language::Spanish,
            "en" => Language::English,
            "jp" => Language::Japanese,
            other => Language::Unknown(other.to_owned()),
        }
    }
}

For constrained shapes we would do the same to parse the inner deserialized value into the wrapper tuple struct, except for these differences:

For enums, the client generates an Unknown variant that "contains new variants that have been added since this code was generated". The server does not need such a variant (#1187).
Conversions into the tuple struct are fallible (try_from() instead of from()). These errors will result in a my_struct::ConstraintViolation.

`length` trait

We will enforce the length constraint by calling len() on Rust's Vec (list and set shapes), HashMap (map shapes) and our aws_smithy_types::Blob (bytes shapes).

We will enforce the length constraint trait on String (string shapes) by calling .chars().count().

`pattern` trait

The pattern trait

restricts string shape values to a specified regular expression.

We will implement this by using the regex's crate is_match. We will use once_cell to compile the regex only the first time it is required.

`uniqueItems` trait

The uniqueItems trait

indicates that the items in a List MUST be unique.

If the list shape is sparse, more than one null value violates this constraint.

We will enforce this by copying references to the Vec's elements into a HashSet and checking that the sizes of both containers coincide.

Trait precedence and naming of the tuple struct

From the spec:

Some constraints can be applied to shapes as well as structure members. If a constraint of the same type is applied to a structure member and the shape that the member targets, the trait applied to the member takes precedence.

structure ShoppingCart {
    @range(min: 7, max:12)
    numberOfItems: PositiveInteger
}

@range(min: 1)
integer PositiveInteger

In the above example,

the range trait applied to numberOfItems takes precedence over the one applied to PositiveInteger. The resolved minimum will be 7, and the maximum 12.

When the constraint trait is applied to a member shape, the tuple struct's name will be the PascalCased name of the member shape, NumberOfItems.

Unresolved questions

Should we code-generate unsigned integer types (u16, u32, u64) when the range trait is applied with min set to a value greater than or equal to 0?
- A user has even suggested to use the std::num::NonZeroUX types (e.g. NonZeroU64) when range is applied with min set to a value greater than 0.
- UPDATE: This requires further design work. There are interoperability concerns: for example, the positive range of a u32 is strictly greater than that of an i32, so clients wouldn't be able to receive values within the non-overlapping range.
In request deserialization, should we fail with the first violation and immediately render a response, or attempt to parse the entire request and provide a complete and structured report?
- UPDATE: We will provide a response containing all violations. See the "Collecting Constraint Violations" section in the Better Constraint Violations RFC.
Should we provide a mechanism for the service implementer to construct a Rust type violating the modeled constraints in their business logic e.g. a T::new_unchecked() constructor? This could be useful (1) when the user knows the provided inner value does not violate the constraints and doesn't want to incur the performance penalty of the check; (2) when the struct is in a transient invalid state. However:
- (2) is arguably a modelling mistake and a separate struct to represent the transient state would be a better approach,
- the user could use unsafe Rust to bypass the validation; and
- adding this constructor is a backwards-compatible change, so it can always be added later if this feature is requested.
- UPDATE: We decided to punt on this until users express interest.

Alternative design

An alternative design with less public API surface would be to perform constraint validation at request deserialization, but hand over a regular "loose" type (e.g. String instead of NiceString) that allows for values violating the constraints. If we were to implement this approach, we can implement it by wrapping the incoming value in the aforementioned tuple struct to perform the validation, and immediately unwrap it.

Comparative advantages:

Validation remains an internal detail of the framework. If the semantics of a constraint trait change, the behavior of the service is still backwards-incompatibly affected, but user code is not.
Less "invasive". Baking validation in the generated type might be deemed as the service framework overreaching responsibilities.

Comparative disadvantages:

It becomes possible to send responses with invalid operation outputs. All the service framework could do is log the validation errors.
Baking validation at the type-system level gets rid of an entire class of logic errors.
Less idiomatic (this is subjective). The pattern of wrapping a more primitive type to guarantee domain invariants is widespread in the Rust ecosystem. The standard library makes use of it extensively.

Note that both designs are backwards incompatible in the sense that you can't migrate from one to the other without breaking user code.

UPDATE: We ended up implementing both designs, adding a flag to opt into the alternative design. Refer to the mentions of the publicConstrainedTypes flag in the description of the Builders of builders PR.

Smithy Rust