RFC: Constraint traits
Status: Implemented.
See the description of the PR that laid the foundation for the implementation of constraint traits for a complete reference. See the Better Constraint Violations RFC too for subsequent improvements to this design.
See the uber tracking issue for pending work.
Constraint traits are used to constrain the values that can be provided for a shape.
For example, given the following Smithy model,
@length(min: 18)
Integer Age
the integer Age
must take values greater than or equal to 18.
Constraint traits are most useful when enforced as part of input model validation to a service. When a server receives a request whose contents deserialize to input data that violates the modeled constraints, the operation execution's preconditions are not met, and as such rejecting the request without executing the operation is expected behavior.
Constraint traits can also be applied to operation output member shapes, but the expectation is that service implementations not fail to render a response when an output value does not meet the specified constraints. From awslabs/smithy#1039:
This might seem counterintuitive, but our philosophy is that a change in server-side state should not be hidden from the caller unless absolutely necessary. Refusing to service an invalid request should always prevent server-side state changes, but refusing to send a response will not, as there's generally no reasonable route for a server implementation to unwind state changes due to a response serialization failure.
In general, clients should not enforce constraint traits in generated code. Clients must also never enforce constraint traits when sending requests. This is because:
- addition and removal of constraint traits are backwards-compatible from a client's perspective (although this is not documented anywhere in the Smithy specification),
- the client may have been generated with an older version of the model; and
- the most recent model version might have lifted some constraints.
On the other hand, server SDKs constitute the source of truth for the service's behavior, so they interpret the model in all its strictness.
The Smithy spec defines 8 constraint traits:
The idRef
and private
traits are enforced at SDK generation time by the
awslabs/smithy
libraries and bear no relation to generated Rust code.
The only constraint trait enforcement that is generated by smithy-rs clients
should be and is the enum
trait, which renders Rust enum
s.
The required
trait is already and only enforced by smithy-rs servers since
#1148.
That leaves 4 traits: length
, pattern
, range
, and uniqueItems
.
Implementation
This section addresses how to implement and enforce the length
, pattern
,
range
, and uniqueItems
traits. We will use the length
trait applied to a
string
shape as a running example. The implementation of this trait mostly
carries over to the other three.
Example implementation for the length
trait
Consider the following Smithy model:
@length(min: 1, max: 69)
string NiceString
The central idea to the implementation of constraint traits is: parse, don't
validate. Instead of code-generating a Rust String
to represent NiceString
values and perform the validation at request deserialization, we can
leverage Rust's type system to guarantee domain invariants. We can generate a
wrapper tuple struct that parses the string's value and is "tight" in the
set of values it can accept:
pub struct NiceString(String);
impl TryFrom<String> for NiceString {
type Error = nice_string::ConstraintViolation;
fn try_from(value: String) -> Result<Self, Self::Error> {
let num_code_points = value.chars().count();
if 1 <= num_code_points && num_code_points <= 69 {
Ok(Self(value))
} else {
Err(nice_string::ConstraintViolation::Length(num_code_points))
}
}
}
(Note that we're using the linear time check chars().count()
instead of
len()
on the input value, since the Smithy specification says the length
trait counts the number of Unicode code points when applied to string
shapes.)
The goal is to enforce, at the type-system level, that these constrained
structs always hold valid data. It should be impossible for the service
implementer, without resorting to unsafe
Rust, to construct a NiceString
that violates the model. The actual check is performed in the implementation of
TryFrom
<InnerType>
for the generated struct, which makes it convenient to use
the ?
operator for error propagation. Each constrained struct will have a
related std::error::Error
enum type to signal the first parsing failure,
with one enum variant per applied constraint trait:
pub mod nice_string {
pub enum ConstraintViolation {
/// Validation error holding the number of Unicode code points found, when a value between `1` and
/// `69` (inclusive) was expected.
Length(usize),
}
impl std::error::Error for ConstraintViolation {}
}
std::error::Error
requires Display
and Debug
. We will
#[derive(Debug)]
, unless the shape also has the sensitive
trait, in which
case we will just print the name of the struct:
impl std::fmt::Debug for ConstraintViolation {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let mut formatter = f.debug_struct("ConstraintViolation");
formatter.finish()
}
}
Display
is used to produce human-friendlier representations. Its
implementation might be called when formatting a 400 HTTP response message in
certain protocols, for example.
Request deserialization
We will continue to deserialize the different parts of the HTTP message into
the regular Rust standard library types. However, just before the
deserialization function returns, we will convert the type into the wrapper
tuple struct that will eventually be handed over to the operation handler. This
is what we're already doing when deserializing strings into enums
. For
example, given the Smithy model:
@enum([
{ name: "Spanish", value: "es" },
{ name: "English", value: "en" },
{ name: "Japanese", value: "jp" },
])
string Language
the code the client generates when deserializing a string from a JSON document
into the Language
enum is (excerpt):
...
match key.to_unescaped()?.as_ref() {
"language" => {
builder = builder.set_language(
aws_smithy_json::deserialize::token::expect_string_or_null(
tokens.next(),
)?
.map(|s| {
s.to_unescaped()
.map(|u| crate::model::Language::from(u.as_ref()))
})
.transpose()?,
);
}
_ => aws_smithy_json::deserialize::token::skip_value(tokens)?,
}
...
Note how the String
gets converted to the enum via Language::from()
.
impl std::convert::From<&str> for Language {
fn from(s: &str) -> Self {
match s {
"es" => Language::Spanish,
"en" => Language::English,
"jp" => Language::Japanese,
other => Language::Unknown(other.to_owned()),
}
}
}
For constrained shapes we would do the same to parse the inner deserialized value into the wrapper tuple struct, except for these differences:
- For enums, the client generates an
Unknown
variant that "contains new variants that have been added since this code was generated". The server does not need such a variant (#1187). - Conversions into the tuple struct are fallible (
try_from()
instead offrom()
). These errors will result in amy_struct::ConstraintViolation
.
length
trait
We will enforce the length constraint by calling len()
on Rust's Vec
(list
and set
shapes), HashMap
(map
shapes) and our
aws_smithy_types::Blob
(bytes
shapes).
We will enforce the length constraint trait on String
(string
shapes) by
calling .chars().count()
.
pattern
trait
The pattern
trait
restricts string shape values to a specified regular expression.
We will implement this by using the regex
's crate is_match
. We will use
once_cell
to compile the regex only the first time it is required.
uniqueItems
trait
indicates that the items in a
List
MUST be unique.
If the list shape is sparse
, more than one null
value violates this
constraint.
We will enforce this by copying references to the Vec
's elements into a
HashSet
and checking that the sizes of both containers coincide.
Trait precedence and naming of the tuple struct
From the spec:
Some constraints can be applied to shapes as well as structure members. If a constraint of the same type is applied to a structure member and the shape that the member targets, the trait applied to the member takes precedence.
structure ShoppingCart {
@range(min: 7, max:12)
numberOfItems: PositiveInteger
}
@range(min: 1)
integer PositiveInteger
In the above example,
the range trait applied to numberOfItems takes precedence over the one applied to PositiveInteger. The resolved minimum will be 7, and the maximum 12.
When the constraint trait is applied to a member shape, the tuple struct's name
will be the PascalCased name of the member shape, NumberOfItems
.
Unresolved questions
- Should we code-generate unsigned integer types (
u16
,u32
,u64
) when therange
trait is applied withmin
set to a value greater than or equal to 0?- A user has even suggested to use the
std::num::NonZeroUX
types (e.g.NonZeroU64
) whenrange
is applied withmin
set to a value greater than 0. - UPDATE: This requires further design work. There are interoperability
concerns: for example, the positive range of a
u32
is strictly greater than that of ani32
, so clients wouldn't be able to receive values within the non-overlapping range.
- A user has even suggested to use the
- In request deserialization, should we fail with the first violation and
immediately render a response, or attempt to parse the entire request and
provide a complete and structured report?
- UPDATE: We will provide a response containing all violations. See the "Collecting Constraint Violations" section in the Better Constraint Violations RFC.
- Should we provide a mechanism for the service implementer to construct a
Rust type violating the modeled constraints in their business logic e.g. a
T::new_unchecked()
constructor? This could be useful (1) when the user knows the provided inner value does not violate the constraints and doesn't want to incur the performance penalty of the check; (2) when the struct is in a transient invalid state. However:- (2) is arguably a modelling mistake and a separate struct to represent the transient state would be a better approach,
- the user could use
unsafe
Rust to bypass the validation; and - adding this constructor is a backwards-compatible change, so it can always be added later if this feature is requested.
- UPDATE: We decided to punt on this until users express interest.
Alternative design
An alternative design with less public API surface would be to perform
constraint validation at request deserialization, but hand over a regular
"loose" type (e.g. String
instead of NiceString
) that allows for values
violating the constraints. If we were to implement this approach, we can
implement it by wrapping the incoming value in the aforementioned tuple struct
to perform the validation, and immediately unwrap it.
Comparative advantages:
- Validation remains an internal detail of the framework. If the semantics of a constraint trait change, the behavior of the service is still backwards-incompatibly affected, but user code is not.
- Less "invasive". Baking validation in the generated type might be deemed as the service framework overreaching responsibilities.
Comparative disadvantages:
- It becomes possible to send responses with invalid operation outputs. All the service framework could do is log the validation errors.
- Baking validation at the type-system level gets rid of an entire class of logic errors.
- Less idiomatic (this is subjective). The pattern of wrapping a more primitive type to guarantee domain invariants is widespread in the Rust ecosystem. The standard library makes use of it extensively.
Note that both designs are backwards incompatible in the sense that you can't migrate from one to the other without breaking user code.
UPDATE: We ended up implementing both designs, adding a flag to opt into the
alternative design. Refer to the mentions of the publicConstrainedTypes
flag
in the description of the Builders of builders PR.