In my previous post I argued that the existing tools available to us as developers largely either optimize for developer velocity or scale, particularly when build backend applications. I postulate that if we had a framework that derived as much as possible from a schema definition plus associated metadata, that framework could optimize for both. To enable this, we first need to have a better way to write schemas.
Introducing sroto, a library and associated command tool to generate .proto
files (and thus protobuf schema definitions) from jsonnet, a data templating language.
This basic problem with the way we approach schemas is:
- A schema definition is just data.
- Writing data is annoying and we want to write code.
- So we use the application code to write the data.
- Whoops, now we can’t untangle it and therefore scalability is hard.
So let’s break this cycle.
A schema definition is just data
Schemas (API, database, or otherwise) are generally structured data that itself conforms to a schema. The OpenAPI schema is defined here. A JSON object that conforms to that schema can be interpreted as an OpenAPI schema.
Similarly, while .proto
files look like code, they’re actually just data. That schema is defined here. The way protoc
generates code for the different languages is that a subprocess (eg. protoc-gen-go
) consumes a serialized FileDescriptorSet
protobuf message. Hypothetically, you could write a FileDescriptorSet
in the canonical JSON format, transform it into a serialized protobuf message, and feed that into a standard protoc
code generator. However, one of the big benefits of .proto
files is that it’s easy to understand the schema, and once a developer gets comfortable with .proto
files it becomes easy to mentally map the .proto
file to the generated code that the developer interacts with.
Writing data is annoying
grpc-gateway
is a project that enables you to expose your gRPC APIs as REST APIs. This starts out simple enough, with a simple implementation looking like (sample taken from readme):
syntax = "proto3";
package your.service.v1;
option go_package = "github.com/yourorg/yourprotos/gen/go/your/service/v1";
import "google/api/annotations.proto";
message StringMessage {
string value = 1;
}
service YourService {
rpc Echo(StringMessage) returns (StringMessage) {
option (google.api.http) = {
post: "/v1/example/echo"
body: "*"
};
}
}
Easy enough, the additional metadata on the schema is pretty minimal: just give the endpoint in the post
field on the option and say where to find the body of the post data. Ok now suppose your frontend developers come to you and ask for OpenAPI schemas. Now the message starts looking like:
message StringMessage {
option (grpc.gateway.protoc_gen_openapiv2.options.openapiv2_schema) = {
json_schema: {
description: "StringMessage wraps a string as a message.",
required: ["value"]
},
example: "{\"value\": \"This is a message to echo.\"}"
};
string value = 1;
}
Here’s a sample file with this metadata added in all the different ways this is possible.
I think this functionality is great and backend engineers who write REST APIs should provide schemas that can be consumed by clients. But at the same time, I also think it’s unrealistic for engineers to write schemas in this particular way because:
- It’s not composable. If I wanted to define a common UUID field or similar field, there would be a lot of copying of metadata.
- There’s no way to access helpers. In the
example
field I even had to manually escape the quotation marks! - Standardization can only be enforced through a
protoc
plugin. In larger organizations, you want to make it easy to do the right thing by providing libraries that increase developer velocity but also incidentally create standardization. Since these plugins aren’t that common to write, standardization is then primarily enforced through code reviews, which are imperfect.
We want to write code
This is where sroto
comes in. I wanted to create a new way to write .proto
files to make it easier to do the right thing while also feeling very similar to writing an actual .proto
file. Because schemas are data, using a data templating language was a natural choice since this language type is focused on solving the problem of writing data. This enables a high degree of expressiveness for this use case while remaining a relatively simple language. Jsonnet, which I picked as the language of choice for sroto, is a popular and simple data templating language which fit the objectives perfectly. (Of course, similar languages like cue could also be used and the project documentation lays out how one might go about reusing some of sroto’s functionality to enable a cue frontend)
One of the goals was to make the jsonnet code “feel” like protobuf definitions as much as possible. This required me to play around with the API and experiment with a few different languages. As an example, here’s the above message and service definition, before any options:
local sroto = import "sroto.libsonnet";
sroto.File("your/service/v1.proto", "your.service.v1", {
StringMessage: sroto.Message({
value: sroto.StringField(1),
}),
YourService: sroto.Service({
Echo: sroto.UnaryMethod("StringMessage", "StringMessage"),
}),
})
As you can see, this looks a lot like a .proto
file, just structured more as object composition rather than a code-like format. To “compile” these files I created a tool that feels a lot like protoc
called srotoc
. It can be installed by running:
go install github.com/tomlinford/sroto/cmd/srotoc@latest
srotoc
also embeds a jsonnet interpreter and the sroto.libsonnet
library file to simplify installation. srotoc
takes in *.jsonnet
files and looks for a --proto_out
argument. So the file above is at your/service/v1.jsonnet
you can compile it by running:
srotoc --proto_out=. your/service/v1.jsonnet
Let’s now add the options from the initial protobuf example to the jsonnet example:
local sroto = import "sroto.libsonnet";
local option(option, value) = {options+: [{type: option, value: value}]};
local openAPISchemaOption(value) = option({
filename: "protoc-gen-openapiv2/options/annotations.proto",
package: "grpc.gateway.protoc_gen_openapiv2.options",
name: "openapiv2_schema",
}, value);
local httpOption(value) = option({
filename: "google/api/annotations.proto",
package: "google.api",
name: "http",
}, value);
local postMethod(url, input_type, output_type) =
sroto.UnaryMethod(input_type, output_type) + httpOption({
post: url,
body: "*",
});
sroto.File("your/service/v1.proto", "your.service.v1", {
StringMessage: sroto.Message({
value: sroto.StringField(1),
}) + openAPISchemaOption({
json_schema: {
description: "StringMessage wraps a string as a message.",
required: ["value"],
},
example: std.manifestJsonMinified({
value: "This is a message to echo.",
}),
}),
YourService: sroto.Service({
Echo: postMethod("/v1/example/echo", "StringMessage", "StringMessage"),
}),
}) + option("go_package", "github.com/yourorg/yourprotos/gen/go/your/service/v1")
As you can see, this is truly code, not data. We can refactor and make common functionality to be DRY, and build out libraries to provide general functions. For instance, if your organization wants to have a typical standard for the protobuf filenames, protobuf package names, and go package names that could mean implementing a function that takes in some generic package name and the file declarations and returns a sroto.File
with those fields set as expected.
Many additional examples are provided in the sroto readme.
Generate code for the application
If other --*_out
arguments are passed in, it will also call protoc
inside a subprocess to enable generating the .proto
files and the generated protobuf code files in one step (of course, this requires the protobuf toolchain to be installed). So for instance, to generate all .proto
files and *_pb2.py
files in a directory, you could run something like:
find . -name '*.jsonnet' | xargs srotoc --proto_out=. --python_out=.
Note that this also automatically takes .proto
files generated from srotoc
and passes them into the protoc
subcommand.
Furthermore, the metadata provided in the schema could be used to autogenerate additional code. For example, the OpenAPI schema could be consumed to generate code that does validation on the server side and to generate autogenerated clients on the client side. This code generation could be implemented as a protoc
plugin, which can leverage the existing ecosystem using tools like protoc-gen-star
.
Scalability from the isolation between the schema and application code
A nice side-effect of using a data templating language like jsonnet is that it’s impossible to write application code in it. As a result, the primary service using the API (ie. the service serving the API) is an equal consumer of the schema as everyone else. Without this isolation, it can just be too tempting to write the schema in the primary service.
Typically when a schema is written in a general purpose programming language we leverage the language’s type system. While this approach can feel faster at the outset, it quickly causes issues. Type systems can vary a lot between languages. Features that are natural in one type system may be unsupported in others. Furthermore, more expressive languages tend to have more feature-rich type systems, but this also generally results in slower runtime or compilation speeds. But ultimately the real problem with repurposing a type system to generate a schema is that a schema is data, and type systems are a poor way to generate data.