DigitalOcean App Platform

15 May, 2021

I’ve been quite impressed with DigitalOcean’s App Platform service. The team there struck a good balance of providing some extensibility while not going so far as to undermine the core value proposition of the service: simplicity. As a quick example: this blog is written using Hugo. It’s hosted on GitHub, and using the DigitalOcean admin console to start hosting the blog was a breeze (and it’s free!).

This said, the more interesting use cases of App Platform comes when it’s combined with Pulumi (I’ll assume some familiarity with Infrastructure-as-Code solutions and how Pulumi differentiates itself). To get started, here’s the Pulumi Python code that’s used to host this blog:

import pulumi_digitalocean as digitalocean

digitalocean.App("blog", spec=digitalocean.AppSpecArgs(
    name="blog",
    region="sfo",
    domain_names=[digitalocean.AppSpecDomainNameArgs(
        name="tomlinford.com",
        type="PRIMARY",
        zone="tomlinford.com",  # Makes DigitalOcean manage the domain name
    )],
    static_sites=[digitalocean.AppSpecStaticSiteArgs(
        name="blog",
        # Builds to the public directory, which App Platform auto-discovers for
        # static sites.
        build_command="hugo -d public",
        environment_slug="hugo",
        github=digitalocean.AppSpecStaticSiteGithubArgs(
            branch="main",
            deploy_on_push=True,  # Any new commit pushed auto-updates the blog!
            repo="tomlinford/blog",
        ),
    )],
))

Very easy! This can also obviously just get checked in with the rest of the repo, inside of an infra directory for instance.

Let’s break down a more complex example from the Pulumi reference for App Platform. Comments inlined are mine:

import pulumi_digitalocean as digitalocean

digitalocean.App("mono-repo-example", spec=digitalocean.AppSpecArgs(
    databases=[digitalocean.AppSpecDatabaseArgs(  # Managed postgres database
        engine="PG",
        name="starter-db",
        production=False,
    )],
    domains=[{  # Basically the same as the above example, just more weakly typed
        "name": "foo.example.com",
    }],
    name="mono-repo-example",
    region="ams",
    services=[digitalocean.AppSpecServiceArgs(  # API service
        environment_slug="go",
        github=digitalocean.AppSpecServiceGithubArgs(
            branch="main",
            deploy_on_push=True,
            repo="username/repo",
        ),
        http_port=3000,
        instance_count=2,
        instance_size_slug="professional-xs",
        name="api",
        routes=[digitalocean.AppSpecServiceRouteArgs(
            path="/api",
        )],
        run_command="bin/api",
        source_dir="api/",
    )],
    static_sites=[digitalocean.AppSpecStaticSiteArgs(  # Client-side web app
        build_command="npm run build",
        github=digitalocean.AppSpecStaticSiteGithubArgs(
            branch="main",
            deploy_on_push=True,
            # This example uses a monorepo, and can just reference the same repo
            repo="username/repo",
        ),
        name="web",
        routes=[digitalocean.AppSpecStaticSiteRouteArgs(
            path="/",
        )],
    )],
))

What didn’t click with me the first time I saw this was the extensibility with this as a starting point. There’s a bunch of different directions one could go:

Improve the build process

1. Use a `stable` branch to permit for better testing before automatic deploys

With deploy_on_push=True, any new commit immediately gets deployed. This requires fairly comprehensive pre-commit testing, but even that can miss cases. Typically this happens when two developers are working on the same bit of code and both land after the initial tests have passed. Ultimately it’s best to re-run the full suite of tests before a deployment and wait for that to be green.

To do this, one can set up a build job to watch the main branch, run tests, and then if the tests pass bring the stable branch up to the main branch. This then triggers the deploy. To actually make the change, any reference to "main" would just need to get switched to "stable", and of course this could just be done with a constant at the top of the file (or refactoring out the GithubArgs).

Note this is basically the same strategy as “git flow”, but with main instead of dev and stable instead of master.

2. Specify `Dockerfile` instead of using buildpacks

By default, App platform uses buildpacks, which uses some heuristics to figure how to build the app. Naturally, this doesn’t cover every scenario (eg. users of python poetry are out of luck). To fix this, one can just define a Dockerfile listing the build steps and then set the dockerfile_path argument for each service/etc.

3. Build images independently

Of course, for maximum control teams will want to build images themselves. This permits using a set of base images as “golden images,” which can provide a more secure baseline.

DigitalOcean has a registry that can be used for this purpose. Images built can be pushed to the registry, and then the github directive would be swapped out for something like:

image=AppSpecJobImageArgs(
    registry_type="DOCR",
    repository="mono-repo-example-server",
    tag="latest",
)

At this point though, the release process would also likely need to be iterated on, since swapping out to use the image directly will create a need trigger releases.

Improve the release/deploy process

The great thing here is that the release process is already pretty good. With doing #1 above, we already have pretty comprehensive testing and CD. Making changes from here starts creating trade-offs.

1. Switched to tagged releases

Tagged releases have some benefits when switching off of CD. One case is protecting against backwards incompatible database changes. For example, when adding a new column to a table in a relational database, one would first make the column nullable, wait, and then in a separate release and migration backfill values in the column and make the column non-nullable. Bundling a set of commits in a single release enables validation to ensure that the set of commits don’t create these types of issues.

This change could be done with the github directive – for each release teams could make a new branch and update the branch= argument. Of course, typically tagged releases would be done in tandem with building images independently.

2. Build out a UI for managing releases and deploys

Every engineering team wants to get to that point where deploys are just a matter of clicking a button in a web UI and sticking around for a bit as the rollout occurs and automated checks are run. This could be built in a simple way by storing the version to deploy in spaces storage and then retrieving the value in pulumi with the getSpacesBucketObject function.

Switch to raw kubernetes

Ultimately, App Platform will hit its limit as a team’s app scales. For instance, currently there’s no way to set up autoscaling groups since services only takes instance_count, a single number governing the number of “instances” in a “component.”

Nonetheless, pulumi has kubernetes provider, effectively can replace all the yaml typically associated with kubernetes deployments. And since pulumi of course can also control managed DigitalOcean kubernetes clusters, teams can set up kubernetes and deploy on kubernetes in the same place.

So, this means that we can effectively treat App Platform as an interface which can be re-implemented in either App Platform or kubernetes.

Let’s play around with this idea. First we refactor out some pieces, making some assumptions about project layout:

from typing import Union

import pulumi_digitalocean as digitalocean
from pulumi.output import Output

def digitalocean_app_platform_app(
    name: str,
    domain_name: str,
    version: Union[str, Output[str]],
    api_min_instance_count: int,
    api_max_instance_count: int,
):
    # Can't auto-scale, so client must be aware that these numbers have to be the same.
    assert api_min_instance_count == api_max_instance_count

    return digitalocean.App(name, spec=digitalocean.AppSpecArgs(
        # Very similar to above
    ))

app = digitalocean_app_platform_app(
    name="example",
    domain_name="example.com",
    version=get_version("example"),  # Could read from an object bucket for the version.
    api_min_instance_count=2,
    api_max_instance_count=2,
)

Now the exact same function can be implemented for kubernetes:

from typing import Union

import pulumi_digitalocean as digitalocean
import pulumi_kubernetes as kubernetes
from pulumi.output import Output
from pulumi_kubernetes.apps import v1 as kube_apps
from pulumi_kubernetes.core import v1 as kube_core

def kubernetes_app(
    name: str,
    domain_name: str,
    version: Union[str, Output[str]],
    api_min_instance_count: int,
    api_max_instance_count: int,
):
    cluster_args = digitalocean.KubernetesClusterArgs(
        # See https://www.pulumi.com/docs/reference/pkg/digitalocean/kubernetescluster/
    )
    cluster = digitalocean.KubernetesCluster(f"{name}-cluster", args=cluster_args),
    kube_provider = kubernetes.Provider(f"{name}-do-k8s", kubernetes.ProviderArgs(
        kubeconfig=..., # Pull from cluster above
    ))
    # Also need to set up the keys for pulling from the registry and store as a kube secret.
    deployment = kube_apps.Deployment("api-deployment", args=kube_apps.DeploymentArgs(
        # Replacement for YAML would effectively be here, but need to include the secrets
        # for pulling from the registry.
    ))

This obviously gets complicated quite quickly. But ultimately with this approach it becomes up to individual teams when the added complexity is worth the trade-off. In the mean time, developers building business logic are happily building their APIs and UIs, creating new apps as they need. And if and when a migration ultimately happens, everything can be managed through code in pulumi.

Why this matters

DigitalOcean’s App Platform seems to be the first PaaS offering that lets teams go from nothing to code in production in under a day without vendor lock-in due to its credible migration paths out.

The alternatives with higher vendor lock-ins:

Google App Engine. Super painful migration out since app engine gets so close to the code.
Heroku. Engineers I’ve talked to on Heroku have told me about how hard it is to migrate off. And Heroku server costs can really add up, creating a strong desire for an alternative approach.

The most common path nowadays that takes more time is to use a managed kubernetes service, which can take quite a bit of time to understand what’s going on under the hood.

Anyways, props to the DigitalOcean team working on this feature. This is quite a boundary of abstraction and any engineer working on infrastructure would be wise to play around with it to see that boundary.

#Infrastructure