I’ve just gone through this process on GCP and omg was it painful. Not only are there so many options for building your app, but the products offered through GCP have some serious limitations that may affect the entire infrastructure set up.

What I wanted to set up

First off, let me explain what I wanted to have. Pretty simple really:

  • A backend API service that was fast and reliable, and could be called by multiple clients
  • A web client and mobile clients which call the API
  • Everything is secure!
  • Everything is serverless

And the guiding principles for the set up and infrastructure choices were as follows (in order of importance):

  • Security
  • Performance (latency and cost)
  • Simplicity

Initial investigations

I investigated A LOT of options out there. I had already chosen GCP as my platform mostly because I was already familiar with it. It’s definitely not the most cost effective option, but in the grand scheme of things this wasn’t a major issue (price differences aren’t that large). Plus, having worked at Google for over a decade played a major role in helping me become familiar with GCP but also the technology under the hood.

So I set out to look into various options.

Option 1: Fully develop on Firebase

So the first thing I looked into was Firebase. It’s super easy to do things like auth, call APIs (via Cloud Functions), or even query firestore directly.

But very quickly I realised I cannot rely on this set up. I needed a bit more control over my APIs. But some features were sacrificed as a result. Specifically:

Pros:

  • By using Cloud functions via Firebase you get all the auth handled (mostly). You may need to handle CORS, but that’s it
  • You also get to use the firestore security rules which is a super nice feature
  • It’s also very easy to call your functions in any client. Firebase has functions built in (web client example)

However, the limitations of Cloud Functions was the major reason for not going with this approach. Cloud Functions are great for small pieces of code that are triggered by various events, but not suitable for building a reliable API.

Cons:

  • Portability: It doesn’t make sense to containerize Cloud functions, they are small pieces of code with the runtime, server and endpoints abstracted (e.g. a python Cloud Function runs on Flask), making it more difficult to switch cloud provider in the future
  • Local development: it’s difficult to spin up all your functions for local development. But putting everything into one Cloud Run container means all your endpoints are in one place.
  • Cost and latency: Cloud Run will be cheaper in the long-run due to concurrency. Cloud Functions only sends one request at a time to a function instance. However Cloud Run can handle multiple. And since you’re paying for instance uptime, Cloud run becomes cheaper and faster as the product scales

Option 2: Use Cloud Run for API and FE client, but use Firebase in parallel

This option is definitely more complex and requires a lot more infrastructure set up vs option 1, but it was worth the cost, especially considering my “performance” guiding principle.

However I still wanted Google to do as much security as possible. So after a lot of research, I decided this would be the best set up for my API:

And very similarly for my FE client:

This high-level set up met most of my requirements, especially security wise. From a security perspective, these are the features that are available to you:

  • Clients will use the Firebase auth library to sign in and authenticate users. The tokens (access and refresh) are stored securely, and can be used to call the API
  • Load balancer comes packaged with Cloud Armor, securing against DDoS attacks
  • IAP ensures only authorized people (defined in the cloud project) can access a specific service. This was only used to protect dev instances from the public
  • API Gateway handles the typical security that is available in the OpenAPI / Swagger spec. But it’s critical to know that Google’s API Gateway doesn’t support the OpenAPI v3 spec. That aside, the gateway can do things like verify firebase id tokens, verify API keys and then call the Cloud Run service with a specific service account (meaning Cloud Run can be locked down)
  • Cloud run has two main things: 1) public vs private ingress and 2) authorization can be required. Since the API Gateway doesn’t run on a VPC within the cloud project, ingress has to be set to public (hopefully this changes soon). But authorization can be turned on so that only principals with the appropriate permissions can access the service

I also spent a lot of time trying to resolve CORS issues. Given my domain set up (the API living on api.example.com and the web client on app.example.com) and some limitations in the API Gateway and Cloud Run, this became a real problem for me (more on this later).

And the final thing to consider was CSRF. Since we’re not using cookies (I made that decision whilst exploring option 1), CSRF wasn’t something I had to worry about. Firebase authentication stores tokens in indexedDB and doesn’t use cookies.

Option 3: Build everything from scratch

Since I’m building a startup, this was not an option. There’s no need to build the infrastructure when Google’s done so much of it already. Plus the more code I write, the more opportunity for bugs.

This might be something I move to in the future, especially if we start requiring more customization options vs what Cloud Run offers. Since Cloud Run is built on Kubernetes, that’s a natural direction to move in.

My infrastructure decisions (so far)

Having chosen option 2, I did compromise simplicity and perhaps some performance, but I still benefited from all the security features from option 1.

API

Since I chose to build on Cloud Run, I needed to choose a framework to build on. I didn’t have to consider security here too much since a lot was handled upstream, and the rest was super similar for all frameworks. So according to my guiding principles, performance and simplicity were next on the priority list. I read an excellent performance comparison made between FastAPI, Express.js, Flask and Nest.js, and combined with me being comfortable with Python, the choice was clear - FastAPI.

But this is where I faced some issues when trying to deploy my API using Google’s API Gateway:

  • FastAPI outputs OpenAPI v3 spec, but the API Gateway only supports Swagger 2.0 (so had to convert it manually)
  • Also had to manually add securityDefinitions for the API Gateway to handle things like Firebase id token validation and API key validation - but this wasn’t a big deal
  • CORS however became a real problem, mostly because the API Gateway does not handle CORS properly today (details), and Cloud Run also has some limitations (details), meaning even though FastAPI had middleware to handle CORS, it didn’t matter

It took me a week to get all of this working, discovering limitations and issues at every turn. Testing out a large number of different configurations across all the components I had configured for myself (load balancer, IAP, gateway, cloud run). It’s safe to say that an “out-of-the-box” solution doesn’t exist so I had to implement a bunch of workarounds. And honestly, all of this is SUPER complex, so I documented everything because I knew that the following week I would have forgotten everything.

In any case, the API is up and running now, and is super secure, super fast and as simple as I could get it.

Web client

So far in my journey, I haven’t defined all the details here yet, but given the infrastructure choices so far, naturally the client will live on Cloud Run. If anyone has any suggestions though, I’m all ears 🙂

So.. looking back at my principles:

  • Secure
  • Performant
  • Simple

Security isn’t something I have to focus on too much here since all sensitive stuff is handled by Firebase authentication and the API. So on to performance and simplicity.

I first made a list of some options to consider:

FE Framework

Runtime (language)

Django templating

Django (Python)

Jinja templating

Flask (Python)

Vanilla JS

Flask (Python)

Node.js (JavaScript)

React

Node.js (JavaScript)

Angular

Node.js (JavaScript)

Vue

Node.js (JavaScript)

Svelte / SvelteKit

Node.js (JavaScript)

Phoenix LiveView

Phoenix LiveView (Elixir)

Ditching Django and Jinja

Since I’ve built a nice REST API, the first two options are out of consideration even though they’re definitely the most simple approaches. I don’t want to do browser page reloads all the time, and even though these frameworks are great for static, server-side rendered websites, they’re not the most performant when it comes to dynamic sites. And so it’s highly likely I’ll use one of these frameworks for the marketing site (probably Jinja + Flask), but not for the main app.

Ditching Angular and Vue

This wasn’t a hard decision. Everything about React is better in comparison, especially the developer community.

Ditching Phoenix LiveView

This is a really interesting framework, and has all the things I want. Server-side rendering, but dynamically updated on the client side. However Elixir is not a language I want to learn, so simply because of that it’s out.

Ditching vanilla JS

Even though I liked the idea of not having to learn a framework, the benefits outweigh the costs. Mostly because of reusable components, and libraries that handle all sorts of things (e.g. routing). Oh and also all frameworks bind JS variables to HTML, which is a ton of boilerplate JS to write all the time.

Choosing between React and Svelte

This is where I’m at currently, and haven’t made a final decision yet.

Performance: Svelte is a compiler and so is definitely more performant vs React since there’s no virtual DOM. Svelte does all code interpretations during build, and returns a vanilla JS bundle. React interprets code during runtime. Svelte also produces much smaller code, with much fewer lines of code written.

State management: This is the next thing I looked into. With React you need to wrestle with Redux, but in Svelte everything is built in. And so this is a +1 for Svelte, especially since everything I read about state management in React is making me want to run.

Support & community: React being the most popular framework by far obviously has a great community and a lot of 3rd party support. Svelte is in comparison very new, but the community is growing. So choosing React is the obvious decision from this perspective. It’s also the winner from the perspective of hiring developers in the future - the hiring funnel will be much larger.

Production readiness: React has been proven in production. There are many examples incl. Facebook, Netflix, Uber, etc. But I’ve also seen some interesting production uses of Svelte (see this Twitter mega thread), incl. Apple Music, Spotify, Ikea, Square, YCombinator and many more.

Learning curve: I guess this is more of a subjective consideration. Personally not knowing Svelte or React, and having had a quick play around with both, I’m confident that I’ll be more productive with Svelte. In the minimal testing I’ve done, I already prefer the built-in hooks in Svelte vs useState in React.

The conclusion here is that I’m leaning more towards Svelte, but if there are any other frameworks or decision points I should consider I’d love to hear from you!

Mobile clients

Given where we are in the product discovery process, apps aren’t on the roadmap as of yet. Plan is to build and iterate on the web client, make improvements quickly based on user feedback and other factors, and when we have a stable product we’ll move to build the mobile clients.