Bringing a new SaaS to life (part 1)

by Eric Z. Ayers in Tech

Early system concept

Early system concept

Last Fall, a former colleague and I started working on an idea for a new product we'd like to launch in a startup. One of the customer needs was a very quick response time (50ms) to a query that would likely require a database lookup. We noodled out a requirements document for a proof-of-concept and got some advice from an experienced DevOps manager. Based on what we wanted to do, we very quickly settled on a couple of core technologies:

  • Google Spanner for the interactive database
  • PubSub for messaging between service components

Both of these choices meant we would be deploying on Google Cloud Platform.

I've had experience with running services with Spanner in the past from other jobs, but not first-hand experience developing and deploying. The attractive thing about Spanner is that it is super reliable, scalable, and can be deployed across an entire continent as a single database. At first, I was reluctant for cost reasons. It felt like the most expensive choice, but we used the pricing tool and volume numbers pulled out of thin air to prove to ourselves that costs would be manageable and at large scale, maybe the most cost efficient choice.

Great! We've settled on a single cloud provider. That narrows a number of the design choices. Now, we had other choices to make for the front end and backend.

For the front end, we decided to make a single page web app and use React + TypeScript. Neither one of us is a front end engineer, but agentic programming seems to be working pretty well with that combination.

I was much more interested in the back end, as that's my forte. Suddenly, there were lots of choices to make, that felt like "type 1" decisions that would be difficult to change once made.

  • How should we store our raw data for later analysis by data engineers?
  • What language should we use?
  • Which Web Framework should we use?
  • How should we deploy our service?
  • Should we use dedicated instances or serverless (Cloud Run) deploys?

I agonized over these choices, so it's worth recounting the decision making process I went through.

Methodology

As I mentioned, early in the process we consulted with an experienced engineer to make some high level design choices. That was invaluable advice. Going forward, I had to make more decisions, mostly on my own. To make these decisions, I used a number of sources.

  • My own experience
  • I started subscribing to more newsletters and reading Hacker News more diligently to see what ideas were currently buzzy.
  • I read through some of the GCP training materials and documentation
  • I asked AI agents for options and as a starting point to research specific solutions.
  • I reconnected with my tech network

I'll have to say that I was worried that my personal experience was going to be dated and be years behind as I haven't been working in the industry for the past four years. In some ways, it was dated, but most of the things I've learned have only evolved. The concepts are still pretty much the same.

Reading Hacker News and doing some side quests into documentation has been super helpful. I feel like I've squirreled away tons of useful ideas into issues to work on later. I wrote a few Google Docs with ideas I wanted to try, and many of them I was able to start using.

As for using AI Agents, that has been a true game changer. Working alone is tough for many reasons. With an AI Agent, there is always "someone" there to bounce ideas off of, help me research, or review my decisions. I've been using mostly Gemini for my research, but also ChatGPT and Claude. One of my favorite things to do when I'm out walking around is to turn on the live/voice mode and just have conversations with the AI, asking it questions and then probing into details as far as I want to go.

Reconnecting with some of my former co-workers is one of the things I love most about getting back into Software Development. It's not just the technology that I love, but sharing it with people. And everyone these days is talking about AI. There's no better way to learn how to make use of it than sharing best practices with a fellow practitioner.

Datastore

As I mentioned, we had chosen Google's Cloud Spanner for our primary datastore. But there's also the option to use a mix of Postgres and Spanner, and also to throw in GCS (file based storage) and BigQuery.

For the database layer, I saw there were choices. Spanner offers a native driver where you use it almost like a columnar store. It also supports a DML layer (SQL). I was hoping that I'd be able to use a standard library that would allow me to target Postgres if needed. Maybe I could use Postgres in development and staging environments for the datastore instead of Spanner to save on costs. Using some database middleware might give me that option.

What I decided to do was to use tried and true ORM style programming on top of the most popular SQL library for the language that I chose. There was something about ORM that always drove me nuts, and that is thinking about all the wasted bandwidth and copying of data across multiple layers with each database access. On the other hand, you are working in objects that look more like in-memory objects that naturally map to the database. It allows you to spend less time thinking about moving data in and out of the database and just focus on the business logic of the code. Maybe the performance wouldn't be too bad...

As for deciding whether to store in flatfiles, BigQuery, or Spanner: for now, I decided to just throw everything into Spanner. That was going to be the most expensive option, but it seemed too early to optimize for cost. And the cost wasn't crazy expensive at prototype scale.

Language

I read a blog post from Steve Yegge that language choices don't matter anymore. Gosh, maybe he is right? I mean, I never thought that JavaScript would be a popular server side language, but, he was right about that. But nevertheless, I did need to make a choice because the code has to be written in... something.

There are a number of languages I feel very comfortable with. Java and JavaScript are super popular, and I have lots of experience with them.

I almost immediately dismissed Java, even though it's the language I have the most experience with. Java servers have a very large memory footprint, and although it is fast enough, I felt it might be a bit too cumbersome to start out as a prototype. JavaScript is quick and nimble, but I like lots of guardrails on my code. TypeScript might have fit the bill, and heck, they use JavaScript for the SpaceX Dragon capsule user interface. But no, I'm sorry. With apologies to Steve Yegge, I just couldn't bring myself to write a backend in JavaScript. Especially not after having to constantly reset the entertainment console in my old Tesla.

I considered golang. I loved writing code in C, and Go is probably the closest modern language to C. And I knew that goroutines are lightweight, efficient, and easy to write. Since Go is a compiled language I knew it was fast. But I wasn't experienced in golang. And wasn't it garbage collected? Ugh, I didn't want to debug weird performance issues. And since I wasn't experienced, it didn't feel like a great time to learn a totally new language.

Speaking of new languages, last summer during the break from school, I spent quite some time learning to write Rust. My son and I did the Advent of Code exercise from the year before as a way to learn. I started liking Rust a lot. I especially thought that it was super cool that it helps protect programmers from themselves in preventing you from writing bugs into the code, and that, if done right, extending the code is almost as simple as just going in and fixing all the compiler errors. Also, if I never have to debug a null pointer exception again in my life, well, I'm fine with that.

But then, I had written enough Rust to know that a lot of libraries that seemed like they would be super popular were actually thinly supported. If I were to write in Rust, I felt like hiring new engineers might be tough. It's not an easy language to learn. Then again, maybe it would make finding great candidates easier for precisely the same reason. I think the final decision was made when I pulled up the Google official driver for Spanner on Rust. "Experimental code. Do not use in production." If you can't depend on your database, then you are out of luck.

Then there was Python. I loved Perl back in the day, so naturally, I took to Python when it came out. It was like a version of Perl where the syntax was actually well thought out. I've written lots of Python code in my career, and there have been some interesting developments in Python in the past few years. Optional typing, asynchronous programming, and multiple great frameworks were available. I knew I'd be using AI agents to write code and as a top ten language, Python certainly fit the bill.

As for performance, I knew Python wasn't the fastest language. It's interpreted and although it's fast to write and easy to change, you pay for that convenience at runtime. In fact, in an old system we had a C and a Python implementation that ran side by side. The Python implementation was almost 100x slower than the C version.

In the end, I decided to use Python because, well all the reasons mentioned above. In short, I'm good at it, the community supports it well, and it's quick to write. And hey, that 100x slower anecdote came from over 20 years ago, how bad could the performance be?

Web Server Framework

Now that I'd chosen the language, it was time to decide which libraries I'd use. Unlike in JavaScript, in Python, I had a relatively narrow field of Web frameworks to choose from. I knew I wanted to use something popular with strong community support.

Django is a rich and powerful server framework. But then when I looked through what all of those integrations did, I felt like most of them were irrelevant. Our front end was going to be a single page app. We didn't need a ton of server side rendered plugins. I'm trying to write microservices here, not a full fledged webapp. It felt complicated and kind of overkill.

I looked at the Flask library. It's super popular, lightweight, and well supported. But it is a little dated.

Finally, I saw FastAPI. It's similar to Flask, but opinionated about using async Python programming. Async programming felt a little complicated at first, but it suits Python well. I knew that threading was inefficient in Python due to the Global Interpreter Lock (GIL) causing many operations to bottleneck in multi-threaded programs. I had gotten a good taste of writing async code over the summer using Rust. The more I thought about it, the more I liked the idea of it. It seemed like there were not nearly as many 'gotchas' as when trying to coordinate synchronizing heavily threaded code in C or Java.

Infrastructure Provisioning

I've worked a lot in cloud development at Google and Square, but never had full responsibility for the design and deployment of the infrastructure.

At Google, developers just threw code over the fence, and DevOps mostly figured out how to run it. I managed a small website at Google deployed somewhere in the vast cavernous datacenters. One time, I actually got a tour of a datacenter and got to peek at one of the servers where it was running.

At Square, our cloud infrastructure was entirely home grown, modeled after best practices at other big companies where many of us had worked before. As I was leaving, Square was busy migrating all of that home grown infrastructure into cloud services.

Fortunately, I've done enough networking and system administration to feel comfortable around all the concepts used in modern cloud infrastructure. Now, all I had to do was run a few commands or press some buttons on the screen... Oh my goodness, it was really complicated and seemed hard to know everything that needed to be wired together.

I decided to deploy using Terraform, which I had never done before, mainly because I was enamored with the idea of infrastructure as code. Or was it really because I've never met a gcloud command that I liked. It amazes me how unfriendly and esoteric that tool is. Terraform was much less hit-and miss, and if I made a mistake, or wanted to make a second environment, I could always recreate everything all over again. And I had AI to help me write it. I knew that the DevOps folks swore by Terraform. This was one case where the decision was a no-brainer.

Instance types to use for Deployment

When you start looking at hosting your app on the cloud, there is a huge menu of machine types and sizes to choose from, all with different pricing. Then, once you make that decision, you have to coordinate how they stop, start, and communicate with each other. I had used the precursor to Kubernetes at Google. It was complicated! I hear that Kubernetes follows in the same vein.

I was not very receptive to the serverless deployment pitch. In theory, you get automatic provisioning and no fuss scaling. In practice, our team at Square had found that it was slow and expensive. Would Cloud Run (GCP's serverless offering for web services) prove to be the same?

In the end, I decided to use Cloud Run simply because it was cheap and easy to setup. I've used dedicated instances before, but never Kubernetes. I was eager to get up and running quickly, and Cloud Run promised to do that. The Infrastructure expert said using Cloud Run was a no brainer for a proof of concept. And, it was so inexpensive - I could configure multiple environments super cheaply (because you don't pay for idle instances.) It sounded perfect for bootstrapping an unfunded startup.

Reflection

I made those major decisions 6 months ago, which seems like an eternity. Did I make the right choices? I'd love to hear what you think. But do you think I'm going to add a comments section to a blog in 2026? Ain't nobody got time to deal with that. Let me just say that I felt like they were good decisions, knowing what I knew then. Within 90 days, I had a working prototype of our backend deployed to GCP that had almost all the functionality we came up with in August for our MVP. And I learned a lot.

Maybe you would be surprised to hear that almost every one of these decisions is being revisited at this point in development, not even 6 months in. Fortunately, my assumptions about them being type 1 decisions are turning out to be wrong. I look forward to writing some more about the twists and turns that I found along the way.