Rewriting a large codebase is almost never a team’s plan A — or even their plan C. Joel Spolsky called it “the single worst strategic mistake that any software company can make.” For our Samsara Driver App, built on React Native, it was the right call. In this post, I’ll talk about why that was, how we shipped a large-scale rewrite successfully to meet a critical deadline, and how that rewrite back in 2019 helped us build the foundation for launching four new apps by 2022.
A Looming 64-Bit Deadline
Let’s set the stage: It’s 2018, and we hear a scary piece of news. Starting in August 2019, all updates shipped to the Google Play Store will have to support 64-bit architecture. At the time, we had one mobile application in the Play Store and iOS: the Samsara Driver App, which professional drivers use for things like talking to their dispatchers, recording vehicle inspections, and tracking their hours. While we had lots of developers doing mobile work and building new features, at that point, we didn’t have any mobile-only engineers specifically focused on infrastructure.
Our app is built on React Native, which has been great for us — we love how it lets us structure our teams autonomously, and everyone can contribute to mobile, with engineers building features that work for both Android and iOS without a lot of overhead. Using React Native has been a conscious decision to help developers to ramp up quickly and contribute across our entire stack; we use Go for any code that runs on devices in our control, and React + Typescript for all code that runs on devices we don’t control, like web and mobile apps.
But with this news from the Play Store, we suddenly had a major problem. Our app was built on React Native v0.34, which only supports 32-bit architecture, and we hadn’t upgraded versions, well, ever. We only had a short amount of time to ship a 64-bit app, or we’d have no way to push fixes and improvements to our users. The latest version of React Native at the time, v0.59, was the first version to include 64-bit support. So, what options did that give us?
The Solution Space
Given the business constraint “We need to ship a 64-bit Android app to the Play store,” we considered a few high-level approaches:
- Back-port 64-bit support from React Native v0.59 to v0.34.
- Write a new, bare-bones Android app from scratch without React Native, but continue to use React Native for iOS.
- Upgrade the React Native version of our app in place to achieve 64-bit support.
- Make a brand new, empty React Native v0.59 app and move our code into it, feature by feature, making changes where necessary.
We quickly ruled out options 1 and 2.
A back-port seemed intractable for multiple reasons: there were a ton of commits between React Native v0.34 and v0.59, we’d end up with a very niche codebase and a ton of debt, and we relied on third party libraries that also needed 64-bit upgrades.
Rewriting our app in native Android was also relatively easy to rule out — we’re very happy with how React Native lets us structure our engineering organization where developers can learn a small number of things and contribute to all parts of our codebase. If we switched to native Android, we’d give up this benefit — we’d either need to couple feature owners to a dedicated Android team, or all those feature owners would have to learn to be productive in the Android ecosystem.
Options 3 and 4 were more interesting.
Upgrading React Native In Place
Intuitively, upgrading our app in place seemed like the best option. Developers at Facebook and others we spoke with in the React Native community recommended that we approach the upgrade version by version. So, we gave that a try.
After a few developer-months, we’d migrated from v0.34 to v0.37 and hit a wall. Here’s what we learned along the way:
- This development model is “you vs the compiler”: you install the new React Native version, try to build the app, get an error, and try to change code until the error goes away. Then you get another error. We found it really difficult to estimate how many errors were left in a given upgrade and how long they would take to resolve. Furthermore, this work couldn’t be parallelized across developers, since you only see one error at a time.
- Unlike with some of the other options, which allowed for partial degrees of success, in this case, success was binary. Either we would get every file in the app up to v0.59 support in time, or we wouldn’t, putting us in the same bad position as before.
- Third-party libraries were tricky! Along the chain of dependencies, there could be incompatible releases (e.g. a library required by React Native v0.38 depends on a second library that requires React Native v0.45).
- By design, this process requires a lot of redundant work. Multiple files end up getting rewritten multiple times to handle the nuances of new versions along the upgrade path.
In the end, the uncertainty in this approach was a blocker for us. We couldn’t estimate. We couldn’t parallelize. We might get really far and then find that, for example, version 0.46 was super difficult and required tons of code changes. If any of these surprises happened, we wouldn’t have any 64-bit support in time for the deadline.
Therefore, we followed the least likely path: the complete rewrite.
Choosing To Do a Rewrite
Approaching this project as a rewrite, as opposed to using one of the strategies described above, had the most favorable properties for us. In summary:
- We’d end up with a future-proof codebase without substantial debt.
- We could continue to leverage our existing expertise and cross-platform team ownership model.
- We knew we would eventually succeed.
- We could reason about the migration timeline without too many unknowns.
- The work was parallelizable, freeing teams to work on their own features on their own schedule.
- We had a gradient failure mode. Even if things didn’t go well, we could still have a shippable app with at least some features in time for the deadline.
At this point, we’d lived in our 32-bit app for a while, and we had built up a list of qualities we might want a fresh start to have. We’d want our codebase to optimize for:
- testability, especially integration tests of user behavior;
- spinning up new apps relatively easily;
- sharing code and infrastructure across apps; and, relatedly,
- making sure infrequently-modified apps don’t get stuck on old infrastructure (like React Native v0.34).
Fortunately, as of 2018 we’d already begun working on these improvements and even had a new React Native v0.59 codebase to start from. Customers had been asking for an app designed for fleet managers (as opposed to drivers), and in investigating how to launch this quickly, we discovered that more than 50% of our Samsara Driver App codebase was infrastructure that could be shared across future apps: things like logging, offline reconciliation, navigation, the event bus, auth, storage, Code Push, and our component library.
So, we’d built a proof-of-concept React-Native v0.59 codebase that could produce multiple apps from shared code. It used build-time environment variables (“flavors” on Android and “targets” on iOS) to produce distinct app bundles that could include different code and open to a unique screen. We ended up using this codebase to rebuild the Driver App.
We knew that a full-scale rewrite would require lots of time and attention. We didn’t yet have a good idea of exactly how much time, but we had a feeling we might be down to the wire with the Google deadline. While some of us were planning the rewrite and building proofs of concept, other team members worked on fortifying our existing app. A serious latent production issue in the old codebase or an urgent customer request could take time away from the rewrite at a crucial moment, so we wanted to reduce these risks as much as possible. We built some new monitoring tools, cleaned up some long-standing bugs and code debt, and delivered some relatively quick UX wins for our Driver App users.
Testing a Proof of Concept
We knew we could build new React Native apps in a fairly consistent, short timeframe because of our recent experience with the fleet management app. But we didn’t know how a rewrite of an existing app would compare — would it be faster and easier because we already knew what the behavior of the app should be? Or would it be slower and more difficult because we’d have to dig through code to find those behavioral expectations and reproduce them faithfully?
To find out, we built a proof of concept. We chose a medium-sized feature area from our Driver App: vehicle inspection reports. Then we spent a two-week, two-developer sprint building out the functionality in our new codebase. One of us knew a lot about the new codebase, and the other knew a lot about vehicle inspection reports in the old codebase. We tried two approaches, each for a handful of the feature’s modules:
- Treat it like a brownfield port: copy-and-paste the module from the old codebase into the new one, and make changes until it works.
- Treat it like a greenfield feature: read the old code carefully and write many failing tests in the new codebase describing its behavior, then write code from scratch against those tests.
To our surprise, we found the second approach much more effective. We ended up with better code and better tests, and the development process was faster and smoother.
Following the proof of concept, we developed a runbook for how to apply this approach to other features. At this point, we had a repeatable process for moving features into the new codebase, and we had a baseline that could help us estimate how long a set of features might take. But we still didn’t have a firm grasp of the size of the project.
Before work could begin in earnest on rewriting our app, we needed to understand how to allocate engineers to this project. If we had five full-time people, would we get the app done in time for the deadline? What about ten people? How much buffer would we have? What was the critical path: which features had to get done before others could start? A Gantt chart could help answer these questions.
We started by skimming each file in the old codebase and making a list of all of the features in OmniPlan. We made our best guesses at which features required special domain expertise, which features required special infrastructure expertise, and which features needed to be completed before others. At this point, we knew roughly how many person-months of labor the project would take. To learn how to spend those person-months, there was one more thing we had to factor in: onboarding.
We liked the model of pairing domain experts with new-codebase experts. We figured that after a few weeks of working together, a domain expert would become a new-codebase expert. Over time, we would grow our pool of codebase experts and increase our capacity. We incorporated this expectation into the Gantt chart. We also incorporated the people themselves — who is a domain expert on what? How many developers know about each feature?
We learned two main things from this exercise:
1. We had a clear critical path
Our app has around six big feature areas, with Compliance (features to help drivers follow the law) being the largest. By developing infrastructure just-in-time for features that required it, along with implementing our pair programming strategy to gradually grow the new-codebase expertise of our Compliance domain experts, we’d barely make it on time.
Other areas were much easier, and teams had flexibility around when they could work on their features while still shipping on time. This helped us focus even more on the critical path of Compliance.
2. The project was feasible(!)
If we could get everyone up to speed on the new mobile codebase and only hit the expected (large) number of surprises, we could get a feature-complete app in time for the deadline.
I can’t over-emphasize how helpful it was to have a document that said the project was feasible. On a personal level, it helped emotionally; I could refer to that Gantt chart every time I felt overwhelmed by this massive rewrite (everyone knows that a rewrite is the worst thing you can do in software engineering), and I could feel reasonably confident that we’d finish in time for the critical deadline. And it helped tactically; I could look managers in the eye, hold up the chart, and say “If your feature is going to make it into the new app, this document says that X number of people from your team need to work on it.”
Execution… and the Last Mile
Once we had a plan in place, completing the project was just a matter of execution. Accounting for surprises that might come up along the way, our plan gave us six months of feature implementation, followed by an additional two or three months of optimization and QA before releasing the new version of our Driver App to the majority of our users. Three months into feature implementation, we’d be able to start Beta testing with some customers who used only a small number of features, and if everything went well we could expand the test group from there.
Accounting for development time beyond the initial feature implementation turned out to be crucial. We had about 20 person-weeks of “last-mile” issues to get through after we were feature-complete. Last-mile issues fell into two categories: bugs and performance.
One bug in particular was interesting; we stored a user’s username (for pre-filling the login form) differently in the two app versions, and we didn’t have migration code so that a stored username from the old app would populate in the new app. This meant that the very first time you logged in, you’d have to enter your username again. Well, many people didn’t remember their usernames. Even in a tiny Beta, there was a week where this was the top issue that drivers called our support team about! It’s always good to get a dose of user feedback. 🙂
As for performance, a first-pass implementation of anything in React Native tends not to be very snappy on low-end devices. Because of our foundation of shared code and a small number of widely-used abstractions, we were able to build performance instrumentation and make optimizations in a small number of places that had a massive impact on all our apps. But some code areas still needed in-depth optimization. Over a month or so, we built tooling to detect wasted React renders and needless cache misses, and we whittled down the issues until the new app was at least as fast as the old app for all interactions (and much faster for many).
Now, it’s two years later, and the Samsara Driver App is better than ever. Literally every metric we track — performance, error rates, customer sentiment, and others — is better than it was in the old app. To this day, we still get feedback in our developer experience surveys about how much easier it is to write Driver App code now than it used to be before the rewrite. Plus, our multi-app ecosystem is strong! We’ve built five customer-facing mobile apps from this codebase, and introducing a new app isn’t scary anymore.
So, that’s why we rewrote our biggest mobile app in 2019. In the future, we’ll post about a variety of other React Native mobile development stories and get into specifics around some cool things we’ve built. If these topics interest you, please reach out! I’m always happy to chat about mobile development. Plus, Samsara is a great place to work on React Native, and we’re hiring!
Mobile \ Read more \ 09.08
Read more about iOS and Android development.
Calculator – A clone of the Apple Calculator on SwiftUI
A clone of the iconic Apple Calculator developed with SwiftUI. This is the product of a tutorial series I published...
Mobile \ Read more \ 08.08
Read more about iOS and Android development.
Now in Android: 65 – Android 13 Beta 4, Jetpack Compose 1.2 stable, Wear OS, and more!
Welcome to Now in Android, your ongoing guide to what’s new and notable in the world of Android development.
GitHub launched Projects
Projects are available now on GitHub.
droidcon Berlin 22 Videos
Over 100 videos from droidcon Berlin. The conference was held in early July, and now the records of all reports...