Rebuilding DocumentCloud’s Frontend

I spent a large amount of last year rebuilding DocumentCloud. A rebuild is always a risky proposition and carried some serious risks. I thought I'd take a moment to explain why it was the right solution and how we approached it.

In 2023, we added some new features like an improved Add-Ons browser and redesigned some existing components, which created some inconsistency between the old-and-new visual languages. At the same time, we were redeveloping existing components to support TypeScript and improved network handling, implementing modern best practices. The codebase was becoming increasingly fractured as we continuing developing using new methods and technologies.

After building on the existing site through 2023, it became clear that the existing codebase was an obstacle to speedy development. The site ran on a bespoke Svelte SPA framework and state manager. While this was the best solution when the last version was developed in 2018-19, solutions like SvelteKit had since eclipsed it. We were spending as much time working around quirks in the custom frameworks as we were shipping new features. In order to speed up development, we'd either have to invest time in updating the existing framework or replace it.

Early in 2024, Chris Amico created an experimental branch that used SvelteKit as an application framework. Within a few weeks, he had listing and searching working to the point that we could evaluate it against the existing site. A few things immediately stood out: the code was much simpler and the site ran much faster. This was the validation we needed to keep going. Around this time, Chris also index the existing application for all its functionality to understand how much work a rebuild would take.

This approach was aided by DocumenetCloud's decoupled architecture: the backend API runs as a separate service from the frontend application, which itself is the single largest consumer of the API. With this architecture, it meant we were swapping out the UI layer, without worrying about disrupting the core functionality of the service.

By the end of March, we had a functional prototype to share with the rest of the MuckRock team. We demonstrated the gains in speed and simplicity, while sharing the checklist of features to implement. Our next step would be to implement a feature-complete domain within a month. We figured that if we could achieve that next milestone without exceeding an estimated timeframe, then we'd have the confidence to proceed with the rest of the project.

Fortunately we were very successful, reaching that milestone ahead of our deadline. We were already feeling the productivity benefits of switching to SvelteKit, and felt very confident that the rebuild would proceed on schedule. With buy-in from the rest of the team, I charted out the next 5 months of development, focusing on addressing high-risk areas first. These were either the most complex features and functionality, the ones most critical to the application's utility, or the ones where we needed to understand more in order to succeed.

One-by-one, we checked off those high-risk areas until all we were left with was the long checklist of more mundane features: CRUD operations, interface elements, and error handling. Along the way, we would periodically refactor to reflect improved knowledge or evolve the codebase structure.

The last few months were spent winnowing down our open tasks until we successfully launched a beta in early October, which was followed by a wide release. During our beta period, we monitored Sentry for crash reports and issues and checked in on submissions to a free-form feedback form we created. These two solutions—one qualitiative, one quantitative—provided us with strong feedback for where we needed to focus pre-launch.

Now that the new DocumentCloud is live, we've been able to release changes much faster. It means we're now introducing new features and fixing issues raised by our feedback mechanisms. The challenge now is in scaling and optimization, as the DocumentCloud service has millions of visitors a month, and these issues didn't reveal themselves until we had completely migrated to the new version.

While a rebuild is always risky, I think we managed the risk extremely well by continually validating our approach as we iterated towards the final solution. The outcome is now a much simpler, much faster application and a speedier development cycle, both of which let MuckRock more effectively and efficiently serve the millions of DocumentCloud visitors and thousands of newsrooms who rely on the service for their reporting.