TL;DR

Issue 1: All projects that were created after April 18 11:30 UTC until April 22 11:39 UTC were considered invalid when making a request to api.web3modal.com because the API was still querying data from the temporary replica that was out of sync with the latest projects created. 494 developers created a project during that period. Issue 2: While fixing the issue and redeploying a new version of the worker, another bug occurred where all projects started receiving a 400 bad request response.

Summary

Root Cause

While upgrading our Supabase instance to the latest Postgres version, we had to point our workers to a temporary replica to avoid downtime. That process involves updating environment variables manually through the Cloudflare dashboard UI twice (once before the upgrade and after the upgrade to point back to the original db).

Issue 1 The first issue occurred because the environment variables were not properly updated to point back to the original DB.

Issue 2 The second issue occurred because there was a connection issue between our Cloudflare worker and Supabase.

5 Whys

  1. Why were new projects considered invalid when making requests to the API?
  2. Why was the API querying data from the temporary replica?
  3. Why were the environment variables not updated?
  4. Why were the updates to the environment variables mishandled?
  5. Why was there no logging or alerting in place?

What could we have done better?