debugging the new site
The refresh needed a refresh. That became a framework switch and site rendering change. But let me explain why...
the launch
Last September, I relaunched this site as a multi-faceted experiment. I wanted to dive in on Remix as a potential react-based framework to rebuild some existing side projects. The website itself was long overdue for a refresh (I launched the gatsbyjs version of this site some years ago). I wanted to dive in more on hosting sites and applications on edge hosting providers - so I launched it on Vercel. And I stuck with sanity as the CMS because I really do enjoy it as a developer and a writer. I also have really come to enjoy their GROQ query language - more than GraphQL.
The launch was awesome! Site was fast, Remix was great... and then the problems started.
overages and queries
I started noticing that my resource usage on sanity.io ballooned to well over their generous free limits (at the height, I was averaging 44K requests to CDN content endpoints A DAY, without the site traffic to match). I could not, for the life of me, figure it out (I wish sanity's "Usage" panel also showed sources, but maybe that's a limitation of the free account).
I wracked my brain. I occasionally hopped into Sanity's slack. And the entire time, I'm actually now starting to pay for my CMS usage, which would be fine if, again, I had the traffic to match.
An engineer at Sanity replied to a support request I sent in a frustrated and exasperated state. It gave me the clue I needed to know what was happening. There were 5 particular queries that was firing every 10 seconds (some of them unoptimized, some of them with remnants of sanity graphql queries).
The 5 and 10 were the keys to unlock wtf was happening.
These 5 queries were used to generate my new homepage (most recent post, content areas, galleries, random photos, and something else).
As for 10 seconds, well, the default timeout for a serverless function invocation with Vercel is 10 seconds. Which means, these queries were not timing out (which removed thoughts I had around any sort of retry logic being an issue) and I probably had some sort of issue with how I was generating identifiers for caching requests in Redis (that one was a bit harder to trace). I was able to rule out errant requests in logic (meaning, I didn't have any forgotten long-polling, I didn't have an errant hook causing re-renders that would then cause more data requests - things like that).
I confirmed this a couple of ways. One, I used a log drain via Logtail to see my data incoming. I also started adding extra debugging logs - user agents and IPs in particular, to rule out any malicious issues.
The second way was using OpenTelemetry node packages to auto instrument the Remix app (auto-instrumenting document load on the Frontend and http requests on the backend helped to rule out a lot of theories I had at first).
While I wasn't able to figure out what singular thing was triggering the requests every 10 seconds, I was able to narrow it down to an interesting mix of search crawlers and bots. While I didn't see the same IP addresses every 10 seconds, I did see the same cluster of IP addresses with a detectable frequency. I could also see that this was happening with enough frequency that, perhaps, hosting the site the way I was before with dynamic rendering may not have been the right approach for what my goals were.
learnings & the path ahead
Hosting content-driven sites at different places (edge cached, function invoked, server hosted, etc) require different considerations. My site doesn't have to be dynamically rendered upon request. Remix doesn't do static generation (IIRC), and while I could have switched to hosting the site on something like Fly or Digital Ocean, I didn't want that overhead for this site. I wanted something I could have update with the occasional content update with little friction.
the new approach
I settled on rebuilding the site in Next.js as a static generated site. It felt like an easy and somewhat obvious solution and not just because I'm hosting the site on Vercel. However, moving to Next.js meant that full Analytics support as well as Image Optimization were fully available with zero configuration changes, so that was nice.
It was really simple to port from Remix to Next.js, whether using the `pages` folder or the new beta `app` folder. Perhaps I'll write more on this in the future, but a lot of my changes looked very similar to:
// 1. swap all Link usages from remix to next
// remix
import { Link } from "@remix-run/react";
// usage
<Link to={"url-here"} />
// nextjs
import Link from 'next/link'
// usage
<Link href={"url-here"} />
// 2. port all loaders into getStaticProps/getServerSideProps
// (nextjs pages folder setup)
// remix
export let loader: LoaderFunction = async () => {
const query = `{
"author": ${PostQueries.Author},
"posts": ${PostQueries.AllPosts},
}`
const initialData = await makeQuery(query)
return {
...initialData,
randomImages: await getRandomImages(),
contentTypes: await getPostCountByType(),
}
}
// nextjs - static site
export const getStaticProps = async ({params}: any) => {
const query = `{
"author": ${PostQueries.Author},
"posts": ${PostQueries.AllPosts},
}`
const content = await makeQuery(query)
return {
props: {
content
}
}
}
// nextjs - dynamic site
export const getServerSideProps = async ({params}: any) => {
const query = `{
"author": ${PostQueries.Author},
"posts": ${PostQueries.AllPosts},
}`
const content = await makeQuery(query)
return {
props: {
content
}
}
}
I enjoyed rebuilding the site in Next.js, and took me a matter of hours spread across a handful of days. I also see how easy it would be for me to also move sites from Next to Remix (if it were beneficial to do so).
epilogue
I wasn't able to actually pin this challenge on one singular source as the root of all the problems. This is not an indictment or critique of Remix nor of Vercel. It's actually yet another reminder to use the best tool for the problem being solved. In addition, one should always have some tooling in place to see what's happening in their applications and hosting environments.
All in all, my site is now back up and running. There's still some remaining tasks, like new content sections to add, but I can do so once I have the ideas fully baked. This little rabbit hole I ended up diving down was frustrating at first, but worth it in the end.
Thanks for reading, and if you like this, I've got a lot more drafts to publish now that my site is live and stable again.
Until next time...