DeepSmith

Jun 26 · SEO & AI Visibility

17 min read

How to Make JavaScript-Heavy Sites Visible to LLMs: A Developer's Implementation Guide

Avinash Saurabh
Avinash Saurabh · CO-Founder & CEO
Make JavaScript-Heavy Sites Visible to LLMs:

Let's get straight to it. The fix is simple, but it requires discipline: make your critical content and structured data exist in the initial server response HTML. Use Server-Side Rendering (SSR), Static Site Generation (SSG), or prerendering to get it done.

Don't count on client-side hydration to get content into the DOM. By the time your JavaScript runs, most AI crawlers are already gone. The only way to know you’ve fixed it is to test the raw HTML and do no-JS fetches, not by just loading the page in Chrome.

I’ve seen this movie before. Maybe you're living it right now. Your pages look perfect in the browser. Your team shipped a polished SPA with buttery-smooth transitions, lazy-loaded sections, and a slick tabbed interface. And yet, your competitors keep showing up in ChatGPT answers while your brand is invisible. You want to fix it, but you can’t rewrite the whole app. Engineering’s backlog is already the length of a CVS receipt.

That gap between “works in the browser” and “visible to AI” is exactly what this guide is about. I’ve been in that gap. This is the playbook we built to get out of it.

Here’s what we’ll cover:

  • What AI crawlers actually fetch (and why you can’t count on JS execution)
  • How to choose SSR, SSG, or prerendering on a per-route basis
  • The common SPA patterns that silently hide your content from LLMs
  • Framework-specific steps for Next.js, Nuxt, and vanilla React SPAs
  • How to embed schema so AI bots can actually use it
  • A verification process and measurement loop to make sure your fixes stick

Do LLM Crawlers Render JavaScript or Only Read Raw HTML?

Assume they only read HTML. If your content isn't in the initial server response, you have to treat it as invisible to AI crawlers. That’s the working assumption that will save you a world of pain.

Here’s the thing you have to internalize. There are three different ways your site gets seen:

  • Browser rendering: The full orchestra. Full JavaScript execution, DOM hydration, user interactions firing. This is the complete picture your users (and you) see.
  • Googlebot rendering: This is a two-step process. Googlebot fetches the HTML first, then comes back later (sometimes minutes, sometimes days) to render the JavaScript. It’s not instant, but it eventually happens.
  • LLM/AI crawlers: They fetch the raw HTML. Some might execute a tiny bit of JavaScript; many don't. None of them are waiting around for your React hydration to finish before they start extracting content. There’s just no standard behavior here.

The key distinction is between rendering and extraction. Even if a crawler technically runs some JS, its content extraction pipeline may have already finished. AI systems are built to harvest text and schema from raw responses. They aren't waiting for a React tree to mount.

Content locationVisible to AI crawler?
Present in initial HTML responseYes — reliably
Injected after JS hydrationUnreliable — often missed
Loaded on scroll or user interactionNo — effectively invisible

What "counts" as visible content for AI engines?

Visible means it’s in the fetched response body. Not in the DOM after your scripts run. Not in a CSS pseudo-element. Not hiding behind a consent banner that blocks the page from rendering.

Something to keep in mind: many AI pipelines will down-convert your beautiful HTML into simple text or Markdown before processing it. This means that clean, readable body content often gets more weight than heavily nested layouts or metadata in the <head>. Your H1, your paragraphs, your JSON-LD, all of it needs to live in the initial HTML response.


How Do You Choose Between SSR, SSG, and Prerendering for AI Visibility?

Decide this route-by-route, not for the whole app. The goal isn't to convert everything. That’s a recipe for a six-month project that never ships. The goal is to make sure your most important pages, the ones that drive pipeline and deserve to be cited, deliver HTML-first content.

Just so we’re on the same page with definitions:

  • SSR (Server-Side Rendering): The server generates HTML on every single request, pulling fresh data. This is great for dynamic content.
  • SSG (Static Site Generation): The HTML is generated once at build time. It’s fast, cheap to serve, and perfect for content that doesn’t change often.
  • Prerendering: You take a static HTML snapshot of each route and serve that to crawlers (or everyone). Think of it as a bridge for SPAs that can't migrate to a new framework just yet.

Route-by-route guidance

Not every page on your site is created equal. You need to prioritize based on citation potential and business impact. Here’s how we think about it:

  • Blog posts, docs, glossary pages: SSG is your best friend here. This content is stable, high-value for citations, and the build-time generation is a one-time cost.
  • Product pages, pricing, comparison pages: SSR or SSG with frequent rebuilds. These pages drive pipeline and should be winning citations for high-intent queries.
  • Landing pages: SSG. These are static by nature. There’s no reason they shouldn’t be fully rendered as HTML.
  • App shell, authenticated dashboards, user settings: Client-side rendering is totally fine. These pages aren't going to earn AI citations anyway.
ApproachBest forTrade-offCommon failure mode
SSGStable content (blog, docs, glossary)Build time grows with scaleStale content if rebuild isn't triggered
SSRDynamic content (pricing, personalized pages)Infra cost, caching complexitySlow TTFB without proper caching layer
PrerenderingExisting SPAs needing quick winsSnapshot can go stale; limited interactivityPrerender cache not invalidated after content updates

One pattern I really like is SSR with a stale-while-revalidate caching strategy. It's a practical way to get HTML-first benefits for pages that need to be fresh, but without hammering your server on every single page view for high-traffic routes.


What SPA Patterns Make Content Invisible to LLMs Even When Humans Can See It?

Here’s my rule of thumb: if disabling JavaScript makes the content disappear, you have to assume AI engines can't see it.

I’ve been burned by every single one of these. Here are the specific patterns that catch teams off guard.

Empty shell + hydration delay Your server sends back <div id="root"></div> and a prayer. React mounts, fetches its data, and renders a beautiful page. A human sees it all. An AI crawler sees an empty div. The fix: Make sure the server renders the actual content before sending the response.

Lazy-loaded main content Lazy-loading images with loading="lazy" is a great performance win. But if your text content—your hero copy, your feature descriptions—is lazy-loaded by a JavaScript Intersection Observer, any crawler that doesn't run JS will never see it. The fix: Lazy load your images and non-critical UI. Keep your core body copy in the initial HTML payload.

Infinite scroll without a fallback Infinite scroll is great for engagement, but it depends on a user scrolling. Bots don't scroll. If your blog index or product list only loads more items via a scroll event, everything past that first screen is invisible. The fix: Always include paginated links (like <a href="/blog/page/2">) in the HTML as a fallback for crawlers and non-JS users.

Tabs and accordions that load content on click This one gets teams all the time. If your product page hides feature details behind tabs, and clicking a tab fires a JS fetch to get the content, that content might as well not exist for a crawler. The fix: Render all the tab content in the initial HTML, then use CSS to control what’s visible. The content exists in the DOM; JavaScript just toggles which panel is shown.

CSS pseudo-element content Any text you put in a ::before or ::after pseudo-element is not in the DOM, so it won’t be extracted. On the other hand, text hidden with display: none or visibility: hidden is in the HTML and will be seen, though you open a different can of worms by deliberately hiding content.


How Do You Implement SSR/SSG/Prerendering in Next.js, Nuxt, or a React SPA Without Rewriting Everything?

Don't boil the ocean. I can't say this enough. Pick 10–20 of your most critical routes, get them to be HTML-first, and then iterate. This is a game of inches, not a single grand gesture. Here’s the playbook.

Step 1: Route inventory and prioritization

Get your team in a room (or a Zoom) and list out every route in your app. Tag each one:

  • Must be AI-visible: These are your citation targets. Product pages, pricing, comparison pages, blog posts, documentation.
  • Can stay client-side: Dashboards, settings pages, auth flows, anything behind a login wall.

Focus all your energy on that first list. Ignore the second for now.

Step 2: Next.js

Next.js is built for this, which makes the work fairly surgical. For your priority routes:

  • Use server-side data fetching. That means fetching data at request time for SSR or at build time for SSG. This ensures content is in the HTML before it ever reaches the client.
  • Be ruthless about use client. If a component contains copy that should be earning citations, it needs to render on the server.
  • Verify that your H1, hero copy, and any structured data are present before hydration runs.

Anti-patterns to hunt down on your priority routes:

  • Fetching hero copy or product descriptions inside a useEffect.
  • Injecting your JSON-LD schema on component mount.
  • Hiding content behind an interaction like a modal, tab, or accordion that is loaded via a JS fetch.

Step 3: Nuxt

Nuxt starts with SSR enabled by default, which is great, but I’ve seen plenty of teams accidentally opt out of it.

  • Confirm SSR mode is active for your target routes (no ssr: false in your global config).
  • Avoid wrapping components that contain core page content in <ClientOnly>.
  • For static content like your blog or docs, use nuxt generate to pre-build the HTML at deploy time.

Anti-patterns:

  • Client-only plugins that intercept or delay the rendering of key content.
  • Fetching dynamic data only on the client for content that could and should be static.

Step 4: React SPA (no SSR framework yet)

If a migration to Next.js or Nuxt isn’t happening this quarter, prerendering is your fastest path to visibility.

  • Use a prerendering service or a build-time tool to capture static HTML snapshots of your priority routes.
  • Serve those snapshots to crawlers (or just serve them to everyone for simplicity).
  • Start planning for a phased migration to an SSR-capable framework. Prerendering is a great bridge, but managing cache invalidation can become a real headache over time.

Progressive enhancement patterns

This mindset applies no matter what framework you use. The core idea is simple: ship a complete, readable HTML document, then enhance it with JavaScript.

  • Tabs/accordions: Render all the content in the HTML, then use CSS classes to show/hide the panels. JavaScript’s job is just to update a class on click.
  • Infinite scroll: Include your <a> pagination links in the HTML. JavaScript can intercept those clicks to load content inline, but the links are there as a fallback.
  • Filters and sorts: Render the default, unfiltered result set on the server. JavaScript can then handle filtering or sorting on the client.

Our minimal acceptance criteria for a route

A route is "LLM-visible" on our team when it passes this checklist:

  • The H1 is in the raw HTML response.
  • The core body copy is in the initial HTML (not a loading spinner).
  • Internal links exist as <a> tags in the HTML.
  • JSON-LD structured data is present in the HTML <head> or <body>, not injected after mount.
  • Disabling JavaScript leaves the content readable. The interactivity can break, but the content cannot disappear.

How Do You Make Schema (JSON-LD) and Key Page Facts Visible and Maintainable?

Generate your JSON-LD in server-rendered HTML, and make sure the same facts are stated in plain English in the body copy.

I learned this one the hard way. We spent a week crafting the perfect Product schema, deployed it with a tag manager, and... nothing happened. The reason? The schema didn't exist in the initial HTML response. For Googlebot, this might eventually work. But for AI crawlers that don't execute JS, that schema was completely invisible.

The body-first principle

Don't rely on schema alone. Generate your JSON-LD on the server from the exact same data source that populates your visible content. This is the only way to guarantee that your schema and your body copy can't drift apart. Keep your entity names consistent, too. If you're "Acme Corp" in the schema, don't call yourself "Acme" in the copy and "AcmeCorp" in the footer.

A minimal server-rendered Article schema looks like this:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "How to Make JavaScript-Heavy Sites Visible to LLMs",
  "author": {
    "@type": "Organization",
    "name": "Your Company"
  },
  "datePublished": "2025-01-15"
}
</script>

This script block has to be in the HTML delivered by the server, not added by a script that runs after the page loads.

Common mistakes I've seen:

  • Schema added in a React useEffect or Vue mounted() hook.
  • Schema placed behind an A/B test variant that bots will never see.
  • Schema that references a product name or feature set that doesn't match the visible page content.

How Do You Verify (and Keep Verifying) That LLMs Can See Your Content After Launch?

Fixing your rendering is step one. Making sure it stays fixed is the real job. Your verification process has to be a repeatable test, not a one-time check you did on deploy day.

Tests you can run right now

View Source (not DevTools): In Chrome, right-click and hit "View Page Source." This shows you the raw, unadulterated server response. This is what AI crawlers see. Look for your H1, the first few paragraphs of body copy, your internal links, and your JSON-LD block.

JavaScript-disabled test: In Chrome DevTools, go to Settings → Debugger and check "Disable JavaScript." Then reload the page. Your core content must still be readable. It's okay if interactivity breaks, but the content can't vanish.

Raw HTTP fetch: My favorite quick-and-dirty test. Use curl -s https://yourpage.com/pricing | grep "your expected headline" to confirm the response body contains what you think it does. This is the closest you can get to seeing what a bot receives.

Google Search Console URL Inspection: The "Test Live URL" feature is a good sanity check. It shows you what Googlebot fetches. It’s not a perfect proxy for LLM crawlers, but if Googlebot can't find your content, it’s a sign of a deeper problem.

Common blockers to check:

  • Is your CDN serving different content to bots vs. users? Test with different user agents.
  • Does a consent banner hide the body content until it's accepted? That's blocking crawlers, too.
  • Are geo-restricted CDN rules serving blank pages to certain IP ranges?

The ongoing measurement loop

Technical fixes are just table stakes. The real proof is tracking citation and mention outcomes over time. You need to build a repeatable loop:

  1. Pick 10–20 prompts your ideal customers ask AI platforms.
  2. Map each prompt to a priority page on your site.
  3. Verify that page passes all the HTML visibility tests above.
  4. Publish or improve the content to make it the absolute best answer to that prompt.
  5. Track your mention and citation rates for those prompts over time.

This is where the process moves from engineering to marketing. All these technical checks are just leading indicators. They tell you if you've set up the system correctly, but they don't tell you if you're actually winning.

This is a measurement problem. We use a tool called DeepSmith for this. Its AI Visibility features let us track our brand mention and citation rates across ChatGPT, Gemini, Perplexity, and others, right down to the prompt level. It shows us which of our pages are earning citations and which are still invisible. It also benchmarks us against competitors, so we can see who is winning on which prompts and prioritize our efforts. DeepSmith doesn't fix your rendering, but it closes the measurement loop that makes it impossible to know if all this technical work is actually paying off.

What to report to leadership:

  • Leading indicators: What percentage of our target prompts have a dedicated, HTML-visible page? What’s our technical pass rate across priority routes?
  • Lagging indicators: What is our brand mention rate by platform? How is our citation rate trending over the last 90 days? What’s our share of voice vs. our top competitors?

Optional CI automation: For your most critical routes, you can even add a regression test to your CI pipeline that fetches the raw HTML and asserts that key text selectors exist. That way, if a deploy accidentally breaks SSR on your pricing page, the build fails before it ever reaches production.


Frequently asked questions

What's the easiest way to check what an LLM crawler sees on my page?

Use "View Page Source" in your browser. Not DevTools, the actual "View Page Source" option. That shows you the raw HTML. Then, disable JavaScript in your browser settings and reload the page. If your main content is still there, you’re on the right track.

Do I need SSR for every page to be visible in ChatGPT, Claude, or Gemini?

Nope. And you shouldn't try. That's a classic over-engineering trap. Focus your energy where it counts: your citation-target pages like blog posts, product pages, pricing, and docs. Your interactive app pages (dashboards, settings) can stay client-side.

How can I make a React SPA visible to AI crawlers without migrating frameworks?

Prerendering is your fastest path forward. Use a tool to capture static HTML snapshots of your most important pages and serve those to visitors (or at least to bot user agents). It’s a great bridge solution, though managing cache invalidation can be a pain long-term.

Why is my JSON-LD schema not showing up in AI results?

Most likely, you're injecting it with JavaScript after the page loads (like in a `useEffect` or through a tag manager). AI crawlers that don’t run JS will never see it. You have to move your schema generation to the server so it’s part of the initial HTML response.

What content must be in the initial HTML for AI visibility?

The bare minimum is your H1, your main body copy (not a loading spinner), your key internal navigation links, and any JSON-LD you want the AI to see. The test is simple: if it's not there when you "View Page Source," it doesn't count.

Will tabbed content, accordions, or lazy loading hurt AI visibility?

Yes, if the content inside them only loads when a user interacts with them. If clicking a tab fires a JS fetch to get the content, that content is invisible to crawlers. The fix is to render all that content in the initial HTML and just use CSS to show or hide it by default.

How long does it take for AI engines to reflect changes after I fix rendering?

There’s no clear answer here. Unlike Google, most LLM providers don't have a public "Search Console" or published crawl schedules. The timing is opaque. We've seen it take weeks or even months for content changes to show up as citation changes. That’s why you need a consistent habit of tracking prompts over time, rather than waiting for a big signal that your one fix "worked."