Schema markup has been a staple of SEO for years. But with AI-powered search and large language models now shaping how people find information, a new debate has taken over SEO communities: does structured data actually matter to LLMs? Or is it just another technical SEO myth dressed up in fresh clothes?

We dug into recent research, real-world experiments, and industry commentary to give you our honest take. The short answer is nuanced. Schema still matters for traditional search. Its role in LLMs, though, is a different story.

What schema actually does (and what it was built for)

Schema markup, or structured data, is code you add to your website that helps search engines understand the content of a page more precisely. Instead of leaving Google to guess whether “42” refers to an age, a price, or a rating, schema tells it explicitly. It was designed to bridge the gap between how humans write and how machines parse meaning.

For traditional search, the value is clear. Certain schema types unlock rich results in the SERPs, which can meaningfully improve click-through rates. A few standout examples:

  • VideoObject schema gets your content into video carousels in Google search.
  • Review snippet schema adds star ratings next to your listing, making it stand out visually.
  • Organization schema feeds information into knowledge panels.
  • Product and Product Variant schema, which Google recently expanded to support product variants, help e-commerce pages appear correctly in shopping results.

Google continues to invest in structured data. If it were pointless, that investment would not be happening. The mistake is conflating schema’s value for Google Search with its value for AI-driven tools. Those are two different systems operating in two different ways.

What the research actually says about LLMs and schema

One of the most telling experiments on this topic comes from a well-documented SEO study in which a researcher built a page for a completely fictional company called Duckier T-Shirts and added entirely made-up schema types, including things like “flock name,” “waddle style,” and “quack volume.” When asked for the company address, ChatGPT and Perplexity both responded with the nonsense values from the fake schema.

The conclusion was straightforward: LLMs are not reading schema as schema. They are picking up text from the HTML source, and that text happens to live inside structured data tags. Valid or invalid, real or invented, if the text looks relevant to a query, the model will use it. Schema is not being interpreted in the explicit, machine-readable sense it was designed for.

That lines up with what other SEO practitioners have observed in their own testing. The view from hands-on experimentation is that LLMs do not need schema to extract structured meaning from a page because they are already capable of interpreting the body content directly. Testing suggests LLMs prioritize body text over any other part of a web page roughly 95% of the time.

Google’s own search team has also stated clearly that structured data does not improve rankings. Its purpose is to enable specific search features in the SERPs, not to signal quality to ranking algorithms.

The Microsoft exception worth paying attention to

Not everything points in the same direction. A Principal Product Manager at Microsoft Bing has publicly confirmed that schema markup helps Microsoft’s LLMs, including Copilot, understand content. That is a direct statement from a platform maker, not a blogger speculating.

On the OpenAI side, an SEO professional observed that OpenAI’s bots were crawling JSON data at significantly higher rates than standard HTML. That is an observation, not a confirmed policy, but it is worth noting.

So we have a confirmed yes for Microsoft Copilot. For ChatGPT, Gemini, and Perplexity, there are no official statements. The honest position is that we simply do not know for certain, and anyone who tells you definitively that it does or does not matter for those systems is working from incomplete information.

Why people argue schema does not help LLMs

Two arguments come up most frequently in this debate, and both have some substance but also some gaps.

The first is that LLMs themselves say they do not use schema. When someone asks ChatGPT whether it reads structured data, it often says no. The problem with this argument is that LLMs can and do hallucinate or give inaccurate answers about their own internal processes. Self-reported behavior from a language model is not a reliable source of ground truth.

The second argument involves tokenization. When an LLM processes text, it breaks input down into numerical tokens. A sentence like “Hello, world!” becomes a series of IDs — each word or subword mapped to a number. The argument goes that at that level of abstraction, the model cannot meaningfully distinguish between schema tags and regular HTML copy.

This argument has some logic to it, but a counterpoint stands out: LLMs can generate valid, correct schema markup on demand. That implies some degree of understanding about what schema is and how it works. Whether that understanding extends to treating it differently at inference time is less clear, and the tokenization argument does not conclusively answer that question.

What hands-on testing shows about AI and structured data

Research published by the team at WordLift complicates the simple “schema doesn’t matter” take. The findings showed that sites with well-implemented structured data appeared more accurately in AI-generated responses, while sites without it were sometimes misrepresented or ignored. The researchers noted a clear pattern: visibility varied between different types of LLM tools, and context mattered a great deal.

One finding from that research deserves particular attention for anyone building AI-optimized content. When an LLM accesses a page directly, rather than through a search engine retrieval layer, it often cannot see JSON-LD schema. JSON-LD sits in the HTML head as a script tag, separate from the visible body content. Microdata embedded directly in the HTML is more likely to be picked up in those direct-access scenarios.

For teams that want to maximize visibility across both search engines and AI agents, the implication is practical: consider a dual structured data strategy. Use JSON-LD for traditional search engine indexing, and supplement with microdata and semantic HTML for direct agent access. Both have distinct advantages depending on which system is processing your content.

How LLMs actually retrieve and process web content

A lot of confusion in this debate comes from misunderstanding how LLM-powered tools like Perplexity, ChatGPT with search, or Microsoft Copilot actually work. They are not traditional search engines with a full index of the web. Most operate as wrappers around existing search infrastructure, querying Google or Bing and processing the top results.

This means that if your page ranks well in Google, an LLM-powered tool is more likely to find and cite it. Schema can play a role here indirectly: if it helps you get rich results that improve your click-through rate and strengthen your rankings, it feeds into LLM visibility as a second-order effect. The LLM is not reading your schema. It is benefiting from the fact that your schema helped you rank, and ranking helped you get retrieved. If you want to measure how that visibility translates into business outcomes, these AI search KPIs are a good place to start.

The takeaway from practitioners testing this in the field is consistent: SEO is still common sense. You prove your relevance, you get found. You make sense for a given search, AI will find you. The fundamentals have not changed as much as some would have you believe.

The sameAs schema angle no one talks about enough

While the debate around schema and LLMs tends to focus on article or product schema, there is a schema type that gets less attention but arguably delivers more value: sameAs.

SameAs schema sits inside Organization or Person markup and links your website to your social profiles, your Wikipedia page, your industry directory listings, and other verified external sources. Its purpose is entity recognition. When Google can confidently associate your website with a consistent entity across the web, it trusts you more. That trust tends to accelerate ranking gains, particularly in YMYL (Your Money Your Life) categories where Google applies stricter scrutiny.

For newer websites, sameAs can be a meaningful tactic for getting Google to validate your brand identity. Researchers in the entity SEO space have found that having 30 to 40 indexed, associated profiles is a threshold where Google tends to start trusting and validating a domain more consistently. That trust signals into the ranking factors that eventually affect AI search visibility too.

Practical tips for sameAs implementation:

  • Add your Twitter, LinkedIn, YouTube, and Facebook profiles as sameAs URLs.
  • Include an About.me page as the first sameAs URL, as it can link out to your full citation profile.
  • Keep the list focused. A handful of high-quality, indexed profiles outperforms a list of 50 weak ones.
  • Make sure the information in your sameAs schema is also visible in the body text of your About page. Google builds entity graphs from readable content, and schema reinforces what it already infers rather than replacing it.

Our take on schema for AI search

We would not dismiss schema across the board, and we would not oversell it as an AI search silver bullet either. The clearest, most defensible positions are:

  • Schema for rich results is still worth doing. If a schema type is actively generating rich results in the SERPs for keywords you are targeting, use it. Better CTR feeds ranking, and ranking feeds AI visibility.
  • Schema for e-commerce is non-negotiable. Product and Product Variant schema are actively supported by Google and directly affect how your products appear. Ignoring them is leaving visibility on the table.
  • Schema for Microsoft Copilot matters. That is confirmed. If your audience uses Bing or Copilot-integrated tools, structured data is a legitimate optimization.
  • Schema as a direct LLM ranking signal is unconfirmed. For ChatGPT, Gemini, and Perplexity, there is no official confirmation that schema is processed as intended. Body content quality and traditional ranking factors remain your primary levers.
  • If you want LLM visibility, focus on ranking and body copy. That is what the retrieval layer uses, and that is what LLMs most consistently process.

The broader lesson here is one we find ourselves repeating often: SEO is changing fast right now, and the people who will fare best are the ones who follow evidence rather than hype. Schema is not dead. It is not a magic bullet for AI either. It is a tool with specific use cases, and using it well means knowing which use cases actually apply.

As AI models continue shifting from processing plain text toward interpreting structured data more deliberately, a pattern researchers are beginning to track, this debate will keep changing. We will be watching, testing, and updating our thinking as the evidence comes in.

By Nikola

Leave a Reply

Your email address will not be published. Required fields are marked *