Content Curation

Content Curation

Content curation on Lytics involves scanning your website and other content to ingest topics and build content affinities. Properly setting up the curation process is key to enabling use cases such as promoting relevant ads and delivering targeted web content.

This document gives an overview of important concepts and considerations to make while curating your content during the early stages of implementation. It will also help determine if any custom content curation should be planned for.

Ensure that any content customization is performed to support your campaign execution and audience building, not for the sake of customization itself.


Once the Lytics JavaScript tag is installed, Lytics automatically begins crawling the content on your website using Natural Language Processing (NLP) engines. Put simply, this means that:

  • Lytics will index your content producing a list of all your content. By default, content includes your web pages and images.

  • Lytics will crawl the content via NLP providers (such as Google NLP or Diffbot, and extract topics associated with that content.

Over time, as users interact with your content, Lytics identifies user content affinity levels for various topics. While this functionality begins working right away without you having to do anything, there are various things to consider to ensure Lytics is bringing in the content you’ll need to execute use cases.

What domains should Lytics scan?

By default, Lytics will scan all content on your domains. You can specify which domains Lytics will scan by adding domains to the "Domain Allowlist", which can be modified by navigating to Account Settings > Content.

If there are only certain sections of your website Lytics should be scanning, you can customize by allowing or blocking specific paths as well. For example, if you have a blog section and other pages won’t be relevant to gauge what users are interested in, you can add the path /blog/ to the list of allowed paths.

You can see which domains have been improved in the domain and path settings on the Content Classification page. Visit the content account settings documentation for detailed instructions on how to add more domains or paths.

Are there any paths that should be avoided?

A website may have sections that should not be scanned for topics such as password reset pages or any pages hidden behind a log-in (e.g. /password-reset/ or /admin/). These paths can be blocked in your Account Settings under Content.

Which NLP service should I be using?

Lytics uses Google NLP which pulls from their knowledge graph/taxonomy. If you determine you're not getting enough topics from your content, Lytics can use Diffbot in addition to Google NLP, which has more loose associations between topics and content. You'll bring in more topics, but may be slightly surprised by what you see!

If your brand is international, you may need to consider which languages are supported by each NLP service. Please consult the provider documentation for a list of languages supported by Google NLP and Diffbot. Another option is to turn NLP completely off and use only custom topics.

For more details on each service Lytics uses, see NLP services.


If you aren’t seeing the content you expected to, note it may take some time for Lytics to crawl all of your content. By default, monthly limits exist for scanning new content (see Content Enrichment limits below). If Lytics scans all new content without having reached the limit, Lytics will move on to older content until it is all scanned. Please allow time for this.

Also, consider how far back should Lytics be scanning. For example, it is likely unnecessary to scan content from 2 years ago. If people are no longer interacting with that content, use Account Settings to set a date to start the scan from.

Content Enrichment limits

Lytics will scan and enrich up to 20,000 URLs per month by default. This limit is designed to act as a guard rail to ensure good filter hygiene is in place. Most accounts do not publish close to 20,000 pieces of distinct content per month. If you believe your account is hitting this limit, please check with your Lytics Implementation team. Once confirmed, you can consider the following options:

  • Are there any domains or content paths that you can block? This will likely be part of the solution. See Are there any paths that should be avoided? above for more information.
  • Do you need Lytics to increase the limit?

Robot directives

Your domain likely has a robot directive (e.g. that provides instructions to crawlers on how to crawl or index your content. Lytics will follow these directives. While typically not an issue, it’s worth turning to your directives when troubleshooting any missing content or information.


You may want to build a collection of content based on publish date or author. Consider the following if this is the case:

If you have metadata on your website, is it using OpenGraph? Open Graph tags will populate the following default values in Lytics:
title, image, published_time, description, and lytics:topics. You can check the lytics_content query in your account to check if all the Open Graph tags you need are being picked up.

If you are not using Open Graph, Lytics may not be picking up any meta tags automatically. The quickest way to check that Lytics is bringing is to use our find a document tool to view the data Lytics has for a specific piece of content. Navigate to Content > Find a Document and search for the URL of the piece of content you would like to view.

If you make changes to your content's metadata and would like to preview those changes you can use our manual classification tool located at the bottom of the Content > Classification dashboard. Enter the URL with the updated metadata and click Get Details. Lytics will scan the document and display a preview of the updated metadata. If you are satisfied with the changes click Complete Classification. If not, you can make additional edits and preview again.

Alternatively, you can check by navigating to Content > Collections. Try to build content collections by author or publish date as these are the most commonly used filters. If the content doesn’t come up as expected, you may need to curate your tags.

Custom topics

For many users outside of publishing who may not have rich content, NLP derived topics aren’t enough. To accommodate this, Lytics can add custom topics via the metadata. The easiest way to do this is by adding the lytics:topics meta tag. Read more on providing custom topics.

If you already have topics in your metadata using a different meta tag than the above, it’s possible Lytics may be able to bring those in as well by making a change to your account settings. Speak with your implementation team about this.

Once this setting is changed and content is rescanned, you will be able to build content collections with this topic, and users' affinities will be generated for these new topics.

Total number of topics

In the Lytics UI, you will see a max of 500 topics. Lytics keeps all of your topics, but only the top 500 are surfaced in the UI.

  • As you block topics, Lytics will backfill to show 500.
  • You can allow specific topics to ensure that they make the top 500.

If you choose to allow topics, make sure that the topics actually exist - either as custom topics or are being generated by NLP. Case sensitivity is important when adding topics to the allow list. For example, allowing ABBA is different than allowing Abba.

Other content

If you have other content on your site outside of web pages or images (e.g. PDFs) that you’d like to derive topics from and have them generate affinities, Lytics will need to develop a plan to bring those in using our APIs.

Document properties

All topics - NLP derived or custom topics - will allow for two things:

  1. Building a content collection with the topics.
  2. Assign affinities to users for those topics.

There are instances where you may want to build collections based on a topic, but not have them generate affinities. For example, a collection of featured sale items, SKUs, genres, etc. You can set these as document properties in your metadata to allow for this.