Classification

The Content Classification page provides visibility into how your content is being scraped, indexed, and classified by the Lytics Content Affinity Engine. The Classification page consists of 2 sections: Classification Dashboard and Classification Activity.

The Classification Dashboard displays various attributes of your Content, including site name, author, URL path, and Topic information. The Classification Dashboard is intended to provide a high-level understanding of your Content and can be used as a starting point for your Content-oriented use cases.

The Classification Activity section shows how many documents are being classified by the Lytics Content Engine, and provides information about which domains are set up for Classification. By default, Lytics will attempt to classify 20,000 documents per month; this includes new documents, and periodically re-classifying old documents. This section can help identify if you have exceeded your monthly quota, and if the Lytics Content Engine is working as expected.

The Lytics Content Engine is controlled by multiple workflows that run in the background. The workflow that classifies content runs hourly, so you can expect to see updates to your classification activity every hour, so long as you have not met the quota this month.

📘

If your account reached the content classification quota in prior weeks, the chart for the current week will not have any data because content has not been classified during that time period. Change the date range to see when content was last classified.

Domain and Path Settings

The Domain and Path Settings display important account information to verify that Lytics is classifying the right content that will be used in your marketing initiatives. To adjust what content gets classified, your account admin can specify the list of approved domains and ignored paths via your Lytics Content settings.

lytics-domain-path-settings

  • Classification Quota - the progress bar indicates how your account is tracking towards the monthly classification quota, which is set to 20,000 documents per month by default.
  • Approved Domains - any URLs that contain one of the approved domains will be classified. This list should include the primary domains for your company. Examples for Lytics include lytics.com, `learn.lytics.com, etc.
  • Ignored Paths - any URLs that match at least one of the ignored paths will not be classified. Examples of ignored paths include /blog, /search, /userprofile, etc.

🚧

If your account is exceeding the classification quota, it's likely an indicator that your account settings aren't sufficiently filtering out content that shouldn't be indexed. If you still need a larger quota, reach out to your Account Manager to discuss your options.

Manual Classification

The Manual Classification module allows you to preview how a single document will be classified by Lytics. You can use this to resolve any issues with how your page is set up before it’s added to the Lytics content corpus.

Simply enter the URL of the document you would like to preview and click Get Details. This will allow you to see things like topics extracted from the document as well as any metadata Lytics scrapes from the document.

Once you’re happy with the results of the classification, you can click Complete Classification to add it to the content corpus, the documents and its topics then become available for use in personalization such as recommendations or content affinity.

URL Normalization

As Lytics ingests web-based content, it attempts to resolve duplicate URLs and create links between documents, much like a search engine would. As such, Lytics does things like respect robots.txt directives, resolve canonical URLs when present, etc.

Lytics attempts to sanitize URLs as much as possible before ingesting them into the Content Affinity Engine. Sanitization includes removing all URL parameters and cleaning URL syntax. This happens via an LQL function called urlmain.