Milly Software
InsightsKnowledge baseURL Sync: Keep Your Shopify AI Chat Knowledge Base in Lockstep With Your Help Center
Knowledge base··6 min read

URL Sync: Keep Your Shopify AI Chat Knowledge Base in Lockstep With Your Help Center

Sitemap-driven knowledge base hydration for Shopify AI chat — bulk import help-center URLs, schedule auto-sync, and keep the widget's KB current without manual upkeep.

V
Viet Le
co-founder · Milly Software

The single biggest blocker to getting an AI chat widget useful is the knowledge base. Until the merchant's policies, FAQs, and product docs are loaded, the AI can't answer questions about anything specific to that store. Until enough of the catalog's edge cases are covered (returns, shipping cutoffs, sizing, warranty), the widget feels generic.

Hand-loading KB entries works for the first ten. It breaks at a hundred. Most merchants who'd benefit from a KB-first chat experience already have a help center, a docs site, or a policies page somewhere — content the AI should know about. The friction is moving that content into the widget's KB without a manual paste-each-page slog. URL Sync exists for exactly that.

The sitemap inspector flow

URL Sync lives as a tab in the Knowledge Base section of the dashboard. The flow is short:

  1. Paste a sitemap URL — typically https://yourstore.com/sitemap.xml or a help center's dedicated sitemap.
  2. The inspector parses it (handles both urlset and sitemapindex recursive shapes — fetches up to one level of nested sitemap files) and shows the full URL list grouped by section.
  3. Filter by hide patterns — common excludes are /products/*, /collections/*, /blog/*. Most merchants want to import their policy and help-content URLs, not their product pages (those sync separately as products, not KB entries).
  4. Select the URLs to import and confirm.

The selected URLs land in the KB as draft entries. Drafts don't serve to the widget yet — the merchant reviews, cleans up titles or content if needed, and publishes when the entry is ready. This is deliberate: importing 200 URLs and immediately serving all of their content to the AI is a recipe for noisy retrieval. Drafts let the merchant promote what's actually useful.

The auto-sync flag

Help-center content drifts. A return policy gets updated, a shipping cutoff date moves, a warranty page gets new sub-sections. Importing once and forgetting means the KB and the source-of-truth content slowly diverge until the AI starts giving outdated answers.

Each KB entry imported via URL Sync carries an auto_sync flag. When set, a daily cron re-fetches the URL, re-extracts the content, regenerates embeddings, and updates the entry in place. The merchant doesn't have to track which pages they've loaded — auto-sync covers it.

For pages that don't need this (archived policies, permanent FAQs, content that's stable by design), auto- sync stays off and the entry behaves like a manual entry — edit when you want, leave alone otherwise.

Sitemap as a first-class source type

The KB schema models source_type as an enum: manual, sitemap, PDF (and now markdown — added later in v2.1.4). Each entry carries the type it was created with, which lets the Documents tab filter intelligently — by default, the Documents tab hides PDF and sitemap-imported entries because they live in their own dedicated tabs (PDFs in the Documents upload pane, sitemap entries in URL Sync).

The merchant sees one unified KB at retrieval time (the AI searches across all entries regardless of source) but a cleanly partitioned view at edit time. Manual entries stay in the manual editor; PDFs in the PDF list; URL-imported entries in URL Sync. No mixing types in the wrong UI.

Why this changes onboarding

For a merchant onboarding to Milly Chat with a substantial existing help center, URL Sync turns a multi-day KB seeding process into a single afternoon. Drop the sitemap URL, filter out the URL patterns you don't want, bulk import as drafts, walk through the list once to publish, set auto_sync on the entries that need it.

Deako's onboarding sprint in mid-April is the working example: 216 URLs from their help center imported in one session, reviewed and promoted across two days, auto_sync on the entries tied to active product policies. Without URL Sync, that KB seeding would have stretched across a week of manual paste-and-format work — and the merchant would have skipped half the content because the manual cost was too high.

The conversion at the merchant level: any chat widget's first-month value scales with how much KB context it has access to. URL Sync is the path that gets a meaningful KB up in time for the widget's first month to be useful, not a placeholder.

Try Milly Chat

Want to see how this fits your store?
We'll set up a working session.

Get it on ShopifyTalk to sales →