Milly Software
InsightsCase studiesIf You Run Flash Sales, Your Chat Widget's Job Isn't the Rush — It's the Lead-Up
Case study··8 min read

If You Run Flash Sales, Your Chat Widget's Job Isn't the Rush — It's the Lead-Up

A five-day mega-sale on a major Shopify storefront, six promo-coded flash windows, and the data that reframed what an AI chat widget actually does at peak intent.

V
Viet Le
co-founder · Milly Software

Five days. Six promo-coded flash windows. A major Shopify storefront on Milly Chat at 80% rollout, search-bar format, with a 14-day attribution window. Going in, the question we thought we'd answer was the obvious one: does the widget hold up when the rush hits?

It did. But that turned out to be the less interesting answer. The data underneath flipped our framing of what the widget is actually doing at peak load.

The flash sale we thought we were measuring

The store: an enterprise apparel-and-accessories merchant running on Shopify. Five days of mega-sale, with six promo-coded windows ranging from six-minute US flashes to a one-hour international slot. The widget covered 80% of traffic at the search-bar format. Attribution ran on a 14-day window — wider than our default 7d, opted into a month before the campaign.

The hypothesis going in was straightforward: when 528 orders land in a six-minute window — the kind of pressure that creates a 9-checkouts- per-second peak — does the chat handler stay responsive? We were framed for a load test.

Total mega-week orders ran +165% over a comparable baseline week — moving from 1,575 baseline orders / ~$126K to 4,176 orders / ~$329K mega-week revenue. Widget-shown order coverage moved from 66% baseline to 73% during the mega week. The handler held; nothing exploded; the rush passed.

Then we looked at where the chat-touched orders actually concentrated.

Chat-touched orders concentrated 8.5× at peak intent

We define an interactor order the same way the dashboard does: an order placed by a visitor who had previously opened the widget within the configured attribution window. Same metric, three time scales:

  • Pre-sale week (baseline): 28 interactor orders out of 1,575 total — 1.8%
  • Mega-week overall: 355 interactor orders out of 4,176 total — 8.5%
  • Inside the five US flash windows specifically: 307 interactor orders out of 2,007 total — 15.3%

Visitors who had chatted with the widget in the prior 14 days were 8.5× more likely to be the ones buying inside the flash windows than the baseline visitor pool. The chat-touched cohort concentrated their purchases at the moment of peak intent.

Mega-week interactor orders ran +1,168% vs baseline; interactor revenue moved from ~$7K baseline to ~$25K (+242%). That isn't a load story — that's a primer story. The widget had been doing its work somewhere else.

The lead-up effect — chat volume nearly doubled before the flashes

The 14-day window before the mega-sale tells the story. Daily chat averages over the two-week baseline:

  • ~143 conversations/day
  • ~296 unique visitors opening search/day
  • ~371 widget opens/day
  • ~202 messages/day

Then the campaign launched on Monday. Even though the Monday flash misfired (more on that below), the chat surge was instant. By Tuesday, conversation volume had jumped to 303/day, unique searchers to 643/day, messages to 405/day. Wednesday hit the chat peak — 731 unique visitors opened search, 399 messages, 299 conversations.

Same day, BIG60 produced the highest single-window order count of the mega-sale (528 orders in six minutes). The chat volume tracked the buying intent in real time — not as a coincidence, but as the same underlying signal showing up in two surfaces.

Across the full mega-week, conversation volume ran 2.0× baseline. Unique searchers ran 2.1× baseline. The widget wasn't handling the rush; it was handling everyone who was thinking about the rush.

Inside the six-minute flash windows

The five US flashes ran six minutes each — thirty minutes of cumulative flash time across the week. Per-window interactor share:

  • Tue SAVE60 — 526 orders / ~$32K, 16.3% interactor
  • Wed BIG60 — 528 orders / ~$33K, 15.5% interactor
  • Thu URBAN60 — 452 orders / ~$28K, 15.0% interactor
  • Fri LAST60 — 368 orders / ~$24K, 19.3% interactor
  • Fri GLOBAL60 (international, 60 min) — 102 orders / ~$9K, 11.8% interactor

Across the five US flashes, 15-19% of every order placed came from a visitor who had chatted in the prior fourteen days. That cohort produced ~$21K of the ~$25K mega-week interactor revenue — about 85% — in the cumulative thirty minutes of US flash time.

The Monday flash misfired — sent at 5:15am PT instead of the scheduled 12:15pm PT, hitting mostly EU buyers. It produced 133 orders with a 3.0% interactor rate — below baseline. The EU audience hadn't built up the same widget-exposure baseline as US visitors. The misfire is, in its own way, a control: when the flash hits a population the widget hasn't primed, the concentration effect goes away.

A sale-aware widget, by design

The lead-up effect didn't happen by accident. In the days before the campaign launched, the merchant pre-loaded knowledge-base articles describing the mega-sale itself — the codes, the windows, the eligible products, the price tiers. Visitors who chatted with the widget in the prior two weeks weren't getting generic AI answers about discounts; they were getting merchant-authored, sale-specific responses:

  • "Any deals coming up?" → enriched, with the actual upcoming windows
  • "When's your next sale?" → grounded in the merchant's own KB entries
  • "Will [product] go on sale?" → answered with the eligible-products list

This reframes what the widget's product positioning actually is. It isn't just a chat widget. It's merchant-aware infrastructure that can be enriched campaign-by-campaign — sale articles loaded for a flash week, product-launch FAQ loaded the day of a launch, holiday shipping cutoffs loaded for December. The widget's answers track what the merchant is actually doing at any given moment, because the merchant tells it.

The data above is what that looks like measured: 2.0× baseline conversation volume in the lead-up, 8.5× concentration of those conversations into peak-intent purchases, 85% of the mega-week interactor revenue materializing in thirty minutes of cumulative flash time.

Holding up under load

The technical robustness story is real, just not the headline. The peak window (BIG60) hit ~88 orders/minute — about 9 checkouts per second. Widget cart-attribute injection held: only 1.5% of orders ended up attribution-unknown at peak, a clean signal that visitor IDs were being captured even under maximum checkout pressure.

Counter-intuitively, mega-week attribution-unknown actually improved vs the baseline week (18% → 9%). The intent surge produced more reliable visitor identification, not less.

Chat handler load during the six-minute flashes: 1-2 messages per minute. The flash buyers weren't chatting during the rush — they were checking out. The handler was nearly idle while the cart-attribute injection layer was absorbing nine checkouts a second. Different surfaces, different loads, both held.

The widget's job is the two weeks before

The original framing — "widget scales with traffic"— describes a load test. The data describes something else:

Search-widget conversation volume doubled in the lead-up to a mega-sale (143 → 296 daily averages). Inside the flash windows that followed, 15-19% of every order came from visitors who had chatted in the prior fourteen days — concentrating 8.5× above baseline. That primer cohort produced ~$21K of the ~$25K mega-week chat-attributed revenue (about 85%) in thirty cumulative minutes of flash time. The widget isn't running the rush. It's running the two weeks before.

For merchants who run flash sales — three takeaways from the data:

  • Pre-load the knowledge base for the campaign. Sale-specific KB articles in the lead-up are what turn the widget from a passive question-answerer into an active campaign primer. Loaded codes, windows, eligible products, price tiers — anything a curious visitor might ask, the widget should be able to answer authoritatively before the campaign launches.
  • Widen the attribution window beyond seven days. Many of the primer chats in this dataset happened 8-14 days before the flashes. At a 7d window, that signal would be substantially undercounted, and the lead-up effect would look much smaller than it actually is. 14d caught the full picture.
  • Treat under-exposed audience segments differently. The Monday misfire to EU buyers (3.0% interactor rate) and the GLOBAL60 international hour (11.8%) both ran below the US-flash rate of 15-19%. The implication: an audience that hasn't built up widget-exposure baseline doesn't convert at the primed rate. Wider international rollout, with enough lead-time for the same primer effect to compound, is where the next lift probably lives.

The handler held during the flash. That's the load test answer. But the value the merchant captured was already in the bag two weeks earlier.

Try Milly Chat

Want to see how this fits your store?
We'll set up a working session.

Get it on ShopifyTalk to sales →