Case Summarization: It Started With One HR Case

4 min read

Case Summarization: It Started With One HR Case

4 min read

Case Summarization: It Started With One HR Case

Case Summarization: It Started With One HR Case

1) At a glance

  • Problem: HR fulfillers lose time reconstructing context after handoffs and writing resolution notes.

  • Approach: Pilot + two-wave research (productivity validation → post-live adoption learnings) → codify a reusable pattern → scale to LSD + CLM.

  • Key decisions: Summary depth, placement, refresh behavior, editability and provenance, attachment handling, sentiment framing.

  • Result: A workflow-native summarization pattern designed for trust and adoption, not just output quality.


2) The person at the center: the HR Service fulfiller

Monday morning. An HR fulfiller opens a case that has already been touched by two other people. Notes are scattered, context is implied, and an attachment might contain the detail that matters.

Before they can help the employee, they do two invisible jobs:

  1. reconstruct what happened

  2. write what they did in a way the next person can trust

Nobody celebrates this work, but it is the work. And it quietly consumes time, focus, and quality across every handoff. This case started there. With one HR case and one goal: reduce the hidden work without creating new risk.



3) Why “live” was the beginning, not the finish

Once summarization exists in the product, the question changes. It is no longer “can AI summarize?” It becomes “will agents rely on it inside the workflow they already use?”

In HR work, trust is fragile. Agents handle sensitive situations, ambiguous context, and decisions that need to be explainable. A summary that is slightly off does not feel like a small error. It can feel risky.

So we treated this as an adoption and workflow-fit problem: discoverability, freshness cues, attachment handling, and clear authorship when summary content is reused in work notes.


4) My role

Design Manager, HR Service Delivery (HR pilot owner)

HRSD was one of the pilot teams alongside CSM and ITSM. I staffed a designer from my team to partner closely with the AI Platform research team and bring the HR fulfiller lens into a cross-workflow capability.

My job was to own the HR outcome:

  • stayed closely plugged into how the study was framed, run, and synthesized

  • guided HR-specific decisions and tradeoffs

  • reviewed iterations end to end and held the quality bar

  • communicated progress and implications back to CBWF design leadership

  • treated the pilot as the foundation for a reusable workflow pattern

Team credit: Design execution was led by my designer in partnership with UXR. I owned direction, reviews, and decision-making.


5) The moment it clicked


Before

After

The fulfiller scans work notes, comments, and attachments

The case opens with a summary that surfaces the story, recent changes, and what matters next.

They stitch the story together and figure out what changed recently

They verify only the risky bits, not everything

Then they write resolution notes from scratch

When closing the loop, they reuse and correct content with clear authorship, so documentation becomes faster without losing accountability

The hidden tax is rereading, cross-checking, and guesswork.


This only works when the summary is discoverable, clearly updated, and safe to reuse.


6) What we did

We approached this in two waves, because the questions changed over time.

Wave 1: Validate productivity (controlled)

We measured whether summarization could reduce time spent on:

  • getting up to speed during handoffs

  • writing resolution notes at closure

This established the baseline value proposition: used at the right moments, summarization can reduce time spent on context reconstruction and documentation.



Wave 2: Post-live reality check (adoption and trust)

Once the feature existed in the world, we studied how agents evaluated it in practice:

  • can I find it quickly in my workspace

  • does it feel current

  • does it include what I need, especially attachments

  • what does it mean for trust and accountability when reused in work notes

  • where does sentiment fit, and how might it affect agents emotionally

This second wave is where the design nuance lived.


Who we tested with, and what environments we tested in across HR, CSM, and ITSM.


7) What we learned

Agents wanted more than a recap

Agents were not asking for “a nicer paragraph.” They were trying to avoid manual searching and reduce verification.

They wanted:

  • temporal context, not just content

  • sources and transparency, not magic

  • attachment awareness, not surprises

  • a path forward, not a dead-end card

Discoverability was the feature

When agents could not immediately see the output, they assumed the feature failed. Some clicked summarize repeatedly, not because they loved it, but because the UI did not make the result obvious. In support work, “out of sight” becomes “not reliable.”

Update visibility was a trust lever

A summary is only useful if it feels current. Subtle update cues get missed in real work. If the agent misses an update, they do not think “I overlooked it,” they think “this isn’t reliable.”

Change-highlighting had to match mental models

Agents liked “show me what changed” because it reduces comparison work. But it only works when it is precise:

  • highlight only what is new

  • keep it visible long enough to notice

  • include attachment-related updates too

Attachments defined whether the summary felt complete

For HR cases, attachments often contain the real answer. The requirement was not perfect summarisation of every file. It was clarity and predictable behavior.

Sentiment had potential, but raised real risk

Sentiment intrigued agents, but workflow purpose was unclear. People wanted it to update automatically, worried it could be used against them, and flagged emotional toll from constant negative exposure.

8) The decisions that shaped what shipped



9) Hard moment

The toughest conversation was not about AI summary quality. It was about where the summary deserved to live.

Workspace teams were cautious about adding another persistent surface. The fear was clutter and distraction. From the fulfiller side, the fear was different: if the summary is buried, it might as well not exist. In support work, anything hard to find quickly becomes “not reliable.”

We resolved it by shifting the debate from opinion to workflow truth:

  • the summary had to be visible where decisions happen, not in a secondary corner

  • if we were going to keep it lightweight, we owed agents strong cues for freshness and change

That unlocked the compromise: keep the surface scannable, but invest in discoverability and update signaling so it earned its spot.


10) What changed in the experience

These were the concrete UX directions that emerged and guided iteration:

  • strengthened update visibility so agents could not miss when the summary needed attention

  • aligned change-highlighting to mark only newly added content, long enough to notice

  • clarified expectations around sources and transparency so trust did not rely on blind faith

  • surfaced attachments as first-class context and made inclusion behavior predictable

  • clarified authorship when summary content is reused for resolution notes

  • treated sentiment as a supported signal only when it has a clear purpose, safe framing, and reliable update behaviour

This is what made summarisation feel less like “AI text” and more like a workflow tool.


10) Outcomes

In controlled tasks, agents completed handoff ramp-up and resolution documentation faster with AI summaries than without across HR, CSM, and ITSM. In the post-live wave, we saw that adoption depended less on wording and more on discoverability, freshness cues, predictable attachment handling, and clear authorship when content is reused.


11) Trust guardrails

This work was not about replacing agent judgment. It was about reducing hidden work without creating new risk.

  • Human oversight is assumed: the summary reduces scanning, it does not eliminate verification in sensitive cases

  • Freshness is visible: update states and change-highlighting prevent silent drift

  • Transparency is designed: sources and inclusion rules reduce blind trust

  • Authorship is clear: edited reuse in work notes does not blur what is AI and what is human

  • Sentiment is handled carefully: emotional impact and misuse concerns are treated as design constraints, not edge cases

12) Leadership moves that mattered

  • I ran recurring reviews focused on summary placement, visibility, and refresh behavior because those choices determined adoption more than wording.

  • I pushed for provenance clarity once “edit and reuse into work notes” entered the conversation, because trust collapses when authorship is ambiguous.

  • I kept design leadership aligned with a simple split: what we could improve quickly now, and what required deeper research next (open prompting and quick actions).

13) From pilot to pattern to adoption (HRSD → LSD + CLM)

We did not treat HRSD as a one-off. We treated it as the proving ground for a reusable pattern for workflow-native AI summaries.

Reusable backbone we carried forward:

  • a shared summary information model (what it must answer to reduce context reconstruction)

  • trust cues and update behavior (clear refresh states, visible “what changed,” avoid silent updates)

  • placement principles (close to where decisions happen, not buried)

  • provenance decisions (how human edits and AI text are distinguished when reused)

  • attachment expectations (clarity on what was included and what wasn’t)

After proving the pattern in HR fulfillment, we extended it into LSD and CLM. The core stayed consistent, while each area tuned surface details and content emphasis based on workflow and risk profile.

What HR taught that generalized: freshness cues, attachment clarity, and authorship signals matter across domains because they govern trust.
What needed tuning in LSD/CLM: which sections carried the most weight and how “next steps” should be expressed based on workflow ownership and risk.


14) Shipped and iterated vs Next and research

Shipped and iterated (pilot-driven UX direction)

Next and research track

Stronger update visibility so agents notice when the summary changed

Open prompting patterns: how agents want to ask follow-ups and drive actions

More precise change-highlighting so agents can see what’s new

Quick actions: what actions belong near the summary, and how to customize by org

Clearer attachment handling expectations and predictable behaviors

Deeper source transparency patterns (best UX for citations at scale)

Clearer authorship and provenance when content is reused in work notes

Sentiment positioning only after workflow purpose and emotional impact are resolved


15) What we chose not to do

  • We didn’t treat sentiment as ready to scale until workflow purpose and emotional implications were clearer.

  • We didn’t bolt open prompting onto the summary as an afterthought. We separated it as a dedicated research track so interaction patterns could be designed intentionally.


16) What I would do differently

I would pair qualitative insights with lightweight instrumentation earlier:

  • repeated clicks on summarize as a discoverability signal

  • time spent verifying summary vs using it directly

  • attachment open rates to guide where summarization investment matters most

Contents

Role

UX & UI

Branding

Product Strategy

Website Development

Team

Duration and date

2 Months

December - November 2023

Role

UX & UI

Branding

Product Strategy

Website Development

Team

Duration and date

2 Months

December - November 2023