When AI Becomes Labour, Not Software

Context

Most AI products are still designed as tools. Smarter tools, faster tools, but tools all the same.

At the same time, a different reality is emerging. AI systems are executing work end-to-end, operating continuously, and being evaluated on outcomes rather than usage. In practice, they are functioning as employees.

This case study explores what changes when AI is treated as labour rather than software, and how product decisions must evolve as a result.

The Problem

Treating AI as “just another feature” creates three systemic failures.

1. Broken ownership
Software has users. Labour has accountability. Most AI products define neither clearly.

2. Misaligned value measurement
Feature adoption metrics fail to capture whether work is actually getting done.

3. Organisational friction
Teams bolt AI onto workflows without redesigning handoffs, escalation paths, or governance.

The result is predictable. Impressive demos, stalled pilots, and limited real-world impact.

After several pilots showed strong model performance but weak operational adoption, I pushed to reframe the core question:

How should products be designed when AI is expected to behave like a member of the workforce?

Reframing the Product

The critical shift was conceptual, not technical.

Instead of asking:
“What tasks can AI assist with?”

The product lens became:
“What work can this AI own, and under what conditions should it stop?”

That reframing drove every downstream decision, from system boundaries to pricing and compliance.

Key Product Decisions

AI Requires an Employment Model, Not a Feature Spec

Most failed AI implementations collapse at the handoff.

The product explicitly separated work into:

Low-risk autonomous execution
Conditional execution with approval
Mandatory human control

Rather than generic “human-in-the-loop” assumptions, handoffs were triggered by:

Confidence thresholds
Risk classification
Contextual signals such as ambiguity or emotional volatility

In one healthcare workflow, early versions optimised for throughput increased downstream clinical review time. Reclassifying the agent as a “junior worker” with mandatory escalation thresholds reduced total human time per case, despite slower raw execution.

Human–AI Handoffs Must Be Designed, Not Assumed

Decision thresholds

Below 40 percent risk: informational only
40–65 percent: manager review recommended
Above 65 percent: mandatory human review with action logging

Ownership and review

Thresholds were owned by RevOps
Reviewed quarterly with Sales and Finance leadership
Adjusted as sales motion or market conditions changed

Overrides

All recommendations could be overridden
Overrides required a reason
Override data fed back into evaluation and process review

This preserved judgement while enforcing accountability.

Performance Is Measured on Outcomes, Not Activity

Traditional software metrics were deliberately deprioritised.

Instead of:

Usage
Engagement
Feature adoption

The AI was evaluated like labour:

Cost per unit of work
Resolution completeness
Time to outcome
Human oversight load

This surfaced uncomfortable truths early, particularly where AI created downstream rework rather than genuine efficiency. It also made ROI discussions concrete rather than speculative.

Pricing Must Reflect Labour Economics, Not SaaS Norms

Seat-based pricing breaks down when AI operates independently of humans.

The model shifted toward:

Outcome-based pricing where work completion could be measured
Consumption models tied to task volume and complexity
Clear comparison against equivalent human cost

This framing simplified procurement conversations and forced internal discipline around performance and value delivery.

Compliance Is a Product Capability, Not a Legal Afterthought

In regulated environments, AI that behaves like labour inherits labour-level scrutiny.

The product incorporated:

Auditability of decisions and actions
Clear attribution of responsibility
Predictable update and change-control paths

Rather than slowing adoption, this became a differentiator. Buyers were not looking for maximal autonomy. They were looking for controlled reliability.

What This Case Study Demonstrates

This work was not about building an AI agent.

It was about:

Recognising a category shift before it becomes obvious
Translating abstract AI capability into concrete product decisions
Designing for second-order effects inside real organisations
Treating governance, economics, and change management as first-class product concerns

Most AI PM portfolios stop at what a model can do.
This case study focuses on what organisations must be ready to live with.

Why This Matters Now

AI is collapsing the boundary between software and labour. Products that ignore this will continue to struggle with trust, scale, and value realisation.

The next generation of successful AI products will not win because they are smarter.
They will win because they are designed to work responsibly inside human systems.