When AI Becomes Labour, Not Software

Context

Most AI products are still designed as tools. Smarter tools, faster tools, but tools all the same.

At the same time, a different reality is emerging. AI systems are executing work end-to-end, operating continuously, and being evaluated on outcomes rather than usage. In practice, they are functioning as employees.

This case study explores what changes when AI is treated as labour rather than software, and how product decisions must evolve as a result.

The Problem

Treating AI as “just another feature” creates three systemic failures.

1. Broken ownership
Software has users. Labour has accountability. Most AI products define neither clearly.

2. Misaligned value measurement
Feature adoption metrics fail to capture whether work is actually getting done.

3. Organisational friction
Teams bolt AI onto workflows without redesigning handoffs, escalation paths, or governance.

The result is predictable. Impressive demos, stalled pilots, and limited real-world impact.

After several pilots showed strong model performance but weak operational adoption, I pushed to reframe the core question:

How should products be designed when AI is expected to behave like a member of the workforce?

Reframing the Product

The critical shift was conceptual, not technical.

Instead of asking:
“What tasks can AI assist with?”

The product lens became:
“What work can this AI own, and under what conditions should it stop?”

That reframing drove every downstream decision, from system boundaries to pricing and compliance.

Key Product Decisions

  1. AI Requires an Employment Model, Not a Feature Spec

Most failed AI implementations collapse at the handoff.

The product explicitly separated work into:

  • Low-risk autonomous execution

  • Conditional execution with approval

  • Mandatory human control

Rather than generic “human-in-the-loop” assumptions, handoffs were triggered by:

  • Confidence thresholds

  • Risk classification

  • Contextual signals such as ambiguity or emotional volatility

In one healthcare workflow, early versions optimised for throughput increased downstream clinical review time. Reclassifying the agent as a “junior worker” with mandatory escalation thresholds reduced total human time per case, despite slower raw execution.

  1. Human–AI Handoffs Must Be Designed, Not Assumed

Decision thresholds

  • Below 40 percent risk: informational only

  • 40–65 percent: manager review recommended

  • Above 65 percent: mandatory human review with action logging

Ownership and review

  • Thresholds were owned by RevOps

  • Reviewed quarterly with Sales and Finance leadership

  • Adjusted as sales motion or market conditions changed

Overrides

  • All recommendations could be overridden

  • Overrides required a reason

  • Override data fed back into evaluation and process review

This preserved judgement while enforcing accountability.

  1. Performance Is Measured on Outcomes, Not Activity

Traditional software metrics were deliberately deprioritised.

Instead of:

  • Usage

  • Engagement

  • Feature adoption

The AI was evaluated like labour:

  • Cost per unit of work

  • Resolution completeness

  • Time to outcome

  • Human oversight load

This surfaced uncomfortable truths early, particularly where AI created downstream rework rather than genuine efficiency. It also made ROI discussions concrete rather than speculative.

  1. Pricing Must Reflect Labour Economics, Not SaaS Norms

Seat-based pricing breaks down when AI operates independently of humans.

The model shifted toward:

  • Outcome-based pricing where work completion could be measured

  • Consumption models tied to task volume and complexity

  • Clear comparison against equivalent human cost

This framing simplified procurement conversations and forced internal discipline around performance and value delivery.

  1. Compliance Is a Product Capability, Not a Legal Afterthought

In regulated environments, AI that behaves like labour inherits labour-level scrutiny.

The product incorporated:

  • Auditability of decisions and actions

  • Clear attribution of responsibility

  • Predictable update and change-control paths

Rather than slowing adoption, this became a differentiator. Buyers were not looking for maximal autonomy. They were looking for controlled reliability.

What This Case Study Demonstrates

This work was not about building an AI agent.

It was about:

  • Recognising a category shift before it becomes obvious

  • Translating abstract AI capability into concrete product decisions

  • Designing for second-order effects inside real organisations

  • Treating governance, economics, and change management as first-class product concerns

Most AI PM portfolios stop at what a model can do.
This case study focuses on what organisations must be ready to live with.

Why This Matters Now

AI is collapsing the boundary between software and labour. Products that ignore this will continue to struggle with trust, scale, and value realisation.

The next generation of successful AI products will not win because they are smarter.
They will win because they are designed to work responsibly inside human systems.

 

When AI Becomes Labour, Not Software

Context

Most AI products are still designed as tools. Smarter tools, faster tools, but tools all the same.

At the same time, a different reality is emerging. AI systems are executing work end-to-end, operating continuously, and being evaluated on outcomes rather than usage. In practice, they are functioning as employees.

This case study explores what changes when AI is treated as labour rather than software, and how product decisions must evolve as a result.

The Problem

Treating AI as “just another feature” creates three systemic failures.

1. Broken ownership
Software has users. Labour has accountability. Most AI products define neither clearly.

2. Misaligned value measurement
Feature adoption metrics fail to capture whether work is actually getting done.

3. Organisational friction
Teams bolt AI onto workflows without redesigning handoffs, escalation paths, or governance.

The result is predictable. Impressive demos, stalled pilots, and limited real-world impact.

After several pilots showed strong model performance but weak operational adoption, I pushed to reframe the core question:

How should products be designed when AI is expected to behave like a member of the workforce?

Reframing the Product

The critical shift was conceptual, not technical.

Instead of asking:
“What tasks can AI assist with?”

The product lens became:
“What work can this AI own, and under what conditions should it stop?”

That reframing drove every downstream decision, from system boundaries to pricing and compliance.

Key Product Decisions

  1. AI Requires an Employment Model, Not a Feature Spec

Once AI is treated as labour, it needs the same structural primitives as a human worker.

  • A defined scope of responsibility

  • Clear authority boundaries

  • Performance expectations

  • Escalation rules

  • Offboarding mechanisms

This led to designing AI agents with:

  • Explicit job definitions rather than open-ended capabilities

  • Task ownership that could be audited

  • Hard stop conditions instead of silent failure modes

This reduced operational risk and materially increased trust in regulated deployment environments.

  1. Human–AI Handoffs Must Be Designed, Not Assumed

Most failed AI implementations collapse at the handoff.

The product explicitly separated work into:

  • Low-risk autonomous execution

  • Conditional execution with approval

  • Mandatory human control

Rather than generic “human-in-the-loop” assumptions, handoffs were triggered by:

  • Confidence thresholds

  • Risk classification

  • Contextual signals such as ambiguity or emotional volatility

In one healthcare workflow, early versions optimised for throughput increased downstream clinical review time. Reclassifying the agent as a “junior worker” with mandatory escalation thresholds reduced total human time per case, despite slower raw execution.

  1. Performance Is Measured on Outcomes, Not Activity

Traditional software metrics were deliberately deprioritised.

Instead of:

  • Usage

  • Engagement

  • Feature adoption

The AI was evaluated like labour:

  • Cost per unit of work

  • Resolution completeness

  • Time to outcome

  • Human oversight load

This surfaced uncomfortable truths early, particularly where AI created downstream rework rather than genuine efficiency. It also made ROI discussions concrete rather than speculative.

  1. Pricing Must Reflect Labour Economics, Not SaaS Norms

Seat-based pricing breaks down when AI operates independently of humans.

The model shifted toward:

  • Outcome-based pricing where work completion could be measured

  • Consumption models tied to task volume and complexity

  • Clear comparison against equivalent human cost

This framing simplified procurement conversations and forced internal discipline around performance and value delivery.

  1. Compliance Is a Product Capability, Not a Legal Afterthought

In regulated environments, AI that behaves like labour inherits labour-level scrutiny.

The product incorporated:

  • Auditability of decisions and actions

  • Clear attribution of responsibility

  • Predictable update and change-control paths

Rather than slowing adoption, this became a differentiator. Buyers were not looking for maximal autonomy. They were looking for controlled reliability.

What This Case Study Demonstrates

This work was not about building an AI agent.

It was about:

  • Recognising a category shift before it becomes obvious

  • Translating abstract AI capability into concrete product decisions

  • Designing for second-order effects inside real organisations

  • Treating governance, economics, and change management as first-class product concerns

Most AI PM portfolios stop at what a model can do.
This case study focuses on what organisations must be ready to live with.

Why This Matters Now

AI is collapsing the boundary between software and labour. Products that ignore this will continue to struggle with trust, scale, and value realisation.

The next generation of successful AI products will not win because they are smarter.

 


When AI Becomes Labour, Not Software

Context

Most AI products are still designed as tools. Smarter tools, faster tools, but tools all the same.

At the same time, a different reality is emerging. AI systems are executing work end-to-end, operating continuously, and being evaluated on outcomes rather than usage. In practice, they are functioning as employees.

This case study explores what changes when AI is treated as labour rather than software, and how product decisions must evolve as a result.

The Problem

Treating AI as “just another feature” creates three systemic failures.

1. Broken ownership
Software has users. Labour has accountability. Most AI products define neither clearly.

2. Misaligned value measurement
Feature adoption metrics fail to capture whether work is actually getting done.

3. Organisational friction
Teams bolt AI onto workflows without redesigning handoffs, escalation paths, or governance.

The result is predictable. Impressive demos, stalled pilots, and limited real-world impact.

After several pilots showed strong model performance but weak operational adoption, I pushed to reframe the core question:

How should products be designed when AI is expected to behave like a member of the workforce?

Reframing the Product

The critical shift was conceptual, not technical.

Instead of asking:
“What tasks can AI assist with?”

The product lens became:
“What work can this AI own, and under what conditions should it stop?”

That reframing drove every downstream decision, from system boundaries to pricing and compliance.

Key Product Decisions

  1. AI Requires an Employment Model, Not a Feature Spec

Most failed AI implementations collapse at the handoff.

The product explicitly separated work into:

  • Low-risk autonomous execution

  • Conditional execution with approval

  • Mandatory human control

Rather than generic “human-in-the-loop” assumptions, handoffs were triggered by:

  • Confidence thresholds

  • Risk classification

  • Contextual signals such as ambiguity or emotional volatility

In one healthcare workflow, early versions optimised for throughput increased downstream clinical review time. Reclassifying the agent as a “junior worker” with mandatory escalation thresholds reduced total human time per case, despite slower raw execution.

  1. Human–AI Handoffs Must Be Designed, Not Assumed

Decision thresholds

  • Below 40 percent risk: informational only

  • 40–65 percent: manager review recommended

  • Above 65 percent: mandatory human review with action logging

Ownership and review

  • Thresholds were owned by RevOps

  • Reviewed quarterly with Sales and Finance leadership

  • Adjusted as sales motion or market conditions changed

Overrides

  • All recommendations could be overridden

  • Overrides required a reason

  • Override data fed back into evaluation and process review

This preserved judgement while enforcing accountability.

  1. Performance Is Measured on Outcomes, Not Activity

Traditional software metrics were deliberately deprioritised.

Instead of:

  • Usage

  • Engagement

  • Feature adoption

The AI was evaluated like labour:

  • Cost per unit of work

  • Resolution completeness

  • Time to outcome

  • Human oversight load

This surfaced uncomfortable truths early, particularly where AI created downstream rework rather than genuine efficiency. It also made ROI discussions concrete rather than speculative.

  1. Pricing Must Reflect Labour Economics, Not SaaS Norms

Seat-based pricing breaks down when AI operates independently of humans.

The model shifted toward:

  • Outcome-based pricing where work completion could be measured

  • Consumption models tied to task volume and complexity

  • Clear comparison against equivalent human cost

This framing simplified procurement conversations and forced internal discipline around performance and value delivery.

  1. Compliance Is a Product Capability, Not a Legal Afterthought

In regulated environments, AI that behaves like labour inherits labour-level scrutiny.

The product incorporated:

  • Auditability of decisions and actions

  • Clear attribution of responsibility

  • Predictable update and change-control paths

Rather than slowing adoption, this became a differentiator. Buyers were not looking for maximal autonomy. They were looking for controlled reliability.

What This Case Study Demonstrates

This work was not about building an AI agent.

It was about:

  • Recognising a category shift before it becomes obvious

  • Translating abstract AI capability into concrete product decisions

  • Designing for second-order effects inside real organisations

  • Treating governance, economics, and change management as first-class product concerns

Most AI PM portfolios stop at what a model can do.
This case study focuses on what organisations must be ready to live with.

Why This Matters Now

AI is collapsing the boundary between software and labour. Products that ignore this will continue to struggle with trust, scale, and value realisation.

The next generation of successful AI products will not win because they are smarter.
They will win because they are designed to work responsibly inside human systems.