How to Measure the ROI of Generative AI Projects

TABLE OF CONTENT

Share this article

The reason why it seems difficult to measure ROI with generative AI.

Classical IT projects can be typically well monitored:

You process automate – you save X time – you save Y people or eliminate Z new employees.
You roll out a function of e-commerce – you measure the increase in conversion and troops.

Generative AI makes this image more complicated:

Benefits are often diffuse.

A coding assistant may pass dozens of repositories and teams. The content producer may enhance emailing campaigns, landing pages, and sales documents simultaneously. Who “owns” the value?

Impact is often indirect.

It is not always that sales copilots are closing deals: it can enhance the quality of proposals, acceleration, and consistency, which in turn will push win rates and deal volumes.

The tech is probabilistic.

Models fantasize, deteriorate with time, and perform differently in various tasks. There must be quality and risk in your ROI model.

There’s a lot of “option value.”

Most projects begin as experiments which develop capabilities: domain-tuned models, reusable prompts or improved pipelines. The potential of these capabilities is future uses that you can not today price that precisely.

In spite of this, you can definitely calculate ROI. It only needs a little different mental attitude and vice.

Got the thinking correct: AI is not a magic trick it is a business tool.

The initial principle of ROI measurement of generative AI is falsely straightforward:

Begin with the business problem, and not with the model.

Embeddings, fine-tuning, or APIs Before you discuss those, you should be able to answer, in a sentence or two:

Which business outcome do we aim to improve it?
What measure will inform us that we have?
Who in the business is interested in that measure to the extent of sponsoring this?

Examples:

Customer support:

Purpose: Decrease the average handle time and improve the first-contact resolution.

Measures: AHT, FCR, ticket backlog, cost per ticket.

Software development:

Goal: Ship at faster quality, the same quality or better quality.

Measures: Cycle time, lead time, bugs/LOC, frequency of deployment.

Marketing and sales content:

Objective: Improve quality and quantity of personalised content.

Measures: Launching speed of the campaign, variants that are tested, conversion rate, pipeline created.

The macro level has a massive opportunity. The McKinsey estimates that generative AI can generate value of 2.6-4.4 trillion/year across the analysed applications of AI on a global scale (15-40x times the overall impact of AI). McKinsey & Company But that top-down figure will only be significant to you in case you break it down to individual bottom-up projects that have quantifiable KPIs.

An example of a simple ROI framework of generative AI.

We will base ourselves on a simple and CFO-friendly ROI formula:

ROI = (Total Benefits [?] Total Costs) / Total Costs

In the case of generative AI projects, it is useful to split it into four buckets:

Direct financial benefits
Reduction of cost (Less support cost, reduced number of outsourced hours, less rework)
Better revenue (increased conversion rate, increased deal size, retention)
Less risk (less compliance, less errors).
Productivity and experience perks.
Time saved for employees
Shorter turnaround time (SLAs).
Improved customer/employee experience scores.
Strategic and capability advantages.
New services or products with the help of AI.
Competition (e.g. AI-first customer journeys)
Learning and data resources (fine-tuned models, prompt library, cleaned datasets)
Total cost of ownership
Infrastructure and model expenses.
Integration and engineering energy.
Data Governance and Data Preparation.
Training and change management.

In the case of reporting, we tend to suggest:

Measure (1) and (4) in hard (where possible) currency.
Encoding (2) into time value and measures of support.
Explain (3) qualitatively and through scenarios as opposed to compelling exact numbers.

This makes your ROI tale stringent without feigning the ability to accurately price all the strategic advantages.

Select the appropriate metrics and KPIs.

The set of KPIs that Generative AI needs is a little bit different than the traditional automation. In gauging the success of gen AI, Google Cloud, in particular, singles out groups of KPIs in terms of model accuracy, operational efficiency, user interaction, and financial impact. Google Cloud

You can think in five layers.

Financial KPIs

The following are the figures that your CFO is interested in:

cost per transaction/ per ticket /unit of output.
Avoided headcount (e.g. support, content, QA testers)
Revenue uplift (e.g. uplift in conversion, cross-sell, upsell)
Retention or reduction of churn.
Risk-adjusted savings (e.g. savings due to reduced errors)

Example:

Assuming that a customer support copilot decreases the cost per ticket by [?]150 to [?]110 and you are dealing with 500,000 tickets annually, the gross savings are:

The latter can be compared directly to your project and run costs.

Productivity KPIs

These are regarding doing a lot with the same people:

It takes time to do a job (write a proposal, write code, make a report).
Tasks to be done each week per individual.
Queue length / backlog
Lead times and SLAs

The productivity research by IBM emphasizes the necessity of comparing the productivity with a control group that does not use AI, as otherwise, you will be able to introduce the improvements to the AI system and not external factors. IBM

Customer experience KPIs

Generative AI reaches humans – customers, partners, internal users. So track:

before and after AI deployment Csat / NPS.
Initially, the rate of resolution.
Quality of response (through surveys or quality audit)
Rates of complaint or rate of escalation.

Example:

In case an AI-assisted chatbot results in FCR spiking to 75, it can be multiplied into a cost reduction (fewer follow-ups) and the experience (greater CSAT).

Quality, safety, and risk KPIs

By the virtue of the fact that generative AI can be hallucinated or create non-compliant content, ROI needs to consider quality and risk:

Error rate (e.g. wrong answers, problems with the code)
Rework rates (number of times human beings have to correct AI outputs)
Reviewed and found policy/compliance violations.
Incidents or near misses

Monitoring them will assist you in making sure that you are not saving money but at the same time exposing yourself to greater risk.

Technical and adoption KPIs

These are not business results, but the reason as to why you may not be realizing ROI yet:

Active users / number of users per week.
Incorporation of features (consumers using AI features)
Latency and uptime
The use of tokens and the price per request.
Model quality measures (e.g., scores on task specific evaluation)

Should the ROI be low and adoption be low then you are experiencing a change management issue and not necessarily a technology one.

Develop a strong base.

It is impossible to gauge improvement before knowing where to begin. This is where most of the gen AI projects fail.

To establish a baseline:

Map the current process (“as-is”).

Indicatively, in the case of a support workflow:

Customer submits ticket
Triage by L1 support
Escalation to L2
Resolution and documentation.
Collect historical data.
3-6 months of a measure of the appropriate KPIs (AHT, backlog, CSAT, etc.)
Representative of teams, areas and time.
Seize qualitative pain points.
We take too much time to find out what answers are past.
Product teams are dragged into asking the same questions.
Our documentation is never up to date.
Value present unit economics.
Ticket cost, campaign cost, feature cost.
External vendor costs
Overtime or rush charges

This base will form your without AI situation- the point upon which all ROI calculations will be based.

Design your experiment (do not omit this step)

The second one is: How do you realize that you can actually see changes that are caused by AI?The golden rule is that you should treat your AI rollout as an actual experiment.

A/B tests and control groups

Where feasible, run an A/B test:

Some of the users or teams should be randomly assigned to use the AI tool (treatment group).
Other ones follow the traditional process (control group).
Hold the other factors (process, tools, incentives) constant during the test period.

IBM suggests the explicit measurement against the non-AI control group in order to attribute productivity gains appropriately. IBM This can be as simple as:

Group A: Has access to the code assistant;

Group B: Does not;

Compare cycle time production rate and production volume within 4-8 weeks.

Seasonality (e.g. busy and quiet periods).

Major change control (new product launches, reorganisation of org).

Apply a variety of time intervals (e.g. 3-6 months prior, vs 3-6 months afterwards).

Introvert, over-instrumented.

Test the AI solution on a carefully selected group:

A single location, a single team or a single line.
Make sure that it is well logged: who accessed AI, when, to what ends, what were the results.

The information you will get here is your point of reference on scaling.

Transform impact into numbers.

At this point we enter the mechanics of the AI impact monetization.

Time saved- cost saved or capacity saved.

Suppose:

The time that a marketing team of 10 individuals takes to create first drafts of campaign copy is 40 percent of their time.

According to your experience, an AI-based generative assistant reduction cuts that drafting time by half.

Assuming that the loaded cost per individual is 2, 000,000 per annum:

Draft time per man =40% of time.

Savings = 40% x 50% = 20% of total time

So time value per person:

20% x 2,000,000 = 400,000

Across 10 people:

400,000 x 10 = 4,000,000 per year of capacity freed.

You now have choices:

Shove that capacity to more valuable work (more campaigns, more experiments).

Future-avoidance (i.e. you do not hire +3 people in the following year but remain flat).

In any case, the financial worth is tangible- even without necessarily reducing the size of your workforce at once.

Conversion and uplift increase in revenue.

In the case of revenue-based use cases (personalised recommendations, sales email, pricing suggestions) you can simply do the following:

Baseline conversion rate: 3%

The rate of post-AI conversion: 3.6% (A/B test)

Average order value: [?]5,000

Monthly traffic: 200,000 visitors

Baseline monthly revenue:

0.03 x 200,000 x [?]5,000 = [?]30,000,000

Post-AI monthly revenue:

0.036 x 200,000 x [?]5,000 = [?]36,000,000

Incremental revenue:

[?]36,000,000 [?] [?]30,000,000 = [?]6,000,000 per month

>= [?]72,000,000 per annum (confidence and seasonality subject to)

This will be then adjusted to gross margin to estimate how the profit will be affected not only revenue.

7.3 Improvement of risk and quality.

In risk use cases (e.g., AI helping in contract review, fraud detection, or compliance checks) expected the reduction of the expected loss:

Estimated loss = Likelihood of occurrence of incident x Monetary cost.

Suppose that AI-assisted review causes the surgical error rate of 5/year in high-severity contracts to drop to 2/year, and that the average cost of a high-severity contract incident to you [?]10,000,000 in penalties / rework / opportunity loss, then:

Baseline expected loss: 5 x [?]10,000,000 = [?]50,000,000

Post-AI expected loss: 2 x [?]10,000,000 = [?]20,000,000

Being risk-adjusted, then your payoff is:

[?]30,000,000 per year

It might seem fuzzy and regulators and risk teams think all the time this way. You can, too.

Do not underestimate cost side.

This is because Generative AI projects have a different cost structure than traditional automation.

One-time (CapEx-like) costs

Discovery, design work and proof of concept work.
Connection with the existing systems (CRM, ticketing, data warehouse).
The cleaning of data, labelling and governance configuration.
Minor adjustment, model-selection, prompt engineering.
Legal examination, audit and security.
In-house communications and pre-training.

They are front-loaded frequently at least at the beginning of their usage.

Ongoing (OpEx) costs

Model inference (tokens, API calls, or compute usage)
Updating of models, assessing and monitoring.
Maintenance and support of application.
Continued training and orientation of new employees.
Other data storage and observability systems.

Being only a rough pattern, people and change costs can often be as large as or larger than the pure technology cost, in organisations. Change management is the most important bottleneck as Microsoft CEO Satya Nadella has indicated that the most challenging aspect of AI is not the technology but getting people to alter the way they work. Business Insider

In your ROI model, be specific in line items of:

Training (Hours/cost per hour/employees/number of employees)
Process redesign workshops
AI coaches and team internal champions.

In that manner, it does not surprise anybody in the future.

Integrating everything: a worked example.

Now, we will go through a less complicated case: a coding assistant in an AI, that belongs to your engineering team.

Baseline

50 developers

Mean loaded cost per developer: [?]3,000, 000 per annum.

General cost of engineering per annum: [?]150,000,000.

Mechanical tasks (boilerplate, tests, refactoring) 35%

You run a pilot:

The assistant is made available to 10 developers in 8 weeks.

You put them in a control group of 10 other similar developers.

Results:

Pilot group ships has a faster rate of 25 percent and no defect rate increment.

Subjective review: Less boilerplate, more design and reviews.

Translating into value

Assuming that the assistant is capable of recapturing 15 percent of the time of the developers in a year (not 25, to permit noise and ramp-up):

Time value per developer:

15% x [?]3,000,000 = [?]450,000

Across 50 developers:

The potential annual capacity is [?]22,500,000 [?]450,000 x 50 = [?]22,500,000.

You may decide that:

Half of that is realised as true cost avoidance (less contractors, less overtime, less delays).

Half of this is realised as through-put-more features, faster roadmap, which can be indirectly linked to revenue.

And you might safely say [?]11,250,000 of hard saving and the rest strategic / throughput benefit.

Particularly, such productivity of code is already observed in industry, Nadella has indicated that up to 30% of Microsoft code is now written by AI, and that AI will continue to increase this percentage. New York Post That provides you with external confirmation that your suppositions are not outrageous.

Cost side

Assume:

Licences of AI tools and API cost: [?]6,000,000/annum.

Implementation and integration (one time): [?]4,000,000.

Training/change management (1 st year): [?]2,000,000.

Total first-year cost:

[?]6,000,000 + [?]4,000,000 + [?]2,000,000 = [?]12,000,000

ROI calculation (Year 1)

Using only hard savings ([?]11,250,000):

ROI = (Benefits [?] Costs) / Costs

= ([?]11,250,000 [?] [?]12,000,000) / [?]12,000,000

[?] [?]6.25%

Terminating there it would appear to be a negative ROI in year one (which is typical of foundational projects).

But if you include:

The other [?]11,250,000 in capacity xstrategic and throughput value

The fact that one-time costs decline in year 2 (there is no repeat cost of integration).

In the second year, then on the supposition that the costs are reduced to:

Licences/API: [?]6,000,000

Continued training and monitoring: [?]1, 000, 000.

Total: [?]7,000,000

Benefits (still [?]11,250,000 hard savings):

ROI (Year 2) = ([?]11,250,000 [?] [?]7,000,000) / [?]7,000,000

[?] 60.7%

It is also a far more realistic method of expression of ROI: multi-year, with the distinction between investment of foundational investment and steady-state returns clearly established.

Expanding: portfolio ROI and benchmarks.

The majority of organisations do not have single AI projects, they have a portfolio:

A support copilot
A coding assistant
A content generation tool
Knowledge search system within the organization.
Operations anomaly detection.
To manage this portfolio:
Standardise your template of ROI.
Same sections business case, baselines, metrics, experiment design, benefits, costs.
Same meaning of time saved, hard savings and strategic value.
Estimate a value-at-scale estimate of every use case.
Pilot impact x factor (greater number of teams, greater number of regions).
Add confidence ranges (low / medium high).

Comparison with payback period and NPV.

Payback period: The time to break even?

NPV: net present value cash flow.

Benchmarking can be used to sanity-check your numbers. An example of this is an analysis of generative AI ROI by industry, in which the largest companies in the financial services sector are realizing direct value of between 3.50 and 8.00 USD per dollar invested in the form of an efficient operation and a personalised experience. NextBuild

You need not quite hit those figures, but when your model indicates that you will never in your life make [?]1.10 back every [?]1.10 you lay out, it might imply:

Your use cases are too narrow.
You are too conservative on your assumptions of adoption.
Or your cost of implementation is excessive.

Pitfalls (and how to escape them).

Whenever we assist teams in making ROI reason, we see the same mistakes repeated over and over.

Pitfall 1: Considering time saved as cash.

The AIs save us 20 percent of all the time, so we saved 20 percent of payroll.

In reality:

The time saved can be only valuable when it is reallocated (to more valuable work) or prevents an expense (e.g. a hire, overtime, spending with a vendor).

Be specific: In what way will you prevent or do not do something in that liberated capacity?

Pitfall 2: Quality and risk negligence.

Reducing the handling time by half is a good idea, until you realise:

Error rates doubled.

Complaints increased.

Regulatory risk went up.

This is why quality and safety KPIs should not be placed in the appendix of your ROI dashboard, but they should be front and center citizens.

Pitfall 3: Excessive adoption.

It is simple to believe that all members of the target population will utilize your AI feature every day.

In practice:

It will not be trusted by some users at first.

Other managers will be resistant to changing their processes.

Certain operations do not match the existing quality of models.

A more realistic approach:

Adoption of the model in stages (e.g. 20 percent in Q1, 40 percent in Q2, 60 percent in Q3).

Integrate usage logs with survey data in order to narrow down assumptions.

Pitfall 4: Duplicating a benefit.

Examples:

The savings in time spent in support and the savings of time spent in engineering all accumulate to the same headcount line.

The same deals are reflected in revenue uplift of marketing and sales.

Be strict on the point at which the value is recognized on the P&L and ensure that every rupee is reported only once.

Pittrap 5: Losing the no-regret value.

At times, the greatest success of an AI project is not the eventual ROI but:

Constructing clean and well-managed data pipes.

Creating templates and prompts that can be reused.

Re-educating your employees to think and act like AI.

Platform investments those are not going to have one payback in one single project, but they do compound as you layer more uses.

The human side of ROI

Regardless of the sophistication of the models, generative AI is all about people: it should be more efficient and release it to prioritize the uniquely human judgment and creativity.

According to a research conducted by McKinsey regarding AI in the workplace, AI has the ability to enable individuals to discover novel productivity and creativity, rather than automating. McKinsey and Company That is why Satya Nadella stated that AI will not have humans replaced but simply alter the working and leadership patterns. Business Insider

As you are developing a business case, it is useful to narrate that story outright:

What will people cease to do (monotonous, low value work)?

What will they (strategy, experimentation, relationship-building) begin doing?

What will this remake the employee value proposition and contribute towards retention or attraction?

These are not “soft” benefits. Gradually, organisations that effectively introduce AI into the way individuals work are likely to speed up, and those that consider AI as a side project do not keep up.

An effective checklist that you can use tomorrow.

In case you are going to go green-light or revise a generative AI project, this is a brief checklist you can use at this moment:

Problem clarity

Is it possible to put the business problem in a single sentence?

What is the metric that we would like to move?

Baseline

Is there 3-6 months baseline data?

Have we end to end mapped the current process?

Metrics

What financial, productivity, quality, experience, and adoption KPIs shall we observe?

How often will we review them?

Experiment design

Is there a control group or there is a convincing before and after comparison?

What is the duration of the pilot and sample size?

Value model

How is it that we are converting the changes in metrics into rupees/dollars?

How much of the savings of time do we regard as hard savings or capacity?

Cost model

Have we included one time and continuing costs?

Were training and change management clearly mentioned?

Risk and governance

What are we tracking hallucinations, bias and non-compliance?

So who is the owner of the system after go-live?

Scale and portfolio

In case the pilot succeeds, then in which other places can we use the same pattern?

Compared to other AI projects, what is the ROI and strategic value of this project?

Provided that you can respond to such questions, you are already superior to most of the market.

Closing thoughts

Generative AI is at the point where the technology is maturing and there is an adoption of the technology gaining momentum. In a recent survey by McKinsey (2023) on AI, approximately 1/3 of organisations indicated that they were already using generative AI in at least one of their business functions (under one year after most tools were launched). McKinsey and Company +1 And, future studies propose that this is not the end of the productivity frontier that generative AI is likely to open. McKinsey & Company

However, in the cases of individual organisations, it will be reduced to a single factor, disciplined measurement.

base your projects on definite business issues.
Develop adequate experiments and baselines.
Converting changes into financial and strategic worth.
Speak truth in terms of costs, both the human aspect of change.
Consider a use case portfolio and a horizon of years.

Or, in Andrew Ng’s analogue of new electricity: AI is strong, however, you still have to wire the building, put up meters and choose what machine to turn on first.

This is how we conceptualize generative AI at TAV Tech Solutions, not as a buzzword, but as an investment category that is afforded the same rigour of any other cap-decision, albeit with a bit more imagination of what can be accomplished.

By establishing such discipline today, you will not only be testing generative AI out, but will be generating returns on it.