AI isn't be about cost cutting and it will be empowering for individuals

It's not 2022 any more; I don't think our regulators and governments have noticed, and so I don't understand how to comply with the latest standards

Sep 08, 2024

This week, the Department of Industry, Science and Resources released the voluntary AI safety standard and asked for feedback from the general public. Lots of very smart people in government, academia, and industry contributed to it. It’s well written, clear and easy-to-understand. It makes more sense than ASIC deciding to study whether the 81st-best model is good at summarisation. Have a look at the DISR page yourself.

https://www.industry.gov.au/publications/voluntary-ai-safety-standard/10-guardrails

I should be excited and happy. But I've spent the last few days wondering why I have a problem with it.

Background: I lecture in computer science (in AI), I run a small business myself, and I have consulting clients running the gamut from tiny to multinationals. If anyone is in a position to do something with these standards, it would be me. But I can’t see how I will be able to comply with it long term, and I don’t see any of my clients complying either.

I think the fundamental problem is that I have some very different assumptions about how AI will be deployed in Australia in the future.

The Safety Standard was written with the idea that AI will be centrally controlled and primarily run the way AI projects have traditionally been run. There also seems to be an undercurrent through all the examples that AI will be used for cost reduction.
My view is that the future is actually going to be about individually-controlled mass personalisation and customisation. AI for cost cutting only is a strategy (and not a particularly good one). AI as a tool for value creation is a better strategy, and one that will out-perform the cost-cutting-only crowd.

I should be more humble about it, but I'm pretty sure I'm right and the cohort that wrote the AI safety standard is wrong.

In 1440 were people expecting the printing presses to be owned by the church and royalty and just used to reduce duplication costs? That would be a reasonable thought at the time because bishops and kings were the only ones who commissioned large printing jobs — but it falls apart when you realise that Johann Fust was a (wealthy) commoner who could pay for the invention himself alone. Likewise, AI as a centrally-managed cost-reducer flies in the face of the evidence of what people use it for.

Rules and Regulations

When I think about some of the guidelines, rules and regulations that I have had to work with in my life doing IT consulting, the best ones are the ones that codified timless good practice.

Sarbanes-Oxley enforced things like separation of concerns, told us not to run day-to-day operations with admin privileges; and made us make sure that there's logging of everything exceptional and privileged.
PCI codified that you need to keep your software patched if it has a security vulnerability.

They both sound silly and trivial, but lots of people weren’t doing those things, and so guidelines for good practice were helpful.

The worst guidelines, rules, and regulations are those that codify a world that no longer exists.

Bill Burr has famously apologised for his guidance on passwords. Rotation of passwords in the presence of two-factor authentication isn't even seen as a worthwhile thing to do anymore.
There was a multinational I did some work for who swore by statements in some obscure cybersecurity guidelines document that mentioned requiring strong passwords. Thus they concluded that the cryptographically strong secure shell protocol (which can authenticate with public-private keys instead of passwords) had to be banned and instead the plaintext and insecure telnet protocol was how all computers should be accessed. How this was supposed to improve cybersecurity is anybody's guess, but it was codified in the rules that had been written in a past era, and they continued to comply long after it made no sense.

Good AI guidelines should have best practices for the world that we are going into, not the best practices for the world that's already in the past.

Can we guess what that future would look like?

Trying to live in the future

I always try to take on board Paul Buchheit’s advice to live in the future. So I have been doing an experiment for the last couple of months, forcing myself to think how I would do things if I had unlimited access to high quality AI. I have run up rather scary bills with OpenAI and Anthropic as a result!

Let me talk about two projects that have been taking up my time.

Reducing Karol’s accent

This is Karol Binkowski. He has a Polish accent, and I wanted to remove it from his videos to see if students do better if they can understand their lecturer more easily. He has put together a lot of video content — this is definitely not a task I wanted to do by hand.

But as I sat down to start writing a program to automate this task, I stepped back to wonder what an AI-first solution would look like.

If you just prompt a language model to process hundreds or even thousands of lines of tasks, it pretty quickly gets confused or makes a mistake. Even if it is able to comply with instructions 95% of the time, when you have a long sequence of instructions — 40 or more — the probability of it succeeding is pretty slim.

So I put together a framework for writing sequences and steps in those sequences. Each step has an action and a verification to make sure that it's worked correctly, and then some sort of sequencing to work out what to do based on what it's seen.

Here are the first couple of steps in the sequence for my accent reduction.

- Order: 1
- Action Prompt: List the project libraries; try to find one that matches the thing that I am calling `{LECTURE}`.
- Forced Action Function: `list_project_libraries`
- Verification Prompt: None
- Forced Verification Function: None
- Sequencing Prompt: If a library exists that matches the name of the lecture, then jump to "Select the multimedia library", otherwise, jump to "Create the multimedia library."

- Order: 2
- Action Prompt: Create a multimedia project library called `{LECTURE}` where `{LECTURE}` is a pseudonym as discussed elsewhere.
- Forced Action Function: `create_project_library`
- Verification Prompt: List the multimedia project libraries.
- Forced Verification Function: `list_project_libraries`
- Sequencing Prompt: If the multimedia project library just got created successfully, move on to "Select multimedia library." Otherwise, stop with failure now.

...

I wouldn’t call it elegant, but it got the 22-step sequence done reliably on every video.

Symmachus

For the Symmachus project I translated a little over eight thousand hitherto-untranslated Koine Greek documents.

Initially I tried to do this by scraping the Duke Database of Papyri one page at a time with an AI controlled bot. But it was too slow for my liking and was also getting a little expensive.

So instead, I wrote (well, I got Anthropic Claude to write) a small Python program that iterated through every papyrus and manuscript and invoked ChatGPT on each document separately, rather than trying to have one prompt that covered everything. The prompt ended up like this:

Extract the Koine Greek from this document as best as possible; if there's any Latin, extract that too and then provide a translation into English. Output in HTML format

I didn't have to write a parser to decipher the HTML of the Duke Database website — I was able to sidestep that piece of programming with an LLM.

Still, I was disappointed that I had to write some code because I had wanted the project to be a demonstration to all the papyrologists and archaeologists out there how to automate their work. Maybe I’ll try again next year and see if it is cost-effective and practical by then.

Y’all get a superpower

The observation to make is that prompting is a kind of substitute for programming.

I always tell people that learning how to program a computer is a superpower. If you can automate something, you don’t have to do it yourself. It can happen automatically. You have successfully tricked some dumb rocks into working for you. If that’s not a superpower, I don’t know what is.

That superpower is becoming available to everyone.

We have hints that show us dimly what that looks like already.

After two 45-minute lessons, an eight-year-old girl can use Cursor to build a web app with a color scheme that only an eight-year-old girl would appreciate. (Skip to the end to see it.)
You can ask Anthropic Claude to write some quite sophisticated web apps for you and generally succeed.

But I don’t think that’s the future of end-user initiated automation. That’s just some minor debris from the collision of the future with the present.

The priesthood of all users

Let me try a concrete example. Here's what I expect out of customer relationship management systems in the next few years.

You are a salesperson and you have just made a video call to a client. The call has been transcribed. (That's a pretty tame prediction given that people are already doing that today. I use fathom.video and krisp.ai for this today.)

The transcription will be passed over to the Customer Relationship Management system, where some processing will be done. There will be a system prompt that says something like:

Summarize the key points from this transcript. If there are any action items for other people in the organisation, add them to their to-do list.

There will be a company-level prompt that will say something like:

If the transcript suggests evidence of collusion, fraud or bribery, notify the integrity team.

And there's likely to be a per-user prompt. The salesperson might have a prompt that says something like:

When I refer to a client as being a dolphin, this means that they are a small client that may be viable if we have a number of similar clients — only together are they enough to make a whale worth chasing. Tag those clients with "Dolphin" and compare it to any other documents that have been tagged "Dolphin." If you find sufficient similarity in their needs with another client then give me a reminder follow-up for the other client that was similar.

Note that the salesperson in this scenario is programming an automation, but they would hardly recognize it as such. To them it would just feel like giving instructions to a PA.

The old world: automations were under the control of the high priesthood of IT systems developers who would take the beseechments of the staff and determine what could and would be automated.
The new world: the ability to create additional automations using natural language will be a common requirement for any respectable enterprise software. For small to medium enterprises, software that doesn't have automation capability will quickly fall out of the market for lack of buyers. End users will describe processing that needs to be done.

This will happen even without new software

Large language models aren’t large. The largest open-source model easily fits on a USB thumb drive that you can buy at Officeworks. Ollama and AnythingLLM let you run these models for free, running locally so that there’s no security problem. Even today, every computer that you could possibly buy or be issued (even in the most archaic of IT departments) can easily run a small model like phi3 or gemma2. Nearly everyone can run something like a small llama3.1 (7 billion parameter) model on the laptop they have today.

Even if no further improvements happen to model distillation to allow smaller models, we are only a small hardware upgrade away from runing almost every open source model that’s available today. A mixed-mode model that can understand a screenshot and write an AutoHotkey script is current technology, and prompting a model like that is something users can do themselves, empowering the user to do the automations that they think are useful and necessary for their jobs.

In the same way that no IT department will bother tracking how many windows you have open, no sane organisation will be tracking how many different prompts you have in your library to automate your day-to-day tasks. Maybe they could, but it would be overwhelming and pointless. Most of the prompts you write will be part of trivial AI systems that you used to solve one problem once. There will be so much automation happening that the big challenge of governance over the next five years will be creating a culture of encouraging people to share the successful and interesting prompts and automations that they have created.

We don’t train models around here any more

The world changed on November 30th, 2022. (Actually, it changes every day, but I’m trying to make a rhetorical point.) Prior to that, AI projects followed a very particular workflow that involved:

Data collection
Data curation
Training a model
Deployment
Monitoring

It was a task that could only be done by experienced professionals, and a generation of students have come through universities doing data science degrees in order to fit into that world.

But in the last 12 months, that lifecycle of deploying AI solutions has died.

In my consulting business, the creation of bespoke models dried up, and everybody just uses large language models for everything. When I ask my academic colleagues I find that most of them are of the opinion that there's no point in training up a new model for anything unless you are a major vendor with a billion dollar budget.

The major models have already been trained on all the public data that you could expect reasonably to be able to use in your own training. Their larger parameter sizes will mean that they can deliver accuracy that a pokey in-house solution in Australia will never be able to match.

There are a few exceptions to this.

Statisticians are still creating linear models. Sometimes I do some explainable non-parametric models. These are not for making predictions. They are for gaining insight into a data set. This used to be a large task. Now it's a trivial task if you ask ChatGPT to do.
Occasionally I see models being fine-tuned for some specific niche purpose. For example, a small vocabulary model that needs to operate very quickly in a memory or CPU constrained environment
In my research I'm training new models using radically new techniques to see if there are other ways of creating AI.

But I digress. For the vast majority of use cases, the giant models (ChatGPT, Claude, Gemini) already have internalised all public information and datasets, so you can get what you want simply by prompt engineering.

Prompt engineering isn't the purview of a "priesthood" of tech folks: lots of people have learned how to do prompt engineering. Lake Macquarie Council just launched a traineeship for staff who want to learn, because they can’t afford the salaries that computer science students are asking.

Where before we would have to have specialised technically-trained people program something to interact with a ML model, we now see:

Individuals make their own systems by prompting AI to do the task.
Automation of that is done through things like Zapier or n8n or equivalent.

So AI systems will proliferate and be created by individuals.

Commercially, individuals who do that will be rewarded because of their efficiency, and if we insist on saying that particular guardrails and deployment practices and procedures must be observed through a centralised process, then:

We will remove a dynamism and creativity from that economy or company;
And/or, people will just ignore the guardrails.

People respond to incentives. The organisations with incentive structures that hold back staff from using AI will, *gasp* be less efficient and successful than those organisations that encourage their staff to use AI to be efficient and to create new products and services.

Rules of thumb

Putting those last two sections together, I have two quick rules-of-thumb for identifying irrelevant discussions that are carry-overs from pre-2022:

It is irrelevant if it talks about "monitoring the input data sets" (no-one in Australia will be training significant models) or “addressing bias in training data”.
It is irrelevant if it assumes that deployment of an AI system is a large organisationally-visible centrally-managed activity which can be put through a careful process.

These are the ideas that people are still clinging to, because they learned about them pre-2022 and they seemed important then. They don’t really apply any more, and they will apply even less in the future.

AI Safety Standard

Which lets me get back to the AI Safety Standard. When I apply my rules-of-thumb for irrelevance to the principles, the result isn’t great:

- #1 "Establish, implement and publish an accountability process" — this assumes accountability at the leadership level, when most AI usage, development and monitoring will be at the individual level. If there is a process and it conflicts with the incentives staff see, it will be ignored.

- #2 "Establish and implement a risk management process" — there’s that word “process” again. Good luck with that. It assumes that there is a controlled list of AI systems and that there is a formal commissioning process for new AI capabilities. You can want to have this, you can even establish the process, but it won’t actually happen because most new AI capabilities will arise from staff spontaneously doing things.

- #3 “Protect AI systems and implement data governance systems” — this talks about how to manage the input data for training. Since there won’t be any model training, this is irrelevant.

- #4 “Test AI models and systems to evaluate model performance” — this assumes that deployment is a large and visible activity which is distinct from the day-to-day operational tasks of staff optimising their jobs. Making sure there are tools for evaluating the performance of different prompts is a great idea: I hope we see that in lots of applications. Metrics from that will be very useful. I also predict that not many people will use that functionality, because an AI-powered automation that you create for yourself only has to pass your own evaluation criteria, which are likely to be very loose. You can suggest to people that they be more careful in their evaluations, but if the incentives are to get more done faster, that will get dropped very early.

- #5 “Enable human control or intervention in an AI system” — this is actually sort-of OK, although there are footnotes talking about "procurement". But it's not really an AI thing: that's to do with having accountability of outcomes, which would be a true and necessary thing with or without AI involvement.

- #6 “Inform end-users regarding AI-enabled decisions” — we can legislate the transparency part if we feel like it, but it will devolve into a standardised email footer attached to everything that says "AI may have contributed in whole or in part to this response." This seems a bit pointless.

- #7 “Establish processes for people impacted” — this is a necessity for systems with or without AI. So this isn’t an AI principle. It’s just good systems design, even if you implemented the system using teams of clerks.

- #8 “Be transparent across the AI supply chain” — this assumes a distinction between developer and deployer, and that deployment is an organisationally-visible activity. Sometimes that distinction will exist — we’ll still buy solutions from vendors — but a lot of the time the deployment of a new AI capability will be an end-user updating a prompt in a text box in a system.

- #9 “Keep and maintain records to allow third parties to assess compliance with guardrails” — this fundamentally assumes that all AI systems are organisationally-visible. If AI systems are being created and and shared modified by end-users, and their individual value is low, then who is going to care enough to assess compliance for the vast majority of AI systems?

- #10 “Engage the stakeholders” — again, it assumes that AI systems are organisationally-visible, that there are identified stakeholders for each AI system and that those stakeholders are distinct from the owners and developers of those systems. Sometimes this will be true, of course. But I suspect that it will be far from universal.

It feels like this is a case study in why you shouldn't continue to use pre-2022 thinking about AI: 8/10 don’t look to be relevant, and the remaining 2/10 that are relevant apply regardless of whether AI is involved or not. I hope this isn’t going to be made mandatory — it will just be a major time sink that doesn’t achieve much.

What should be done?

If the Department of Industry, Science and Resources wants to provide guidance on the use of supervised machine learning (which is what the pre-2022 thinking matches), then the 10 guidelines will be fine. They could make them mandatory if they want: the number of supervised machine learning projects in Australia today is so low that it would have no discernable effect on the economy.

But let’s not pretend that this is relevant for 2024-2029 AI. I’ve tried to put together a list of things that DISR could work on that would be relevant and helpful. This isn’t a comprehensive list, and obviously it’s not as well thought out as what a large team of thinkers could put together.

Mandatory separation of concerns for AI

An AI can’t own a bank account, but it can have exclusive control over a cryptocurrency wallet.

An AI system that can arrange for human beings to do things and then pay for their services is a dangerous combination.

It’s dangerous from an AI alignment perspective. It’s also dangerous from a justice perspective.

AI can’t be held responsible for anything. The normal corollary of that is that AI can’t own or control assets; if AI has control over assets directly, and autonomy to act as it sees fit with those assets, then it has authority without responsibility. That never ends well.

If something terrible happens, we need to be able to trace the initiation of an activity back to a human being so that we can hold them responsible for the outcome. If the chain of command leads back to an AI and no further, there may be no way of achieving justice.

Setting up a system that can lead to that sort of injustice should be a crime, even if no terrible outcome has happened as yet.

In the same way that we outlaw precursor chemicals to dangerous drugs and have laws against conspiring for treason, we need to have laws that prevent a single AI system from being set up having both capabilities (command and payment), especially if there's no human oversight.

Labour frameworks to handle changes of duties

Our best estimates for present-day AI is that it gives white-collar workers a 25% productivity improvement. That sounds good! We need that!

But put another way, it means that 25% of white-collar workers' duties at work are likely to change radically over the next few years. That doesn’t guarantee that Fair Work would consider it a constructive dismissal, but it could well be a significant factor.

We urgently need a framework for allowing employers and employees to renegotiate contracts and job descriptions that will arise from this very large disruptive change.

Worker protections for automation

We need to codify regulations that if an employee has found a way to automate a large chunk of their job, that they cannot be summarily dismissed. The onus is on the employer to find ways to redeploy the employee who has automated their job out of existence, or otherwise to keep paying them at the same rate as their previous (now automated) role.

Otherwise we will find that employees will be reluctant to optimise and Australia's appalling productivity growth will continue.

Conclusion

This was a much longer essay than I planned, but I guess I’m making good progress on my book on the next five years of AI.

I also wrote it in a hurry, so that there’s enough time for a bit of a debate to play out before DISR finalises any proposals. I’m thoroughly expecting that I’ve made some flaws of logic or left some non-sequiturs in here.

What have I missed? What’s wrong with my vision of how AI will play out? Have I unreasonably dismissed the 10 guidelines?

Solresol

Discussion about this post