Apple and University of Washington researchers have joined forces to tackle a growing concern with on‑device AI assistants. These systems can now navigate smartphone apps, fill out forms, and change settings on our behalf. Yet they often lack a sense of when an action could cause harm. A recent paper lays out a framework for teaching AI agents to distinguish harmless taps from those that require user approval.

The Risk of Unchecked Automation
Modern AI agents promise to handle tasks like booking tickets or posting updates without direct input. Apple’s own vision for Siri in the coming years includes fully automated routines, from ordering groceries to managing appointments. Convenience rises when these agents act autonomously. But mistakes can be costly. Deleting an account by accident or sending money without consent can have a long‑term impact on personal privacy and finances.
Building a Taxonomy of Action Impact
To give AI a better understanding of risk, researchers first held workshops with experts in user interface design and AI safety. They identified key questions for any action: can the user undo it, does it affect other people, or does it carry financial or privacy consequences? From these discussions emerged a structured taxonomy that labels each UI action across multiple dimensions. For example, sending a message may be reversible in minutes but irreversible after deletion, while a bank transfer is almost forever.
Gathering Real‑World Data
Next, the team created a simulated smartphone environment. High-stakes interactions, including updating their payment information or changing their account passwords, were documented by participants. These examples replaced those where previous datasets were omitted, mainly involving routine tasks such as opening menus. Giving a new name to every action recorded by means of the new taxonomy, researchers made up a rich training set of safe and risky tasks.
Teaching AI to Reason About Risk
The annotated data was then fed to large language models, including GPT‑4 and its multimodal variant. Models were prompted to classify actions by impact level and by the risk dimensions in the taxonomy. Including the taxonomy in prompts boosted performance, but even the top model achieved only about 58 percent accuracy at judging risk correctly.
Why AI Safety on Mobile Is Hard
Study results showed AI tended to err on the side of caution, flagging benign taps like clearing a calculator history as high risk. While over‑warning may feel safer, it risks creating an assistant that annoys users with needless confirmations. More troubling was the models’ trouble with nuanced judgments, such as the true reversibility window of an action or understanding its effect on other people.

Toward Smarter, Safer Assistants
Researchers stress that this taxonomy can guide the design of future AI policies. Individuals could set confirmation thresholds at a level that is comfortable to them. The programmers will be able to comment on the deficit of existing models and the areas of improvement. Since Siri and other AI assistants are getting smarter and smarter, it is crucial to learn what each tap means. The next step in the on‑device AI making it both powerful and trustworthy will be to teach the machines to pause and request consent.