Inference over the World of Private Data

Ah, the holy grail of AI! Private data pods are a really hot topic right now, and for good reason - they offer the potential to unlock the power of AI while keeping private data safe and secure. By allowing people to keep their data in a secure, private "pod" and then only granting access to AI algorithms when needed, it could create a whole new world of possibilities for personalized services and applications while still protecting people's privacy. Imagine being able to get personalized health advice, financial planning, or even just customized entertainment recommendations without having to worry about your data being misused or sold to third parties! - pi.ai

Chat-GPT

The launch of Chat-GPT by Open AI in November 2022 was an event which made a huge number of people now aware of Large Language Model (LLM) technology. As a result, there was a tendency on the ensuing months to assume that any powerful AI would be packaged and exposed in the same way. People were warned not to use Chat-GPT to talk about confidential things - there was no guarantee that information you shared with it -- secrets or ideas-- would not end up being used by the company running the system.

The AI systems typically were packaged to provide these generative functions:

A text-prompt image generator
A text-prompt text generator
A conversational "chatbot"

In each case, users entered text - a description or request or question - which was fed to the LLM, and the result then, as text or image, given to the users.

Improved wiring 1: Privacy

From the point of view of privacy and user empowerment, to be a beneficent app, an improved wiring of the LLM would involve all of the insights and benefit being available to the user themselves.

The "Pi" Personal Intelligence from Inflection.AI provides this wiring, it seems. You should now be able to use use Pi as a personal helper, assistant, maybe therapist, that you can trust. I have tried it, and indeed you can build up a relatively long conversation and it seems to take on a better understanding of you and your goals.

Adding my own Personal Data

Clearly an AI that works for you, but does not have access to your personal data is not going to as good a job as one which does. In September 2023, the Bard team announced that you allow Bard access to your personal data such as GMail. Obviously this is going to work most easily when you have a lot of personal data with the same provider and the same provider as your AI, and so the risk of further monopolization and data siloing is increased. When the journalist Kevin Roose tried Bard on his personal Gmail, it did not suddenly gain more dramatic insight as a person would have, I suspect.

Why? Perhaps because although it had been shown his email, it had not been trained on email. For public things, it had wisdom about how they all connect. For private email, no wisdom. How could another re-wiring give us that wisdom?

Adding insights from Other People's Personal Data

If we are going to have the sort of strong AI assistants which I've written about as example as Charlie, the AI which works for you, and Bill Gates describes in his recent note, then that system is going to need to be able to understand people's private lives as well all the public data on the web. Trying to get that insight is a hard problem - all that private data isn't just availaavailableble to throw in as training data. Well, it isn't unless you are a very large platform willing to train on all its users, or a state-run system in an autocratic state.

Group insights

One way to proceed is, rather than doing inference over the whole private web, to do it over a subset, and specifically a group of people who have something in common which gives them a specific inventive to share their data to a achieve greater insight.

Data Trusts - Patient Groups

A classic use case which has been around for a long time, because it is motivating, is that of patients with the same disease sharing data in oder to get inghts about treatments, and even discover new drugs.

This typically requires a social construction in which the level of trust is increased so that individuals are happy to have their individual data contributed to the group's good. These have been called Data Trusts or, more recently, Data Institutions.

Multiparty Computation - Gig Econ workers example

A projects at Oxford Martin School's EWADA group allows Uber drivers, or generically , workers in the Gig Economy who pick up small jobs at different rates and times at conditions, to join a system which will tell them group information, like average pay and typical rates. The system works using a form of Multi Party Computation (MPC) - a system where the result is produced as a function of private data without revealing the private input data. It uses a federated network of MPC nodes to scale. MPC is not the sort of thing which will scale to training an LLM, but it can have a role to play in achieving group insight from personal data.

Consent

A common tool in most of these architectures is the tracking of user consent for their data to be used in a way to the common good or the group good.

Conclusion

We will need the power of AI assistants which have access to and also understand our private data. Various combinations have already been produced, and can be imagined in future, of: respecting personal data, using personal data, and training over a lot or a little of personal data. There will be tech policy and legal parts of the design in different cases, in order provide protection which the data architecture cannot give themselves. The goals in sight, such as medical solutions for group of patients, and powerful agents which significantly empower a human being, are very exciting. Obviously the Solid Protoocol makes each combination easier, especially when coupled with standard consent management.

Disclaimer

This a space in which things are changing rapidly and peeople have new ideas all the time, and so there is to attempt to survey the field thoroughly. These are just some systems and combinations I had come across.

References

Zhao et al. Libertas: Privacy-Preserving Computation for Decentralised Personal Data Stores.
Ramesh et al. Decentralised, Scalable and Privacy-Preserving Synthetic Data Generation
What is a data trust?Jack Hardinges, ODI, 2018
Data trusts in 2020Jack Hardinges, ODI, 2020
Introducing Pi, Your Personal AI Inflection, 2023 May 2
AI is about to completely change how you use computers", Bill Gates, November 09, 2023
Google’s Bard Just Got More Powerful. It’s Still Erratic. Kevin Roose, Sept. 20, 2023
Bard can now connect to your Google apps and services Yury Pinsky, Sep 19, 2023

Up to Design Issues

Tim BL

Inference from Private Data