Blog: Data & analytics
Are tech firms using your event data to train their AI?
28 August 2023 minute read
If you’ve read our blog before (or had the misfortune to be cornered by any of us at a drinks reception) you’ll know we’re big on the idea that event owners and attendees (not their technology suppliers) should own and control their own data.
We’ve written and spoken at length about platforms that (quietly) assume the role of Data controller and end up owning all your participant data – often using it to market other people’s events to your customers (and God knows what else).
Well, this month we learned about a new battleground that’s opening up in the event tech data privacy space. And yes, (surprise, surprise) it’s to do with AI.
Many tech companies, especially those in the event space, are sitting on a treasure trove of customer data and they’re coming to realise that this data can be valuable to train artificial intelligence models – both their own, and (potentially) those of third-party clients or partners.
Zoomgate
Videoconferencing behemoth, Zoom, inadvertently brought this issue to the world’s attention last month when it made a number of changes to its terms of service relating to the way it uses people’s data and the content of their calls.
There was widespread outrage, in particular at a clause in which users agreed to Zoom’s ‘access, use, collection, creation, modification, distribution, processing, sharing, maintenance, and storage’ of data ‘for any purpose’.
The new terms said that data could be used for a variety of functions, including ‘machine learning or artificial intelligence’ such as training new artificial intelligence models.
Users feared that the expansive rules would mean that Zoom could, for instance, use the data from meetings and webinars to train generative AI systems. A number of other tech giants have faced similar opprobrium over fears that they could be gathering user data with a view to training AI systems using it, and customers have become increasingly concerned about the potential invasion of privacy and ownership that could present.
During a few days which Zoom’s PR department would probably prefer to forget, the company sought to explain that its terms were misunderstood and updated them with a new line intended to make clear that chats would not be used to train AI systems. ‘Zoom will not use audio, video or chat customer content to train our artificial intelligence models without your consent,’ the terms now read.
The backlash forced Zoom to provide some transparency around how and when it is using people’s data. Like many other companies this year, Zoom has upped its AI capabilities, adding features which allow clients to summarise meetings without having to record an entire session. It’s the kind of handy functionality lots of workplace tools have been rolling out recently, including Google and Microsoft.
Should we be OK with this?
Generative AI’s ability to speed up tasks and create better work efficiency is clear. But it needs to be trained on vast datasets. Ensuring this is done ethically and responsibly, with respect for privacy, copyright and other laws, is less straightforward.
The central question is: should event owners and attendees be able to opt out of having their data used to train generative AI systems?
Since AI needs data to be trained and become smarter, which data should be used for that? Your event website content? What about your attendee lists and profiles? How about streaming video content of presentations? Direct messaging and 1:1 calls? Chat messages, attendee engagement data, polls, questions … You see where this is going.
While there are clearly potential benefits of integrating AI with products like Zoom, this mustn’t happen at the expense of customer (and attendee) privacy.
Zoom might be the current poster child for this, but they’re not the first (and won’t be the last) tech company to train new AI products on user behaviour – with or without users’ knowledge. In fact, this is just the next phase of a process that is already widespread across the internet.
‘The concept of technology platforms and devices collecting and analysing your data is nothing new,’ says Azadeh Williams, who sits on the board of the Global AI Ethics Institute.
‘Social media platforms and apps have been doing this for years.’ Facebook famously allowed third-party developers to scrape personal info, and Google illegally tracked the location of Android users. The major players in Silicon Valley are now primed to seize your content to speed development of artificial intelligence.
Take Google. The company says AI systems should be able to mine publishers’ work unless companies opt out; a position that would seem to turn copyright law on its head, and one that experts believe could harm the interests of smaller content creators.
They just want to improve the service they give me, no?
Well, they might want to improve the service, but there could be a bunch of other stuff going on.
One big mistake is to assume that the data a technology company might collect for AI training is not very different from data the company might collect about service use, says Claude Mandy, Chief Evangelist, Data Security at Symmetry Systems.
Technology companies have been using data about their customers’ use of services for a long time. Until now, this has generally been limited to metadata about the usage itself, rather than the actual content or data being generated by or stored in the services.
And while both these usage types involve customer data, there’s a big difference between data about the customer and data of the customer.
We’re very used to clicking boxes on platforms consenting to let our usage data be collected and analysed to improve that service. But in the age of AI, you might find it’s being used for a whole lot more.
‘There is no equivalence between using customer data to improve the user experience and for training AI. This is apples and oranges’, cautions Denis Mandich co-founder of Qrypt and former member of the US intelligence community.
‘AI has the additional risk of being individually predictive, putting people and companies in jeopardy.’
Consider the following example:
A start-up uses a virtual event platform to run internal technical discussions as well as customer conferences about its game-changing new product. In doing so, the start-up inevitably creates lots of video assets, talking about their proprietary tech, as well as records of chat, messaging, Q&A and other content – some of it commercially sensitive. Of course, they carefully restrict the audiences for these meetings.
But what if the event platform is making customer data available to train AI? A generative tool, like ChatGPT, trained on this data could potentially be a great source of information for a competitor to that start-up.
Oops!
According to Mandich, this would be an example of where ‘the issue is about the content, not the user experience for video / audio, quality, GUI, etc’.
Tech giants including Amazon and Microsoft have already warned employees not to share sensitive internal info with AI tools such as ChatGPT, for fear that confidential material could be leaked. If they’re concerned about a competitor’s AI getting a glimpse at their corporate secrets, it stands to reason that ordinary end users are vulnerable to the same exposure.
This is just the beginning
Zoom and other tech companies are scrambling to suck up as much available information as they can, because AI developers are facing a shortage of high-quality data in the decades ahead – a sort of informational bottleneck – that threatens to slow the progress of machine learning models.
Nate Sharadin, a fellow at the non-profit Center for AI Safety, says ‘Companies used to collect data but not know what to do with it’.
‘Now, there’s something profitable they can do: sell it on to model developers, or to data curators.’
Event technology is potential goldmine for training AI because pretty much everything in the world is discussed exhaustively at events. And not only do you have all those presentations, you have peoples’ reactions to that content. You have their (typically expert and nuanced) opinions about literally anything you could possibly want – from nuclear physics to ice cream marketing.
At this point, whether something is spoken during a meeting or typed into a chat window makes little difference, given how good AI transcription has become. If anything, there’s more information in the spoken sentence, in the form of intonation and (if the camera is on) facial expressions.
For virtual event platforms and videoconferencing tools, the potential applications might seem harmless. Maybe they’re thinking of using this data to teach AI to predict who will be the next person to speak in a panel, so that that person can be highlighted on the screen automatically? That doesn’t sound so bad.
Except that the platform – and everyone else in the market – is now incentivised to hoard as much information as they can. And once you have all that data, it’s tempting to try to do more things with it.
Are they even encrypting my data?
Then there’s the cyber security dimension. Today’s AI systems are very much capable of remembering, and then leaking, their vast stores of training data. Instead of just existing on your primary vendor’s servers, your data may also be ending up on the servers of multiple third parties, all using it for machine learning purposes.
Maybe all these instances of your data are really well encrypted every time they’re transmitted and wherever they’re at rest. Or, you know, maybe they’re not.
Either way, your vulnerability to a data breach just went up.
Is it worth it?
Does the value of AI enhanced features outweigh the risk and do we really know what the risk is?
If the big event platforms and virtual meeting tools end up where Zoom was forced to go – ie committing not to use customer content to train their AI models without explicit consent – then the onus of security will continue to fall on event owners and attendees, rather than the corporations analysing their inputs and behaviours.
Of course, history tells us that not all vendors will be transparent. For many, the lure of monetising customer data via AI use cases will be irresistible. And they’ll make clients read through hundreds of pages of Terms of Service to find the clause that lets them do it.
At AttendZen, we’ve already decided. No use of customer data to train AI models. Ever. No need to opt out, we’re just not going near it. Instead we’ll continue to mine our own anonymised service data and actual customer feedback to make our own tech work better.
Of course, we’re not owned by a venture capital firm, calling us up and ordering us to quietly change our Terms of Service because they just read an article in the Financial Times about how you can make a bunch of money selling data to brokers for AI training.
We’re betting that platforms that do play fast-and-loose with clients’ content will end up paying a high price in terms of trust.
Out there in the wider world, privacy will continue to erode, especially when the technology is moving so fast that legislation can’t keep up to provide safeguards.
So as individual consumers and professionals, we need to be more aware of the value of our data, and how it is being used when we ‘exchange’ it for services or enhancements.
We’re all going to have to get better at clicking NO, DISAGREE or OPT OUT.