How voice recognition will affect privacy in the Internet of Things
- While devices may be responding to voice commands, they will capture any sound that is present in the environment when the voice commands are being given.
- Unlike devices that collect information from traditional interfaces, devices that are responsive to sound must be constantly “listening.”
- The sound that follows the keyword may include a voice with a command.
- A device needs to determine when it is being addressed and then send the sounds that immediately follow to a server for interpretation.
- How sound is captured and processed may lead users to believe an organization is either respectful or a voyeur.
An Internet of Things (IoT) device using voice recognition requires the collection and interpretation of sounds. These sounds may extend far beyond the voice commands being given and cannot avoid being captured. How sound is captured and processed may lead users to believe an organization is either respectful or a voyeur.
@evankirstel: How voice recognition will affect #privacy in the Internet of Things on @CIOonline #IoT
A key differentiator for devices being created for use as part of the Internet of Things (IoT) is usability. Traditional user interfaces are being eschewed in favor of voice recognition. In fact, in some studies of voice recognition have shown that this approach to be faster and more accurate than traditional user interface methods.
The use of voice recognition requires the collection and interpretation of sounds. A device needs to determine when it is being addressed and then send the sounds that immediately follow to a server for interpretation. This presents some unique privacy challenges. An organization needs to be sensitive to:
Some of these items are discussed in the following sections:
Notice that the focus in the above bullet points are on sound, not voice. While devices may be responding to voice commands, they will capture any sound that is present in the environment when the voice commands are being given. Thinking to the future, it would not be unreasonable for a device to respond to a non-verbal sound such as a clap, a whistle, a door opening/closing, or maybe even a dog barking.
Unlike devices that collect information from traditional interfaces, devices that are responsive to sound must be constantly “listening.” The device must determine when it should perform some function to respond to an instruction or a query. Often this is done by a speaker providing a keyword such as “Alexa,” “Siri” or “OK Google.” The sounds that follow are then sent to a server for some response to be made.
The sound that follows the keyword may include a voice with a command. The sound will also include any background conversations that are occurring. In fact, the sound may contain a myriad of things that indicate such things as what tools are being used in the environment (a drill or a mixer for example), what music is preferred, what animals are present, how many people are in the vicinity, the subjects those people are discussing, or what TV or radio programs are being watched as just some examples. Each of these sounds reveals something about the user.
As non-verbal commands begin to be used to initiate activities by a device, the reliance on a key phrase becomes mute. However, sending all sound to a server for interpretation is costly, inefficient, and certainly a privacy concern. Local processing of sound in the devices will need to be increased to address this challenge if preserving privacy ins an objective. After the device locally interprets a sound, a traditional data-based message (albeit with some sound attached) may be sent to a server to provide instructions as to what action to take or what information is needed for a response to be provided.
The sound collected by the device and sent to the server will certainly be used to respond to the command or query. There are two other uses for the collected sounds that I would consider “safe.”
One is to improve the services offered by the device. This may include creating a profile of a device user. For example, if every evening at 10 p.m. a user requests to shut off the lights, might a device be able to ask at 10 p.m, “shall I turn off the lights?”
Similarly, when people communicate we use previous actions and context as a shorthand. For example, I have a dog named Lucy. Lucy favors one brand of dog food. My wife understands what product I am referring to when I say “we need more food for Lucy.” By building a profile, a device could be able to recognize what dog food it helped me order in the past and that Lucy is a dog, then properly respond when I tell the device to “order more food for Lucy.”
The “safe” second use for the sound is to improve the processing done by the server to interpret the sound. This purpose may be done with anonymized or pseudonymized sounds.
Another use may be to use the background sounds captured to add to a user profile. There is technology, for example, to identify a song, a TV show, or a movie just from sound that is captured. Clearly voice-driven IoT devices that rely on sound can examine the background sound, make a determination as to the song or TV show or movie and then add that to a user profile. There would clearly be a market for this information. I suggest. however, that this type of use may be perceived as voyeur-like and an affront to a user’s privacy akin to an in-resident Peeping Tom.
Some of the personal information that IoT devices may have could include access to retailer accounts, financial information (like what stocks are watched), search history, as well as access to other devices. This access will allow a device user to easily place an order, turn on lights, unlock doors, or obtain order status. A device should have mechanisms to authenticate that the user making the request has the right to access personal information or to request the actions to be performed.
Laws and regulations that provide direction for the processing of personal information must be followed. For example, sound that is captured from children may require the capturing organization to obtain parental permission or to have some other compensating control prior to capturing the sound. Sounds captured in the EU may require a legal basis be established prior to transferring the sound outside of the EU for processing. Consideration that voice patterns are considered biometrics must also be given.
The privacy legal and regulatory environment is rapidly evolving. These requirements must be constantly monitored for changes. In the absence of statutes or regulations, a privacy professional must provide well-founded guidance to their organization to anticipate how the requirements may develop.
Regardless of how the above-mentioned items are addressed, it is important that a privacy notice be provided explaining how the device collects information, how the information is used, how it is protected, who it is shared with, how long it is retained, and how it is ultimately destroyed.
This article is published as part of the IDG Contributor Network. Want to Join?