Deepfake phishing: Can you trust that call from the CEO?
Deepfakes use artificial intelligence (AI) to make an extremely convincing fake audio or video recording of a person. These deepfakes can make it appear that someone did or said something they never did.
The ability to create deepfakes has been around for a long time; however, technology has improved drastically over time. With modern artificial intelligence and growing access to audio and videos of people on the Internet, deepfakes are becoming more convincing and easier to create than ever before.
See Infosec IQ in action
How deepfakes work
Modern deepfakes are made possible by sophisticated artificial neural networks (ANNs). Using a library of training data, an ANN can be taught to extract patterns and trends in the data to create the desired output. In cybersecurity, one potential application of an ANN is as a classifier to identify benign or malicious network traffic based on a model that it builds using a training set of many labeled examples.
To create a deepfake, an ANN is trained using audio or video of the desired subject. This training enables the network to learn how the subject looks or sounds under various conditions. For example, the ANN may learn how a person uses their expressions, details of intonation and accent etc.
Once the ANN has generated a model, it can be used to overlay the subject’s image or voice on a video/audio recording of an actor. The actor does/says something, and the ANN uses its knowledge of the subject to make it look like it was the subject instead. The result is a highly-convincing video/audio recording that can be used for blackmail, social engineering, psyops, or other purposes.
For audio deepfakes, an actor may not even be required to create a convincing deepfake. Text-to-speech programs already offer the ability to duplicate the voices of people, even with only a few seconds of recorded audio. This makes it possible for a deepfake to hold an actual conversation on the phone as the attacker types responses into a computer, and the fake audio is played on the phone.
The deepfake threat
Deepfake technology has already been used in multiple social engineering attacks. In early 2020, a bank manager was tricked into transferring $35 million to an attacker-controlled account.
The transfer was performed based on a phone call. The attackers used deepfake technology to duplicate the voice of a company’s director whom the bank manager had previously spoken with. The claim was that the $35 million was needed to close an acquisition, and the bank manager had received emails from the director and a lawyer hired to close the acquisition.
This incident demonstrates the power of believable deepfake technology. Some of how deepfake technology can be used and abused include:
- Sophisticated vishing: Vishing attacks are already an effective form of social engineering since people are more willing to trust phone calls, especially when they think they can recognize the caller’s voice. Deepfake phishing can make these attacks even more effective, as demonstrated by this $35 million theft.
- Out-of-band validation: In security awareness training, users are often trained to validate unusual requests out-of-band, such as calling the sender of an email. With deepfakes, an attacker may defeat this validation process by masquerading as the other party and validating the request.
- Voice authentication: The use of voiceprints is a common form of user authentication, especially in spy movies. If deepfakes can perfectly imitate the subject’s voice, they render this form of user authentication worthless.
- Coercion and blackmail: While trickery is the most common technique used in social engineering attacks, threat actors also use fear and blackmail to coerce their victims. Instead of just claiming that the attacker has video of a target in a compromising position, deepfakes make it possible to send convincing “proof” as well, strengthening their attacks.
- “Smart” devices: Smart devices like Alexa respond to voice commands and may be programmed to only work with a particular voice. Deepfakes could allow an attacker to send commands to these devices to record audio or take other harmful actions.
See Infosec IQ in action
The impacts of deepfakes on security training
Corporate security training programs commonly struggle to keep up with evolving attack vectors. For example, phishing attacks over SMS or corporate collaboration platforms are effective because employees are conditioned to think that phishing is only a threat in their email inbox. Performing essentially the same attack via a different medium allows attackers to catch targets off their guard and when they are more likely to fall for the scam.
With deepfakes, corporate training on the vishing threat becomes much more important and more complicated. Detecting a phishing email from a legitimate email address is hard enough, but, with deepfakes and called ID spoofing, an attacker can make a call that appears to be from a trusted party and sounds exactly like them.
Deepfakes will make it necessary to ensure that security training is prevention-focused rather than detection-focused. Suppose an employee legitimately can’t differentiate a real phone call from a deepfake. In that case, an organization needs to have authentication mechanisms and verification processes to ensure that an unusual or potentially dangerous request is legitimate before an employee carries it out.