PATENT DROP: The speech recognition dilemma
Plus: Anduril and Boeing’s Game of Drones; Intel’s plan to break down communication barriers
Happy Thursday and welcome to Patent Drop!
This morning, we’re taking a look at why speech recognition matters so much to Big Tech companies, and the barriers that stand in the way. Plus, we’ll check out Boeing and Anduril’s new drones; and Intel’s plan to make communication easier for the hearing impaired.
But before we get into it, we wanted to ask: How do you stay competitive in today’s job market? Today’s Sponsor Brilliant is helping millions master essential concepts in math, AI, computer science, and more in just minutes a day. Ditch the lecture videos and opt for interactive learning—try Brilliant free for 30 days and score 20% off your premium subscription today.
Let’s take a peek.
#1. Say that again?
Tech companies seem to be really keen on making sure they hear you right.
To start, Baidu filed a patent application for “sound source localization tech,” which, as the name implies, helps a digital assistant determine the “direction of a sound source.” Using voice processing and deep learning, this tech works by taking in audio data from users’ commands and requests, marking those clips as coming from certain directions and training a neural network with those clips. Pinpointing exactly where a sound is coming from helps the virtual assistant with accuracy and speed of response.
Separately, Baidu also seeks to patent a system for “synthesizing speech,” which essentially reduces excess noise in audio data to better understand the commands that a user asks. For example, if you ask a smart home to call up a friend, this system will be able to better drown out the noise of your dishwasher, your TV and your upstairs neighbor stomping about.
And of course, Baidu is not the only one working on this: Google filed a patent for “Multi-Talker Overlapping Speech Recognition,” which helps its smart devices straighten out commands when people just won’t stop talking over each other. This tech works by tracking the start and end times of multiple overlapping lines of speech, and filling in the gaps when the speech gets too jumbled. This is in order to track who is saying what, and whether either one was directing their speech towards the device.
So if you decide to host a dinner party with your friends, and your parents, and your cousins and your in-laws, your smart speaker will be able to hear you over the cacophony if you ask it to play music or set a timer.
If you’ve read Patent Drop over the last few months, you’ll know that this is not the first time we’ve talked about speech recognition in this newsletter. Tons (and tons and tons) of big tech firms are trying to get speech recognition right, accounting for everything from the emotions in your voice to the number of people in your home to the secrets you want to kept between you and your smart speaker.
So why do tech companies care so much? Like other kinds of AI, speech recognition tech can be harnessed for much more than a Siri or Alexa device. It can be put to use in healthcare contexts with things like dictating clinical reports or generating prescriptions. In classrooms, it could help special needs kids improve reading and writing skills, and in courtrooms, it could automate court reporting.
“These are just a few examples of the potential applications of speech recognition technology,” Iu Ayala Portella, CEO and founder of machine learning consultancy Gradient Insight, told me in an email. “As the technology continues to improve and evolve, we are likely to see even more innovative uses in a wide range of fields.”
Another potential reason? The better and more widespread these companies can make speech recognition, the more access to data they have. And in today’s tech ecosystem, data is gold: the more access to it your company has, the more powerful it is.
Yet, there are significant barriers to getting speech recognition right, said Eldad Postan-Koren, CEO of enterprise AI company WINN.AI. For one, accents among non-native English speakers often make speech recognition difficult, and the tech companies working on this tech “need to take into consideration that the majority of the English speakers in the world have accents.” And filler noises in speech, like pauses or stutters, are still not perfected in speech recognition.
Perhaps the biggest barrier, however, is context, said Postan-Koren. “Communication is not about the words, it’s about the nuances: Your body language, cultural understanding, things like that are all important context.”
But the nature of AI is that it’s always improving, he said. It’s only a matter of time before tech develops enough that these mountains look like molehills.
#2. Game of Drones
If there’s anything to take away from the hundreds of patent applications flowing in from the U.S. Patent and Trademark Office every week, it’s that everybody loves a drone.
Let’s start with Boeing. Insitu, a subsidiary of Boeing that focuses on defense and government customers, filed a patent application for a system to “guide an unmanned aerial vehicle for recovery.” This tech relies on an aerial tether line with a sensor on it and a transmitter that guides the drone to “engage the tether line” to stop its flight, ensnare it and drag it down.
This system aims to retrieve a drone or a drone’s payload in mid-air “without need for a runway,” in a situation where a landing could damage it, such as on a boat. Using the sensor, this tech can also accurately guide a drone in windy and stormy conditions by adjusting its position based on data from the sensor on the tether line.
Next up, we’ve got a patent from defense tech company Anduril. The company is researching a “counter drone system” which it claims can detect whether or not a drone is a threat, and intercept if it is. The system uses a “counter drone” that patrols a geofenced area, with sensors indicating if another drone is a “potential target” through data such as when and where it came from, or if it’s associated with whoever is operating the counter drone.
If the other drone is indicated to be a threat, the counter drone can then attempt to “destroy, disable, or capture the threat drone.”
While consumer-grade drones aren’t exactly hard to get these days, the tech laid out in these patent applications is far more advanced than something you could get from a Best Buy. Insitu and Anduril’s advanced tech would most likely be applied exclusively to defense and military contexts, and represent a drop in a massive bucket of drone innovations in defense.
Drones are one of the fastest-growing segments of defense, said Jeffrey Starr, chief marketing officer at D-Fend Solutions. The tech can be used for intelligence gathering, battlefield strategy, and supply drops as we see in Insitu’s patent. Major military contractors like Lockheed Martin, Northop Gruman, and Boeing have all turned their attention to drone tech in some way, becoming key frontrunners in the market as they win massively lucrative contracts for drones.
With the growing attention, drone tech is only getting better, Starr said.
“Technology is making UAVs faster, harder to detect, and more durable,” Starr said. “They can fly increasingly longer distances and carry heavier payloads. They are also becoming easier to operate… This in turn stimulates (counter-drone) innovation.”
As drone tech grows, so do countermeasures against them, Starr said, such as we see in Anduril’s patent. Innovations in counter drones include tech that can detect hostile drones with pinpoint accuracy, discreetly capture hostile drones without collateral damage, or complete a “cyber takeover” of a hostile drone to safely take control and land it. Development of this tech is expanding as military reliance on drone tech grows and “commercial and ‘Do-it-Yourself’ UAVs” continue to be readily available, he said.
“UAVs are used in various forms of attacks and intelligence gathering, and countermeasures must be rapidly developed in a never-ending cat and mouse game combined with leapfrog on steroids,” Starr said.
I guess when you play the Game of Drones, you win or you fly (away).
SPONSORED BY BRILLIANT
AI Is Moving Quickly–Don’t Get Left Behind
Artificial Intelligence and the technologies of tomorrow are transforming the world at lightning speeds.
So how do you not just keep up, but get ahead? With Brilliant.
As a tiny daily habit, Brilliant teaches you math, computer science, and more in the time it takes to drink a cup of coffee.
Used by 10+ million lifelong learners: Research shows that interactive learning is 6x more effective than passive learning (i.e. lecture videos). Brilliant deploys hands-on, visual exercises across each bite-sized lesson so you can master core concepts quickly.
Designed for all ages and abilities
Over 50,000 5-star reviews on iOS and Google Play Store
Hailed by The New York Times, Business Insider, Wired, and others
Ready to upgrade your brain? Boost your analytical skills and try Brilliant free for 30 days. Patent Drop readers will even receive a 20% discount on premium memberships.
#3. Intel’s auto-translator
Intel wants to help break down communication barriers for the hard of hearing.
The company is seeking to patent a system for “automatic translation” between sign language and spoken language in a video call. Basically, this system takes in auto data from one user and translates it into signs held up by a “signing avatar” displayed on the screen. Then, when the other user signs back, the system captures this information and translates it into spoken word for the first user. Intel also notes that this system may include “text translations” for both parties, in which the spoken words are captioned in real-time for the hard-of-hearing user, and the signs are translated in real-time for the hearing user.
Intel noted that this tech could help reduce costs spent in hiring an interpreter, and preserve privacy between the two parties involved in a conversation by not using an interpreter. The company also noted that this tech could help teach people sign language through “self-training.”
In the filing, Intel said traditional communication methods, like an interpreter, handwriting back and forth, informal gestures, and one-directional speech-to-text solutions, “can be hard to use, expensive, not available on-demand, slow, incomplete, and/or ambiguous.”
Intel’s tech has several potential benefits. Of course, there's the obvious benefit that it could help break down communication barriers between the hearing and hard-of-hearing folks. But, as Intel noted in the patent, it could also help people learn sign language, both actively as a “self-training” exercise and passively just by using the tech, therefore potentially helping prepare for in-person communication as well.
But creating a program like Intel’s isn’t as simple as it sounds. For one, sign language is incredibly context-rich, from conveying the emotions in the phrases being signed to the language itself constantly evolving. Programming software to take all of this into account is far from an easy task.
Producing this tech also faces the same challenges of other speech-to-text technologies of trying to decide what exactly to include. For example, if it’s displaying signs from audio input exactly as the other person is saying it, this tech may pick up on misspoken words, stutters or filler words such as “like” and “um,” potentially leading to confusion. If it’s using what the National Deaf Center calls “meaning-for-meaning” interpretations, the tech would “eliminate false starts and misspeaks,” though this could be a more difficult engineering endeavor.
While Intel’s patent could help in some contexts, its tech faces the same problems that many speech recognition developers are trying to push through.
Haven’t had enough yet? We’ve got you covered.
Microsoft wants to prevent your burnout. The company seeks to patent “personalized health and wellbeing tools” that monitor for signs of a user’s stress using machine learning, combining behavioral signals (heart rate, respiration, typing speed, etc.) with contextual information (messages and calendar events).
Zoom wants to help you feel ready for your meetings. The company filed patent applications for tech that lets you preview conference topics and conference participants prior to joining – so you won’t be surprised if you see your boss on-screen during a call you thought was just with your coworkers.
Sony wants to put you in the driver's seat… without letting you get in the car. The company wants to patent a “hyper realistic drive simulation” to help train your kids to not crash once they get behind the wheel IRL (because Forza and Mario Kart aren’t exactly the best driving teachers).
What else is new?
TikTok CEO Shou Chew testified before congress for the first time this morning. He faced harsh criticism from lawmakers and rebutted allegations of Chinese spying on the app.
AI chatbot startup Character.AI raised $150 million in a funding round led by a16z, bringing its valuation to $1 billion.
The SEC warned Coinbase that it might be in violation of securities laws. The company said this warning wouldn’t impact its services.
Learning AI and more in minutes a day. How do you stay on top of technology when it’s constantly changing? With bite-sized, daily lessons from Brilliant. Millions of go-getters use Brilliant everyday to master AI, math, neural networks, and more. Why? Brilliant applies highly-visual, hands-on learning techniques to help you go beyond memorization (it’s also 6x more effective than lecture videos). Start building your skills today in just a few minutes. Patent Drop readers get a 30-day free trial PLUS 20% off a premium subscription.*
Have any comments, tips or suggestions? Drop us a line! Email at email@example.com or shoot us a DM on Twitter @patentdrop.