The founders of Glenbrook Networks – a father and daughter team – have spent more than 20 years building the sophisticated and nuanced data collection tools that the company deploys for its clients. Along the way, they’ve been awarded 14 patents between them, in areas like natural language processing, optical character recognition and speech recognition patterns. These two are definitely Data Geniuses.
Edward grew up in Moscow and got his first degree at Lomonosov Moscow State University in mathematics, and his PhD with Russian Academy of Sciences was in the intersection between mathematics and computer science. Early on, he was part of the team that built AI-based system that became the First World Champions in Computer Chess.
A few years later, he co-founded Cognitive Technology Corporation, which built one of the earliest OCR engines used by many companies, including Canon, Corel and HP. Next in the early 90s, he co-founded Accent, Inc, which was an early but prominent player in the speech recognition space, and had, for example, Bill Gates as an investor.
Then came a company called BetterAccent, which Julia and Edward co-founded, soon after Julia completed her undergrad studies at UC Berkeley, and graduate work at Cornell, studying both mathematics and computer science.
BetterAccent’s mission was to help non-native English speakers learn to speak English with the right stress, rhythm and intonation – an extremely difficult task for speakers from languages that operate very differently than English. These aspects of language vary dramatically from one tongue to another and are the difference between speaking a language correctly in a technical sense, and actually being understood by native speakers.
BetterAccent’s system created a way to visualize facets of pronunciation while students were learning, helping them “see” which syllables to emphasize, how long or short syllables should be, and whether the intonation should be upward or downward, to sound like a native speaker. As the market moved, they saw an opportunity to sell their considerable IP to Rosetta Stone.
And then Glenbrook was born. In many ways, it is the culmination of their 20+ years of learning in AI, machine learning, natural language processing, speech recognition and optical character recognition.
In essence, Glenbrook’s patented tools allow them to find, interpret and collect data that most other data collectors would consider an impossible task. Their work is sometimes referred to as “deep web trawling” in that they’re able to do things like find, retrieve and structure data behind forms, as well as extract unindexed data from the deep web that no other search engines including Google reach.
They’re also able to interpret data. For example, with their background in OCR, they’re able to extract data from things like PDFs. They don’t just extract the characters, words, sentences from the document, but instead they interpret those characters so that the right data can be ported into the corresponding field in a database. Think of a PDF of a vendor invoice: pulling out the characters and numbers and the spacing between them and dumping them in a heap somewhere isn’t all that helpful. Glenbrook’s tools are “taught” to understand what the words and sections mean, and what the numbers represent – for example distinguishing between dates and currency values and quantities. With that kind of interpretation, companies can extract that data and drop it into, say, their A/P system without costly and error prone data entry.
Another really interesting use of their systems is interpreting multi-lingual data. Pretend you’re a company in the US, and you have a sales team in Tokyo. Naturally, the sales team is going to do everything in Japanese, including tracking their sales activities by company and contact. But, when HQ in Chicago wants to get a company-wide look at their prospect or customer lists, they need a way to not just translate the data, but interpret it. In this example, given the hieroglyph-based nature of proper names in Asian markets, using an automated translator like Google Translate would bring back a puddle of useless mush.
To solve this problem, Glenbrook preps and runs a complex “learning” process to understand what is meant by the characters – interpreting their meaning by fully understanding that country’s language, standards of communication and culture, and the context – to arrive at an accurate language conversion for individual’s names and company names. For example, imagine Google Translate trying to translate the company name Bridgestone from English into Japanese Kanji: you don’t want the company name in your database to come across as “an arched stone path over the flowing river”. Instead you want to have the name that Japanese use and which in this case is written phonetically in Katakana. You want it to be Bridgestone so that you can marry it up with all your other Bridgestone contacts in your Salesforce or CRM system.
Another – but certainly not the last – example of what their technology can do, is related to finding data that is otherwise un-findable. In some cases the data could exist on the web, but be unstructured and unindexed, without addressable urls behind forms. In these cases, Glenbrook can prepare a routine to find, collect and interpret the data from multiple sources, and then synthesize it into a coherent dataset for use.
I could go on. But in short: for “impossible” data requirements, keep Glenbrook in mind.
Mobile: +1 303-506-6947