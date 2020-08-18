This is the web version of Eye on A.I., ’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.

Back in January, I wrote a big story for about the ongoing revolution in natural language processing. These are A.I. systems that can manipulate and, to some degree, “understand” language.

Language processing is now entering a kind of golden age, in which once impossible tasks are increasingly within reach. These new systems are already starting to transform how businesses operate—and they stand poised to do so in a much bigger way in the coming years.

This summer has seen some startling examples of what these methods can accomplish. The most discussed breakthrough has been OpenAI’s GPT-3, which can generate long passages of coherent prose from a human-written prompt of just a line or two. In many cases, what the system generates is indistinguishable from human-written text.

GPT-3 is, for the moment, still something of a party trick—it is difficult to control, for instance, whether what the system generates is factually accurate, or to filter out racist or misogynistic ideas that it might have picked up from its large training set (which included not only the complete works of Shakespeare, but such repositories of human virtue as Reddit). But some companies are starting to build real products around it: One is creating a system that will generate complete emails from just a few bullet points. And a legal technology firm is experimenting with GPT-3 to see if it can aid in litigation discovery and compliance.

Another San Francisco A.I. company, Primer, creates software that helps analyze documents. It counts a number of U.S. intelligence agencies among its customers. Today it unveils a website, Primer Labs, that showcases three NLP systems it built in the past year and allows anyone to upload any text to play around with the tech.

I had interviewed John Bohannon, Primer’s Director of Science, back in December for that feature about NLP. Last week, I caught up with him again by Zoom. Bohannon told me things have only accelerated since we first talked.

He describes what is happening in NLP as “an industrial revolution,” where it is now becoming possible to string together multiple NLP tools—much the same way a mechanical engineer might combine boilers, flywheels, conveyor belts and presses—to create systems that can do real work in real businesses. And building these systems is getting easier and easier. “What used to take months,” he says, “now takes a week.”

Bohannon gave me early access to Primer Labs to let me experiment on texts of my own choosing.

The first tool: question-answering.

Upload any document and you can then ask questions in natural language to prompt the system to find an answer in the text. The system also suggests questions that you might want to ask.

The software was fantastic at answering a series of questions about a simple news story on Joe Biden’s selection of Kamala Harris as his veep pick.

However, when I uploaded a 2012 Securities and Exchange Commission filing from the pharmaceutical giant Merck that runs to 159 pages and about 100,000 words, its performance was hit-and-miss. When I asked it what Merck’s sales were in 2011, it returned the correct answer: $48 billion. But when I asked it what the company’s operating profit was, I received a message that the software “was having trouble answering that particular question.” And when I asked it what the company’s revenue recognition policies were, I received the inaccurate but hilarious reply that “non-GAAP EPS is the company’s revenue recognition policies.”

The next Primer tool: “named entity recognition.”

This is the task of identifying all the proper names in a document and figuring out which pronouns in the text refer to which people or which organizations. This task is relatively easy—if time-consuming—for humans, but it’s historically stumped computers. It is a good example of a skill that is now within software’s grasp thanks to the NLP revolution. In benchmark tests Primer has published, its system has outperformed similar software created by Google and Facebook.

I tried to stump Primer’s software by giving it a passage about the 19th-century French authors George Sand and Victor Hugo. I was hoping that the fact Sand is the male nom de plume of a female writer (her real name was Amantine Lucile Aurore Dupin) would confuse the system when it had to decide whether the pronoun “he” belonged to Sand or Hugo. But, to my surprise, the system performed flawlessly, understanding that every “he” in the passage referred to Hugo while “she” referred to Sands.

The final and perhaps most difficult task Primer Labs’ tools perform: summarization.

Accurately summarizing long documents is difficult for humans too. And gauging how useful a summary is can be highly subjective. But Primer came up with a clever way to automatically judge summary quality based on BERT, a very large language model that Google created and has made freely available. BERT is what is known as a “masked language model,” because its training consists of learning how to correctly guess what a hidden word in a text is. Primer’s BLANC judges summaries by assessing how much better BERT performs in this fill-in-the-blank game after having accessed the summary. The better BERT does, the better the summary. Thanks to BLANC, Primer was able to train a summarization tool that can generate pretty fluent summaries.

I fed Primer’s summarization tool a feature story I wrote for ’s August/September double-issue about how AstraZeneca has managed to leap ahead of its Big Pharma rivals in the quest for a COVID-19 vaccine. I was impressed at how well the software did in abstracting the lengthy article. It captured key points about AstraZeneca’s corporate turnaround as well as the importance of a COVID-19 vaccine.

But the system is still far from perfect. Another part of the tool tries to reduce the text to just a handful of key bullet points instead of whole paragraphs. Here the results were strangely off-base: The software fixated on factual information from an anecdote at the beginning of the article that was not essential, and yet missed crucial points contained further down in the body of the piece.

For a laugh, I fed the system T.S. Eliot’s “The Love Song of J. Alfred Prufrock.” Bohannon had warned me that the software would struggle to summarize more creative writing, particularly poetry, and the results were not pretty. Other than the fact that “the women come and go, speaking of Michelangelo,” the system wasn’t really sure what was happening. A lot of high school students could probably sympathize. But no English teacher would give Primer’s results high marks. (Interestingly, GPT-3 isn’t half bad at writing poetry. But that doesn’t mean it has any real understanding of what it’s writing.)

Then again, poetry is probably not the most pressing business case for Primer’s products. Summarization is a huge potential market. In 1995, the average daily reading requirement of a U.S. intelligence analyst assigned to follow the events in one country was just 20,000 words (or about the equivalent of two New Yorker longreads). By 2016, the same analyst’s daily reading load was estimated at 200,000 words—more than the most capable speed reader could possibly skim in 24 hours. This phenomenon is affecting analysts in finance and law too, and is a huge issue for people in the sciences trying to keep up with the explosion in published research. (In fact, to help out during the pandemic, Primer has created a site that summarizes each day’s new research papers on COVID-19.)

So the NLP revolution has arrived not a moment too soon. Automated tools that help condense and summarize and extract information from written text are becoming more and more essential. Today’s NLP isn’t perfect—but it is getting good enough to make a difference.

And with that, here’s the rest of this week’s A.I. news.

This story has been updated to correct the year in which U.S. intelligence analysts’ average daily reading load was 20,000 words. It was 1995 not 1956.