Facebook researchers use maths for better translations

Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu and others are constantly seeking to improve their translation tools. (File/AFP)
Updated 13 October 2019

Facebook researchers use maths for better translations

  • Facebook researchers say rendering words into figures and exploiting mathematical similarities between languages is a promising avenue
  • Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business

PARIS: Designers of machine translation tools still mostly rely on dictionaries to make a foreign language understandable. But now there is a new way: numbers.

Facebook researchers say rendering words into figures and exploiting mathematical similarities between languages is a promising avenue — even if a universal communicator a la Star Trek remains a distant dream.

Powerful automatic translation is a big priority for Internet giants. Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business.

Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu and others are constantly seeking to improve their translation tools.

Facebook has artificial intelligence experts on the job at one of its research labs in Paris. Up to 200 languages are currently used on Facebook, said Antoine Bordes, European co-director of fundamental AI research for the social network.

Automatic translation is currently based on having large databases of identical texts in both languages to work from. But for many language pairs there just aren’t enough such parallel texts.

That’s why researchers have been looking for another method, like the system developed by Facebook which creates a mathematical representation for words.

Each word becomes a “vector” in a space of several hundred dimensions. Words that have close associations in the spoken language also find themselves close to each other in this vector space.

“For example, if you take the words ‘cat’ and ‘dog’, semantically, they are words that describe a similar thing, so they will be extremely close together physically” in the vector space, said Guillaume Lample, one of the system’s designers.

“If you take words like Madrid, London, Paris, which are European capital cities, it’s the same idea.”

These language maps can then be linked to one another using algorithms — at first roughly, but eventually becoming more refined, until entire phrases can be matched without too many errors.

Lample said results are already promising. For the language pair of English-Romanian, Facebook’s current machine translation system is “equal or maybe a bit worse” than the word vector system, said Lample.

But for the rarer language pair of English-Urdu, where Facebook’s traditional system doesn’t have many bilingual texts to reference, the word vector system is already superior, he said.

But could the method allow translation from, say, Basque into the language of an Amazonian tribe? In theory, yes, said Lample, but in practice a large body of written texts are needed to map the language, something lacking in Amazonian tribal languages.

“If you have just tens of thousands of phrases, it won’t work. You need several hundreds of thousands,” he said.

Experts at France’s CNRS national scientific center said the approach Lample has taken for Facebook could produce useful results, even if it doesn’t result in perfect translations.

Thierry Poibeau of CNRS’s Lattice laboratory, which also does research into machine translation, called the word vector approach “a conceptual revolution.”

He said “translating without parallel data” — dictionaries or versions of the same documents in both languages — “is something of the Holy Grail” of machine translation.

“But the question is what level of performance can be expected” from the word vector method, said Poibeau. The method “can give an idea of the original text” but the capability for a good translation every time remains unproven.

Francois Yvon, a researcher at CNRS’s Computer Science Laboratory for Mechanics and Engineering Sciences, said “the linking of languages is much more difficult” when they are far removed from one another.

“The manner of denoting concepts in Chinese is completely different from French,” he added.
However even imperfect translations can be useful, said Yvon, and could prove sufficient to track hate speech, a major priority for Facebook.


US impeachment hearings grab media spotlight

Updated 43 min 29 sec ago

US impeachment hearings grab media spotlight

  • Televised hearings into allegations about Trump’s dealings with Ukraine will begin in the US this week

WASHINGTON: This week will mark a new and unparalleled chapter in Donald Trump’s tumultuous presidency, as the Democratic-led impeachment probe goes public with televised hearings into allegations about Trump’s dealings with Ukraine.

Beginning on Wednesday, three witnesses will publicly detail their concerns, previously expressed behind closed doors, that the Trump administration sought to tie military aid to Ukraine to an investigation of the Republican president’s potential Democratic rival for the presidency, Joe Biden.

The testimony will be carried by major broadcast and cable networks and is expected to be viewed by millions, who will watch current and former officials from Trump’s own administration begin to outline a case for his potential removal from office.

It has been 20 years since Americans last witnessed impeachment proceedings, when Republicans brought charges against then-Democratic President Bill Clinton.

Democrats in the US House of Representatives argue Trump abused his authority in pressing the Ukrainian government to investigate Biden and his son Hunter, who was on the board of a Ukrainian energy company, Burisma.

Representative Eric Swalwell, a Democrat on the House Intelligence Committee, which will hold the hearings on Wednesday and Friday this week, accused Trump on Sunday of “extortion.”

“We have enough evidence from the depositions that we’ve done to warrant bringing this forward, evidence of an extortion scheme, using taxpayer dollars to ask a foreign government to investigate the president’s opponent,” Swalwell said on CBS’ “Face the Nation.”

Trump argued on Twitter that he was not guilty of misconduct and that the probe was politically driven. “NOTHING WAS DONE WRONG!” he wrote on Sunday.

Democrats consider the open hearings to be crucial to building public support for a formal impeachment vote against Trump. If that occurs, the Republican-controlled Senate would hold a trial on the charges. Republicans have so far shown little support for removing Trump from office, which would require a two-thirds vote in the Senate. The House Intelligence Committee will first hear from William Taylor, the top US diplomat in Ukraine, who told the committee in closed-door testimony that he was unhappy US aid to the country was held up by the administration.

Taylor said he also became uncomfortable with what he described as an “irregular channel” of people involved in Ukraine policy, including Rudy Giuliani, the president’s personal lawyer.

George Kent, a senior State Department official who oversees Ukraine, will appear at Wednesday’s hearing as well. Kent was also concerned about Giuliani’s role in conducting shadow diplomacy — and has testified that he was cut out of
the decision-making loop on Ukraine matters.

On Friday, the committee will hear from former US Ambassador to Ukraine Marie Yovanovitch. She says she was ousted from her post after Giuliani and his allies mounted a campaign against her with what she called “unfounded and false claims by people with clearly questionable motives.”

Democrats are likely to call further witnesses after this week.

House Republicans released their list on Saturday of witnesses they would like brought before the committee, including Hunter Biden and the yet-unnamed whistleblower who first brought the complaint against Trump over his July 25 call with Ukrainian President Volodymyr Zelenskiy.

Intelligence Committee Chairman Adam Schiff, a Democrat, is unlikely to summon either to testify, and even some Republicans have opposed the push from Trump and some of his supporters that the whistleblower be identified.

“I think we should be protecting the identity of the whistleblower,” Will Hurd, a former CIA officer and a Republican member of the committee, said on the “Fox News Sunday” program, “because how we treat this whistleblower will impact whistleblowers in the future.”