Training an Arabic LLM that reflects local values

Training an Arabic LLM that reflects local values

Training an Arabic LLM that reflects local values
The Arab world did not play a key role in the PC, internet and mobile eras. In the AI era, it will be different. (Shutterstock)
Short Url

Advances in the large language models that underpin generative AI are changing everything, from medicine and education to entertainment.

Our relationship with technology is becoming more intimate as machines change from passive tools into active assistants that amplify our innate human abilities.

This new era poses both a challenge and an opportunity for the Middle East.

The challenge is that leaders in this new field, like OpenAI’s ChatGPT and Google’s Gemini, come from Silicon Valley, or from China, where my team at 01.AI has built models that rival the Americans. In Europe, too, startups such as France’s Mistral have entered the race.

The opportunity is for the Middle East to join this league and make sure its voice is heard.

Inspired by my latest trip to Riyadh, I decided to test how the current crop of AI models would handle a simple request. I imagined myself as a young Saudi getting ready to host a dinner party and asked ChatGPT to prepare a menu.

The food it recommended sounded delicious — stuffed grape leaves, tabouleh salad, mandi and stuffed dates. But the beverages were a problem.

Aside from drinks such as mint lemonade and jallab, a mixture of dates, grape molasses and rose water, ChatGPT also offered this: “For alcoholic beverages, you could offer a selection of international wines, beers, or non-alcoholic mocktails.”

To its credit, when I repeated the question, it offered only non-alcoholic drinks.

If a model recommends breaking both the law and cultural norms, imagine how it might answer other more sensitive questions about politics or religion? Indeed, researchers have even shown that some models have exhibited an anti-Muslim bias.

My modest test underlines the urgent need to develop an Arabic large language model that reflects local values.

The first step to building this is creating enough high-quality Arabic digitized data to properly train a new generation of models.

Although there are 400 million Arabic speakers, only an estimated 2 percent of online content is in Arabic. Meta’s open source LLM model Llama is overwhelmingly trained on English data, with Arabic comprising less than 0.1 percent of the data.

The lack of data naturally skews the results. To fix this dearth of data, either a visionary entrepreneur or a government-backed organization should collect, digitize and convert the many Arabic books into training data for Arabic models.

Once the data is gathered, it can be fed into the breakthrough pre-training process, which reads trillions of words and creates its own virtual concept space or model of the world. This concept space has been shown to be mostly in English and Chinese.

Adding a sizable number of texts in Arabic, which has enormous cultural output and significance, will make the concept space more knowledgeable about Arabic and more balanced in its concepts and views.

After such pre-training, the model needs to be fine-tuned by data and labels from the Arab world, which will align with the values of the region. Those are different from American models, which are aligned to US values, and Chinese models, which reflect Chinese values.

The collection of alignment data, the coordination of human labeling and the alignment process will need to be done in-region by AI experts.

A new Arabic-enhanced large language model could encourage entrepreneurs and developers to build new applications tailored to the needs of their nations.

Kai-fu Lee

Finally, safety modules will need to be added to ensure legal compliance and to avoid harm. These will also need to be developed locally.

The above steps will create localized, sovereign models that will reflect the traditions of the Middle East. Privately developed or government-backed, it could be the foundation for a new wave of Arabic AI innovation.

A new Arabic-enhanced large language model could encourage entrepreneurs and developers to build new applications tailored to the needs of their nations.

Imagine an AI tool that could find, summarize, organize and write insightful content, an AI teacher that makes learning fun and customized, an AI doctor that is more knowledgeable than any human, an AI engineer that can write software and applications, and an AI assistant that knows its owner better than the owner themselves.

The Arab world did not play a leading role in the PC, internet and mobile eras. In the AI era, it will be different.

This transformation is by no means an easy feat. It will require an unprecedented investment of money, energy and human capital.

Middle Eastern leaders like Saudi Crown Prince Mohammed bin Salman and others have shown that they have the vision, determination and resources to lead their countries into the future.

Standing on my hotel balcony in Jeddah recently, overlooking the King Abdullah University of Science and Technology, I saw part of that vision coming to fruition.

Universities such as KAUST and the Mohamed bin Zayed University of Artificial Intelligence in the UAE are striking examples of the resources that have already been poured into this transformation.

These world-class academic institutions can attract and retain the best top tier global talent.  It is especially important to bring in the world’s best computer engineers to help fulfill this vision of the future AI.

Our team at 01.AI has shown what a group of talented and motivated computer scientists can achieve in just one year. With the right commitment of resources and drawing upon the best talent, countries like Saudi Arabia can easily catch up with their global peers.

The Middle East can also lead the world in the use of renewables to run power-hungry generative AI models.

As it seeks to diversify its economy, Saudi Arabia is actively promoting the use of alternative energy sources such as solar, which could power server farms and reduce their carbon footprint — a growing concern as AI becomes more widespread.

It may take time for countries to figure out their strategy for building a sovereign AI. But it is critical for the Arab world to quickly catalyze the creation of culturally appropriate LLMs and build a rich ecosystem to allow AI-powered Arabic apps to blossom.

A recent encounter with a female sales assistant at a computer store in Riyadh served as an apt reminder of what is at stake. Dressed in jeans and sporting a tattoo, she was a reminder of the transformative changes that the country is undergoing.

Where are you from, I asked. “I’m Saudi,” she said. “One day I want to be Saudi Arabia’s Elon Musk.” I hope on my next visit she will pitch me a homegrown AI app.

Kai-Fu Lee is a computer scientist, CEO of 01.AI, chairman of Sinovation Ventures, former president of Google China, and author of “AI 2041” and “AI Superpowers”
 

Disclaimer: Views expressed by writers in this section are their own and do not necessarily reflect Arab News' point of view

Ukraine, US teams ready to meet in Saudi Arabia in ‘coming days’: Zelensky

Ukraine, US teams ready to meet in Saudi Arabia in ‘coming days’: Zelensky
Updated 32 sec ago
Follow

Ukraine, US teams ready to meet in Saudi Arabia in ‘coming days’: Zelensky

Ukraine, US teams ready to meet in Saudi Arabia in ‘coming days’: Zelensky
Kyiv, Ukraine: Officials from Ukraine and the United States could meet in Saudi Arabia in the coming days for a second round of peace talks, Ukrainian President Volodymyr Zelensky said Wednesday.
“Ukrainian and American teams are ready to meet in Saudi Arabia in the coming days to continue coordinating steps toward peace,” Zelensky wrote on X.

PM Sharif hails investor confidence as Pakistan Stock Exchange crosses 119,000 points

PM Sharif hails investor confidence as Pakistan Stock Exchange crosses 119,000 points
Updated 41 min 39 sec ago
Follow

PM Sharif hails investor confidence as Pakistan Stock Exchange crosses 119,000 points

PM Sharif hails investor confidence as Pakistan Stock Exchange crosses 119,000 points
  • KSE-100 index soared by over 1,400 points after trading commenced, hitting an all-time high of 119,421.81
  • Shehbaz Sharif says his government is trying to ensure conducive business and investment environment

ISLAMABAD: Prime Minister Shehbaz Sharif on Thursday lauded the Pakistan Stock Exchange (PSX) for surpassing the significant 119,000-point threshold for the first time, saying the market’s bullish opening session mirrored growing investor confidence.​
The benchmark KSE-100 index soared by over 1,400 points shortly after trading commenced, reaching an all-time high of 119,421.81 points.
The market rally coincides with Sharif’s four-day visit to Saudi Arabia, where he met separately with Crown Prince Mohammed bin Salman and Investment Minister Khalid Al-Falih to discuss strengthening economic cooperation.​
“The positive trajectory in the Pakistan Stock Exchange signifies the increasing trust of traders and investors in the government’s economic policies,” the prime minister said in a statement issued by his office from Jeddah.
“The government is providing all necessary facilities on a priority basis to ensure a conducive environment for business and investment in the country,” he added.
Earlier this month, an International Monetary Fund (IMF) team concluded its review of Pakistan’s economic reforms under a $7 billion loan program.
The international lender described Pakistan’s progress as “strong,” though its mission departed without finalizing the staff-level agreement.
The IMF’s positive assessment, nevertheless, led to bullish sentiment in the market, despite recent upticks in militant violence.​
The ongoing PSX session saw the KSE-100 index dip below its earlier high, trading at 118,526.63 points at the time of filing this report.


One person dies as migrants aim to cross English Channel

One person dies as migrants aim to cross English Channel
Updated 20 March 2025
Follow

One person dies as migrants aim to cross English Channel

One person dies as migrants aim to cross English Channel
  • Both the British and French governments have made tackling migrants crossing the English Channel illegally a high priority

PARIS: One person has died after a boat carrying migrants trying to cross the English Channel from France got into difficulties overnight, said a local French authority on Thursday.
The French local authority responsible for the North Sea and English Channel regions said 15 people had been rescued and brought back to shore at the port of Gravelines, near Dunkirk.
Both the British and French governments have made tackling migrants crossing the English Channel illegally – often in perilous conditions as they travel in dinghies or small boats – a high priority.
Data in January showed Britain’s Labour government had removed 16,400 illegal migrants since coming to power last July, marking the highest rate of such removals since 2018, although Labour’s political opponents say the government needs to do more.


At least 70 Palestinians killed in Israeli strikes across Gaza, health authorities say

At least 70 Palestinians killed in Israeli strikes across Gaza, health authorities say
Updated 59 min 6 sec ago
Follow

At least 70 Palestinians killed in Israeli strikes across Gaza, health authorities say

At least 70 Palestinians killed in Israeli strikes across Gaza, health authorities say
  • Medics say Israeli strikes targeted several houses in northern and southern areas of the Gaza Strip
  • Since Tuesday, airstrikes have killed 510 Palestinians, with more than half of them women and children

GAZA/CAIRO: At least 70 Palestinians were killed and dozens wounded in Israeli airstrikes across Gaza on Thursday, after Israel resumed its bombing campaign on the enclave, a Gaza health official said.

Medics said Israeli strikes targeted several houses in northern and southern areas of the Gaza Strip. There was no immediate comment from Israel.

On Wednesday, the Israeli military said its forces had resumed ground operations in central and southern Gaza, after a ceasefire that had broadly held since January collapsed.

The renewed ground operations came a day after more than 400 Palestinians were killed in airstrikes in one of the deadliest episodes since the beginning of the conflict in October 2023.

Since Tuesday, airstrikes have killed 510 Palestinians, with more than half of them women and children, the health official said.

The Israeli military said its operations extended Israel’s control over the Netzarim Corridor, which bisects Gaza, and were a “focused” maneuver aimed at creating a partial buffer zone between the north and the south of the enclave.

The Palestinian militant group Hamas said the ground operation and the incursion into the Netzarim Corridor were a “new and dangerous violation” of the two-month-old ceasefire agreement. In a statement, the group reaffirmed its commitment to the deal and called on mediators to “assume their responsibilities.”

Palestinian mourners pray over the bodies of victims of overnight Israeli airstrikes on the Gaza Strip at Al-Ahli Arab hospital, also known as the Baptist hospital, in Gaza City ahead of their burial on March 18, 2025. (AFP)

Speaking to Reuters on Thursday, a Hamas official said mediators had stepped up their efforts with the two warring sides but added that “no breakthrough has yet been made.”

The group has made no clear threats to retaliate.

The war started after Hamas militants attacked Israeli communities on October 7, 2023, killing 1,200 and taking more than 250 hostages, by Israeli tallies.

Activists gather on Wall Street in front of a property owned by President Donald Trump following renewed attacks on Gaza by Israel on March 19, 2025 in New York City. (Getty Images via AFP)

More than 49,000 Palestinians have been killed in the ensuing conflict, according to Gaza’s health authorities, with the enclave reduced to rubble.


Sudan army close to taking control of Presidential Palace from RSF, state TV says

Sudan army close to taking control of Presidential Palace from RSF, state TV says
Updated 5 min 12 sec ago
Follow

Sudan army close to taking control of Presidential Palace from RSF, state TV says

Sudan army close to taking control of Presidential Palace from RSF, state TV says
  • Marks a significant shift in the two-year-old conflict that threatens to fracture the country
  • The war has led to what the UN calls the world’s largest humanitarian crisis

Sudan’s army is close to taking control of the Presidential Palace in Khartoum from the paramilitary Rapid Support Forces, state TV reported on Thursday, in a significant milestone in a two-year-old conflict that threatens to fracture the country.

The RSF quickly took the palace and most of the capital at the outbreak of war in April 2023, but the Sudanese Armed Forces have in recent months staged a comeback and inched toward the palace along the River Nile.

The RSF, which earlier this year began establishing a parallel government, maintains control of parts of Khartoum and neighboring Omdurman, as well as western Sudan, where it is fighting to take control of the army’s last stronghold in Darfur, Al-Fashir.

The taking of the capital could hasten the army’s full takeover of central Sudan, and harden the east-west territorial division of the country between the two forces.

Both sides have vowed to continue fighting for the remainder of the country, and no efforts at peace talks have materialized.

The war erupted amid a power struggle between Sudan’s army and the RSF ahead of a planned transition to civilian rule.

World’s largest humanitarian crisis

The conflict has led to what the UN calls the world’s largest humanitarian crisis, causing famine in several locations and disease across the country. Both sides have been accused of war crimes, while the RSF has also been charged with genocide. Both forces deny the charges.

The fight for the Presidential Palace has raged over the past several weeks, with the RSF fighting fiercely to maintain control, including via snipers placed around surrounding downtown buildings. Its leader, Mohamed Hamdan Dagalo, instructed troops earlier this week not to give up the palace.

Late on Wednesday into Thursday morning, explosions could be heard from airstrikes and drone attacks by the army targeting central Khartoum, witnesses and military sources told Reuters. The army has long maintained the advantage of air power over the RSF, though the paramilitary group has shown evidence of increased drone capabilities recently.

On the Telegram messenger app, the RSF said its forces were making advances toward the Army General Command, also in central Khartoum, and eyewitnesses said the force was attacking from southern Khartoum.

The army’s advance in central Sudan since late last year has been welcomed by many people, who had been displaced by the RSF, which has been accused of widespread looting and arbitrary killings, and of occupying homes and neighborhoods.

The RSF denies the charges and says individual perpetrators will be brought to justice.

Hundreds of thousands of people have returned to their homes in Central Sudan, though late on Wednesday activists in Omdurman warned that some soldiers have engaged in robbery. The military has routinely denied such allegations.