Inception, a subsidiary of Abu Dhabi-based G42 firm specializing in machine intelligence, the California-based Cerebras research company, and the Mohammed bin Zayed University of Artificial Intelligence (MBZUAI) have developed a large language model called Jais.
The new configuration of digital intelligence was named after the highest peak of the UAE. This language model has 13 billion parameters, its training on a supercomputer, which is a joint development of G42 and Cerebras, lasted three weeks. The database used to train the new configuration of machine intelligence includes 116 billion Arabic lexemes and 279 billion English lexemes. According to the creators of the bilingual AI model, this configuration of artificial intelligence fully understands the complexity of the Arabic language and its nuances.
Andrew Jackson, CEO of Inception, made a very bold and more than ambitious statement that the large language model developed with the participation of his company’s specialists sets the standard for the development of machine intelligence in the Middle East with its appearance and is a guarantee that the Arabic language with all its depth and historical heritage will occupy its zone of presence in the space of digital technologies.
MBZUAI President Eric Xing said that the development of such a high-class LLM required advanced research in the area of AI in addition to a detailed understanding of the Arabic language in all its diversity and taking into account all its inherent specifics.
The UAE Ministry of Industry and Advanced Technology, the Abu Dhabi Ministry of Health, the UAE Ministry of Foreign Affairs, Abu Dhabi First Bank, Abu Dhabi National Oil Company, Etihad Airways, and E& Technology Group have become Jai’s partners.
The new artificial intelligence model is designed to be used by more than 400 million Arabic speakers around the world. At the same time, this development is not the first LLM platform launched in the Middle East. In the UAE, the State Institute of Technological Innovation, located in Masdar City, has already created a separate open-source language model called Falcon. In Jais, the accuracy of the Arabic language is higher. This is due to the fact that the software developed by the Institute of Technological Innovations was not previously trained in Arabic. At the same time, Jais was originally created as part of achieving the goal of having a base that is not focused on the United States. As a result of this decision, the new AI model is more immersed in understanding the subtleties of the socio-cultural context of the Middle East and is better oriented in the worldview system common in this region. Jais is able to generate content in modern Arabic in its official version and in various dialects of this language.
The developers also claim that the new LLM can fully compete with English AI configurations of a similar size, despite the fact that fewer English tokens are used in its case.
The team of Jai’s creators stated that the English component of the model formed a certain part of the paradigm of understanding the surrounding information space based on Arabic data and vice versa.
Many advanced LLMS, including OpenAI’s GPT-4, Google’s PaLM, and Meta’s open-source LLaMA, also understand Arabic and are capable of generating content in it. With the commercialization of generative artificial intelligence on a global scale, the level of integration of technology into various cultural spaces is increasing, which is also a competitive advantage in the AI market.
Currently, work is already underway to create a multilingual LLM for global telecommunications companies. At the same time, according to experts, as part of these efforts, developers face the problem of a lack of high-precision data on languages other than English in the Internet space.
The Jais training was conducted using information materials published by the Arab media, and based on content posted on social media platforms. Also, in this case, Arabic commands with code sequences controlled by English were used.
Analysts assume that as the number of countries developing their own LLMS increases, new differences in the generating capabilities of these AI models will appear. For example, in China, where the rules for regulating processes in the artificial intelligence industry came into force in August, a ban has already been imposed on advanced technology systems that generate information materials that, according to the authorities, contain narratives that are a threat to state stability.
The UAE has imposed similar restrictions on Jais. This model will not create informational materials, the basis of the semantic construction of which in one way or another enters into ideological conflict with the cultural and religious aspects of the social philosophy of the Middle East. Jais will also not generate content that does not contain ideological elements of the value systems followed by participants in the process of creating this LLM.
The participation of the UAE National Security Adviser Sheikh Tahnoon bin Zayed al-Nahyan in the development of the mentioned model has caused concern that the technology could become a tool for the implementation of certain political intentions of Middle Eastern autocrats. This assumption about the potential use cases of Jais led to the fact that the United States restricted the export of NVIDIA artificial intelligence chips, banning the supply of these products to several countries in the Middle East, without specifying which states the decision of the American authorities concerns.
As we have reported earlier, OpenAI Launches ChatGPT for Businesses.