‘Open source’ AI isn’t truly open — here’s how researchers can reclaim the term


Some 50 years ago this month, the Homebrew Computer Club — a do-it-yourself group of computer enthusiasts and hobbyists — began meeting in Menlo Park, California, fostering a culture of collaboration, knowledge exchange and the open sharing of software. These values, which helped to shape the open-source movement, are now being subverted by some artificial intelligence (AI) companies.

Many foundational AI models are labelled as ‘open source’ because their architecture, including the neural networks’ structure and design, is made freely available. Yet, little information is disclosed about how the models were trained. As the executive director of the Open Source Initiative (OSI) based in Palo Alto, California, my priority since 2022 has been clarifying what the term actually means in the AI era.

Decades of free access to non-proprietary software — such as R Studio for statistical computing and OpenFOAM for fluid dynamics — has hastened scientific discovery. Open-source software protects research integrity by ensuring reproducibility. It also fosters global collaboration, allowing scientists to freely share data and solutions.

Conventional open-source licences are built around source code, which is easy to share with full transparency, but AI systems are different. They rely heavily on training data, often from proprietary sources or that are protected by privacy laws, such as health-care information.

As AI drives discoveries in fields ranging from genomics to climate modelling, the lack of a robust consensus on what is and isn’t open-source AI is worrying. In the future, the scientific community could find its access limited to closed corporate systems and unverifiable models.

For AI systems to align with typical open-source software, they must uphold the freedom to use, study, modify and share their underlying models. Although many AI models that use the ‘open source’ tag are free to use and share, the inability to access the training data and source code severely restricts deeper study and modification . For example, an analysis by OSI found that several popular large language models, such as Llama2 and Llama 3.x (developed by Meta), Grok (X), Phi-2 (Microsoft) and Mixtral (Mistral AI), are incompatible with open-source principles. By contrast, models such as OLMo, developed by the Allen Institute for AI, a non-profit organization in Seattle, Washington, and community-led projects such as LLM360’s CrystalCoder — a language model tailored to perform both programming and natural-language tasks — better uphold OSI’s vision of open source.

The main reason why some companies might be misusing the open-source label is to sidestep proposed regulations under the European Union’s 2024 AI Act, which exempts free and open software from strict scrutiny. This practice — companies claiming openness while restricting access to key components such as information about the training data — is called openwashing.



Source link

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0SubscribersSubscribe

Latest Articles