Open-source and artificial intelligence (AI) developers and leaders agree that open-source AI is important. Despite the best efforts of the Open Source Initiative (OSI) to create an open-source AI definition (OSAID), there is still much disagreement on what should and shouldn’t be included in an OSAID. Springing from this disagreement, the newly formed Open Source Alliance (OSA) has released its take on OSAID: the Open Weight Definition (OWD).
The OWD is a new framework that balances closed and open-source AI integrity. The framework is designed, its creators say, to address the complexities and challenges posed by the rapid development of AI technology. It aims to provide a clear standard for what constitutes “open source” in AI models, particularly large language models (LLMs).
Also: DeepSeek’s new open-source AI model can outperform o1 for a fraction of the cost
Weights are fundamental components in AI. Based on the raw data, weights are the numerical values associated with the connections between nodes across different layers of an AI program. These values are determined during the machine learning training process. Specifically, the OWD includes:
- Model Weights Accessibility: The definition emphasizes making model weights available to developers and researchers.
- Dataset Information: While not requiring full access to training data, the definition stresses the need for detailed information about dataset contents and collection methods.
- Architecture Transparency: The framework encourages disclosure of model architecture information to facilitate improvements and modifications
Amanda Brock, OpenUK‘s CEO, said she supports the OWD: “The Alliance is being driven to broaden the engagement across multiple organizations currently competing to ensure better global collaboration. This first step of sharing an approach to defining open weights is in line with the disaggregation of AI and defining the level of openness of the disaggregated but critical component, whether that be data, weight, or model. … It certainly seems to be more practical and workable than a small group creating a definition that isn’t fit for purpose.”
This final comment was in reference to the OSI’s OSAID, which Brock has opposed. Indeed, the OSA has seized upon the open-source AI issue to attempt to replace the OSI. In January, its founder Sam Johnston, said in a press release: “Data has tested the boundaries of the Open Source Definition (OSD), which is proven on openness but lacking on completeness beyond source code components.” By adding OWD to the OSD, Johnston wants to create an Open Source 2.0.
Also: OpenAI’s o1 lies more than any major AI model. Why that matters
Brock added that despite the publication of the OSAID definition last October, “the OSI is ‘at the start of the journey’ with the definition. In my mind, this shows that the approach of trying to define ‘open source AI’ is wrong. Rather we should follow this disaggregated approach to the challenge and look at the underlying ‘technology,’ including the training data, and what it means to be open. Open source doesn’t define law, and it should not. It’s about what enables anyone to use the technology’s ‘source’, including data for any purpose.”
Brock concluded: “The reality and accuracy of this must be understood in assessing risk and liability. So for today, The Alliance announcement of a definition of open weight is a welcome one.”
In response to the OWD announcement, Stefano Maffulli, the OSI’s executive director, said: “Communities build standards and definitions. The Linux Foundation community already has a definition of open weights in the Model Openness Framework.”
The Linux Foundation isn’t the only party that’s addressed open-weight standardization. Prominent open-source lawyer Heather Meeker also addressed them. Meeker wrote: “In the realm of AI, there’s a fundamental misunderstanding that needs to be addressed — the assumption that the principles of open source software licensing can directly apply to Neural Net Weights (NNWs). The misconception stems from conflating two different artifacts — software source code and NNWs.”
Also: I spent hours testing ChatGPT Tasks – and its refusal to follow directions was mildly terrifying
She continued: “NNWs are different. They represent the ‘knowledge’ an artificial neural network has learned and are often stored as large matrices of numbers. Unlike source code, NNWs are not human-readable or debuggable. […] Open source’s foundational freedoms — to run, study, redistribute, and modify software — do not translate easily to NNWs. While you can run and distribute NNWs, studying and modifying them is non-trivial, or functionally impossible.”
You can share NWWs under an open-source-style license, Meeker’s proposed Open Weights Permissive License. But, as she noted: “This definition focuses instead on the original idea of openness, and preserving the original goals of Freedom Zero of free software and open source.”
Mafflli said: “TheOSI is watching to see what AI practitioners actually do. Like the LF’s work, OSI’s definitions are developed by and with the community. This was the case with the original Open Source Definition that was developed on top of 20+ years of free software communities building and releasing software. It’s what we’ve done with AI: the community has led the process to define Open Source AI.”
In an interview, Meeker added: “I hope the various definitional efforts (the OSI’s Open Source AI definition, The Open Weights Definition I first published in 2022, and this new definition) can converge. Unfortunately, though, it seems likely none of these definitions will become a de facto standard like the Open Source definition — they have all been eclipsed by disparate regulatory frameworks and privacy regulations and vendors who are setting practices in a highly concentrated market.”
What this debate boils down to is we’re still debating what precisely open-source AI looks like. True, open-source leaders can agree that simply saying an AI program or data is open-source doesn’t mean that it is, which is what Meta did with Llama. But we’re still nowhere close to finding unity in an open-source AI definition.