What Does 'open Source AI' Mean, Anyway?

Would You like a feature Interview?

All Interviews are 100% FREE of Charge

The battle between open source and proprietary software is well known, but the tensions that have permeated the software industry for decades have also found their way into the burgeoning field of artificial intelligence, where they’ve sparked fierce debate.

The New York Times recently It gave a glowing review Meta CEO Mark Zuckerberg says his push for “open source AI” has made him popular again in Silicon Valley. But the problem is that Meta’s Llama-branded large-scale language models aren’t actually open source.

Or is it?

By most estimates, no. But it highlights that the concept of “open source AI” is likely to become even more controversial in the future. This is the Open Source Initiative (OSI) under the leadership of the Secretary-General, Stefano Maffuli (pictured above) has been tackling this issue for over two years through a global effort that includes conferences, workshops, panels, webinars and reports.

AI is not software code

GettyImages 1478421250 6c192b e1718874554384 — **Image credits:** Westend61 via Getty

OSI is Definition of Open Source For over 25 years, the Open Source Disclosure (OSD) has defined how the term “Open Source” applies, or should apply, to software. A license that meets this definition is: Scope of License They range from very tolerant to not so tolerant.

But transferring traditional software licensing and naming conventions to AI is problematic. Joseph JacksOpen source evangelist and VC firm founder OSS Capitalteeth,”There is no such thing as open source AI“Open source was invented for software source code,” he points out.

In contrast, “neural network weights” (NNWs) is a term used in the artificial intelligence world to describe the parameters or coefficients that a network learns during the training process, but cannot be meaningfully compared to software.

“Neural net weights are not software source code; they cannot be read or debugged by humans,” Jacks points out. “Furthermore, the fundamental rights of open source do not apply in quite the same way to NNWs.”

This led Jax and his colleagues at OSS Capital to Heather Meeker To Come up with your own definitioncentered around the concept of “open weight.”

So before we arrive at a meaningful definition of “open source AI,” we can see that trying to get there will create some inherent tensions: how can we agree on a definition if we can’t agree that the “thing” we’re defining exists?

Mafouli agrees.

“You’re right,” he told TechCrunch. “One of the early discussions was whether we should even call this open source AI, but everyone was already using that term.”

This reflects part of the challenge in the broader field of AI, where what we call “AI” today is Is it really an AI? Or maybe it’ll just be powerful systems trained to find patterns in reams of data. But most opponents have accepted the fact that the “AI” label already exists, and see no point in fighting it.

Llama illustration — **Image credits:** Larisa Amosova (via Getty)

Founded in 1998, OSI is a non-profit public benefit corporation that focuses on advocacy, education, and a wide range of open source related activities with the Open Source Definition at its core. Today, the organization relies on sponsors for funding and includes such notable members as Amazon, Google, Microsoft, Cisco, Intel, Salesforce, and Meta.

Meta’s involvement with OSI is of particular interest currently in relation to the concept of “open source AI.” Even though Meta is not working on AI, Open Source PegHowever, the company does place notable restrictions on how the Llama model can be used: It’s free to use for research and commercial purposes, of course, but app developers with more than 700 million monthly users must apply for a special license from Meta, which is granted entirely at Meta’s discretion.

Simply put, Meta’s Big Tech allies can blow the whistle if they want to get involved.

Meta’s terminology for LLMs is somewhat flexible. The company defines LLMs as Llama 2 model open sourceThe arrival of Rama 3 in April saw the term take a slight backslide. Use phrases “Openly available” or “openly accessible”, etc. However, in some areas Still mentioning Define the model as “open source”.

“Everyone else in this discussion is in complete agreement that Llama itself cannot be considered open source,” Maffulli said. “People who have spoken to people who work at Meta understand that’s a bit of a stretch.”

On top of that, one might argue there is a conflict of interest here: are the companies that have demonstrated a desire to piggyback on the open source brand also funding the maintainers of the “definition”?

This is because OSI has been trying to diversify its funding and has recently Sloan Foundationis funding the company’s multi-stakeholder, global effort to define open source AI. TechCrunch revealed the grant is worth about $250,000, and Mahluri hopes it will change the company’s perspective on its reliance on corporate funding.

“One of the things the Sloan grant makes even clearer is that we can say goodbye to Meta’s funding at any time,” Mahri said. “We can do that even before the Sloan grant is paid out, because we know that we’re going to be getting donations from other people, and Meta knows that very well. They’re not going to interfere with this at all.” [process]Microsoft, GitHub, Amazon, and Google all fully understand that their organizational structures mean they cannot interfere.”

A working definition of open source AI

Conceptual diagram depicting finding a definition — **Image credits:** Alexei Morozov/Getty Images

The current draft Open Source AI Definition states: Version 0.0.8The Open Source AI Definition consists of three core parts: a “Preamble” that sets out the document’s mandate, the Open Source AI Definition itself, and a checklist of necessary components for an open source compliant AI system.

According to the current draft, open source AI systems must grant the freedom to use the system for any purpose without asking permission, the freedom for others to study how the system works and inspect its components, and the freedom to modify and share the system for any purpose.

But one of the biggest challenges was around data: whether an AI system can be classified as “open source” if a company doesn’t make its training datasets available to others. Mahruli says it’s more important to know where the data came from and how the developers labeled, deduplicated, and filtered it. It’s also important to have access to the code that was used to assemble the datasets from various sources.

“Knowing that information is much better than just having a data set without the rest of the information,” Mafoury said.

While it would be nice to have access to the full dataset (OSI lists this as an “optional” component), Maffulli says that in many cases, this is not possible or practical. This may be because the dataset contains confidential or copyrighted information that developers are not allowed to redistribute. Additionally, there are techniques to train machine learning models in such a way that the data itself is not actually shared with the system, using techniques such as federated learning, differential privacy, and homomorphic encryption.

And this perfectly highlights the fundamental difference between “open source software” and “open source AI”: they may be similar in intent, but they are not comparable on an equal footing, and it is this difference that the OSI tries to capture in its definition.

In software, source code and binary code are two views of the same artifact: they reflect the same program in different forms. However, a training dataset and the subsequent trained model are different things. Using the same dataset does not necessarily allow you to consistently recreate the same model.

“There’s a lot of statistical and random logic that happens during training, so it can’t be replicated in the same way as software,” Makhouli added.

Therefore, an open source AI system must be easy to replicate with clear instructions. And this is where the checklist aspect of the open source AI definition comes in handy. It states: Recently published academic papers It’s called “Model Openness Framework: Promoting Integrity and Openness for Reproducibility, Transparency, and Usability in Artificial Intelligence.”

The paper proposes the Model Openness Framework (MOF), a classification system that evaluates machine learning models “based on their completeness and openness.” MOF requires that certain components of AI model development, such as details about training methods and model parameters, be “included and released under an appropriate open license.”

Steady state

Stefano Maffouli speaks at the Digital Public Goods Alliance (DPGA) Member Summit in Addis Ababa — Stefano Maffouli giving a presentation at the Digital Public Goods Alliance (DPGA) Member Summit in Addis Ababa.

OSI calls its official releases of definitions “stable versions,” much like a company might release an application that has been thoroughly tested and debugged before prime time. OSI deliberately avoids calling them “final releases” because parts of the definitions are likely to evolve.

“We can’t expect this definition to last 26 years like the Open Source Definition,” Makhlouri says. “I don’t think the first part of the definition, like ‘what is an AI system,’ will change much. But the part that we refer to in the checklist — the list of components — will depend on the technology. Who knows what the technology will be tomorrow?”

A stable, open source AI definition will be approved by the Council. All Things Open Conference At the end of October, OSI will embark on a global roadshow across five continents to seek more “diverse opinions” on how “open source AI” should be defined going forward, but any final changes are likely to consist of only “small tweaks” here and there.

“This is the final stage,” says Makhlouri, “we’ve got the full functionality of the definition. We have all the pieces we need. We have the checklist, so we’re making sure there are no surprises, that there are systems that we should include or exclude.”