Language Input for Data & Society (LIDS): Licensing Data for Collectives

9/4/2025

Language Input for Data & Society (LIDS): Licensing Data for Collectives

Abstract

This paper explores the challenges of adapting intellectual property frameworks for the era of large language models (LLMs), with a focus on data sovereignty, community rights, and equitable benefit distribution. We examine the limitations of current approaches, such as fair use doctrine and Creative Commons licenses, which often fail to capture the nuanced, relational social norms that govern knowledge and data within communities, particularly those with low-resource languages or indigenous backgrounds. We propose a generative framework, Language Model Input for Data & Society (LIDS), a direct response to these challenges. Designed with principles from evolutionary economics, indigenous data sovereignty, and participatory design, LIDS is intended to be compatible with more community-centric licensing architectures. Our goal is to suggest a potential direction for developing agreements that might better operationalize complex social values and incentives. The legal-technical infrastructures of LIDS recognize licensing of data as a collective and personal structure, while acknowledging its significant technical and social foundations and challenges.

Keywords: Artificial Intelligence, License, Data, Dignity, Community, Governance, Maori