Data Governance

Data isn't oil, whatever tech commentators tell you: it's people's lives

June 4, 2021

Read more at The Guardian

To address barriers that historically have impeded ethical and responsible research (ER2) practices, research institutions need to foster a culture of integrity and trustworthiness.  Scientific discovery hinges on data analytics, but data systems are rife with biases and encumbrances that inhibit the ethical conduct of science.  This study applies an innovative Indigenous data governance framework to review institutional norms and practices that promote or inhibit ethical design, outcomes, and approaches across the STEM research landscape.  Indigenous data sovereignty draws on the United Nations Declaration on the Rights of Indigenous Peoples (UNDRIP) that reaffirms the rights of Indigenous Peoples to control data about their peoples, lands, and resources.  The CARE Principles for Indigenous Data Governance enhance and extend the FAIR Principles for data interoperability and reuse by centering equity and ethics as core guiding principles alongside those set out by FAIR.  These concepts form a basis for normative standards for collective data rights that impact research agendas for data privacy, future use, reuse, and data stewardship.

Our research goals are to advance university policies that improve research and data cultures through a comparative analysis of Indigenous rights literature and policies within the academic research setting.  This project is sponsored by the Ethical and Responsible Research (NSF/ER2) program.

The complex, fast-moving nature of technological development is a persistent challenge to the deliberative process of lawmaking.  In the absence of regulations and coordinated guidance, the private sector has driven the norms and practices that societies adopt in response to new data tools and platforms.  This project works with stakeholders in industry, government, civil society, and academia to build a regulatory framework for policymakers that establishes the productive boundaries that support innovation, protect individuals, and grow the economy.  This forum is a partnership with the University of Auckland and INGSA.

The urgent need to coordinate up-to-date taxonomies on hazards and health presents a challenge to researchers across risk-related disciplines; meanwhile, long outdated vocabularies that describe bodies and gender, ethnic and Indigenous people, and myriad underserved communities continue to influence the shape and direction of research, the built environment, public services, and future technologies.  Moving toward FAIR interoperability standards will require the timely harmonization of vocabularies, both to reflect advances in scientific knowledge and to support the sociotechnical goals of that knowledge.  Effective guidelines will need to account for cultural considerations, technical issues, and workflow requirements to ensure timely updates to vocabularies when a term is adopted within a community of practice, as well as meeting challenges across disciplines, silos, and sectors. 

CEDS will lead a workshop to produce standards and implementation guidelines for the periodic review and update of taxonomies and terminologies in support of the CODATA Decadal Programme.

This recommended practice details the rules by which the provenance of Indigenous Peoples’ data should be described and recorded. This recommended practice outlines the core parameters for providing and digitally embedding provenance information for Indigenous Peoples’ data. The recommended practice establishes common descriptors and controlled vocabulary for provenance, including recommendations for metadata fields that can be used across industry sectors, including machine learning (ML) and artificial intelligence (AI) contexts, biodiversity and genomic science innovation and other associated databases. This recommended practice supports proper and appropriate disclosure of originating data information. The recommended practice supports the long-term identification of Indigenous Peoples’ data for future use, connecting data to people and place, and when appropriate, supporting future benefit sharing options.  This activity is organized by IEEE-P2890.

Data in Society

Designing ethical artificial intelligence systems that can reduce or mitigate bias requires unpacking complex and tightly coupled sociotechnical problems, such as competing definitions and fluid conceptions of fairness between communities and across disciplines, as well as quantitative challenges to measuring social values and mores and translating those measurements into contextualized data assets for use in machine learning applications and other knowledge engineering (KE) outputs.  However, these complexities cannot be simplified by eliminating the human role in the engineered systems that are designed to serve society.  To the contrary, such challenges represent the need to produce knowledge about how KE processes can produce fair outcomes, and to advance understanding about the mechanisms that drive society’s response to AI technologies.  In unpacking these complexities, the TIDBIT team will test how various models of citizen engagement along an AI workflow might improve fairness outcomes.  We will debias our way to more fair ontologies by running the Fairlearn assessment tool against ontology-informed auto-labeled data to see how different groups are impacted by the biases inherent in the labeling, and then modify the ontologies and re-run some of the image training and/or ontology enabled search enhancement features against labels assessed with a high degree of unfairness.

The aim of Project TIDBIT is to create a socio-technical system with a data validation component that certifies compliance with bias mitigation policies and regulations.  The demand and need for ethical interventions in AI is well documented, and can be summarized by the recent NIST planning document, “While stakeholders in the development of this plan expressed broad agreement that societal and ethical considerations must factor into AI standards, it is not clear how that should be done and whether there is yet sufficient scientific and technical basis to develop those standards provisions.”  This study will require the identification of best practices and bias mitigation frameworks, leading to creation of standards for SITL AI systems.

Fairness is a socially constructed concept, and therefore dynamic over time and variable across cultures.  Nonetheless, there is general agreement at the community level about the conditions and characteristics that are required to produce a fair outcome.  At the society level, these “fairness metrics” are institutionalized into laws, norms, and observances.  For the purpose of this study, globally recognized, published standards and best practices to mitigate bias in AI will be compiled to elicit a framework that constitutes fairness across domains and cultures.   The TIDBIT team will work with its Advisory Board to survey and assess AI standards to construct a framework against which fairness-related bias can be measured (i.e., fairness metrics in the AI context).  The accuracy of these metrics will be confirmed and refined by citizen groups in the workflow, or TIDBIT Flow.  Tightly coupled to fairness is trustworthiness.  In AI systems, trustworthiness is generally considered to be a technical description of the protocols and functions across a digital ecosystem that are required for interoperability.  However, one of the most vital functions within the trustworthy framework is accountability—an exclusively human role within this complex digital ecosystem.  This project will test several citizen engagement models (CEMs) to determine which type(s) are best suited to identify and mitigate bias in design and data bias, validate community standards of fairness, guarantee accountability, and provide corrective outlets for developers and end-users, both.

The amount of COVID-19 research data and literature produced over the last six months far exceeds the capacity of the scientific community to sort it all out, much less process that information into peer-reviewed scientific knowledge and establish consensus on best practices.  While new tools, algorithms, and platforms have been introduced since March to help researchers find information, transformative approaches to collaboration, co-creation, and co-design have not kept pace.  This proposal describes a design and proof of concept for the innovative use of an established open-by-design research platform as an open-source collaboratory.  We will use the blockchain platform, ARTiFACTS, to document our interdisciplinary research project in real-time, while providing all data and research outputs in an open-source, findable format that allows the global scientific community to find, track, review, and even participate in our research.  This approach will demonstrate an innovation in collaborative research that can be scaled, and which will protect intellectual property and primacy claims by nature of the immutable record of blockchain transactions.  

The Artifactsofresearch, Inc. (ARTiFACTS) is a blockchain-based service that documents the products, data, code, papers and pre-prints, and other outputs that are underrepresented in published research. The platform was created by the co-founders of ORCiD and Web of Science in response to calls by Congress and the science community for better data-sharing tools and policies.  The CODATA Center of Excellence in Data for Society at the University of Arizona will partner with ARTiFACTS to support the activities of co-created theory production and the outputs of co-designed research efforts.  This project, the Fluid Commons to Self Assembling Data Trusts Collaboratory (or simply, the Collaboratory), is the result of a data science workshop held in July 2020.  

“Life after COVID” has become a cliché to shorthand how drastically our daily lives have changed in response to social distancing orders, and how to prepare for the long period to follow as scientists scramble to develop a vaccine and immunize the globe.  The path toward “herd immunity” will take years, and in the interim, billions of people who are able to work, learn, entertain, and socialize at a distance will rely on Video as a Service (VaaS) platforms, such as Zoom, WebEx, Facetime, and similar technologies.  Fortunately, many of these services are currently free to use by anyone who owns a device that runs on a compatible operating system.  However, these platforms are not truly available to everyone across the globe, and various gaps and restrictions will increase the disparities between wealthier citizens and those that will be hardest hit by the COVID-19 pandemic in countless other ways. 

This project will develop a decision tool to help determine the best remote conferencing platform that meets the needs, resources, and limitations of communities around the world that are or will become digitally excluded.  We will survey all public and private platforms across the globe to create a reference database and increase their findability and accessibility characteristics by adding back-end ontology through an automated PROV-O markup tool.  Finally, we will design a decision tool that walks users through steps that help them find the right software or service for their needs. 

This project develops a genomic data privacy protection framework that scales from the personal right of the individual to the sovereign rights of nation-states.

Data Diplomacy


The purpose of this project is to improve the practice of research through an evidence-based model for research collaboration.  The pilot will be run at a unique testbed--the newly formed School of Data Science (SDS) at the University of Virginia.  This transformative approach will provide standards and metrics to quantify the sociocultural criteria that result in effective team engagement, inclusive team building, efficient team communication, ethical and responsible research practices, and unexpected outcomes.  Our transdisciplinary pilot project explores two questions: 1) How can we scientifically assess collaborative research practices and apply those findings to a continuous improvement process? and 2) How does our framework improve the practice of research more generally?  To better understand the state of collaborative practices, including gaps and opportunities, we will employ a mixed-methods approach to 1) gather survey data from cross-functional research team members; 2) map the landscape of historic and current collaboratory practices through interview data and literature review; and 3) conduct reflexive assessments to establish baseline data for an evidence-based collaboratory model.  Our semi-structured interviews will be recorded and transcribed over Zoom and analyzed using descriptive analysis.  We will use MailChimp to conduct surveys and publish results.  We will derive base metrics from interviews, surveys, and literature, and methodically apply positive results to the framework, which will then again be reflexively tested and refined.  We will test the reflexive methodology on the inaugural cohort of the SDS faculty, staff, students, and professionals.  Our goal is to develop an evidence-based model for convergence studies & collaborative research that is robust enough to produce metrics across disciplines and valances.

Recently, data diplomacy has emerged as a distinct area of interest and opportunity, particularly with the rise of big data resources and analytics. This field emphasizes the special role of data in global scientific cooperation—both as a shared technical resource and as a driver of evidence-based policy options. As with science, data interacts with diplomacy in multiple ways. Data can be used as a tool to make diplomacy more efficient, effective, and inclusive by providing new information for decision-makers, allowing for more effective communication and engagement with the public, and providing new options for evaluation and accountability. It may also be used as a tool for diplomatic negotiations, such as those related to data privacy, data sharing, or the development of common definitions for data collection. Finally, cooperation on data-related issues, such as global data collection or sharing systems, has the potential to improve relations between countries. As in the case of science diplomacy, these should not be viewed as three separate endeavors, but instead as integrated and interdependent activities.

Disaster diplomacy is an emerging and powerful subfield of science diplomacy, with important connections to data diplomacy, as well. The scientific community largely agrees that the damage caused by disasters is worsened by the inability to bridge the gaps in evidentiary uses of data and communication in general among the different actors: natural and social scientists, engineers, emergency managers and first responders, local knowledge holders, and policymakers. Examples of disaster diplomacy on international, intranational, and local levels are plentiful throughout human history. For instance, the decades-long tension between Greece and Turkey was eased after large earthquakes struck the two countries in 1999. Generous assistance provided by the citizens and governments of both countries to each other in the immediate aftermath of the earthquakes supported rapprochement, continuing an already established route to long-lasting conflict resolution. As an academic field, however, disaster diplomacy emerged less than two decades ago. To date, disaster diplomacy has been referenced sporadically throughout the academic literature in the fields of hazards and disasters as well as policy and diplomacy. 

Effective disaster diplomacy can take many forms, given that it can originate on international, intranational, and local levels, as well as before, during, and after disasters. Thus, disaster diplomacy encompasses all disaster-related activities, including prevention, mitigation, preparedness, response, and recovery.  Interdisciplinary research and cross-border collaboration are crucial in building and fostering resilience. Continuous collaboration and the sharing of relevant data resources among natural and social scientists, engineers, and non-academic disaster experts, such as local and Indigenous knowledge holders and emergency managers, is necessary to accurately assess disaster causes and effectively address disaster impacts. Yet, disciplinary silos and political barriers often hinder these necessarily collaborative efforts. Disaster diplomacy provides an opportunity as an approach to enhance disaster resilience while simultaneously reducing conflicts and fostering cooperation between states where relations might otherwise be strained.

In most cases, disaster-related collaborations bring states with complex diplomatic situations together only for short periods of time (weeks to months). During disaster prevention and mitigation, the key objective is to assess and minimize or, ideally, eliminate disaster risk. This includes evaluating vulnerability and hazards drivers and their potential impacts, vulnerability and hazard monitoring, community outreach and education, and disaster-resilient infrastructure development. But as memories of collaboration quickly fade, preexisting prejudices and disputes resurface, and guidelines are needed on how best to achieve sustained, desired changes in foreign relations.

This summer marks the 50th anniversary of the Moon Landing, the crown jewel in America’s science and technology treasury that fundamentally changed the way that the people across the globe thought about the Earth.  It led to a new era of data collection and computation that ushered in the digital revolution—electronic banking and FinTech, the Internet, Big Data, automated systems in industry and the military, social media, Artificial Intelligence and Machine Learning, the Internet of Things and the 4th Industrial Revolution, and blockchain tools that will forever change the way that contracts are executed and currency is valued.

Data is not the new oil—data is the new money.  Like money, data only has value when it is used, and the more it is used, the more value that is derived from it.  Data is the building block of scientific advancement and technological innovation, but it is still not properly viewed as a vitally important national asset.  It can be used for multilateral peacebuilding programs that simultaneously serve to strengthen US technological and economic power.  The proposed project, Spacecraft to Statecraft: The Role of Big Data in 21st Century Cooperative Agreement Frameworks (or simply, S2S), will develop a novel framework designed to strategically implement big data assets toward sustainable multilateral agreements.  This Data Diplomacy framework will provide specific roadmaps and toolkits for Track I and Track II decision-makers, examine the conditions and causes of prior successes and failures in data diplomacy, and provide insights and recommendations for future opportunities.

This project is designed to provide tremendous value-added benefits to decision-makers by delivering high-impact reference tools supported by AI technology that will keep data up-to-date long after the award has expired.  S2S will also advance knowledge and understanding in the data science, international relations, and the science-policy communities by organizing siloed information into a single reference tool. It will develop a new framework for cooperation across governments (i.e., Track I organizations) and civilian, NGO, academic, financial, and industrial sectors (i.e., Track II organizations).  Finally, it will examine our comprehensive study of data diplomacy through a case study that identifies and illustrates specific technical opportunities for multilateral collaboration to advance national goals within the existing framework using big data.  Project outcomes will include an organized, cohesive set of policy recommendations that describe how to meet technical challenges involved in sharing and using shared data; new resources for decision-makers and stakeholders; and a data diplomacy framework that will allow new actors to engage in lasting cooperative partnerships.