British Standards Institution

Source:: ISO
Committee:: ART/1 - Artificial Intelligence
Categories:: Information management | Standardization. General rules | ICT | Information technology | Information technology applications

Scope

This document provides a data quality governance framework for analytics and machine learning to enable governing bodies of organizations to direct and oversee the implementation and operation of data quality measures, management, and related processes with adequate controls throughout the data life cycle. This document can be applied to any analytics and machine learning. This document does not define specific management requirements or process requirements specified in 5259-3 and 5259-4 respectively.

Purpose

A holistic data quality governance framework for analytics and machine learning is needed to enable the governing body to direct and oversee the implementation and operation of data quality measures (ISO/IEC 5259-2), data quality management requirements and guidelines (ISO/IEC 5259-3), and data quality process framework for various types of ML (ISO/IEC 5259-4) with adequate controls throughout ML data life cycle (ISO/IEC 5259-1). The goal is to enhance trust in data for analytical and ML applications and services by mitigating data quality-related risks, making informed decisions, empowering effective and efficient operations, and enhancing proper data utilization across the organization.

The approach for an organization to enhance trust in data for analytics and ML should be to establish a robust and cross-cutting data quality governance framework across different levels of the organization with clear roles and responsibilities on how data should be handled and processed (see Figure-1). This document aims to provide the framework with guiding principles applicable regardless of an organization’s size and type, with which an organization develops its own data quality governance. Individual organization’s actual governance arrangement would differ, depending upon a number of factors including the organization’s size and its industry. Both the governing body and management can use the framework to interact and ensure that a rigorous data quality governance for analytics and ML is established at all levels in the organization.

In order to safeguard high-quality data for analytics and ML, a governing body needs to have a deeper understanding of how each aspect of data quality may impact analytics and ML, which may have been insufficiently addressed in traditional data governance.

Data quality for analytics and ML

Data quality can impact the results of analytics and ML if the input data have a problem with data quality characteristics such as accuracy, completeness, consistency, credibility, currentness, accessibility, compliance, confidentiality, efficiency, precision, traceability, understandability, availability, portability and recoverability (ISO/IEC 5259-2). The outcome of analytics and ML and systems may be challenged in terms of bias, trustworthiness, reliability, safety, security, fairness, transparency, privacy, and others (ISO/IEC 24368).

Organization’s adoption of technology such as machine learning changes firm’s operation and processes, which in turn results in changes in organizational structure. Many new tasks specified in ISO/IEC 5259-4 are now essential to ensure trust in data for analytics and ML The creation of the new tasks (e.g. bias checking) and process in turn relates to management activities with proper decision tights to control (ISO/IEC 5259-3). This in turn could make the firm create a new functional unit (such as unified data organization led by Chief Data Officer), depending on the organization’s level of reliance on data and ML. The creation of the new processes and management tasks cannot be done at management level alone. It requires governing body’s approval and strong support.

The proposed data quality governance framework helps the governing body understand not only the importance of data quality on the outcome of analytics and ML but also the importance of their roles in establishing data quality accountabilities across different levels of the organization throughout ML data life cycle (ISO/IEC 5259-1). Data quality governance framework Figure-1 shows the cross-cutting key elements of data quality governance framework and how it relates to the 5259 series of standards.

The goal is to help the organization establish a strong data quality governance within its organization by providing a governance framework with guiding principles on the governance of data quality for analytics and ML. Figure-1: Relationship between data quality governance framework and the rest of 5259 projects (See Annex) The following provides an explanation on each element of the data quality governance framework (DQGF):

- Data quality guiding principles - Data quality guiding principles express desired behaviour of relevant individuals and groups across the organization, including data quality management and operation layers to produce trusted and reliable data for analytics and ML.

- The governing body should require that these data quality principles are applied to the organization’s data quality governance across management and operational processes related to the data quality for analytics and ML.

- Examples of data quality principles in relation to each element of the framework are:

[1] Governing body: The governing body should take the initiatives and ownership of implementing all data quality principles within a timeframe agreed with its management.

[2] DQ Planning and Strategies: Data quality objectives and strategies should be set during the data quality planning; and data quality objectives and strategies should be aligned with organization’s current and future intent of building trust in data for analytical and ML capabilities.

[3] DQ Architecture and IT Infrastructure: An organization should design, build and maintain adequate data architecture and IT infrastructure with roles and responsibilities established within and across organizational boundaries to ensure data quality requirements are fully supported.

[4] DQ Operation and Management: An organization should establish data quality metrics with effective controls on how data are handled and processed at each data life cycle phase.

[5] DQ Accountabilities: Roles and responsibilities should be defined to ensure adequate controls throughout the life cycle of data and on all aspects of data quality performance and conformance.

[6] DQ Risk Management: Risk management reports should include all risk areas identified in relation with data quality metrics enough to facilitate informed decision-making about risk by appropriate decision makers, especially the governing body and top management.

The practices to implement the data quality principles in relation with each element of the framework include :

- Data quality planning and strategies - Data quality planning can be a part of organization’s annual planning and control cycle.

- During the data quality planning process, objectives and goals of data quality are set; strategies to accomplish the objectives are formulated; new data quality initiatives are defined and budgets and other resources are allocated for the initiatives.

For example, the planning can initiate data architecture and IT infrastructure projects for the purpose of improving data quality.

- During this planning process, the organization should ensure that data quality objectives and strategies should be aligned with organization’s current and future intent of building trust in data quality for analytical and ML capabilities.

- The governing body and top management should ensure that roles and responsibilities are established to ensure DQ planning is performed adequately.

- The governing body and top management should ensure that the organization’s goals and strategies on data quality are shared across all departments and staffs.

- Trusted data architecture and IT infrastructure

- The data quality governance framework requires an integrated and trusted data architecture and IT infrastructure that are designed and agreed upon within and across organizational boundaries. Through the agreed and trusted data architecture and IT infrastructure, organizational members can trust the entire process in which correct data are created, entered and stored safely and in compliance with regulations and policies, then shared and utilized for analytics and ML.

- As part of the trusted data architecture, data stewards are selected to define consistent data taxonomies across the entire organization. The data stewards also ensure that these data taxonomies are used consistently throughout the organization. The approach of using a normalized taxonomy for systems of systems (SoS) as described in ISO/IEC/IEEE 21841:2019 can be used to ensure trusted data architecture within/across organizational boundaries and thus to facilitate new communications among stakeholders of AI systems.

- The consistent data taxonomy enables the organization to integrate data sets and ensure the consistent interpretation of the data.

- The underlying infrastructure that an organization relies on for data collection and processing can include the physical infrastructure (e.g. sensors and actuators) and IT infrastructure for data storage and processing (e.g. cloud computing). While some data resources are created by the organization itself other data resources can be acquired from third parties. This means that the data uploaded on the organization’s data platform can be provided by a great number of data providers. Analytics and ML can then be dependent on data from multiple different sources. In this case, data quality governance entails greater complexity in comparison to data quality governance at the level of a single organization. For example, such problems include the issue of determining the responsible party if analytics and ML produces an incorrect output due to an anomaly in data collected from multiple sensors and SoS and the ownership of the sensor data and SoS data is from third parties. Roles and responsibilities should be established as they relate to the ownership and quality of data.

The data owners should ensure there are adequate controls throughout the lifecycle of data and all aspect of technology infrastructure. Roles of the data owner include ensuring data is correctly captured, entered, and aligned with the agreed data definitions; and ensuring data quality monitoring and reporting practices are consistent with the terms of agreement between organizations.

- The governing body and top management should ensure that roles and responsibilities (e.g. data owner, Chief Data Officer, Data Supply Chain Officer and Data stewards) are established with proper authorities within and across organizational boundaries. These roles ensure that there are adequate controls throughout the data life cycle and all aspects of the technology infrastructure, as they relate to input data for ML. - Data quality accountabilities

- Accountability is defined as state of being accountable [SOURCE: ISO/IEC TR 38500:2015]. Note. Accountability relates to an allocated responsibility. The responsibility may be based on regulation or agreement or through delegation.

- Roles and responsibilities and reporting procedures throughout the life cycle of the data across different layers of the organization should be defined to ensure and monitor that obligations for all aspects of data quality performance and conformance are met.

[1] For input phase (where input data sets are used by ML systems for learning or production data is fed to the trained ML model for processing to make prediction [Source: ISO/IEC 22989:2021]):

The governing body should ensure that a mechanism (roles and responsibilities and reporting procedures) is established within the organization to ensure conformance to data protection regulations. Specifically, assurance has to be made that the management layer (definition, an organizational layer where exercises of control and supervision are performed within the constraints of governance) develops criteria to measure data quality for data acquisition; and that the operation layer (definition, an organizational layer where daily routine operational tasks are performed) implements the criteria, checks data in terms of the criteria and reports any critical risk.

[2] For evaluation phase (where prediction made by the trained model on the data is compared to the actual labels in data [Source: ISO/IEC 23053:2021]:

The governing body should ensure that verification and validation and reporting mechanisms are established within the organization to ensure the risks of bias in data used for analytics and ML is minimized. Specifically, assurance has to be made that management layer establish verification and validation team to ensure data quality; and that the operation layer develops models, conducts testing, and reports results and risks.

[3] For output phase (where predictions and actions are made after applying the trained model to production data [Source: ISO/IEC 22989:2021]:

The governing body should ensure that reporting and vigilance mechanisms are established within the organization to ensure and monitor data quality. Specifically, assurance has to be made that the management layer judges incidents and reports to the governing body; and that operation layer monitors data quality and hazardous situations; and reports incidents.

- Data quality risk management

- The governing body and top management should ensure that the organization has the appropriate data risk management capability including:

[1] Ability to identify and predict possible future risks within and across organizational boundaries for data accuracy, completeness, consistency, credibility, currentness, accessibility, compliance, confidentiality, efficiency, precision, traceability, understandability, availability, portability and recoverability in relation to key services enabled by analytics and ML.

[2] Ability to analyse and evaluate the data quality risks in terms of causes, threats and consequences.

[3] Ability to implement data risk handling (avoidance, prevention, reduction or minimization) options.

[4] An agreement on a certain level of error tolerance and risk appetite for both outsourced and inhouse data-related processes (5259-4).

[5] Ability to establish appropriate escalation channels to rectify poor data quality (5259-3).

- The governing body needs to make sure that the risk management process related to each data quality metrics should be integrated into data quality related management practices, structure, operations and processes of the organization.

- The governing body needs to make sure that roles and responsibilities with respect to data quality risk management are established and communicated at all levels of the organization.

- The governing body needs to make sure that the frequency of risk management report production and distribution is set to facilitate informed decision-making about risk by appropriate decision makers, especially the governing body.

The data quality governance framework with well-defined guiding principles is needed to ensure organization’s compliance with all data quality management and process requirements specified in 5259-3 and 5259-4; and to ensure that there are adequate controls throughout ML data life cycle (5259-1) and for each data quality metric (5259-2).

This proposed part in the 5259 series will assist governing bodies and top management in establishing a strong data quality governance framework in their organization to ensure trust in data for analytics and ML.

Comment on proposal

Required form fields are indicated by an asterisk (*) character.

How important do you think standardization is in this area? *

Do you agree that a standard on this subject is feasible?: *

Please give reasons for your selections above: *

Would you or your organization use the standard? *

Would you or your organization be prepared to participate in the development of the standard or to comment on the draft when it is available? *

Are you aware of any regulation, existing standards and other good practice information in this area in the UK (e.g. Industry codes of practice; company specifications, international or European Standards)? *

Would the development of a standard(s) in this area have a particular impact/relevance to SME, consumer, environmental or societal interests? *

Are you aware of any other organizations to which this proposal may be relevant? If so please provide details below. *

Additional comments on the scope or proposal:

Although BSI will not usually enter into correspondence regarding individual comments or suggestions we may wish to contact you to seek further clarification. Please indicate whether this will be acceptable: *

Discard

Please email further comments to: debbie.stead@bsigroup.com

Standard timeline

1. Proposal

Proposal start date:

16/11/2021

Proposal end date:

14/01/2022

2. Draft

3. Public Comments

4. Comment Resolution

5. Approval

6. Publication

Learn more about the standards development process

Standards Development

ISO/IEC JTC 1/SC 42 N 1002, ISO/IEC NP 5259-5 Artificial intelligence -- Data quality for analytics and machine learning (ML) -- Part 5: Part 5: Data quality governance

Scope

Purpose

Comment on proposal

Discard changes?

Submit comment

Submit comment

Follow standard

Unfollow standard

Error