6 Sigma (6σ or 6s): See Six Sigma.
Abstract: a concise and systematic summary of the key ideas of a book, article, speech, or any other kind of relevant information. See also: knowledge compression.
Abstraction: The process of moving from the specific to the general by neglecting minor differences or stressing common elements. Also used as a synonym for summarisation.
Accessibility: capable of being reached, capable of being used or seen.
Accessibility: The characteristic of being able to access data when it is required.
Accuracy: degree of conformity of a measure to a standard or a true value. Level of precision or detail.
Activation: a term that designates activities that make information more applicable and current, and its delivery and use more interactive and faster; a process that increases the usefulness of information by making it more vivid and organising it in a way that it can be used directly without further repackaging.
Accuracy to reality: A characteristic of information quality measuring the degree to which a data value (or set of data values) correctly represents the attributes of the real-world object or event.
Accuracy to surrogate source: A measure of the degree to which data agrees with an original, acknowledged authoritative source of data about a real world object or event, such as a form, document, or unaltered electronic data received from outside the organisation. See also Accuracy.
Aggregation: The process of associating objects of different types together in a meaningful whole. Also called composition.
Algorithm: A set of statements or a formula to calculate a result or solve a problem in a defined set of steps.
Alias: A secondary and non-standard name or alternate name of an enterprise-standard business term, entity type or attribute name, used only for cross reference of an official name to legacy or software package data name, e.g., Vendor is an alias for Supplier
ANSI: Acronym for American National Standards Institute, the U.S. body that sets standards.
Applicability: the characteristic of information to be directly useful for a given context, information that is organised for action.
Application: A collection of computer hardware, computer programs, databases, procedures, and knowledge workers that work together to perform a related group of services or business processes.
Application architecture: A graphic representation of a system showing the process, data, hardware, software, and communications components of the system across a business value chain.
Archival database: A copy of a database saved in its exact state for historical purposes, recovery, or restoration.
Artificial Intelligence (AI): The capability of a system to perform functions normally associated with human intelligence, such as reasoning, learning, and self-improvement.
Association: See Relationship.
Associative entity type: An entity type that describes the relationship of a pair of entity types that have a many-to-many relationship or cardinality. For example, COURSE COMPLETION DATE has meaning only in the context of the relationship of a STUDENT and COURSE OFFERING entity types..
Asynchronous replication: Replication in which a primary data copy is considered complete once the update transaction completes, and secondary replicated data copies are queued to be updated as soon as possible or on a predefined schedule.
Atomic value: An individual data value representing the lowest level of meaningful fact.
Attribute: An inherent property, characteristic, or fact that describes an entity or object. A fact that has the same format, interpretation, and domain for all occurrences of an entity type. An attribute is a conceptual representation of a type of fact that is implemented as a field in a record or data element in a database file.
Attributive entity type: An entity type that cannot exist on its own and contains attributes describing another entity. An attributive entity type resolves a one-to-many relationship between an entity type and a descriptive attribute that may contain multiple values. Also called characteristic or dependent entity type.
Audit trail: Data that can be used to trace activity such as database transactions.
Authentication: The process of verifying that a person requesting a resource, such as data or a transaction, has authority or permission to access that resource.
Availability: A percentage measure of the reliability of a system indicating the percentage of time the system or data is accessible or usable, compared to the amount of time the system or data should be accessible or usable.
Backup: To restore a database to its state at a previous point in time. Backup is achieved : (1) from an archived or a snapshot copy of the database at a specified time; or (2) from an archived copy of a database and applying the logged update activity of changes since that archived copy was made.
Believability: the quality of information and its source to evoke credibility based on the information itself or the history or reputation of the source.
Benchmarking: The process of analysing and comparing an organisation’s processes to that of other organisations to identify Best practices.
Best practice: A process, standard or component that is generally recognised to produce superior results when compared with similar processes, standards or components.
Bias: (1) Statistical error resulting in the distortion of measurement data caused by conscious or unconscious prejudice or faulty measurement technique such as an incorrect calibration of measurement equipment. (2) A vested interest, or strongly held paradigm or condition that may skew the results of sampling, measuring, or reporting the findings of a quality assessment. For example, if information producers audit their own information quality, they will have a bias to overstate its quality. If data is sampled in such a way that it does not reflect the entire population sampled, the sample result will be biased.
Bias: In this context, an unconscious distortion in the interpretation of information.
Biased sampling: Sampling procedures that result in a sample that is not truly representative of the population sampled.
Bounds: See Confidence interval.
Boyce/Codd Normal Form (BCNF): (1) A relation R is in Boyce/Codd normal form (BCNF) if and only if every determinant is a candidate key. (2) A table is in BCNF if every attribute that is a unique identify of attributes describing an entity is a candidate key of that entity.
Business application model: A graphic illustration of the conceptual application systems, both manual and automated, including their dependencies, required to perform the processes of an organisation.
Business information resource data: The Set of information resource data that must be known to information producers and knowledge workers in order to understand the meaning of information, the business rules that governs its quality and the stakeholders who create or require it.
Business information steward: A business subject-matter expert designated and accountable for overseeing some parts of data definition for a collection of data for the enterprise, such as data definition integrity, legal restriction compliance standards, information quality standards, and authorisation security.
Business intelligence (BI): The ability of an enterprise to act intelligently through the exploitation of its information resources
Business intelligence (BI) environment: Quality information in stable, flexible databases, coupled with business-friendly software tools that provide knowledge workers timely access to, effective analysis of, and intuitive presentation of the right information, enabling them to take the right actions or make the right decisions.
Business process: A synonym for value chain, the term is used to differentiate a value chain of activities from a functional process or functional set of activities.
Business process model: A graphic and descriptive representation of business processes or value chains that cut across functions and organisations. The model may be expressed in different levels of detail, including decomposition into successive lower levels of activities.
Business process reengineering: the process of analysing, redefining, and redesigning business activities to eliminate or minimise activities that add cost and to maximise activities that add value.
Business resource category: A business classification of data about a resource the enterprise must manage across business functions and organisations, used as a basis for high-level information modelling. The internal resource categories are human resource, financial, materials and products, facilities and tangible assets, and information. External resources include business partners, such as suppliers and distributors; customers; and external environment, such as regulation and economic factors. Also called subject area.
Business rule: A statement expressing a policy or condition that governs business actions and establishes data integrity guidelines.
Business rule conformance: See Validity.
Business term: A word, phrase, or expression that has a particular meaning to the enterprise.
Business value chain: See Value chain.
Candidate key: A key that can serve to uniquely identify occurrences of an entity type. A candidate key must have two properties: (1) Each occurrence or record must have a different value of the key, so that a key value identifies only one occurrence; and (2) No attribute in the key can be eliminated without nullifying the first property.
Cardinality: The number of occurrences that may exist between occurrences of two related entity types. The cardinalities between a pair of related entity types are : one to one, one to many, or many to many. See Relationship.
CASE: Acronym for Computer-Aided Systems Engineering. the application of automated technologies to business and information modelling and software engineering.
Case study: an empirical inquiry that investigates a contemporary phenomenon within its real-life context; careful and systematic observation and recording of the experience of a single organisation.
CASS (Coding Accuracy Support System): A system for verifying the integrity of United States addresses against a USPS maintained database containing every mailing address in the United States. The system is concerned with just the addresses, not the people or organisations residing at these addresses.
Catalogue: The component of a Database Management System (DBMS) where physical characteristics about the database are stored, such as its physical design schema, table or file names, primary keys, foreign key relationships, and other data required for the DBMS to manage the data.
Categorisation: Here, the conscious effort to group information items together based on common features, family resemblances, rules, membership gradience, or certain group prototypes (best examples of a category).
Cause-and-effect diagram: A chart in the shape of a «fishbone» used to analyse the relationship between error cause and error effect. The diagram, invented by Kaoru Ishikawa, shows a specific effect and possible causes or error. The errors are drawn in 6 categories, each a bone on the fish. The categories are : 1) Human (or Manpower), 2) Methods, 3) Machines, and 4) Materials, 5) Measurement and 6) Environment. Also called a Fishbone diagram. (Q)
Central tendency : The phenomenon that data measured from a process generally aggregates around a value somewhere between the high and low values.
Champion: In Six Sigma, the executive or manager who «owns» a process to be improved, and whose role is an advocate for the improvement project, with oversight and management of critical elements, reporting project success to up-line management, and who removes barriers to enable project improvement success.
Checklist: A technique for quality improvement to identify steps to perform or items to check before work is complete.
Clarity: void of obscure language or expression, ease of understanding, interpretability.
Class word: See Domain type.
Cleansing: See Data cleansing.
Cluster: (1) A way of storing records or rows from one or more tables together physically, based on a common key or partial key value. (2) Groups of objects that have similar characteristics or behaviours that are significantly different from other objects that are discovered through data analysis or mining (Stat).
Cluster sampling: Sampling a population by taking samples from a smaller number of subgroups (such as geographic areas) of the population. The subsamples from each cluster are combined to make up the final sample. For example, in sampling sales data for a chain of stores, one may choose to take a subsample of a representative subset of stores (each a cluster) into a cluster sample rather than randomly select sales data from every store.
Code: (1) To represent data in a form that can be accepted by an application program. (2) : A shorthand representation or abbreviation of a specific value of an attribute.
Commit: A DML command that signals a successful end of a transaction and confirms that a record(s) inserted, updated, or deleted in the database is complete.
Common cause: An inherent source of variation in the output of a process due to natural variation in the process. See also Special cause.
Communication: here, the interchange of messages resulting in the transferral or creation of knowledge; the creation of shared understanding through interaction among two or more agents.
Completeness: A characteristic of information quality measuring the degree to which all required data is known. (1) Fact completeness is a measure of data definition quality expressed as a percentage of the attributes about an entity type that need to be known to assure that they are defined in the model and implemented in a database. For example, «80 percent of the attributes required to be known about customers have fields in a database to store the attribute values.» (2) Value completeness is a measure of data content quality expressed as a percentage of the columns or fields of a table or file that should have values in them, in fact do so. For example, «95 percent of the columns for the customer table have a value in them.» Also referred to as Coverage. (3) Occurrence completeness is a measure of the percent of records in an information collection that it should have to represent all occurrences of the real world objects it should know. For example, does a Department of Corrections have a record for each Offender it is responsible to know about?.
Comprehensiveness: the quality of information to cover a topic to a degree or scope that is satisfactory to the information user.
Conceptual data model: See Data model.
Conciseness: marked by brevity of expression or statement, free from all elaboration and superfluous detail.
Concurrency: (1) A characteristic of information quality measuring the degree to which the timing of equivalence of data is stored in redundant or distributed database files. The measure data concurrency may describe the minimum, maximum, and average information float time from when data is available in one data source and when it becomes available in another data source. Or it may consist of the relative percent of data from a data source that is propagated to the target within a specified time frame.
Concurrency assessment: An audit of the timing of equivalence of data stored in redundant or distributed database files. See Equivalence.
Concurrency control: A DBMS mechanism of locking records used to manage multiple transactions access to shared data.
Conditional relationship: An association that is optional depending on the nature of the related entities or on the rules of the business environment.
Confidence interval, or confidence interval of the mean: The upper and lower limits or values, or bounds on either side of a sample mean for which a confidence level is valid.
Confidence level: The degree of certainty, expressed as a percentage, of being sure that the value for the mean of a population is within a specific range of values around the mean of a sample. For example, a 95 percent confidence level indicates that one is 95 percent sure that the estimate of the mean is within a desired precision or range of values called a confidence interval. Stated another way, a 95 percent confidence level means that out of 100 samples from the same population, the mean of the population is expected to be contained within the confidence interval in 95 of the 100 samples.
Confidence limits: See Confidence interval.
Configuration management: The process of identifying and defining configurable items in an environment by controlling their release and any subsequent changes throughout the development life cycle; recording and reporting the status of those items and change requests; and verifying the completeness and correctness of configurable items.
Consensus: The agreement of a group with a judgment, decision, or data definition in which the stakeholders have participated and can say, «I can live with it.»
Consistency: A measure of information quality expressed as the degree to which a set of data is equivalent in redundant or distributed databases.
Consistency: the condition of adhering together, the ability to be asserted together without contradiction.
Constraint: A business rule that places a restriction on business actions and therefore restrictions the resulting data. For example, «only wholesale customers may place wholesale orders.»
Contamination: See Information quality contamination.
Context: here, the sum of associations, ideas, assumptions, and preconceptions that influence the interpretation of information; the situation of origination or application of a piece of information.
Context: a specific situation that defines the environment in which a piece of information originates or is interpreted.
Contextualisation: the act of adding situational meta-information to information in order to make it more comprehensible and clear and easier to judge.
Contextualisation: a term that designates activities that make information clearer, allow to see whether it is correct for a new situation, and enable a user to trace it back to its origin (in spite of system changes); a process that adds background to information about its origin and relevance.
Contextualiser: A mechanism that can be used to add context to a piece of information and thus increase its interpretability.
Control: The mechanisms used to manage processes to maintain acceptable performance.
Control chart: A graphical device for reporting process performance over time for monitoring process quality performance.
Control group: A selected set of people, objects, or processes to be observed to record behaviour or performance characteristics. Used to compare behaviour and performance to another group in which changes or improvements have been made.
Convenience: here, the ease-of-use or seamlessness by which information is acquired.
Conversion: The process of preparing, reengineering, cleansing and transforming data, and loading it into a new target data architecture.
Corporate data: See Enterprise data.
Correctness: conforming to an approved or conventional standard, conforming to or agreeing with fact, logic, or known truth.
Correlation: A predictive relationship that exists between two factors, such that when one of the factors changes, you can predict the nature of change in the other factor. For example, if information quality goes up, the costs of information scrap and rework go down.
Cost of acquisition: (1) The cost of acquiring a new customer, including identifying, marketing and presales activities to get the first sale. (2) The costs of acquiring products, such as software packages, and services. This should be weighed against the cost of ownership.
Cost of information quality assessment: The costs associated with measurement and quality conformance assurance as a component of the cost of quality information.
Cost of nonquality information: The total costs associated with failure or nonquality information and information services, including, but not limited to reruns, rework, downstream data verification, data correction, data transformation to nonstandard definition or format, work arounds.
Cost of ownership: The total costs of ownership of products, such as software packages, and services, including planning, acquiring, process redesign, implementation, and support required for the successful use of the product or service.
Cost of quality information: The total costs associated with providing nonquality information or information services. The costs consists of costs of failure or nonquality information plus the costs of assessment and conformance plus the costs of information process improvement and data defect prevention.
Cost of retention: The cost of managing customer relationships that result in subsequent sales to existing customers.
Coverage: See Completeness.
Criteria: standards by which alternatives are judged. Attributes that describe certain (information) characteristics.
Criteria of information quality: they describe the characteristics that make information valuable to authors, administrators, or information users.
Critical information: Information that if missing or wrong can cause enterprise-threatening loss of money, life, or liability, such as failure to properly calculate pension withholding, not setting the aeroplane flaps correctly for take-off, or prescribing the wrong drug.
Cross-functional: The characteristic of data or process that is of interest to more than one business or functional area.
Currency: A characteristic of information quality measuring the degree to which data represents reality from the required point in time. For example, one information view may require data currency to be the most up-to-date point, such as stock prices for stock trades, while another may require data to be the last stock price of the day, for stock price running average.
Currency: the quality or state of information of being up-to-date or not outdated.
Customer: The person or organisation whose needs a product or service provider must meet, and whose satisfaction with product and service, including information is the focus of quality management. A customer may be a direct, immediate Customer or the End-consumer of the product or service.
Customer life cycle: The states of existence and relative time periods of a typical customer from being a prospect to becoming an active customer, to becoming nonactive and a «former» customer.
Customer lifetime revenue: The net present value of the average customer revenue over the life of relationship with the enterprise.
Customer lifetime value (LTV): The net present value of the average profit of a typical customer over the life of relationship with the enterprise.
Customer segment: A meaningful aggregation of customers for the purpose of marketing or determining customer lifetime value.
Customer-supplier relationship: See Information customer-supplier relationship.
CUSUM: Abbreviation for Cumulative Summation, a more sensitive method for detecting out-of-control measurements than a simple control chart. The CUSUM indicates when a process has been off aim for too long a period of time.
Cycle time: The time required for a process (or subprocess) to execute from start to completion.
d: A symbol representing the set of deviations of a set of items from the mean of the set of items, expressed as d = x-x bar for each value of x.
Data: 1) Symbols, numbers or other representation of facts; 2) The raw material from which information is produced when it is put in a context that gives it meaning. See also Information.
Data: raw, unrelated numbers or entries, e.g., in a database; raw forms of transactional representations.
Data administration: See Data management.
Data administrator: One who manages or provides data administration functions.
Data analyst: One who identifies data requirements, defines data, and synthesises it into data models.
Data architect: One who is responsible for the development of data models.
Data audit: See Information quality assessment.
Data cleansing: An information scrap-and-rework process to correct data errors in a collection of data in order to bring the level of quality to an acceptable level to meet the information customers’ needs.
Data cleanup: See Data cleansing.
Data consistency assessment: The process of measuring data equivalence and information float or timeliness in an interface-based information value chain.
Data content quality: The subset of information quality referring to the quality of data values.
Data defect prevention: The process of information process improvement to eliminate or minimise the possibility of data errors from getting into an information product or database.
Data deficiency: an unconformity between the view of the real-world system that can be inferred from a representing information system and the view that can be obtained by directly observing the real-world system.
Data definition: The specification of the meaning, valid values or ranges (domain), and business integrity rules for an entity type or attribute. Data definition includes name, definition, and relationships, as well as domain value definition and business rules that govern business actions that are reflected in data. These components represent the «information product specification» components of Information Resource Data or meta data.
Data Definition Language (DDL): The language used to describe database schemas or designs.
Data definition quality: A component of information quality measuring the degree to which data definition accurately, completely, and understandably defines what the information producers and knowledge workers should know in order to perform their job processes effectively. Data definition quality is a measure of the quality of the information product specification.
Data dictionary: A repository of information (meta data) defining and describing the data resource. A repository containing meta data. An active data dictionary, such as a catalogue, is one that is capable of interacting with and controlling the environment about which it stores information or meta data. An integrated data dictionary is one that is capable of controlling the data and process environments. A passive data dictionary is one that is capable of storing meta data or data about the data resource, but is not capable of interacting with or controlling the computerised environment external to the data dictionary. See also Repository.
Data dissemination: The distribution of a copy or extract of information in any form, from electronic to paper from a database or data source to other parties. This is NOT to be confused with data or information sharing. (Q)
Data element: The smallest unit of named data that has meaning to a knowledge worker. A data element is the implementation of an attribute. Synonymous with data item and field.
Data flow diagram: A graphic representation of the «flow» of data through business functions or processes. It illustrates the processes, data stores, external entities, data flows, and their relationships.
Data independence: The property of being able to change the overall logical or physical structure of the data without changing the application program’s view of the data.
Data intermediary: See Data scribe.
Data intermediation: The design of and performance of processes in which the actual creator or originator of knowledge does not capture that knowledge electronically, but gives it in paper or other form to be entered into a database by someone else.
Data management: The management and control of data as an enterprise asset. It includes strategic information planning, establishing data-related standards, policies, and procedures, and data modelling and information architecture. Also called data administration.
Data Manipulation Language (DML): The language used to access data in one or more databases.
Data mart: A subset of enterprise data along with software to extract data from a data warehouse or operational data store, summarise and store it, and to analyse and present information to support trend analysis and tactical decisions and processes. The scope can be that of a complete data subject such as Customer or Product Sales, or of a particular business area or line of business, such as Retail Sales. A data mart architecture, whether subject or business area, must be an enterprise-consistent architecture.
Data mining: The process of analysing large volumes of data using pattern recognition or knowledge discovery techniques to identify meaningful trends, relationships and clusters represented in data in large databases.
Data mining: The process of analysing large volumes of data using pattern recognition or knowledge discovery techniques to identify meaningful trends and relationships represented in data in large databases.
Data model: (1) A logical map or representation of real-world objects and events that represents the inherent properties of the data independently of software, hardware, or machine performance considerations. The model shows data attributes grouped into third normal form entities, and the relationships among those entities. (DM) (2) In data mining, an expression in symbolic terms of the relationships in data, such that the model represents how changes in one attribute or set of attributes causes changes in another attribute or set of attributes, revealing useful information about the reliability of the relationships.
Data presentation quality: A component of information quality measuring the degree to which information-bearing mechanisms, such as screens, reports, and other communication media, are easy to understand, efficient to use, and minimise the possibility of mistakes in its use.
Data quality: See Information quality.
Data quality assessment: See Information quality assessment.
Data reengineering: The process of analysing, standardising, and transforming data from un-architected or non-standardised files or databases into an enterprise standardised information architecture.
Data replication: The controlled process of propagating equivalent data values from a source database to one or more duplicate copies in other databases.
Data resource management (DRM): See Information resource management.
Data scribe: A role in which individuals transcribe data in one form, such as a paper document, to another form, such as into a computer database; for example, a data entry clerk entering data from a paper order form into a database.
Data standards: The collection of standards, rules and guidelines that govern how to name data, how to define it, how to establish valid values, and how to specify business rules. (IRM)
Data store: Any place in a system where data is stored. This includes manual files, machine-readable files, data tables, and databases. A data store on a logical data flow diagram is related to one or more entities in the data model.
Data transformation: The process of defining and applying algorithms to change data from one form or domain value set to another form or domain value set in a target data architecture to improve its value and useability for the information stakeholders.
Data type: An attribute of a data element or field that specifies the DBMS type of physical values, such as numeric, alphanumeric, packed decimal, floating point, or datetime.
Data value: A specific representation of a fact for an attribute at a point in time.
Data visualisation: Graphical presentation of patterns and trends represented by data relationships.
Data warehouse: A collection of software and data organised to collect, cleanse, transform, and store data from a variety of sources, and analyse and present information to support decision-making, tactical and strategic business processes.
Data warehouse audits and controls: A collection of checks and balances to assure the extract, cleansing, transformation, summarisation, and load processes are in control and operate properly. The controls must assure the right data is extracted from the right sources, transformed, cleansed, summarised correctly, and loaded to the right target files.
Database administration: The function of managing the physical aspects of the data resource, including physical database design to implement the conceptual data model; and database integrity, performance, and security.
Database integrity: The characteristic of data in a database in which the data conforms to the physical integrity constraints, such as referential integrity and primary key uniqueness, and is able to be secured and recovered in the event of an application, software, or hardware failure. Database integrity does not imply data accuracy or other information quality characteristics not able to be provided by the DBMS functions.
Database marketing: The use of collected and managed information about one’s customers and prospects to provide better service and establish long-term relationships with them. Database marketing involves analysing and designing pertinent customer information needs, collecting, maintaining, and analysing that data to support mass customisation of marketing campaigns to decrease costs, improve response, and to build customer loyalty, reduce attrition, and increase customer satisfaction.
Database server: The distributed implementation of a set of database management functions in which one dedicated collection of database management functions, accessing one or more databases on that mode, serves multiple knowledge workers or clients that provide a human-machine interface for the requesting of a creation of data.
Data-driven development: See Value-centric development.
DDL: Acronym for Data Definition Language.
Decision Support System (DSS): Applications that use data in a free-form fashion to support managerial decisions by applying ad hoc query, summarisation, trend analysis, exception identification, and «what-if» questions.
Defect: (1) In IQ, a quality characteristic of a data element, such as completeness or accuracy that does not meet its quality standard or meet customer expectation. A record may have as many defects for a quality characteristic as it has data elements. Compare to Defective; (2) A quality characteristic of an item or a component that does not conform to its quality standard or meet customer expectation.
Defect rate: A measure of the frequency that defects occur in a process. Also called failure rate (in manufactured products), or error rate.
Defective: (1) In IQ, a record or logical business unit of information, such as an insurance application or an order, that has at least one Defect causing it to not conform to its quality standard or meet customer expectation. The record is counted as one Defective regardless of the number of defects; (2) A unit of product or service containing at least one Defect.
Definition conformance: The characteristic of data, such that the data values represent a fact consistent with the agreed-upon definition of the attribute. For example, a value of «6/7/1997″ actually represents the «Order Date : the date an order is placed by the customer,» and not the system date created when the order is entered into the system.
Delphi approach: An approach used to achieve consensus, that involves individual judgments made independently, group discussion of the rationales for disparate judgments, and a consensus judgment being agreed upon by the participants.
Demography: The study of human populations, especially with reference to size, density, distribution and other vital statistics.
Derived data: Data that is created or calculated from other data within the database or system.
Design: here, the rendering of content in a communication medium; design is concerned with how things ought to be in order to attain goals and to function.
Deviation (d): The difference in value of an item in a set of items and the mean (x bar) of the set as expressed in the formula d = x-x bar, where d = deviation, x = the value of an item in a set, and x bar is the mean or average of all items in the set.
Devil’s advocate: A technique used in decision making in which someone plays the role of challenging the predominant position in order to expose potential flaws, influence critical thinking and prevent biased and potentially harmful decisions.
DFD: Acronym for Data Flow Diagram.
DIF: Acronym for Data Interchange Format.
Dimension: (1) See Information quality characteristic. (2) A category for summarising or viewing data (e.g., time period, product, product line, geographic area, organisation). See also Enterprise dimension.
Directory: A table, block, index, or folder containing addresses and locations or relationships of data or files and used as a way of organising files.
Discount rate: The market rate of interest representing the cost to borrow money. This rate may be applied to future income to calculate its net present value.
Disinformation: see misinformation.
DMAIC: Acronym for Define-Measure-Analyse-Improve-Control, the Six Sigma method for process improvement.
DML: Acronym for Data Manipulation Language.
Document identification keys: concise alphanumeric labels that are attributed to documents according to a set of rules in order to facilitate their storage and retrieval.
Domain value redundancy: A dysfunctional characteristic of an attribute or field in which the same fact of information is represented by more than one value. For example, unit of measure code having domain values of «doz,» «dz,» and «12» may all represent the fact that the unit of measure is «one dozen.»
Domain: (1) Set or range of valid values for a given attribute or field, or the specification of business rules for determining the valid values. (2) The area or field of reference of an application or problem set.
Domain chaos: A dysfunctional characteristic of an attribute or field in which multiple types of facts are represented by more. For example, unit of measure code for one product has a domain value of «doz,» to represent a unit of measure of «one dozen,» while for another product unit of measure code has a value of «150,» to represent a the reorder point quantity.
Domain type: A general classification that characterises the kind of values that may be values of a specific attribute, such as a number, date, currency amount, or percent. The domain type name may be used as a component of an attribute name. Also called a class word.
Drill down: The process of accessing more detailed data from summary data to identify exceptions and trends. May be multitier.
Drill through: The process of accessing the original source data from a replicated or transformed copy to verify equivalence to the record-of-origin data.
DSS: Acronym for Decision Support Systems.
Ease-of-use: the quality of an information environment to facilitate the access and manipulation of information in a way that is intuitive.
E-commerce: Acronym for electronic commerce, the conducting of business transactions over the Internet (I-Net).
EDI: Acronym for Electronic Data Interchange.
Edit and validation: The process of assuring data being created conforms to the governing business rules and is correct to the extent possible. Database integrity controls and software routines can edit and validate conformance to business rules. Information producers must validate correctness of data.
EIS: Acronym for Executive Information System.
Empty value: A data element that has no value has been capture, and for which the real-world object represented has no corresponding value. For example, there is no date value for the data element, «Last date of service» for an active Employee. Contrast with Missing value. (Stat, Q)
End-consumer: The persons or organisations whose needs a product or service provider must meet, and whose satisfaction with its products and services, including information, determines enterprise success or failure. A customer may be a direct, immediate Customer or the End-consumer of the product or service.
Enterprise data: The data of an organisation or corporation that is owned by the enterprise and managed by a business area. Characteristics of corporate data are that it is essential to run the business and/or it is shared by more than one organisational unit within the enterprise.
Entity integrity: The assurance that a primary key value will identify no more than one occurrence of an entity type, and that no attribute of the primary key may contain a null value. Based on this premise, the real-world entities are uniquely distinguishable from all other entities.
Entity life cycle: The phases, or distinct states, through which an occurrence of an object moves over a definable period of time. The subtypes of an entity that are mutually exclusive over a given time frame. Also referred to as entity life history and state transition diagram.
Entity Relationship Diagram (ERD): See Entity relationship model.
Entity relationship model: A graphical representation illustrating the entity types and the relationships of those entity types of interest to the enterprise.
Entity subtype: A specialised subset of occurrences of a more general entity type, having one or more different attributes or relationships not inherent in the other occurrences of the generalised entity type. For example, an hourly employee will have different attributes from a salaried employee, such as hourly pay rate and monthly salary.
Entity supertype: A generalised entity in which some occurrences belong to a distinct, more specialised subtype.
Entity type: A classification of the types of real-world objects (such as person, place, thing, concept, or events of interest to the enterprise) that have common characteristics. Sometimes the term entity is used as a short name.
Entity/process matrix: A matrix that shows the relationships of the processes, identified in the business process model, with the entity types identified in the information model. The model illustrates which processes create, update, or reference the entity types.
Equivalence: A characteristic of information quality that measures the degree to which data stored in multiple places is conceptually equal. Equivalence indicates the data has equal values or is in essence the same. For example, a value of «F» for Gender Code for J. J. Jones in database A and a value of «1» for Sex Code for J. J. Jones in database B mean the same thing : J. J. Jones is female. The measure equivalence is the percent of fields in records within one data collection that are semantically equivalent to their corresponding fields within another data collection or database. Also called semantic equivalence.
ERD: Acronym for Entity Relationship Diagram.
Error cause removal: Elimination of cause(s) of error in a way that prevents recurrence of the error.
Error event: An incident in which an error or defect occurs.
Error proofing: Building edit and validation routines in application programs and designing procedures to reduce inadvertent human error. Also called foolproofing.
Error rate: See Defect rate.
Evaluation: the activity of assessing the quality of a system or the information it contains.
Event: (1) An occurrence of something that happens that is of interest to the enterprise. (2) See also Error event.
Executive Information System (EIS): A graphical application that supports executive processes, decisions, and information requirements. Presents highly summarised data with drill-down capability, and access to key external data.
Expert: a knowledge worker who has a high degree of domain specific knowledge and a high heuristic competence in the field of expertise.
Expert system: (1) A specific class of knowledge base system in which the knowledge, or rules, are based on the skills and experience of a specific expert or group of experts in a given field. (2) A branch of artificial intelligence. An expert system attempts to represent and use knowledge in the same way a human expert does. Expert systems simulate the human trait of thinking.
Export: The function of extracting information from a repository or database and packaging it to an export/import file.
Extensibility: The ability to dynamically augment a database (or data dictionary) schema with knowledge worker-defined data types. This includes addition of new data types and class definitions for representation and manipulation of unconventional data such as text data, audio data, image data, and data associated with artificial intelligence applications.
Extranet: semi-public TCP/IP network used by several collaborating partners.
Fact: (1) Something that is known or needs to be known. (2) In data warehousing, a specific numerical sum that represents a key business performance measure.
Fact: A statement that accurately reflects a state or characteristic of reality.
Fact completeness: See Completeness.
Fact Table: The primary table in dimensional modelling that contains key business measurements. The facts are viewed by various Dimensions. See also Enterprise fact.
Failure costs: See Costs of nonquality information.
Failure mode: (1) The precipitating defect or mechanism that causes a failure. (2) The result or consequence of a failure or the manifestation of a failure. (3) The way in which a failure occurs and its impact on the normal process.
Failure mode analysis (FMA): A procedure to determine the precipitating cause or symptoms that occur just before or after a process failure. The procedure analyses failure mode data from current and previous process designs with a goal to define improvements to prevent recurrence of failure. See also Information process improvement.
Failure rate: A measure of the frequency that defective items are produced by a process, hence the frequency with which the process fails. See also Defect rate.
False Negative: (1) In quality measurement, the condition of measuring a value for accuracy (or validity) and finding it to be not accurate (or not valid) when it is accurate (or valid). (2) In record matching, the condition of failing to identify that two records represent the same real world object.
False Positive: (1) In quality measurement, the condition of measuring a value for accuracy (or validity) and finding it to be accurate (or valid) when it is not. (2) In record matching, the condition of incorrectly identifying that two records represent the same real world object, when they actually represent two unique real world objects.
Falsity: the characteristic of information not to correspond to facts, logic, or a given standard.
Feedback loop: A formal mechanism for communicating information about process performance and information quality to the process owner and information producers.
Field: A data element or data item in a data structure or record.
Fifth Normal Form (5NF): (1) A relation R is in fifth normal form (5NF) (also called Projection Join Normal Form (PJ/NF)) if and only if every join dependency in R is a consequence of the candidate keys of R. (2) A table is in 5NF if a relation or record in which all elements within a concatenated key are independent of each other and cannot be derived from the remainder of the key.
File integrity: The degree to which documents in a file retain their original form and utility (i.e., no misfiled or torn documents).
Filter: See Information quality measure.
First Normal Form (1NF): (1) A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only. (2) A table is in 1NF if it can be represented as a two-dimensional table, and for every attribute there exists one single meaningful and atomic value, never a repeating group of values.
Fishbone diagram: See Cause-and-effect diagram.
Flexibility: A characteristic of information quality measuring the degree to which the information architecture or database is able to support organisational or process reengineering changes with minimal modification of the existing objects and relationships, only adding new objects and relationships.
FMA: See Failure mode analysis.
Focus group: A facilitated group of customers that evaluates a product or service against those of competitors, in order to clearly define customer preferences and quality expectations.
Focus group: a market research technique where five to nine people discuss a topic with the help of a moderator in order to elicit common themes, problems, or opinions.
Foolproofing: Building edit and validation routines in application programs or procedures to reduce inadvertent human error.
Foreign key: A data element in one entity (or relation) that is the primary key of another entity that serves to implement a relationship between the entities.
Fourth Normal Form (4NF): (1) A relation R is in fourth normal form (4NF) if and only if, whenever there exists an MVD in R, say A ->-> B, then all attributes of R are also functionally dependent upon A. In other words, the only dependencies (FDs or MVDs) in R are of the form K -> X (i.e., a functional dependency from a candidate K to some other attribute X). Equivalently, R is in 4NF if it is in BCNF and all MVDs in R are in fact FDs. (2) A table is in 4NF if no row of the table contains two or more independent multivalued facts about an entity.
Frameworks of information quality: they group information criteria into meaningful categories.
Frequency distribution: The relation number of occurrences of values of an attribute, including a graphic representation of that «distribution» of values.
Functional dependence: The degree to which an attribute is an inherent characteristic of an entity type. If an attribute is an inherent characteristic of an entity type, that attribute is fully functionally dependent on any candidate key of that entity type. See Normal form.
Generalisation: The process of aggregating similar types of objects together in a less-specialised type based upon common attributes and behaviours. The identification of a common supertype of two or more specialised (sub)types. See also Specialisation.
Generic information quality criteria: Attributes of information that are relevant regardless of the specific context, such as accessibility or clarity.
Gossip: unsubstantiated, low-quality information that is passed on by word of mouth.
Groupthink: occurs when the members of a highly cohesive group lose their willingness to evaluate each other’s inputs critically. It is a phenomenon (coined by Irving Janis) that describes the negative effects that can take place in team dynamics, such as excluding information sources; a tendency in highly cohesive groups for members to seek consensus so strongly that they lose the willingness and ability to evaluate one another’s ideas critically.
GUI: graphical user interface; the visual component of a (typically operating system) software application.
Heuristics: A method or rule of thumb for obtaining a solution through inference or trial-and-error using approximate methods while evaluating progress toward a goal.
Hidden complaint: An unhappy customer who has a complaint about a product or service, but who does NOT tell the provider organisation.
Hidden information factory: In IQ, all of the areas of the business where information scrap and rework takes place, including redundant databases and applications that move or re-enter data, as well as private, proprietary data files and spreadsheets people maintain the keep their information current, because they cannot access information in the way they need it, they do not trust it, or their «production» reports or queries does not meet their needs. In manufacturing, the hidden factory is all of the areas of the factory in which scrap and rework goes on, including replacement products, retesting or re-inspection of rejecting items.
Highlighting: stressing the most essential elements of a document by emphasising sentences or items visually through colours, larger or different fonts or through flagging icons.
Highly summarised: Data that is summarised to more than two hierarchies of summarisation from the base detail data. Highly summarised data may have lightly summarised data as its source.
Holding the gain: Putting in place controls in a process that has been improved to maintain the quality level achieved by the improvement.
Homonym: A word, phrase or data value that has the same spelling, value or sound, but has a different meaning.
Hoshin planning (Hoshin Kanri): Also known as Policy Management or Policy Deployment, is a management technique developed in Japan by combining Management by Objectives and the Plan-Do-Check-Act (PDCA) improvement cycle. Hoshin planning provides a planning, implementation and review process to align business strategy and daily operations through total employee participation to achieve business objectives and breakthrough improvements.
House of quality: A mapping of customer quality expectations in product or service to the quality measures of the product or service to summarise all expectations and the work to meet them. See also Quality Function Deployment.
House of Quality: a standard quality management tool that consists of a matrix which relates customer requirements to technical specifications or functionalities.
Human error: An action performed by a person that is wholly expected to have a positive or satisfactory outcome, but that does not. (Ben Marguglio). Human error is NOT a root cause of defects, rather, human error is predictable, manageable, and human error is preventable.
Human factors: Static constraints related to human ergonomic and cognitive limitations.
Hypermedia: The convergence of hypertext and multimedia.
Hypertext: The ability to organise text data in logical chunks or documents that can be accessed randomly via links as well as sequentially.
Hypertext: this term refers to the computer-based organisation of information by way of linking related (associated) information.
Hypothetical reasoning: Hypothetical reasoning is a problem-solving approach that explores several different alternative solutions in parallel to determine which approach or series of steps best solves a particular problem. It is useful in business planning or optimisation problems, where solutions vary according to cost or where numerous solutions may be feasible.
Identifier: One or more attributes that uniquely locate an occurrence of an entity type. conceptually synonymous with primary key.
In control: The state of a process characterised by the absence of special causes of variation. Processes in control produce consistent results within acceptable limits of variation. See also Out of control.
Inadvertent error: Error introduced unconsciously; for example, when a data intermediary unwittingly transposes values or skips a line in data entry. See also Intentional error.
Incremental load: The propagation of changed data to a target database or data warehouse in which only the data that has been changed since the last load is loaded or updated in the target.
Informate: A term coined by Shoshona Zuboff in The Age of The Smart Machine to described the benefit of information technology when used to capture knowledge about business events so that the knowledge can «informate» other knowledge workers to more intelligently perform their jobs.
Information: 1) Data in context, i.e., the meaning given to data or the interpretation of data based on its context; 2) the finished product as a result of processing, presentation and interpretation of data.
Information: Information can be defined as all inputs that people process to gain understanding. It is a difference (a distinction) that makes a difference, an answer to a question. A set of related data that form a message.
Information administrator: person who is responsible for maintaining (see also maintainability) information or keeping an information system running.
Information architecture: A «blueprint» of an enterprise expressed in terms of a business process model, showing what the enterprise does; an enterprise information model, showing what information resources are required; and a business information model, showing the relationships of the processes and information.
Information architecture quality: A component of information quality measuring the degree to which data models and database design are stable, flexible, and reusable, and implement principles of data structure integrity.
Information assessment: See Information quality assessment.
Information chaos: A state of the dysfunctional learning organisation in which there are unmanaged, inconsistent, and redundant databases that contain data about a single type of thing or fact. The information chaos quotient is the number of unmanaged, inconsistent, and redundant databases containing data about a single type of thing or fact.
Information chaos quotient: The count of the number of unmanaged, inconsistent, and redundant databases containing data about a single type of thing or fact.
Information consumer: person who is accessing, interpreting and using information products, see also : knowledge worker.
Information customer-supplier relationship: The information stakeholder partnerships between the information producers who create information and the knowledge workers who depend on it.
Information directory: A repository or dictionary of the information stored in a data warehouse, including technical and business meta data, that supports all warehouse customers. The technical meta data describes the transformation rules and replication schedules for source data. The business meta data supports the definition and domain specification of the data.
Information float: The length of the delay in the time a fact becomes known in an organisation to the time in which an interested knowledge worker is able to know that fact. Information float has two components : Manual float is the length of the delay in the time a fact becomes known to when it is first captured electronically in a potentially sharable database. Electronic float is the length in time from when a fact is captured in its electronic form in a potentially sharable database, to the time it is «moved» to a database that makes it accessible to an interested knowledge worker.
Information group: A relatively small and cohesive collection of information, consisting of 20-50 attributes and entity types, grouped around a single subject or subset of a major subject. An information group will generally have one or more subject matter experts and several business roles that use the information.
Information life cycle: See Information value/cost chain.
Information Management (IM): The function of managing information as an enterprise resource, including planning, organising and staffing, leading and directing, and controlling information. Information management includes managing data as the enterprise knowledge infrastructure and information technology as the enterprise technical infrastructure, and managing applications across business value chains.
Information model: A high-level graphical representation of the information resource requirements of an organisation showing the information classes and their relationships.
Information myopia: A disease that occurs when knowledge workers can see only part of the information they need, caused by not defining data relationships correctly or not having access to data that is logically related because it exists in multiple nonintegrated databases.
Information Overload: a state in which information can no longer be internalised productively by the individual due to time constraints or the large volume of received information.
Information policy: A statement of important principles and guidelines required to effectively manage and exploit the enterprise information resources.
Information presentation quality: The characteristic in which information is presented, whether in a report or document, on a screen, in forms, orally or visually, in a manner to communicate clearly to the recipient knowledge worker to facilitate understanding and enabling taking the right action or making the right decision.
Information preventive maintenance: Establishing processes to control the creation and maintenance of volatile and critical data to keep it maintained at the highest level feasible, possibly including validating volatile data on an appropriate schedule and assessment of that data before critical processes use it.
Information process improvement: The process of improving processes to eliminate data errors and defects. This is one component of data defect prevention. Information process improvement is proactive information quality.
Information producer: an author who is creating or assembling an information product or its elements.
Information producer: The role of individuals in which they originate, capture, create, or update data or knowledge as a part of their job function or as part of the process they perform. Information producers create the actual information content and are accountable for its accuracy and completeness to meet all information stakeholders’ needs. See also Data intermediary.
Information product improvement: The process of data cleansing, reengineering, and transformation required to improve existing defective data up to an acceptable level of quality. This is one component of information scrap and rework. See also Data cleansing, Data reengineering, and Data transformation. Information product improvement is reactive information quality.
Information product specifications: The set of information resource data (meta data) characteristics that define all characteristics for a process and creating/updating applications can produce quality information. Information product specification characteristics include : data name, definition, domain or data value set (code values or ranges) and the business rules that identify policies and constraints on the potential values. These specifications must be understandable to the information producers who create and maintain the data and the knowledge workers who apply the data in their work.
Information quality: (1) Consistently meeting all knowledge worker and end-customer expectations in all quality characteristics of the information products and services required to accomplish the enterprise mission (internal knowledge worker) or personal objectives (end customer). (2) The degree to which information consistently meets the requirements and expectations of all knowledge workers who require it to perform their processes.
Information Quality: the fitness for use of information; information that meets the requirements of its authors, users, and administrators.
Information quality assessment: The random sampling of a data collection and measuring it against various quality characteristics, such as accuracy, completeness, validity, nonduplication or timeliness to determine its level of quality or reliability. Also called data quality assessment or data audit.
Information quality characteristic: An aspect or property of information or information service that an information customer deems important in order to be considered «quality information.» Characteristics include completeness, accuracy, timeliness, understandability, objectivity and presentation clarity, among others. Also called information quality «dimension.» See also Quality characteristic.
Information quality contamination: The creation of inaccurate derived data by combining accurate data with inaccurate data.
Information quality decay: The characteristic of data such that formerly accurate data will become not accurate over time because the characteristic about the real world object will change without a corresponding update to the data applied. For example, John Doe’s marital status value of «single» in a database is subject to information quality decay and will become inaccurate the moment he becomes married.
Information quality decay rate: The rate, usually expressed as a percent per year, at which the accuracy of a data collection will deteriorate over time if no data updates are applied, for example, (1) person age decay rate is 100% within one year, decaying at a rate of approximately 1.9% per week; (2) if 17% of a population moves annually, the annual decay rate of address is 17%).
Information quality management: The function that leads the organisation to improve business performance and process effectiveness by implementing processes to measure, assess costs of, improve processes to control information quality, and by defining processes, guidelines, policies, and leading culture transformation and education for information quality improvement. The IQ management function does not «do» the information quality work for the enterprise, but defines processes and facilitates the enterprise to implement the values, principles and habit of continuous process improvement so that everyone in the enterprise takes responsibility for their information quality to meet their information customers’ quality expectations.
Information quality measure(s): A specific quality measure or test (set of measures or tests) to assess information quality. For example, Product Id will be tested for uniqueness, Customer records will be tested for duplicate occurrences, Customer address will be tested to assure it is the correct address, Product Unit of Measure will be tested to be a valid Unit of Measure domain code, and Order Total Price Amount will be tested to assure it has been calculated correctly. Quality measures will be assessed using business rule tests in automated quality analysis software, coded routines in internally developed quality assessment programs, or in physical quality assessment procedures. Some call information quality measures filters or metrics.
Information Resource Management (IRM): (1) The application of generally accepted management principles to data as a strategic business asset. (2) The function of managing data as an enterprise resource. This generally includes operational data management or data administration, strategic information management, repository management, and database administration. See also Information management. (3) The organisation unit responsible for providing principles and processes for managing the information assets of the enterprise.
Information scrap and rework: The activities and costs required to cleanse or correct nonquality information, to recover from process failure caused by nonquality information, or to rework or work around problems caused by missing or nonquality information. Analogous to manufacturing scrap and rework.
Information stakeholder: Any individual who has an interest in and dependence on a set of data or information. Stakeholders may include information producers, knowledge workers, external customers, and regulatory bodies, as well as various information systems roles such as database designers, application developers, and maintenance personnel.
Information steward: A role in which an individual has accountability for the quality of some part of the information resource. See Information stewardship.
Information stewardship: Accountability for the quality of some part of the information resource for the well-being of the larger organisation. Every individual within an organisation holds one or more information stewardship roles, based on the nature of their job and its relationship to information, such as creating information, applying it, defining it, modelling it, developing a computer screen to display it or moving it from one database or file to another. See Strategic information steward, Managerial information steward, and Operational information steward.
Information stewardship agreement: A formal agreement among business managers specifying the quality standard and target date for information produced in one business area and used in one or more other business areas.
Information value: information quality (or alternatively : benefit) in relation to the acquisition and processing costs of information; potential of information to improve decisions by reducing uncertainty.
Information value: The measure of importance of information expressed in tangible metrics. Information has potential and realised and value. Potential value is the future value of information that could be realised if applied to business processes where the information is not currently used. Realised value is the actual value derived from information applied by knowledge workers in the accomplishment of the business processes.
Information value/cost chain: The end-to-end set of processes and data stores, electronic and otherwise, involved in creating, updating, interfacing, and propagating data of a specific type from its origination to its ultimate data store, including independent data entry processes, if any.
Information view: A knowledge worker’s perceived relationship of the data elements needed to perform a process, showing the structure and data elements required. A process activity has one and only one information view.
Information view model: A local data model derived from an enterprise model to reflect the specific information required for one business area or function, one organisation unit, one application or system, or one business process.
Informative: imparting knowledge, instructive.
Integration: A set of activities that makes information more comprehensive, concise, convenient, and accessible; combining information sources and aggregating content to ease the cognitive load on the information consumer.
Intentional error: Error introduced consciously. For example, an information producer required to enter an unknown fact like birth date, enters his or her own or some «coded» birth date used to mean «unknown.» See also Inadvertent error.
Interactivity: being a two-way electronic communication system that involves a user’s orders or responses
Interactivity: the capacity of an information system to react to the inputs of information consumers, to generate instant, tailored responses to a user’s actions or inquiries. Interpretation : the process of assigning meaning to a constructed representation of an object or event.
Interface program: An application that extracts data from one database, transforms it, and loads it into a non-controlled redundant database. Interface programs represent one cost of information scrap and rework in that the information in the first database is not «able» to be used from that source and must be «reworked» for another process or knowledge worker to use.
Interfaceation: The technique of supposedly «integrating» application systems by developing «interface programs» or middleware to extract data in one format from a data source and transform it to another format for a data target rather than by standardising the data definition and format.
Internal view: The physical database design or schema in the ANSI 3-schema architecture.
Intranets: internal company networks designed to provide a secure forum for sharing information, often in a web-browser type interface.
IRM: Acronym for Information Resource Management.
Ishikawa diagram: a chart that can be used to systematically gather the problem causes of quality defects. Sometimes referred to as the 5M- or 6M-chart because most causes can be related to man (e.g., human factors), machine, method, material, milieu (i.e., the work environment), or the medium (the IT-platform).
ISO: Acronym for International Organisation for Standardisation. A European body founded in 1946 to set international standards in all engineering disciplines, including information technology. Its members are national standards bodies; for example, BSI (British Standards Institute). ISO approves standards, including OSI communications protocols and ISO 9000 standards.
ISO 9000: International standards for quality management specifying guidelines and procedures for documenting and managing business processes and providing a system for third-party certification to verify those procedures are followed in actual practice.
Judgmental quality criteria: criteria based on personal (subjective) judgment rather than on objective measures (e.g., relevance, appeal).
Just-in-time information: current information that is delivered in a timely manner (at the time of need), for example through a (profile-based) push mechanism.
Knowledge: Information context; understanding of the significance of information.
Knowledge: justified true belief, the know-what/-how/-who/-why that individuals use to solve problems, make predictions or decisions, or take actions.
Knowledge base: (1) That part of a knowledge base system in which the rules and definitions used to build the application are stored. The knowledge base may also include a fact or object storage facility. (2) A database where the codification of knowledge is kept; usually a set of rules specified in an if . . . then format.
Knowledge base system: A software system whose application-specific information is programmed in the form of rules and stored in a specific facility, known as the knowledge base. The system uses Artificial Intelligence (AI) procedures to mimic human problem-solving techniques, applying the rules stored in the knowledge base and facts supplied to the system to solve a particular business problem.
Knowledge compression: the skilful activity of extracting the main ideas and concepts from a piece of reasoned information and summarising them in a consistent and concise manner.
Knowledge error: Information quality error introduced as a result of lack of training or expertise.
Knowledge management: the conscious and systematic facilitation of knowledge creation or development, diffusion or transfer, safeguarding, and use at the individual, team- and organisational level.
Knowledge work: Knowledge work is human mental work performed to generate useful information. It involves analysing and applying specialised expertise to solve problems, to generate ideas, or to create new products and services.
Knowledge worker: highly skilled professionals who are involved in the non-routine production, interpretation, and application of complex information.
Knowledge worker: The role of individuals in which they use information in any form as part of their job function or in the course of performing a process, whether operational or strategic. Also referred to as an information consumer or customer. Accountable for work results created as a result of the use of information and for adhering to any policies governing the security, privacy, and confidentiality of the information used.
Knowledge-intensive process: We define a knowledge-intensive process as a productive series of activities that involves information transformation and requires specialised professional knowledge. They can be characterised by their often non-routine nature (unclear problem space, many decision options), the high requirements in terms of continuous learning and innovation, and the crucial importance of interpersonal communication on the one side and the documentation of information on the other.
λ: The Greek letter «lambda» used to represent the mean of a Poisson distribution.
Labelling: adding informative and concise titles to information items so that they can be more easily scanned, browsed, or checked for relevance. Labels should indicate the type of information (e.g., definition, example, rule) and its content (e.g., safety issues).
Learnability: the quality of information to be easily transformed into knowledge.
Legacy data: Data that comes from files and/or databases developed without using an enterprise data architecture approach.
Legacy system: Systems that were developed without using an enterprise data architecture approach.
Lifetime value (LTV): See Customer lifetime value.
Lightly summarised: Data that is summarised only one or two levels of hierarchy of summary from the base detailed data.
Load: To sequentially add a set of records into a database or data warehouse. See also Incremental load.
Lock: A means of serialising events or preventing access to data while an application or information producer may be updating that data.
Log: A collection of records that describe the events that occur during DBMS execution and their sequence. The information thus recorded is used for recovery in the event of a failure during DBMS execution.
Lower control limit: The lowest acceptable value or characteristic in a set of items deemed to be of acceptable quality. Together with the upper control limit, it specifies the boundaries of acceptable variability in an item to meet quality specifications.
LTV: Acronym for Customer Lifetime Value.
μ : The Greek letter «mu» used to represent the mean of a population.
Maintainability : the characteristic of an information environment to be manageable at reasonable costs in terms of content volume, frequency, quality, and infrastructure. If a system is maintainable, information can be added, deleted, or changed efficiently.
Management Principle : a general, instructive, concise, and memorable statement that suggests a way of reasoning or acting that is effective and proven to reach a certain goal within an organisational context.
Managerial information steward : The role of accountability a business manager or process owner has for the quality of data produced by his or her processes.
Managerial information stewardship : The fact that a business manager or process owner who has accountability for one or more business processes also has accountability for the integrity of the data produced by those processes.
MDDB : Acronym for Multidimensional Database.
Mean : The average of a set of values, usually calculated to one place of decimals more than the original data.
Measurement curve bundle : The collection of measurement points of a real-world attribute that represents the variation of values of that attribute in the real world.
Measurement system : A collection of processes, procedures, software, and databases used to assess and report information quality.
Median : The middle value in an ordered set of values. If the set contains an even number of values, the median is calculated by adding the middle two values and dividing by 2.
Meta data : (or metadata). See Information Resource data. A term used to mean data that describes or specifies other data. The term has not made its way into either Webster’s Unabridged Dictionary or the Oxford Dictionary. The closest term is meta language, defined as «a language used to describe other languages.» The term Information Resource data is preferred all the term meta data as a business term.
Metadata : Data which provides context or otherwise describes information in order to make it more valuable (e.g., more easily retrievable or maintainable); data about data.
Methodology : A formalised collection of tools, procedures, and techniques to solve a specific problem or perform a given function.
Metric : (1) See Measure. (2) A fact type in data warehousing, generally numeric (such as sales, budget, and inventory) that is analysed in different ways or dimensions in decision support analysis.
Mining : a detailed method used by large firms to sort and analyse information to better understand their customers, products, markets, or any other phase of their business for which data has been captures. Mining data relies on statistical analyses, such as analysis of variance or trend analysis.
Misinformation : information that is uninformative and impedes effective and adequate action because it is incorrect, distorted, buried, confusing because it lacks context, manipulated or otherwise difficult to use.
Misinterpretation : Human error resulting from poor information presentation quality.
Missing value : A data element that has no value has been capture, but for which the real-world object represented has a value. For example, there is no date value for the data element, «last date of service» for a retired Employee whose last day of official employment was June 15, 2002. Contrast with Empty value.
Modal interval : The range interval used to group continuous data values in order to determine a mode.
Mode : The most frequently occurring value in a set of values.
Moment of Truth : A term coined by Jan Carlzon, former head of Scandinavian Airlines, meaning any instance in which a customer can form an opinion, whether positive or negative, about the organisation.
Monte Carlo : A problem-solving technique that uses statistical methods and random sampling to solve mathematical or physical problems.
Multidimensional Database (MDDB) : A database designed around arrays of data that support many dimensions or views of data (such as product sales by time period, geographic location, and organisation) to support decision analysis.
n : Algebraic symbol representing the number of items in a set.
Negative side effect : see Side effect
Net Present Value (NPV) : The value of a sum of future money expressed in terms of its worth in today’s currency. NPV is calculated by discounting the amount by the discount rate compounded by the number of years between the present and the future date the money is anticipated.
Nichification : a market trend that describes the strategy of incumbents or existing market players to consciously focus on specialised business models or business areas in order to distinguish themselves from competitors.
NIST : Acronym for National Institute of Standards and Technology. The U.S. government agency that maintains Federal Information Processing Standards (FIPS). NIST is responsible for administering the Baldrige Quality Award program.
Noise : (1) An uncontrollable common cause factor that causes variability in product quality. (Q) (2) A term used in data mining to refer to data with missing values (where one does exist in the real world object or event), empty values (where no value exists for the real world object or event), inaccurate values or measurement bias or data that may be inconsequential or misleading in data analysis or data mining.
Nonduplication : A characteristic of information quality measuring the degree to which there are no redundant occurrences of data, in other words, a real world object or event is represented by only one record in a database. (Q)
Non-quality costs : the costs that arise due to insufficient levels information quality or data quality defects. Examples are rework or re-entry costs.
Nonquality data : Data that is incorrect, incomplete, or does not conform to the data definition specification or meet knowledge workers’ expectations.
Nonrepudiation : The ability to provide proof of transmission and receipt of electronic communication.
Normal form : A level of normalisation that characterises a group of attributes or data elements.
Normalisation : The process of associating attributes with the entity types for which they are inherent characteristics. The decomposition of data structures according to a set of dependency rules, designed to give simpler, more stable structures in which certain forms of redundancy are eliminated. A step-by-step process to remove anomalies in data integrity caused by add, delete, and update actions. Also called non-loss decomposition.
NPV : Acronym for Net Present Value.
Null : The absence of a data value in a data field or data element. The value may exist for the characteristic of the real world object or event and is missing or unknown, or there may be no value (called «empty») because the characteristic does not exist in the real world object or event.
Objectivity : A characteristic of information quality that measures how well information is presented to the information consumer free from bias that can cause the information consumer to take the wrong action or make the wrong decision (Q).
Objectivity : expressing or dealing with facts or conditions as perceived without distortion by personal feelings, prejudices, or interpretations.
Occurrence : A specific instance of an entity type. For example, «customer» is an entity type. «John Doe» is an occurrence of the customer entity type.
Occurrence of record : A specific record selected from a group of duplicate records as the authoritative record, and into which data from the other records may be consolidated. Related records from the other duplicate records are re-related to this occurrence of record.
OCR : Acronym for Optical Character Recognition.
ODS : Acronym for Operational Data Store.
OLAP : Acronym for On-Line Analytical Processing. Software technology that transforms data into multidimensional views and that supports multidimensional data interaction, exploration, and analysis.
Operational data : Data at a detailed level used to support daily activities of an enterprise.
Operational Data Store (ODS) : A collection of operation or bases data that is extracted from operation databases and standardised, cleansed, consolidated, transformed, and loaded into a enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarised for a data warehouse. The ODS may also be used to audit the data warehouse to assure the summarised and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as their operations databases.
Operational information steward : An information producer accountable for the data created or updated as a result of the processes he or she performs.
Optical Character Recognition (OCR) : The technique by which printed, digitised, or photographed characters can be recognised and converted into ASCII or a similar format.
Optimum : As applied to a quality goal, that which meets the needs of both customer and supplier at the same time, minimising their combined costs.
Origination : the source or author of a piece of information (may include additional origination parameters, such as date, institution, contributors etc.).
Outlier : A sampled item that has a value or a frequency of its value far separated from those of the other items in the sample, indicating a possible anomaly or error, different population or a bias or error in the sampling technique.
Overload : see information overload
Overloaded data element : A data element that contains more than one type of fact, usually the result of the need to know more types of facts growing faster than the ability to make additions to the data structures. This causes process failure when downstream processes find unexpected data values.
Paradigm : An example or pattern that represents an acquired way of thinking about something that shapes thought and action in ways that are both conscious and unconscious. Paradigms are essential because they provide a culturally shared model for how to think and act, but they can present major obstacles to adopting newer, better approaches.
Paralysis by analysis : when timely decision-making fails to occur because too much low quality information (irrelevant, detailed, obsolete, or poorly organised) is readily available.
Pareto diagram : A specialised column chart in which the bars represent defect types and are ordered by frequency, percentage, or impact with the cumulative percentage plotted. This is used to identify the areas needing improvement, from greatest to least.
Pareto principle : The phenomenon that a few factors are responsible for the majority of the result.
Parsing : The electronic analysis of data to break into meaningful patterns or attributes for the purpose of data correction, or record matching, de-duplication and consolidation.
Partnership : The relationship of business personnel and information systems personnel in the planning, requirements analysis, design, and development of applications and databases.
PDCA : Acronym for Plan-Do-Check-Act.
Perceived needs : The requirements that motivate customer action based upon their perceptions. For example, a perceived need of a car purchaser is that a convertible will enhance his or her attractiveness. See also Real needs and Stated needs.
Personal data : Data that is of interest to only one organisation component of an enterprise, (e.g., task schedule for a department project). Contrasted with Enterprise data.
Personalisation : the act of modifying content or an information system to customise it to the needs and preferences of an information consumer.
Physical database design : Mapping of the conceptual or logical database design data groupings into the physical database areas, files, records, elements, fields, and keys while adhering to the physical constraints of the hardware, DBMS software, and communications network to provide physical data integrity while meeting the performance and security constraints of the services to be performed against the database.
Plan-Do-Check-Act (PDCA) cycle : A closed-loop process for planning to solve a problem, implementing suggested improvements, analysing the results, and standardising the improvements. Also called a Shewhart cycle after its developer, W. A. Shewhart.
Poisson distribution : A distribution of items that does not have a normal curve, rather the tail on one side of the curve is longer and less populated than the curve on the other side, as it true for a distribution of data records by frequency of error in each record.
Poka Yoke : Japanese for «mistake proofing,» a system using control methods (to halt operations) and warning methods (to call attention to defects) that assures immediate feedback and corrective action in a way that assures no defects are allowed to get through without correction.
Policy Deployment : See Hoshin planning.
Population : An entire group of objects, items or data from which to sample for measurement or study. Also called a Universe. Contrast with Sample.
Post condition : A data integrity mechanism in object orientation that specifies an assertion, condition, business rule or guaranteed result that will be true upon completion of an operation or method; else, the operation or method fails.
Potential information value : See Information value.
Pragmatic (information quality dimension) : the characteristics that make information useful or usable.
Precision : A characteristic of information quality measuring the degree to which data is known to the right level of granularity. For example, a percentage value with two decimal points (00.00%) discriminates to the closest 1/100th of a percent.
Precondition : A data integrity mechanism in object orientation that specifies an assertion, condition or business rule that must be true before invoking an operation or method; else, the operation or method cannot be performed.
Presentation format : The specification of how an attribute value or collection of data is to be displayed.
Primary key : The attribute(s) that are used to uniquely identify a specific occurrence of an entity, relation, or file. A primary key that consists of more than one attribute is called a composite (or concatenated) primary key.
Prime word : A component of an attribute name that identifies the entity type the attribute describes.
Principles of information quality : they describe how the quality of information can be increased by focusing on crucial criteria and crucial improvement measures.
Procedural error : Error introduced as a result of failure to follow the defined process.
Process : A defined set of activities to achieve a goal or end result. An activity that computes, manipulates, transforms, and/or presents data. A process has identifiable begin and end points. See Business process.
Process : a group of sequenced tasks, which eventually lead to a value for (internal or external) customers.
Process control : The systematic evaluation of performance of a process, taking corrective action if performance is not acceptable according to defined standards.
Process management : The process of ensuring that a process is defined, controlled to consistently produce products that meet defined quality standards, improved as required to meet or exceed all customer expectations and optimised to eliminate waste and non-value adding.
Process management cycle : A set of repeatable tasks for understanding customer needs, defining a process, establishing control, and improving the process.
Process management team : A team, including a process owner and staff, to carry out process ownership obligations.
Process owner : The person responsible for the process definition and/or process execution. The process owner is the managerial information steward for the data created or updated by the process, and is accountable for process performance integrity and the quality of information produced.
Product : The output or result of a process.
Product satisfaction : The measure of customer happiness with a product.
Psychographics : Measures of a population based on social, personality and lifestyle behaviours.
QFD : Acronym for Quality Function Deployment.
QLF : See Taguchi quality loss function.
Quality : Consistently meeting or exceeding customers’ expectations.
Quality : the totality of features of a product or service that fulfil stated or implied needs (ISO 8402). The correspondence to specifications, expectations or usage requirements. The absence of errors.
Quality assessment : An independent measurement of product’s or service’s quality.
Quality characteristic : (1) An identifiable aspect or feature of a product, process or service that a customer deems important in order to be considered a «quality» product or service. (2) A distinct attribute or property of a product, process or service that can be measured for conformance to a specific requirement. See Information quality characteristic.
Quality circle : An ad hoc group formed to correct problems in or to improve a shared process. The goal is an improved work environment and productivity and quality.
Quality Function Deployment (QFD) : The involvement of customers in the design of products and services for the purpose of better understanding customer requirements, and the subsequent design of products and services that better meet their needs on initial product delivery.
Quality goal : See Quality standard.
Quality improvement : A measurable and noticeable improvement in the level of quality of a process and its resulting product.
Quality loss function (QLF) : See Taguchi quality loss function.
Quality Management : see Total Quality Management. Generally speaking, the systematic on-going effort of a company to assure that its products and service consistently meet or exceed customer expectations.
Quality measure : A metric or characteristic of information quality, such as percent of accuracy or average information float, to be assessed.
Quality standard : A mandated or required quality goal, reliability level, or quality model to be met and maintained.
r : Algebraic symbol represented in the coefficient of correlation.
RAD : Acronym for Rapid Application Development. The set of tools, techniques and methods that results in at least one-order-of-magnitude acceleration in the time to develop an application with no loss in quality when using QFD techniques compared to using conventional techniques.
RADD : Acronym for Rapid Data Development. An intensive group process to rapidly develop and define sharable subject area data models involving a facilitator, knowledge workers, and information resource management personnel, using compression planning and QFD techniques.
Random number generator : A software routine that selects a number from a range of values in such a way that any number within the range has an equal likelihood of being selected. This may be used to identify which records from a database to select for assessment.
Random sampling : The sampling of a population in which every item in the population is likely to be selected with equal probability. This is also called statistical sampling. See also Cluster sampling, Systematic sampling, and Stratified sampling.
Rating (of information or of a source) : the (standardised) evaluation of a piece of information or its source according to a given scale by one or several reviewers or readers.
Real needs : The fundamental requirements that motivate customer decisions. For example, a real need of a car customer is the kind of transportation it provides. See also Stated needs and Perceived needs.
Realised information value : See Information value.
Reasonability tests : Edit and validation rules applied to assure that a data value is within an expected range of values or is a realistic value.
Record : A collection of related fields representing an occurrence of an entity type.
Record linkage : The process of matching data records within a database or across multiple databases to match data that represents one real world object or event. Used to identify potential duplicates for «de-duping» (eliminating duplicate occurrences) or consolidation of attributes about a single real world object.
Record of origin : The first electronic file in which an occurrence of an entity type is created.
Record of reference : The single, authoritative database file for a collection of fields for occurrences of an entity type. This file represents the most reliable source of operational data for these attributes or fields. In a fragmented data environment, a single occurrence may have different collections of fields whose record of reference is in different files.
Recovery : Restoring a database to some previous condition or state after system, or device, or program failure. See also Commit.
Recovery log : A collection of records that describe the events that occur during DBMS execution and their sequence. The information thus recorded is used for recovery in the event of a failure during DBMS execution.
Recursive relationship : A relationship or association that exists between entity occurrences of the same type. For example, an organisation can be related to another organisation as a Department manages a Unit.
Redundancy : here, the provision of information beyond necessity.
Reengineering : A method for radical transformation of business processes to achieve breakthrough improvements in performance.
Reference data : A term used to classify data that is, or should be, standardised, common to and shared by multiple application systems, such as Customer, Supplier, Product, Country, or Postal Code. Reference data tends to be data about permanent entity types and domain value sets to be stored in tables or files, as opposed to business event entity types.
Referential integrity : Integrity constraints that govern the relationship of an occurrence of one entity type or file to one or more occurrences of another entity type or file, such as the relationship of a customer to the orders that customer may place. Referential integrity defines constraints for creating, updating, or deleting occurrences of either or both files.
Relationship : The manner in which two entity or object types are associated with each other. Relationships may be one to one, one to many, or many to many, as determined by the meaning of the participating entities and by business rules. Synonymous with association. Relationships can express cardinality (the number of occurrences of one entity related to an occurrence of the second entity) and/or optionality (whether an occurrence of one entity is a requirement given an occurrence of the second entity).
Reliability (of an infrastructure) : here, the characteristic of an information infrastructure to store and retrieve information in an accessible, secure, maintainable, and fast manner.
Replication : See Data replication.
Repository : A database for storing information about objects of interest to the enterprise, especially those required in all phases of database and application development. A repository can contain all objects related to the building of systems including code, objects, pictures, definitions, etc. Acts as a basis for documentation and code generation specifications that will be used further in the systems development life cycle. Also referred to as design dictionary, encyclopedia, object-oriented dictionary, and knowledge base.
Reputation : here, the characteristic of a source to be consistently associated with the provision of high quality information.
Requirements : Customer expectations of a product or service. May be formal or informal, or they may be stated, required or perceived needs.
Response time : here, the delay between an initial information request and the provision of that information by the information system.
Return on Investment (ROI) : A statement of the relative profitability generated as a result of a given investment.
Reverse engineering : The process of taking a complete system or database and decomposing it to its source definitions, for the purpose of redesign.
Reviewing : here, the systematic evaluation of information such as articles, papers, project summaries, etc. by at least one independent qualified person according to specified criteria (such as relevance to target group, methodological rigour, readability, etc.).
ROI : An acronym for Return on Investment.
Role type : A classification of the different roles occurrences of an entity type may play, such as an organisation may play the role of a customer, supplier, and/or competitor.
Rollback : The process of restoring data in a database to the state at its last commit point.
Root cause : The underlying cause of a problem or factor resulting in a problem, as opposed to its precipitating or immediate cause.
Rule : A knowledge representation formalism containing knowledge about how to address a particular business problem. Simple rules are often stated in the form : «If then , where is a condition (a test or comparison) and is an action (a conclusion or invocation of another rule).» An example of a rule would be «If the temperature of any closed valve is greater than or equal to 100°F, then open the valve.»
Salience (of information) : the quality of information to be interesting or intriguing.
Sample : An item or subset of items, or data about an item or a subset of items that comes from a sampling frame or a population. A sample is used for the purpose of acquiring knowledge about the entire population.
Sampling : The technique of extracting a small number of items or data about those items from a larger population of items in order to analyse and draw conclusions about the whole population. See Random sample, Cluster sampling, Stratified sampling, and Systematic sampling.
Sampling frame : A subset of items, or data about a subset of items of a population from which a sample is to be taken.
SC21 : Acronym for ISO/IEC JTCI Sub-Committee for OSI data management and distributed transaction processing.
Schema : The complete description of a physical database design in terms of its tables or files, columns or fields, primary keys, relationships or structure, and integrity constraints.
Scrap and rework : The activities and costs required to correct or dispose of defective manufactured products. See Information scrap and rework.
SDLC : Acronym for Systems Development Life Cycle.
Seamless integration : True seamless integration is integration of applications through commonly defined and shared information, with managed, replication of any redundant data. False «seamless» integration is use of interface programs to transform data from one applications databases to another applications databases. See «Interfaceation.»
Second Normal Form (2NF) : (l) A relation R is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is fully functionally dependent on the primary key. (2) A table is in 2NF if each nonidentified attribute provides a fact that describes the entity identified by the entire primary key and not part of it. See Functional dependence.
Security (of information) : measures taken to guard information against unauthorised access, espionage or sabotage, crime, attack, unauthorised modification, or deletion.
Security : The prevention of unauthorised access to a database and/or its contents for updating, retrieving, or deleting the database; or the prevention of unauthorised access to applications that have authorised access to databases.
Semantic equivalence : See Equivalence.
Sensitivity analysis : a procedure to determine the sensitivity of the outcomes of an alternative to changes in its parameters; it is used to ascertain how a given model output depends upon the input parameters.
Sensor : An instrument that can measure, capture information about or receive input directly from external objects or events.
Shewhart cycle : See Plan-Do-Check-Act cycle.
Side effect : The state that occurs when a change to a process causes unanticipated conditions or results beyond the planned result, such as when an improvement to a process creates a new problem.
sigma (σ or s) : Lowercase Greek letter that stands for standard deviation. The symbol «σ » refers to the standard deviation of an entire population of items. The symbol «s» refers to the standard deviation of a sample of items.
Sigma (Σ) : Uppercase Greek letter that stands for the summation of a group of numbers.
Six Sigma (6σ ) : (1) Six standard deviations, used to describe a level of quality in which six standard deviations of the population fall within the upper and lower control limits of quality and in which the defect rate approaches zero, allowing no more than 3.4 defects per million parts. (2) A methodology of quality management originally developed by Motorola. (Q)
SME : Acronym for Subject Matter Expert.
Source information producer : The point of origination or creation of data or knowledge within the organisation.
SPC : Acronym for Statistical Process Control.
Special cause : A source of unacceptable variation or defect that comes from outside the process or system.
Specialisation : The process of aggregating subsets of objects of a type, based upon differing attributes and behaviours. The resulting subtypes specialisation inherits characteristics from the more generalised type.
Spread : Describes how much variation there is in a set of items.
SQC : Acronym for Statistical Quality Control.
Stability (of information) : the quality of information or its infrastructure to remain unaltered for an extended period of time.
Stability : A characteristic of information quality measuring the degree to which information architecture or a database is able to have new applications developed to use it with minimal modification of the existing objects and relationships, only adding new objects and relationships.
Standard deviation (σ or s) : A widely used measure of variability that expresses the measure of spread in a set of items. The standard deviation is a value such that approximately 68 percent of the items in a set fall within a range of the mean plus or minus the standard deviation. For data from a large sample of a population of items, the standard deviation σ (standard deviation of a population) or s (standard deviation of a sample) is expressed as :
Standard deviation formula
s (σ ) = standard deviation of a sample (population)
d = the deviation of any item from the mean or average
n = the number of items in the sample
σ = «the sum of».
Standard deviation calculation : A measure of dispersion of a frequency distribution that is the square root of the arithmetic mean of the squares of the derivation of each of the class frequencies from the arithmetic mean of the frequency distribution. Also a similar quantity found by dividing by one less than the number of squares in the sum of squares instead of taking the arithmetic mean.
State : A stage in a life cycle that a real-world-object may exist in at a point in time and which is reflected in a state of existence that an entity occurrence or object may be in at a point in time. A real-world object comes into a specific state of existence through some event. The state of an object is represented in a database by the values of its attributes at a point in time.
State transition diagram : A representation of the various states of an entity or object along with the triggering events. See also Entity life cycle.
Stated needs : Requirements as seen from the customers’ viewpoint, and as stated in their language. These needs may or may not be the real requirements. See also Real needs and Perceived needs.
Statistical control chart : See Control chart.
Statistical process control (SPC) : The application of statistical methods to control processes to provide acceptable quality. One component of statistical quality control.
Statistical quality control (SQC) : The application of statistics and statistical methods to assure quality. Processes and methods for measuring process performance, identifying unacceptable variance, and applying corrective actions to maintain acceptable process control. SQC consists of statistical process control and acceptance sampling.
Stored procedure : A precompiled routine of code stored as part of a database and callable by name.
Strategic information steward : The role a senior manager holds as being accountable for a major information resource of subject, authorises business information stewards and resolves business rule issues.
Stratified sampling : Sampling a population that has two or more distinct groupings, or strata, in which random samples are taken from each stratum to assure the strata are proportionately represented in the final sample.
Subject area : See Business resource category.
Subject database : A physical database built around a subject area.
Subject Matter Expert (SME) : A business person who has significant experience and knowledge of a given business subject or function.
Suboptimisation : The phenomenon such that the accomplishment of departmental goals minimises the ability to accomplish the enterprise goals.
Subtype : See Entity subtype.
Supertype : See Entity supertype.
Synchronisation : The process of making data equivalent in two or more redundant databases.
Synchronous replication : Replication in which all copies of data must be updated before the update transaction is considered complete. This requires two-phase commit.
Synonym : A word, phrase, or data value that has the same or nearly the same meaning as another word, phrase or data value.
System log : Audit trail of events occurring within a system (e.g., transactions requested, started, ended, accessed, inspected, and updated).
System of record : See Record of reference. The term system of record is meaningless when defining the authoritative record in an integrated, shared data environment where data may be updated by many different application systems within a single database.
Systematic sampling : Sampling of a population using a technique such as selecting every eleventh item, to ensure an even spread of representation in the sample.
Systems approach : The philosophy of developing applications as vertical functional projects independent of how they fit within the larger business value chain. This approach carves out an application development project into a standalone project and does not attempt to define data to be shared across the business value chain or to meet all information stakeholder needs.
Systems Development Life Cycle (SDLC) : The methodology of processes for developing new application systems. The phases change from methodology to methodology, but generically break down into the phases of requirements definition, analysis, design, testing, implementation, and maintenance. If data definition quality is lacking, this process requires improvement.
Systems thinking : The fifth discipline of the learning organisation, this sees problems in the context of the whole. Applications developed with systems thinking see the application scope within the context of its value chain and the enterprise as a whole, defining data as a sharable and reusable resource.
Tacit knowledge : know-how that is difficult to articulate and share; intuition or skills that cannot easily be put into words.
Taguchi Quality Loss Function (QLF) : The principle, for which Dr. Genichi Taguchi who won the Japanese Deming Prize in 1960, that deviations from the ideal cause different degrees of loss in quality and economic loss. Small deviations in some critical characteristics can cause significantly more economic loss than even large deviations in other characteristics. The Loss can be roughly expressed as a formula
Taguchi Quality Loss Function formula
L = overall economic «Loss» caused by deviation from the target quality
D = the «Deviation» from the target quality expressed in standard deviations
C = the «Cost» of the improvement to produce it to the target quality
Some information quality problems are likewise critical and cause significantly more economic loss than others, and become the higher priority for process improvement.
Teamwork : The cooperation of many within different processes or business areas to increase the quality or output of the whole.
Technical information resource data : The Set of information resource data that must be known to information systems and information resource management personnel in order to develop applications and databases.
Third Normal Form (3NF) : (1) A relation R is in third normal form (3NF) if and only if it is in 2NF and every nonkey attribute is nontransitively dependent upon the primary key. (2) A table is in 3NF if each nonkey column provides a fact that is dependent only on the entire key of the table.
Third normal form : See Normal form.
Timeliness : A characteristic of information quality measuring the degree to which data is available when knowledge workers or processes require it.
Timeliness : coming early or at the right, appropriate or adapted to the times or the occasion.
Total Data Quality Management (TDQM) cycle : the TDQM cycle encompasses four components. The definition component of the TDQM cycle identifies IQ dimensions, the measurement component produces IQ metrics, the analysis component identifies root causes for IQ problems and calculates the impacts of poor-quality information, and finally, the improvement component provides techniques for improving IQ. See Huang et al. (1999).
Total Quality Management (TQM) : Techniques, methods, and management principles that provide for continuous improvement to the processes of an enterprise.
Total Quality Management : a management concept (and associated tools) that involves the entire workforce in focusing on customer satisfaction and continuous improvement.
TPCP : Acronym for Two Phase Commit Protocol.
TQM : Acronym for Total Quality Management.
Traceability : the quality of information to be linked to its background or sources.
Tradeoff : here, a conflict among two qualities of information that tend to be mutually exclusive.
Transaction consistency : The highest isolation level that allows an application to read only committed data and guarantees that the transaction has a consistent view of the database, as though no other transactions were active. All read locks are kept until the transaction ends. Also known as serialisable.
Transformation : See Data transformation.
Trigger : A software device that monitors the values of one or more data elements to detect critical events. A trigger consists of three components : a procedure to check data whenever it changes, a set or range of criterion values or code to determine data integrity or whether a response in called for, and one or more procedures that produce the appropriate response.
Trusted database : Data that has been secured and protected from unauthorised access.
Two-phase commit : In multithreaded processing systems it is necessary to prevent more than one transaction from updating the same record at the same time. Where each transaction may need to update more than one record or file, the two-phase commit protocol is often used. Each transaction first checks that all the necessary records are available and contain the required data, simultaneously locking each one. Once it is confirmed that all records are ready and locked, the updates are applied and the locks freed. If any record is not available, the whole transaction is aborted and all other records are unlocked and left in their original state.
Two-stage sampling : Sampling a population in two steps. The first step extracts sample items from a lot of common groupings of items such as sales orders by order taker. The second stage takes a second sample from the items in the primary or first stage samples.
Uncommitted read : The lowest isolation level that allows an application to read both committed and uncommitted data. Should be used only when one does not need an exact answer, or if one is highly assured the data is not being updated by someone else. (Also known as read uncommitted, read through, or dirty read).
Undo : A state of a unit of recovery that indicates that the unit of recovery’s changes to recoverable database resources must be backed out.
Unit of recovery : A sequence of operations within a unit of work between points of consistency.
Unit of work : A self-contained set of instructions performing a logical outcome in which all changes are performed successfully or none of them is performed.
Universe : See Population.
Update : Causing to change values in one or more selected occurrences, groups, or data elements stored in a database. May include the notion of adding or deleting data occurrences.
Upper control limit : The highest acceptable value or characteristic in a set of items deemed to be of acceptable quality. Together with the lower control limit, it specifies the boundaries of acceptable variability in an item to meet quality specifications.
Useability : the characteristic of an information environment to be user-friendly in all its aspects (easy to learn, use, and remember).
Usefulness : the quality of having utility and especially practical worth or applicability.
User : An unfortunate term used to refer to the role of people to information technology, computer systems, or data. The term implies dependence on something, or one who has no choice, or one who is not actively involved in the use of something. The term is inappropriate to describe the role of information producers and knowledge workers who perform the work of the enterprise, employing information technology, applications and data in the process. The role of business personnel to information technology, applications, and data, is one of information producer and knowledge worker. The relationship of business personnel to information systems personnel is not as users, but as partners. If Industrial-Age personnel were [machine] «operators» or «workers,» then Information-Age personnel are «knowledge workers.»
Utility : The usefulness of information to its intended consumers, including the public.
Validation: Evaluating and checking the accuracy, consistency, timeliness and security of information, for example by evaluating the believability or reputation of its source.
Validity: A characteristic of information quality measuring the degree to which the data conforms to defined business rules. Validity is not synonymous with accuracy, which means the values are the correct values. A value may be a valid value, but still be incorrect. For example, a customer date of first service can be a valid date (within the correct range) and yet not be an accurate date.
Value: (1) Relative worth, utility, or importance. (2) An abstraction with a single attribute or characteristic that can be compared with other values, and may be represented by an encoding of the value.
Value chain: An end-to-end set of activities that begins with a request from a customer and ends with specific benefits for a customer, either internal or external. Also called a business process or value stream. See Information value chain and Business value chain.
Value completeness: See Completeness.
Value stream: See Value chain.
Value-centric development: A method of application development that focuses on data as an enterprise resource and automates activities as a part of an integrated business value chain. Value-centric development incorporates «systems thinking,» which sees an application as a component within its larger value chain, as opposed to a «systems approach,» which isolates the application as a part of a functional or departmental view of activity and data.
Variance (v or σ ): The mean of the squared deviations of a set of values, expressed as :
View: A presentation of data from one or more tables. A view can include all or some of the columns contained in the table or tables on which it is defined. See also Information view.
Visual management: The quality management technique of providing instruction and information about a task in a clear and visible way so that personnel can maximise their productivity.
Visualisation: The use of graphic means to represent information.
Voice of the customer: Documentation of the wants and needs of a product or service, including customer verbatims (actual words used) and reworded data into specific implications for the product or service.
Voice of the engineer: Documentation of the specification required to meet a quality requirement for a product or service as made by the engineer of a product or designer of a service.
Voice of the process: Statistical data from or out of a process that indicates the process stability or capability that provides feedback to process performers as a tool for continual improvement.
Wisdom: Knowledge in context. Knowledge applied in the course of actions.
World Class: The level of process performance that is as good as, or better than, the best competitors in the performance of a process type or in the quality of a product type.
World Wide Web: the graphical user-interface of the Internet based on the http and TCP/IP protocol and the HTML code language.
X (x bar): The algebraic symbol representing the mean, or average, of a set of values.
x : The algebraic symbol representing a set of values.
Xσn (X Sigma n): Formula to find the standard deviation(s) of the X values. Sometimes written as σn.
XML: a generic mark-up language that can be used to structure on-line documents meaningfully.
Zero defects: A state of quality characterised by defect-free products or Six-Sigma level quality. See Six Sigma.
Zero-faults or zero errors: here, the quality of information to be one hundred percent correct.