THE ELEMENTS OF A DATA DEFINITION

Guidelines are essential to ensure that each data object – entity, attribute, relationship, table, column, etc. – is documented to a particular standard. This standard must provide the minimum acceptable level of descriptive information about data that is to be made available to all the potential users of that data. These potential users are probably first and foremost the developers of future systems, but they may also be people from the business who are interested in knowing what data is available and where.

Although it is possible to suggest what descriptive information about data should be available, these guidelines probably need to be specific to the organisation. This enables the guidelines to cover those aspects of data definition that are important to the management of data within that particular organisation. It is difficult to be prescriptive as to what should be included in a data definition.

Most data definition frameworks or guidelines are based on what is loosely called a ‘data item’ – roughly equating to a field in a file record, to an attribute in a conceptual data model and to a column in an SQL schema. The elements of such a data definition framework should include as a minimum:

  • a name or label;

  • a significance statement;

  • formats;

  • valid value lists or validation criteria;

  • valid operations;

  • ownership details;

  • usage details;

  • source;

  • comments;

  • configuration information.

Fundamental to this standard is that each data item should have a name that uniquely identifies the data item. Naming conventions are discussed later in this chapter. Where there is the possibility of more than one name being applicable, one name must be considered as the primary name and all the other names should be recorded as aliases or synonyms. Each data item should also have a comprehensive and accurate description of the item. It is preferable if the description is held in the form of a significance statement – a statement of why the data item is deemed to be significant to the business. If the data item has no significance to the business there is no reason for it to be defined; couching descriptions in terms of the significance of the data to the business helps develop a description that is meaningful to the business, which in turn helps system developers ensure that the data is unambiguously understood when used by application programs.

Details of the format of the data should be included in the definition. This may simply be a statement that the data is currency, a number, a date or a string of characters; but it may also include details of any restrictions that need to be applied, for example, the maximum length of a string of characters.

It is also essential to include lists of valid values or a statement of any validation criteria that may be applied to the data. For some data items, there may be a predetermined list of valid values. For ‘Gender’, the valid values may be restricted to ‘Male’ and ‘Female’, or may be ‘Male’, ‘Female’ and ‘Unknown’; for ‘Staff Category’, the values might be ‘Full-time Employee’, ‘Part-time Employee’ and ‘Contractor’. For other data items, validation criteria may be needed. For ‘Salary’, the validation criterion might be ‘greater than £0 and less than or equal to £100,000’; for ‘Surname’, the validation criterion might be ‘only include the characters A–Z, a–z, space, hyphen and apostrophe’.

The operations that it is valid to carry out on the data should also be recorded. It is, for example, valid to multiply a length by a number to give a greater (or smaller) length. It is also valid to multiply a length by another length, to give an area. It is valid to multiply a currency amount by a number to give another currency amount, maybe as part of a currency-exchange transaction. It makes no sense, however, to multiply a currency amount by another currency amount, and so such an operation would not be valid. For data that comprises strings of characters it is important to record where it is valid to concatenate that data with other string data.

Ownership details should be recorded. Ownership exists at two levels – ownership of the data definition and ownership of the data values. The owner of the data definition is the person in the organisation who has the authority to say that this data should be held and that this definition is the appropriate definition for the data. It should be the responsibility of the owner of the data definition, not the data administrator, to obtain agreement for the data definition across the organisation. The owner of the data definition may also be the owner of the data values once they are recorded, but it may be more appropriate to assign this ownership role to another person (or group of people). Data ownership should not be confused with data stewardship, which is a task normally carried out by the data administrators. Data stewardship involves the maintenance of the data definition on behalf of the owner of the data definition.

Where the data that is defined by this definition is used should also be recorded. This will generally be in the form of a list of the information systems that use data that is the subject of the data definition.

The source of the data definition should be included in the definition. The source may be the owner of the data definition, or it may be that the data administrator has developed the data definition using knowledge gained during the analysis of the data requirements. Alternatively, it could be that the definition originated in a procedural manual or some other documentation.

It may be necessary to record some extra comments to expand or clarify some other element of the definition. Only in exceptional cases should it be necessary to add extra comments; it should really be possible to include all the detail necessary under the other headings.

Finally, the data definition should include information to enable versions of the definition to be controlled. This may include details of the original author and the date of the authorship plus details of any subsequent modifications.

It is important to understand the form of the ‘data item’ to which a definition built to these standards applies. Is it the definition of a field, an attribute or a column? Is it a definition of the format, structure and values that may be applied to more than one attribute? If it is the latter, then it is not a field, attribute or column definition, but is a definition of the equivalent of a domain in the relational model of data – a definition of ‘a pool of values that an attribute may take’.

Examples of two‘data item’ definitions are shown in Figures 5.1 and 5.2. The definition in Figure 5.1 has validation criteria and a set of valid operations, whilst that in Figure 5.2 has a set of valid values.

The data definition elements listed above are those suggested for the definition of a ‘data item’ or domain. The concepts behind these data definition elements can be adopted to provide the conventions for the definition of other data objects, such as entities, relationships and tables. For example, the definition of an entity could include its primary name, any aliases for the primary name, the relationships the entity has with other entities (preferably expressed as sentences reading from the entity being defined), the attributes of the entity and the descriptions of any unique identifiers. An example of an entity definition constructed using these elements is in Figure 5.3.

FIGURE 5.1 A data definition with validation criteria and valid operations

FIGURE 5.2 A data definition with valid values

FIGURE 5.3 An entity definition

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.186.92