Data Validation

There is another control system that Rory needs to consider. That is, data validation. Data validation provides a system whereby a user's data entries can be tested against a series of rules. Only if the rules are passed, is the data entered into the database. Rails, has a series of methods that make validation a straightforward task to set up and maintain.

As with so much of Ruby on Rails, validation methods do much of the nuts and bolts work for us. Two important processes are mostly taken care of: intercepting the user's input of data before it gets to the database; and then if the validation tests are failed, returning the data input form to the user (with a relevant warming message) so that the user can correct their input and resubmit the form. Very few additional lines of code are needed to set this up. We can then concentrate on deciding what needs validating and the logic needed for those validations.

The next step is to ask what we need to validate or consider when setting up the validation tasks.

The Minimum Required Data is Entered

The most basic level of data validation, tests whether the user has actually entered some data. At a basic level, it means ensuring that at least one field has data in it. However, we also need to consider if there is a minimal amount of data that needs to be entered with each input. For example, if an application is a price list of products, each product probably needs to have a price entered.

Each Record can be Uniquely Identified

Often this means having enough data to allow for the unique identification of the new data. For example, when considering a list of people, there may be a number of people with the surname Smith, and therefore it is likely that we would want to validate that both a surname and first name are entered. We may also need to ensure that a particular field entry is unique within the table or that a combination of fields is unique. For example, we may want to ensure that addresses have a combination of house number and zip code (or postal code) that is unique within the table.

However, it is also worth pointing out that the default behavior of Rails is to add an ID field as the primary key. By definition, each ID must be unique within each table. Therefore, all data records will be individually identifiable within the database. If validation was turned off and a user entered ten records with no data, each would have a different ID and therefore each of the ten empty entries could be identified and manipulated individually.

It is considered bad practice to have identical records within a relational database. Having an independent ID field allows a developer to do just that (see the empty record example that I have described in the previous paragraph). Occasionally, we may find ourselves creating tables of data with series of entries that are identical to one another. Pragmatically, this may be the simplest solution to get an application up and running. However, it is always indicative of a system that could be better designed. Instead we should consider doing the following:

  • Test to see if the data already exists and reference the existing data rather than adding a duplicate entry. For example, if we were importing a set of orders and each order had a status, we may want to store the statuses in a separate table. If we simply store each status in the statuses table, we will have as many entries in the statuses table as there are in the orders table. Instead, we should test to see if the current status already exists and then set the order status_id to the id of the existing status; only adding a new status where the current one does not already exist.
  • Add a field that uniquely identifies the entry. For example, we may want to allow partial data to be entered while more data is gathered. A list of leads in a Customer Relationship Management system may consist of very limited information for new contacts that need to be validated manually at a later date. In this case, adding fields that identify when, how, or why the data was added (for example, a data import reference or "entered by George on 12-Jun-07") or what the next action will be (for example, "assigned to Henry to validate on 12-Jun-07") will make it easier to identify and manage each data set at a later date.

Note

Log information

There are two ways of entering information like "entered by George on 12-Jun-07". You can either generate the text string and then enter it into a text field, or create new fields that store the information programmatically. For example, by adding an updated_by field to store the user id for George and an updated_at field to store the date/time when the update was made.

In most circumstances you will want to use the second method. However, there are circumstances where a text entry is better. The advantage a text field entry has is that it can be read independently of any other data. Therefore, if you export the data away from the user table, the information will still be easy to read. It also stores the data as it is at the time of entry. For example, you may find that George is deleted as a user from the system at a later date. It can be easier to store the name as a string, than to ensure that a system is in place to handle this situation. Therefore, text field entries can be useful for activity logs where the log entry is not to be altered once it has been created.

Identify Fields that Need to Have a Particular Format

Some data must have a particular format for it to be correct. Email addresses are a good example. Therefore, it makes sense to test the format of such data at the point of input, and validation tasks are perfect for this task.

Note

There is a trap here: it is a mistake to think of validating data input as a way of controlling the user input. Rather, I strongly believe that we should think of it as a way of helping the user detect and correct mistakes that they have made. If we need to control input, we should use the design of the input form to provide control and not the validation task.

Consider telephone numbers; a developer may have a preferred format for telephone numbers. However, each of these three UK-style telephone numbers is a valid representation of the same number: 0111 111111, (0111) 111111, and 0111111111. So how do we handle the fact that a user may use any of the three given formats? There are four options:

  1. Provide a free text field and allow any of the formats to be used. This will overcome the problem of different formats being entered, but add complication later when we try to compare telephone numbers. For example, 0111 111111 and (0111) 111111 will appear to be different even though they represent the same telephone number.
  2. Configure validation so that it only allows through data formatted as we prefer. When users enter an incorrect format, the entry is rejected and they are prompted to re-enter the data. The result is likely to be annoyed users. Users do not like having their entries rejected, especially when their mistake was simply not to choose the format we prefer. At the very least, we should provide, on the input form, a guide to correct usage.
  3. Design the input form to closely match our desired format and provide visual clues as to the correct format to use. For example, if we want credit card entries to be entered as four sets of four characters, provide four text boxes, four characters long rather than a single text box, where we expect users to insert spaces into the appropriate places. Validation is then used to check that the input data is likely to be correct.
  4. Anticipate the formats that may be entered and process the data so that the format used is detected and the entry reformatted to match our desired format. Regular Expressions are ideal for detecting and modifying data entries into a preferred format. Validation is then used to check that the input data is likely to be correct.

Options 1 and 2 are both unsatisfactory and likely to result in poor data in the database and a bad user experience, respectively.

Options 3 and 4 will help the user enter valid data. Option 3 makes it easy for the user to enter data in the right format, and Option 4 allows users to use their preferred format without upsetting the integrity of the data. Which one we use, depends on how easily we can distinguish the different formats without compromising the validation. If it is easy to distinguish between the alternative input formats, use option 4. If it is difficult to distinguish between the formats use option 3.

Dates are a good example of where we should always use option 3. Living outside of the U.S.A., I am very aware of the confusion between the dd/mm/yyyy and mm/dd/yyyy formats. Programmatically, we cannot detect the difference between the two, unless the day is greater than twelve. That is, it is impossible to tell whether a single entry of 01/07/1916 refers to the day the Battle of the Somme started (1st July 1916), or 7th January of the same year. Therefore, always avoid allowing users to enter dates in these two formats. To do this, use a named month drop-down within a date selection. In that way, the confusion is avoided.

This brings us onto another point. If there are only a small number of options for data entry (for example, the statuses: "Requested", "Processing", "Completed", and "Shipped") do not use validation to control user input. Instead, use drop-downs or a selection list to restrict user entry to only the valid entries. Validation can then be put in place to detect errors when this system has been bypassed, but most of the time the validation will be redundant.

References to Data in Other Tables Point to Actual Data

Within the database, data in separate tables are linked via foreign keys. In Ruby on Rails, these are usually between the ID field of one table and a link field of another table. The link field has a name comprising the singular name of the first table and ending with the suffix _id. When data is entered or altered, a validation process can be used to make sure that the entries in any link fields match an existing ID field entry in the target table.

Therefore, if a people table contains a link to an addresses table, each person will have an address_id field. The validation process would take the number in the address_id field and check that there is a corresponding address record with a matching ID.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.47.169