There is another control system that Rory needs to consider. That is, data validation. Data validation provides a system whereby a user's data entries can be tested against a series of rules. Only if the rules are passed, is the data entered into the database. Rails, has a series of methods that make validation a straightforward task to set up and maintain.
As with so much of Ruby on Rails, validation methods do much of the nuts and bolts work for us. Two important processes are mostly taken care of: intercepting the user's input of data before it gets to the database; and then if the validation tests are failed, returning the data input form to the user (with a relevant warming message) so that the user can correct their input and resubmit the form. Very few additional lines of code are needed to set this up. We can then concentrate on deciding what needs validating and the logic needed for those validations.
The next step is to ask what we need to validate or consider when setting up the validation tasks.
The most basic level of data validation, tests whether the user has actually entered some data. At a basic level, it means ensuring that at least one field has data in it. However, we also need to consider if there is a minimal amount of data that needs to be entered with each input. For example, if an application is a price list of products, each product probably needs to have a price entered.
Often this means having enough data to allow for the unique identification of the new data. For example, when considering a list of people, there may be a number of people with the surname Smith, and therefore it is likely that we would want to validate that both a surname and first name are entered. We may also need to ensure that a particular field entry is unique within the table or that a combination of fields is unique. For example, we may want to ensure that addresses have a combination of house number and zip code (or postal code) that is unique within the table.
However, it is also worth pointing out that the default behavior of Rails is to add an ID field as the primary key. By definition, each ID must be unique within each table. Therefore, all data records will be individually identifiable within the database. If validation was turned off and a user entered ten records with no data, each would have a different ID and therefore each of the ten empty entries could be identified and manipulated individually.
It is considered bad practice to have identical records within a relational database. Having an independent ID field allows a developer to do just that (see the empty record example that I have described in the previous paragraph). Occasionally, we may find ourselves creating tables of data with series of entries that are identical to one another. Pragmatically, this may be the simplest solution to get an application up and running. However, it is always indicative of a system that could be better designed. Instead we should consider doing the following:
status_id
to the id of the existing status; only adding a new status where the current one does not already exist.Log information
There are two ways of entering information like "entered by George on 12-Jun-07". You can either generate the text string and then enter it into a text field, or create new fields that store the information programmatically. For example, by adding an updated_by
field to store the user id for George and an updated_at
field to store the date/time when the update was made.
In most circumstances you will want to use the second method. However, there are circumstances where a text entry is better. The advantage a text field entry has is that it can be read independently of any other data. Therefore, if you export the data away from the user table, the information will still be easy to read. It also stores the data as it is at the time of entry. For example, you may find that George is deleted as a user from the system at a later date. It can be easier to store the name as a string, than to ensure that a system is in place to handle this situation. Therefore, text field entries can be useful for activity logs where the log entry is not to be altered once it has been created.
Some data must have a particular format for it to be correct. Email addresses are a good example. Therefore, it makes sense to test the format of such data at the point of input, and validation tasks are perfect for this task.
There is a trap here: it is a mistake to think of validating data input as a way of controlling the user input. Rather, I strongly believe that we should think of it as a way of helping the user detect and correct mistakes that they have made. If we need to control input, we should use the design of the input form to provide control and not the validation task.
Consider telephone numbers; a developer may have a preferred format for telephone numbers. However, each of these three UK-style telephone numbers is a valid representation of the same number: 0111 111111, (0111) 111111, and 0111111111. So how do we handle the fact that a user may use any of the three given formats? There are four options:
Options 1 and 2 are both unsatisfactory and likely to result in poor data in the database and a bad user experience, respectively.
Options 3 and 4 will help the user enter valid data. Option 3 makes it easy for the user to enter data in the right format, and Option 4 allows users to use their preferred format without upsetting the integrity of the data. Which one we use, depends on how easily we can distinguish the different formats without compromising the validation. If it is easy to distinguish between the alternative input formats, use option 4. If it is difficult to distinguish between the formats use option 3.
Dates are a good example of where we should always use option 3. Living outside of the U.S.A., I am very aware of the confusion between the dd/mm/yyyy and mm/dd/yyyy formats. Programmatically, we cannot detect the difference between the two, unless the day is greater than twelve. That is, it is impossible to tell whether a single entry of 01/07/1916 refers to the day the Battle of the Somme started (1st July 1916), or 7th January of the same year. Therefore, always avoid allowing users to enter dates in these two formats. To do this, use a named month drop-down within a date selection. In that way, the confusion is avoided.
This brings us onto another point. If there are only a small number of options for data entry (for example, the statuses: "Requested", "Processing", "Completed", and "Shipped") do not use validation to control user input. Instead, use drop-downs or a selection list to restrict user entry to only the valid entries. Validation can then be put in place to detect errors when this system has been bypassed, but most of the time the validation will be redundant.
Within the database, data in separate tables are linked via foreign keys. In Ruby on Rails, these are usually between the ID field of one table and a link field of another table. The link field has a name comprising the singular name of the first table and ending with the suffix _id
. When data is entered or altered, a validation process can be used to make sure that the entries in any link fields match an existing ID field entry in the target table.
Therefore, if a people table contains a link to an addresses table, each person will have an address_id
field. The validation process would take the number in the address_id
field and check that there is a corresponding address record with a matching ID.
3.141.192.120