An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data. | ||
--Martin Fowler, P of EA |
The Active Record pattern, identified by Martin Fowler in his seminal work, Patterns of Enterprise Architecture, maps one domain class to one database table, and one instance of that class to each row of that database. It is a simple approach that, while not perfectly applicable in all cases, provides a powerful framework for database access and object persistence in your application.
Rails’ ActiveRecord framework includes mechanisms for representing models and their relationships, CRUD (Create, Read, Update and Delete) operations, complex searches, validation, callbacks, and many more features. It relies heavily on “convention over configuration,” so it’s easiest to use when you’re creating a new database schema that can follow those conventions. However, ActiveRecord also provides configuration settings that let you adapt it to work well with legacy database schemas that don’t necessarily conform to Rails conventions.
According to Martin Fowler, delivering the keynote address at the inaugural Rails conference in 2006, Ruby on Rails has successfully taken the Active Record pattern much further than anyone imagined it could go. It shows you what you can achieve when you have a single-minded focus on a set of ideals, which in the case of Rails is simplicity.
For the sake of completeness, let’s begin with the real basics of how ActiveRecord works. In order to create a new model class, the first thing you do is to declare it as a subclass of ActiveRecord::Base
, using Ruby’s class extension syntax:
class Client < ActiveRecord::Base end
By convention, an ActiveRecord class named Client
will be mapped to the clients
table. Rails understands pluralization, as covered in the section “Pluralization” in this chapter. Also by convention, ActiveRecord will expect an id
column to use as primary key. It should be an integer and incrementing of the key should be managed automatically by the database server when creating new records. Note how the class itself makes no mention of the table name, columns, or their datatypes.
Each instance of an ActiveRecord class provides access to the data from one row of the backing database table, in an object-oriented manner. The columns of that row are represented as attributes of the object, using straightforward type conversions (i.e. ruby strings for varchars, Ruby dates for dates, and so on), and with no default data validation. Attributes are inferred from the column definition pertaining to the tables with which they’re linked. Adding, removing, and changing attributes and their types are done by changing the columns of the table in the database.
When you’re running a Rails server in development mode, changes to the database schema are reflected in the Active Record objects immediately, via the web browser. However, if you make changes to the schema while you have your Rails console running, the changes will not be reflected automatically, although it is possible to pick up changes manually by typing Dispatcher.reset_application!
at the console.
The Rails way of doing things is to use code generation to produce boilerplate code. As a result, you’ll hardly ever create a file for your model class and enter the class declaration yourself. It’s just so much easier to use the Rails model generator instead.
For instance, let’s use the model generator to create our Client class and review the resulting files that are created:
$ script/generate model client exists app/models/ exists test/unit/ exists test/fixtures/ create app/models/client.rb create test/unit/client_test.rb create test/fixtures/clients.yml exists db/migrate create db/migrate/002_create_clients.rb
The file containing our new model class is client.rb:
class Client < ActiveRecord::Base end
Nice and simple. Let’s see what else was created. There’s a unit test skeleton in client_test.rb:
require File.dirname(__FILE__) + '/../test_helper' class ClientTest < Test::Unit::TestCase fixtures :clients # Replace this with your real tests. def test_truth assert true end end
It asks us to replace the test_truth
method with real tests. Right now we’re just taking a glance at the generated code, so we’ll move on. Notice ClientTest
references a fixture file, clients.yml:
# Read about fixtures at http://ar.rubyonrails.org/classes/Fixtures.html one: id: 1 two: id: 2
Not much there at all, just some ids. Unit tests and fixtures are covered in Chapter 17, “Testing.”
Finally, there is a migration file named 002_create_clients.rb:
class CreateClients < ActiveRecord::Migration def self.up create_table :clients do |t| # t.column :name, :string end end def self.down drop_table :clients end end
Migrations is the mechanism used in Rails to create and evolve your database schema, without which you wouldn’t have ActiveRecord models. (Well, they’d be very boring.) That being the case, let’s go ahead and examine migrations in depth.
It’s a fact of life that the database schema of your application will evolve over the course of development. Tables are added, names of columns are changed, things are dropped—you get the picture. Without strict conventions and process discipline for the application developers to follow, keeping the database schema in proper lock-step with application code is traditionally a very troublesome job.
Migrations are Rails’ way of helping you to evolve the database schema of your application (also known as its DDL) without having to drop and re-create the database each time you make a change. And not having to drop and recreate the database each time a change happens means that you don’t lose your development data—which may or may not be that important, but is usually very convenient. The only changes made when you execute a migration are those necessary to move the schema from one version to another, whether that move is forward or backward in time.
Rails provides a generator for creating migrations. Here is its help text at the command line:
$ script/generate migration Usage: script/generate migration MigrationName [options] Rails Info: -v, --version Show the Rails version and quit. -h, --help Show this help message and quit. General Options: -p, --pretend Run but do not make any changes. -f, --force Overwrite files that already exist. -s, --skip Skip files that already exist. -q, --quiet Suppress normal output. -t, --backtrace Debugging: show backtrace on errors. -c, --svn Modify files with subversion. (Note: svn must be in path) Description: The migration generator creates a stub for a new database migration. The generator takes a migration name as its argument. The migration name may be given in CamelCase or under_score. The generator creates a migration class in db/migrate prefixed by its number in the queue. Example: ./script/generate migration AddSslFlag With 4 existing migrations, this will create an AddSslFlag migration in the file db/migrate/005_add_ssl_flag.rb
As you can see, all you have to do is provide a descriptive name for the migration in CamelCase, and the generator does the rest. We only invoke this generator directly when we want to change attributes of an existing table.
As we discussed earlier in the chapter, other generators, such as the model generator, also create migration scripts for you, unless you specify the --skip-migration
option.
The sequential aspect of migrations is accomplished via a simple numbering scheme baked into the name of the migration file, and automatically handled by the migration generator.
By convention, the file name begins with a three-digit version number (padded with zeros) followed by the name of the migration class, separated with underscores. (Note: The name of the file does need to match the name of the class or the migration will fail.)
The migrations generator handles checking to see what the next sequence number should be, based on the value of a special table in the database that Rails maintains. It is named schema_info
and it is very simple:
mysql> desc schema_info; +---------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------+---------+------+-----+---------+-------+ | version | int(11) | YES | | NULL | | +---------+---------+------+-----+---------+-------+ 1 row in set (0.00 sec)
It only contains one column, and one row—the current migration version of your application.
The descriptive part of the migration name is up to you, but most Rails developers that I know try to make it match the schema operation (in simple cases) or at least allude to what’s going on inside (in more complex cases).
If you only ever write Rails programs by yourself, there are no pitfalls to the sequentially numbered naming scheme, and you should skip this little section. Problems can and do crop up when you’re working together with other programmers on the same project, especially big teams with lots of other developers. I’m not talking about problems with the migrations API of Rails itself; it’s just that the overall problem of maintaining and evolving database schemas is hard and not completely solved yet.
My old ThoughtWorks partner-in-crime Jay Fields has been leading big Rails teams and writes in his blog about the troubles faced with Rails migrations on his team:
Migrations are great, but they do come at a cost. When working with a large team (my current team size is 14 and growing) migration conflicts happen. This can be mitigated with communication, but migrations can definitely be a bottleneck. Also, the process of creating a migration can be painful on a large team. Before creating a migration you should always update from source control to ensure you get all the migrations checked in by your teammates. Then, the best case scenario is when you can create a migration that doesn’t change anything and immediately check it in. Checking in a new migration immediately helps ensure you don’t block other teammates from creating migrations; however, it’s not always as simple as adding a new table. Migrations that alter the database structure often break several tests. Obviously, you can’t check those migrations in until you fix all the breaking tests, which can take time. During this time, database changes are blocked for the entire team.[1]
Another manifestation of the same problem happens when you have development happening in multiple code branches that need to be updated, and has been described as nightmarish (see the comments to Jay’s blog entry for more on that subject).
Unfortunately there is no straightforward solution to this problem, other than designing the entire database up-front prior to implementing your application, and that would have its own big set of problems (too big to broach here). I can tell you from experience that doing enough analysis up-front to have at least a rough, but fairly complete database schema ready before diving into application code is quite useful.
It’s also useful to call out to your fellow developers whenever you’re about to embark on the creation of a new migration, so that two of you don’t inadvertently try to commit the same migration number later on, which would lead to drama.
Getting back to the Migration API itself, here is the 002_create_clients.rb file again, from earlier in the chapter, after adding four column definitions for the clients
table:
class CreateClients < ActiveRecord::Migration def self.up create_table :clients do |t| t.column :name, :string t.column :code, :string t.column :created_at, :datetime t.column :updated_at, :datetime end end def self.down drop_table :clients end end
As you can see in the example, migration directives happen within two class method definitions, self.up
and self.down
. If we go to the command line in our project folder and type rake db:migrate
, the clients
table will be created. Rails gives us informative output during the migration process so that we see what is going on:
$ rake db:migrate (in /Users/obie/prorails/time_and_expenses) == 2 CreateClients: migrating ========================================== -- create_table(:clients) -> 0.0448s == 2 CreateClients: migrated (0.0450s) =================================
Normally, only the code in the up
method is run, but if you ever need to rollback to an earlier version of the schema, the down
method specifies how to undo what happened in up
.
To execute a rollback, use the migrate task, but pass it a version number to rollback to, as in rake db:migrate VERSION=1
.
The create_table
method needs at minimum a name for the table and a block containing column definitions. Why do we specify identifiers with symbols instead of strings? Both will work, but symbols require one less keystroke.[2]
The create_table
method makes a huge, but usually true assumption that we want an autoincrementing, integer-typed, primary key. That is why you don’t see it declared in the list of columns. If that assumption happens to be wrong, it’s time to pass create_table
some options in a hash.
For example, how would you define a simple join table consisting of two foreign key columns and not needing its own primary key? Just pass the create_table
method an :id
option set to false
, as a boolean, not a symbol! It will stop the migration from autogenerating a primary key altogether:
create_table :ingredients_recipes, :id => false do |t| t.column :ingredient_id, :integer t.column :recipe_id, :integer end
If all you want to do is change the name of the primary key column from its default of ‘id’, pass the :id
option a symbol instead. For example, let’s say your corporation mandates that primary keys follow the pattern tablename_id. Then the earlier example would look as follows:
create_table :clients, :id => :clients_id do |t| t.column :name, :string t.column :code, :string t.column :created_at, :datetime t.column :updated_at, :datetime end
The :force => true
option tells the migration to go ahead and drop the table being defined if it exists. Be careful with this one, since it will produce (possibly unwanted) data loss when run in production. As far as I know, the :force
option is mostly useful for making sure that the migration puts the database in a known state, but isn’t all that useful on a daily basis.
The :options
option allows you to append custom instructions to the SQL CREATE statement and is useful for adding database-specific commands to your migration. Depending on the database you’re using, you might be able to specify things such as character set, collation, comments, min/max sizes, and many other properties using this option.
The :temporary => true
option specifies creation of a temporary table that will only exist during the current connection to the database. In other words, it only exists during the migration. In advanced scenarios, this option might be useful for migrating big sets of data from one table to another, but is not commonly used.
Columns can be added to a table using either the column
method, inside the block of a create_table
statement, or with the add_column
method. Other than taking the name of the table to add the column to as its first argument, the methods work identically.
create_table :clients do |t| t.column :name, :string end add_column :clients, :code, :string add_column :clients, :created_at, :datetime
The first (or second) parameter obviously specifies the name of the column, and the second (or third) obviously specifies its type. The SQL92 standard defines fundamental data types, but each database implementation has its own variation on the standards.
If you’re familiar with database column types, when you examine the preceding example it might strike you as a little weird that there is a database column declared as type string
, since databases don’t have string columns—they have char or varchars types.
The reason for declaring a database column as type string is that Rails migrations are meant to be database-agnostic. That’s why you could (as I’ve done on occasion) develop using Postgres as your database and deploy in production to Oracle.
A complete discussion of how to go about choosing the right data type for your application needs is outside the scope of this book. However, it is useful to have a reference of how migration’s generic types map to database-specific types. The mappings for the databases most commonly used with Rails are in Table 6.1.
Table 6.1. Column Mappings for the Databases Most Commonly Used with Rails
Migration type | MySQL | Postgres | SQLite | Oracle |
Ruby class | ||||
:binary | blob | bytea | blob | blob |
String | ||||
:boolean | tinyint(1) | boolean | boolean | number(1) |
Boolean | ||||
:date | date | date | date | date |
Date | ||||
:datetime | datetime | timestamp | datetime | date |
Time | ||||
:decimal | decimal | decimal | decimal | decimal |
BigDecimal | ||||
:float | float | float | float | number |
Float | ||||
:integer | int(11) | integer | integer | number(38) |
Fixnum | ||||
:string | varchar(255) | character varying (255) | varchar(255) | varchar2(255) |
String | ||||
:text | text | clob(32768) | text | clob |
String | ||||
:time | time | time | time | date |
Time | ||||
:timestamp | datetime | timestamp | datetime | date |
Time |
Each connection adapter class has a native_database_types
hash which establishes the mapping described in Table 6.1. If you need to look up the mappings for a database not listed in Table 6.1, you can pop open the adapter Ruby code and find the native_database_types
hash, like the following one inside the SQLServerAdapter
class within sqlserver_adapter.rb:
def native_database_types { :primary_key => "int NOT NULL IDENTITY(1, 1) PRIMARY KEY", :string => { :name => "varchar", :limit => 255 }, :text => { :name => "text" }, :integer => { :name => "int" }, :float => { :name => "float", :limit => 8 }, :decimal => { :name => "decimal" }, :datetime => { :name => "datetime" }, :timestamp => { :name => "datetime" }, :time => { :name => "datetime" }, :date => { :name => "datetime" }, :binary => { :name => "image"}, :boolean => { :name => "bit"} } end
For many column types, just specifying type is not enough information. All column declarations accept the following options:
:default => value
Sets a default to be used as the initial value of the column for new rows. You don’t ever need to explicitly set the default value to null
. Just leave off this option to get a null
default value.
:limit => size
Adds a size parameter to string, text, binary, or integer columns. Its meaning varies depending on the column type that it is applied to. Generally speaking, limits for string types refers to number of characters, whereas for other types it specifies the number of bytes used to store the value in the database.
:null => true
Makes the column required at the database level by adding a not null
constraint.
Columns declared as type :decimal
accept the following options:
:precision => number
Precision is the total number of digits in a number.
:scale => number
Scale is the number of digits to the right of the decimal point. For example, the number 123.45 has a precision of 5 and a scale of 2. Logically, the scale cannot be larger than the precision.
Decimal types pose a serious opportunity for data loss during migrations of production data between different kinds of databases. For example, the default precisions between Oracle and SQL Server can cause the migration process to truncate and change the value of your numeric data. It’s always a good idea to specify precision details for your data.
The choice of column type is not necessarily a simple choice and depends on both the database you’re using and the requirements of your application.
:binary. Depending on your particular usage scenario, storing binary data in the database can cause big performance problems. Rails isn’t choosy when it loads objects from the database, and putting large binary attributes on commonly used models will increase the load on your database server significantly.
:boolean. The way that boolean values are stored varies from database to database. Some use 1 and 0 integer values to represent true and false, respectively. Others use characters such as T and F. Rails handles the mapping between Ruby’s true
and false
very well, so you don’t need to worry about the underlying scheme yourself. Setting attributes directly to database values such as 1 or F may work correctly, but is considered an anti-pattern.
:date, :datetime and :time. Trying to store dates on databases that don’t have a native date type, such as Microsoft SQL Server, can be problematic. The Ruby class that Rails maps to datetime
columns is Time
, which has the limitation of not working with year values prior to 1970. Ruby’s DateTime
class does work with year values prior to 1970, so why doesn’t Rails just use DateTime
instead? The answer has to do with performance. Under the covers, Time
is implemented in C and is very fast, whereas DateTime
is written in pure Ruby and is comparatively slow.
To make ActiveRecord map to DateTime
instead of Time
, drop the code from Listing 6.1 into a file in your lib/ directory and require it from config/environment.rb.
Example 6.1. Map Dates to DateTime Instead of Time
require 'date' # It's necessary to do this, because Time doesn't # support dates before 1970... class ActiveRecord::ConnectionAdapters::Column def self.string_to_time(string) return string unless string.is_a?(String) time_array = ParseDate.parsedate(string)[0..5] begin Time.send(Base.default_timezone, *time_array) rescue DateTime.new(*time_array) rescue nil end end end
:decimal. Older versions of Rails (prior to 1.2) did not support the fixed-precision :decimal
type and as a result many old Rails applications incorrectly used :float
datatypes. Floating-point numbers are by nature imprecise, so it is important to choose :decimal instead of :float for most business-related applications.
:float. Don’t use floats to store currency[3] values, or more accurately, any type of data that needs fixed precision. Since floating-point numbers are pretty much approximations, any single representation of a number as a float is probably okay. However, once you start doing mathematical operations or comparisons with float values, it is ridiculously easy to introduce difficult to diagnose bugs into your application.
:integer and :string. There aren’t many gotchas that I can think of when it comes to integers and strings. They are the basic data building blocks of your application, and many Rails developers leave off the size specification, which results in the default maximum sizes of 11 digits and 255 characters, respectively.
You should keep in mind that you won’t get an error if you try to store values that exceed the maximum size defined for the database column, which again, is 255 characters by default. Your string will simply get truncated. Use validations to make sure that user-entered data does not exceed the maximum size allowed.
:text. There have been reports of text fields slowing down query performance, enough to be a consideration for applications that need to scale to high loads. If you must use a text column in a performance-critical application, put it in a separate table.
:timestamp. As of Rails 1.2, when creating new records, ActiveRecord may not work very well with default database values that are generated with functions, as in the case of timestamp columns on Postgres. The issue is that instead of leaving those columns out of the insert
statement, Rails supplies a null
value, which may cause the default value to be ignored.
If use of database-specific datatypes (such as :double
, for higher precision than :float
) is critical to your project, use the config.active_record.schema_format = :sql
setting in config/environment.rb to make Rails dump schema information in native SQL DDL format rather than its own cross-platform compatible Ruby code, via the schema.rb file.
Rails does magic with datetime columns, if they’re named a certain way. Active Record will automatically timestamp create operations if the table has columns named created_at
or created_on
. The same applies to updates when there are columns named updated_at
or updated_on
.
Note that created_at
and updated_at
should be defined as datetime
-type columns not timestamps in your migration.
Automatic timestamping can be turned off globally, by setting the following variable in config/environment.rb
ActiveRecord::Base.record_timestamps = false
Via inheritance, the preceding code turns off timestamps for all models, but you can also do it on a case-by-case basis by setting record_timestamps
to false in specific models. Timestamps are in the local timezone by default, but can use UTC by setting ActiveRecord::Base.default_timezone = :utc
.
Most of the important classes you write while coding a Rails application are configured using what I call macro-style method invocations (also known in some circles as a domain-specific language or DSL). Basically, the idea is to have a highly readable block of code at the top of your class that makes it immediately clear how it is configured.
Macro-style invocations are usually placed at the top of the file, and for good reason. Those methods declaratively tell Rails how to manage instances, perform data validation and callbacks, and relate with other models. Many of them do some amount of metaprogramming, meaning that they participate in adding behavior to your class at runtime, in the form of additional instance variables and methods.
For example, look at the Client
class with some relationships declared. Don’t worry about the meaning of those declarations just yet, because we’ll talk about them extensively in Chapter 7, “ActiveRecord Associations.” All I want to do right now is to illustrate what I’m talking about when I say macro-style:
class Client < ActiveRecord::Base has_many :billing_codes has_many :billable_weeks has_many :timesheets, :through => :billable_weeks end
As a result of those three has_many
declarations, the Client
class gains at least three new attributes, proxy objects that let you manipulate the associated collections interactively.
I still remember the first time I sat with an experienced Java programmer friend of mine to teach him some Ruby and Rails. After minutes of profound confusion, an almost visible light bulb appeared over his head as he proclaimed, “Oh! They’re methods!”
Indeed, they’re regular old method calls, in the context of the class object. We leave the parentheses off to emphasize the declarative intention. That’s a style issue, but it just doesn’t feel right to me with the parentheses in place, as in the following code snippet:
class Client < ActiveRecord::Base has_many(:billing_codes) has_many(:billable_weeks) has_many(:timesheets, :through => :billable_weeks) end
When the Ruby interpreter loads client.rb, it executes those has_many
methods, which, again, are defined as class methods of ActiveRecord’s Base
class. They are executed in the context of the Client
class, adding attributes that are subsequently available to Client
instances. It’s a programming model that is potentially strange to newcomers, but quickly becomes second-nature to the Rails programmer.
Convention over configuration is one of Rails’ guiding principles. If we follow Rails conventions, very little explicit configuration is needed, which stands in stark contrast to the reams of configuration that are required to get even a simple application running in other technologies.
It’s not that a newly bootstrapped Rails application comes with default configuration in place already, reflecting the conventions that will be used. It’s that the conventions are baked into the framework, actually hard-coded into its behavior, and you need to override the default behavior with explicit configuration when applicable.
It’s also worth mentioning that most configuration happens in close proximity to what you’re configuring. You will see associations, validations, and callback declarations at the top of most ActiveRecord models.
I suspect that the first explicit configuration (over convention) that many of us deal with in ActiveRecord is the mapping between class name and database table, since by default Rails assumes that our database name is simply the pluralized form of our class name. And since the issue of pluralization trips up so many beginning Rails developers, we’ll cover it here before going further with ActiveRecord.
Rails has a class named Inflector
whose responsibility is to transform strings (words) from singular to plural, class names to table names, modularized class names to ones without, and class names to foreign keys, etc. (Some of its operations have funny names, such as dasherize
.)
The default inflections for pluralization and singularization of uncountable words are kept in an interesting file inside your Rails installation, named inflections.rb.
Most of the time the Inflector
class does a decent job of figuring out the pluralized table name for a given class, but occasionally it won’t. This is one of the first stumbling blocks for many new Rails users, but it is not necessary to panic. With a little ad-hoc testing beforehand, it’s easy to find out how Inflector
will react to certain words. We just need to use the Rails console, which by the way is one of the best things about working in Rails.
You fire up the console from your command prompt using the executable Ruby script located at script/console in your project directory.
$ script/console >> Inflector.pluralize "project" => "projects" >> Inflector.pluralize "virus" => "viri" >> Inflector.pluralize "pensum" => "pensums"
As you can see in the example, Inflector
is pretty smart, pluralizing “virus” as “viri”; but if you know your Latin you have already noticed that the plural “pensum” should actually be “pensa”. Needless to say, the inflector does not know Latin.
However, you can teach the inflector new tricks either by adding new pattern rules, by pointing out an exception, or by declaring certain words un-pluralizable. The preferred place to do that is inside the config/environment.rb file, where a commented example is already provided.
Inflector.inflections do |inflect| inflect.plural /^(.*)um$/i, '1a' inflect.singular /^(.*)a/i, '1um' inflect.irregular 'album', 'albums' inflect.uncountable %w( valium ) end
By the way, as of Rails 1.2 the pluralizer takes already pluralized strings and...does nothing with them, which is probably best. Older versions of Rails were not as smart about that.
>> "territories".pluralize => "territories" >> "queries".pluralize => "queries"
For a long list of pluralizations correctly handled by Inflector
, take a look inside activesupport/test/inflector_test.rb. I found some of them pretty interesting, such as:
"datum" => "data", "medium" => "media", "analysis" => "analyses"
According to Michael Koziarski, one of the members of the Rails core team, you shouldn’t report issues with Inflector
: “The inflector is basically frozen, prior to 1.0 we’d add lots of new rules to fix bugs, and just end up enraging people who looked at the old output and named their tables accordingly. You can add those exceptions yourself in environment.rb.”
Now that we understand inflection, let’s get back to configuration of ActiveRecord model classes. The set_table_name
and set_primary_key
methods let you bypass Rails conventions and explicitly define the table name for the model, and the column name for its primary key.
For example purposes (only!), let’s say I had some icky naming convention that I was forced to follow for my database tables, that differed from ActiveRecord’s convention. I might have to do the following:
class Client < ActiveRecord::Base set_table_name "CLIENT" set_primary_key "CLIENT_ID" end
The set_table_name
and set_primary_key
methods let you use any table and primary names you’d like, but you’ll have to specify them explicitly in your model class. It’s only a couple of extra lines per model, but on a large application it adds unnecessary complexity, so don’t do it if you don’t absolutely have to.
When you’re not at liberty to dictate the naming guidelines for your database schema, such as when a separate DBA group controls all database schemas, then you probably don’t have a choice. But if you have flexibility, you should really just follow Rails’ conventions. They might not be what you’re used to, but following them will save you time and unnecessary headaches.
If you are working with legacy schemas, you may be tempted to automatically set_table_name
everywhere, whether you need it or not. Before you get accustomed to doing that, learn the additional options available that might just be more DRY and make your life easier.
Let’s assume you need to turn off table pluralization altogether; you would set the following attribute to false at the bottom of config/environment.rb:
ActiveRecord::Base.pluralize_table_names = false
There are various other useful attributes of ActiveRecord::Base
, provided for configuring Rails to work with legacy naming schemes.
primary_key_prefix_type
. Accessor for the prefix type that will be prepended to every primary key column name. If :table_name
is specified, the ActiveRecord will look for “tableid” instead of “id” as the primary column. If :table_name_with_underscore
is specified, ActiveRecord will look for “table_id” instead of “id”.
table_name_prefix
. Some departments prefix table names with the name of the database. Set this attribute accordingly to avoid having to include the prefix in all of your model class names.
table_name_suffix
. Similar to prefix, but adds a common ending to all table names.
underscore_table_names
. Set to false
to prevent ActiveRecord from underscoring compound table names.
The list of attributes associated with an ActiveRecord model class is not coded explicitly. At runtime, the ActiveRecord model examines the database schema directly from the server. Adding, removing, and changing attributes and their type is done by manipulating the database itself, either directly using SQL commands or GUI tools, but ideally via ActiveRecord migrations.
The practical implication of the ActiveRecord pattern is that you have to define your database table structure and make sure it actually exists in the database prior to working with your persistent models. Some people may have issues with that design philosophy, especially if they’re coming from a background in top-down design.
The Rails way is undoubtedly to have model classes that map closely to your database schema. On the other hand, remember you can have models that are simple Ruby classes and do not extend ActiveRecord::Base. Among other things, it is common to use non-ActiveRecord model classes to encapsulate data and logic for the view layer.
Migrations let you define default attribute values by passing a :default
option to the column
method, but most of the time you’ll want to set default attribute values at the model layer, not the database layer. Default values are part of your domain logic and should be kept together with the rest of the domain logic of your application, in the model layer.
A common example is the case when your model should return the string ‘n/a’ instead of a nil (or empty) string for an attribute that has not been populated yet. Seems simple enough and it’s a good place to start looking into how attributes exist at runtime.
To begin, let’s whip up a quick test case describing the desired behavior.
class SpecificationTest < Test::Unit::TestCase def test_default_string_for_tolerance_should_be_na spec = Specification.new assert_equal 'n/a', spec.tolerance end end
We run that test and it fails, as expected. ActiveRecord doesn’t provide us with any class-level methods to define default values for models declaratively. So it seems we’ll have to create an explicit attribute accessor that provides a default value.
Normally, attribute accessors are handled magically by ActiveRecord’s internals, but in this case we’re overriding the magic with an explicit getter. All we need to do is to define a method with the same name as the attribute and use Ruby’s or
operator, which will short-circuit if @tolerance
is not nil.
class Specification < ActiveRecord::Base def tolerance @tolerance or 'n/a' end end
Now we run the test and it passes. Great. Are we done? Not quite. We should test a case when the real tolerance value should be returned. I’ll add another test for a specification with a not-nil tolerance value and also go ahead and make my test method names a little more descriptive.
class SpecificationTest < Test::Unit::TestCase def test_default_string_for_tolerance_should_return_na_when_nil spec = Specification.new assert_equal 'n/a', spec.tolerance end def test_tolerance_value_should_be_returned_when_not_nil spec = Specification.new(:tolerance => '0.01mm') assert_equal '0.01mm', spec.tolerance end end
Uh-oh. The second test fails. Seems our default ‘n/a’ string is being returned no matter what. That means that @tolerance
must not get set. Should we even know that it is getting set or not? It is an implementation detail of ActiveRecord, is it not?
The fact that Rails does not use instance variables like @tolerance
to store the model attributes is in fact an implementation detail. But model instances have a couple of methods, write_attribute
and read_attribute,
conveniently provided by ActiveRecord for the purposes of overriding default accessors, which is exactly what we’re trying to do. Let’s fix our Specification class.
class Specification < ActiveRecord::Base def tolerance read_attribute(:tolerance) or 'n/a' end end
Now the test passes. How about a simple example of using write_attribute
?
class SillyFortuneCookie < ActiveRecord::Base def message=(txt) write_attribute(:message, txt + ' in bed') end end
Alternatively, both of these examples could have been written with the shorter forms of reading and writing attributes, using square brackets.
class Specification < ActiveRecord::Base def tolerance self[:tolerance] or 'n/a' end end class SillyFortuneCookie < ActiveRecord::Base def message=(txt) self[:message] = txt + ' in bed' end end
One of ActiveRecord’s coolest features (IMO) is the ability to mark a column of type “text” as being serialized. Whatever object (more accurately, graph of objects) that you assign to that attribute should be represented in the database as YAML, which is Ruby’s native serialization format.
The four standard operations of a database system combine to form a popular acronym: CRUD.
It sounds somewhat negative, because as a synonym for ‘garbage’ or ‘unwanted accumulation’ the word ‘crud’ in English has a rather bad connotation. However, in Rails circles, use of the word CRUD is benign. In fact, as we’ll see in later chapters, designing your app to function primarily as CRUD operations is considered a best practice!
The most straightforward way to create a new instance of an ActiveRecord model is by using a regular Ruby constructor, the class method new
. New objects can be instantiated as either empty (by omitting parameters) or pre-set with attributes, but not yet saved. Just pass a hash with key names matching the associated table column names. In both instances, valid attribute keys are determined by the column names of the associated table—hence you can’t have attributes that aren’t part of the table columns.
Newly constructed, unsaved ActiveRecord objects have a @new_record
attribute that can be queried using the method new_record?
:
>> c = Client.new => #<Client:0x2515584 @new_record=true, @attributes={"name"=>nil, "code"=>nil}> >> c.new_record? => true
ActiveRecord constructors take an optional block, which can be used to do additional initialization. The block is executed after any passed-in attributes are set on the instance:
>> c = Client.new do |client| ?> client.name = "Nile River Co." >> client.code = "NRC" >> end => #<Client:0x24e8764 @new_record=true, @attributes={"name"=>"Nile River Co.", "code"=>"NRC"}>
ActiveRecord has a handy-dandy create
class method that creates a new instance, persists it to the database, and returns it in one operation:
>> c = Client.create(:name => "Nile River, Co.", :code => "NRC") => #<Client:0x4229490 @new_record_before_save=true, @new_record=false, @errors=#<ActiveRecord::Errors:0x42287ac @errors={}, @base=#<Client:0x4229490 ...>>, @attributes={"name"=>"Nile River, Co.", "updated_at"=>Mon Jun 04 22:24:27 UTC 2007, "code"=>"NRC", "id"=>1, "created_at"=>Mon Jun 04 22:24:27 UTC 2007}>
The create
method doesn’t take a block. It probably should, as it feels like a natural place for a block to initialize the object before saving it, but alas it doesn’t.
Reading data from the database into ActiveRecord object instances is very easy and convenient. The primary mechanism is the find
method, which hides SQL SELECT operations from the developer.
Finding an existing object by its primary key is very simple, and is probably one of the first things we all learn about Rails when we first pick up the framework. Just invoke find
with the key of the specific instance you want to retrieve. Remember that if an instance is not found, a RecordNotFound
exception is raised.
>> first_project = Project.find(1) >> boom_client = Client.find(99) ActiveRecord::RecordNotFound: Couldn't find Client with ID=99 from /vendor/rails/activerecord/lib/active_record/base.rb:1028:in `find_one' from /vendor/rails/activerecord/lib/active_record/base.rb:1011:in `find_from_ids' from /vendor/rails/activerecord/lib/active_record/base.rb:416:in `find' from (irb):
The find
method also understands a pair of specially designated Ruby symbols, :first
and :all
:
>> all_clients = Client.find(:all) => [#<Client:0x250e004 @attributes={"name"=>"Paper Jam Printers", "code"=>"PJP", "id"=>"1"}>, #<Client:0x250de88 @attributes={"name"=>"Goodness Steaks", "code"=>"GOOD_STEAKS", "id"=>"2"}>] >> first_client = Client.find(:first) => #<Client:0x2508244 @attributes={"name"=>"Paper Jam Printers", "code"=>"PJP", "id"=>"1"}>
Somewhat surprisingly to me, there is no :last
parameter, but you can do a last query pretty easily using the :order
option:
>> all_clients = Client.find(:first, :order => 'id desc') => #<Client:0x2508244 @attributes={"name"=>"Paper Jam Printers", "code"=>"PJP", "id"=>"1"}>
By the way, it is entirely common for methods in Ruby to return different types depending on the parameters used, as illustrated in the example. Depending on how find
is invoked, you will get either a single ActiveRecord object or an array of them.
Finally, the find
method also understands arrays of keys, and throws a RecordNotFound
exception if it can’t find all of the keys specified:
>> first_couple_of_clients = Client.find(1, 2) [#<Client:0x24d667c @attributes={"name"=>"Paper Jam Printers", "code"=>"PJP", "id"=>"1"}>, #<Client:0x24d65b4 @attributes={"name"=> "Goodness Steaks", "code"=>"GOOD_STEAKS", "id"=>"2"}>] >> first_few_clients = Client.find(1, 2, 3) ActiveRecord::RecordNotFound: Couldn't find all Clients with IDs (1,2,3) from /vendor/rails/activerecord/lib/active_record/base.rb:1042:in `find_some' from /vendor/rails/activerecord/lib/active_record/base.rb:1014:in `find_from_ids' from /vendor/rails/activerecord/lib/active_record/base.rb:416:in `find' from (irb):9
After you have retrieved a model instance from the database, you can access each of its columns in several ways. The easiest (and clearest to read) is simply with dot notation:
>> first_client.name => "Paper Jam Printers" >> first_client.code => "PJP"
The private read_attribute
method of ActiveRecord, covered briefly in an earlier section, is useful to know about, and comes in handy when you want to override a default attribute accessor. To illustrate, while still in the Rails console, I’ll go ahead and reopen the Client
class on the fly and override the name
accessor to return the value from the database, but reversed:
>> class Client >> def name >> read_attribute(:name).reverse >> end >> end => nil >> first_client.name => "sretnirP maJ repaP"
Hopefully it’s not too painfully obvious for me to demonstrate why you need read_attribute
in that scenario:
>> class Client >> def name >> self.name.reverse >> end >> end => nil >> first_client.name SystemStackError: stack level too deep from (irb):21:in `name' from (irb):21:in `name' from (irb):24
As can be expected by the existence of a read_attribute
method, there is a write_attribute
method that lets you change attribute values.
project = Project.new project.write_attribute(:name, "A New Project")
Just as with attribute getter methods, you can override the setter methods and provide your own behavior:
class Project # The description for a project cannot be changed to a blank string def description=(new_value) self[:description] = new_value unless new_value.blank? end end
The preceding example illustrates a way to do basic validation, since it checks to make sure that a value is not blank before allowing assignment. However, as we’ll see later in the book, there are better ways to do this.
Yet another way to access attributes is using the [attribute_name]
operator, which lets you access the attribute as if it were a regular hash.
>> first_client['name'] => "Paper Jam Printers" >> first_client[:name] => "Paper Jam Printers"
There is also an attributes
method that returns a hash with each attribute and its corresponding value as returned by read_attribute
. If you use your own custom attribute reader and writer methods, it’s important to remember that attributes
will not use custom attribute readers when accessing its values, but attributes=
(which lets you do mass assignment) does invoke custom attribute writers.
>> first_client.attributes => {"name"=>"Paper Jam Printers", "code"=>"PJP", "id"=>1}
Being able to grab a hash of all attributes at once is useful when you want to iterate over all of them or pass them in bulk to another function. Note that the hash returned from attributes
is not a reference to an internal structure of the ActiveRecord object—it is copied, which means that changing its values will have no effect on the object it came from:
>> atts = first_client.attributes => {"name"=>"Paper Jam Printers", "code"=>"PJP", "id"=>1} >> atts["name"] = "Def Jam Printers" => "Def Jam Printers" >> first_client.attributes => {"name"=>"Paper Jam Printers", "code"=>"PJP", "id"=>1}
To make changes to an ActiveRecord object’s attributes in bulk, it is possible to pass a hash to the attributes
writer.
The ActiveRecord connection adapters fetch results as strings and Rails takes care of converting them to other datatypes if necessary, based on the type of the database column. For instance, integer types are cast to instances of Ruby’s Fixnum
class, and so on.
Even if you’re working with a new instance of an ActiveRecord object, and have passed in constructor values as strings, they will be typecast to their proper type when you try to access those values as attributes.
Sometimes you want to be able to read (or manipulate) the raw attribute data without having the column-determined typecast run its course first, and that can be done by using the <attribute>_before_type_cast
accessors that are automatically created in your model.
For example, consider the need to deal with currency strings typed in by your end users. Unless you are encapsulating currency values in a currency class (highly recommended, by the way) you need to deal with those pesky dollar signs and commas. Assuming that our Timesheet
model had a rate attribute defined as a :decimal
type, the following code would strip out the extraneous characters before typecasting for the save operation:
class Timesheet < ActiveRecord::Base before_save :fix_rate def fix_rate rate_before_type_cast.tr!('$,','') end end
The reload
method does a query to the database and resets the attributes of an ActiveRecord object. The optional options argument is passed to find when reloading so you may do, for example, record.reload(:lock => true)
to reload the same record with an exclusive row lock. (See the section “Database Locking” later in this chapter.)
Since one of the most common operations in many applications is to simply query based on one or two columns, Rails has an easy and effective way to do these queries without having to resort to the conditions parameter of find
. They work thanks to the magic of Ruby’s method_missing
callback, which is executed whenever you invoke a method that hasn’t been defined yet.
Dynamic finder methods begin with find_by_
or find_all_by_
, indicating whether you want a single value or array of results returned. The semantics are similar to calling find with the :first
versus the :all
option.
>> City.find_by_name("Hackensack") => #<City:0x3205244 @attributes={"name" => "Hackensack", "latitude" => "40.8858330000", "id" => "15942", "longitude" => "-74.0438890000", "state" => "NJ" }> >> City.find_all_by_name("Atlanta").collect(&:state) => ["GA", "MI", "TX"]
It’s also possible to use multiple attributes in the same find by separating them with “and”, so you get finders like Person.find_by_user_name_and_password
or even Payment.find_by_purchaser_and_state_and_country
.
Dynamic finders have the benefits of being shorter and easier to read and understand. Instead of writing Person.find(:first, ["user_name = ? AND password = ?", user_name, password])
, try writing Person.find_by_user_name_and_password(user_name, password)
.
>> City.find_by_name_and_state("Atlanta", "TX") => #<City:0x31faeac @attributes={ "name" => "Atlanta", "latitude" => "33.1136110000", "id" => "25269", "longitude" => "-94.1641670000", "state" => "TX"}>
You can even customize dynamic finder calls with options, just like regular finder methods! Payment.find_all_by_amount
is actually Payment.find_all_by_amount(amount, options)
. And the full interface to Person.find_by_user_name
is actually Person.find_by_user_name(user_name, options)
. So you can call Payment.find_all_by_amount(50, :order => "created_on")
.
The same dynamic finder style can be used to create the object if it doesn’t already exist. This dynamic finder is called with find_or_create_by_
and will return the object if it already exists and otherwise creates it, then returns it. Use the find_or_initialize_by_
finder if you want to return a new record without saving it first.
The find_by_sql
class method takes a SQL select query and returns an array of ActiveRecord objects based on the results. Here’s a barebones example, which you would never actually need to do in a real application:
>> Client.find_by_sql("select * from clients") => [#<Client:0x4217024 @attributes={"name"=>"Nile River, Co.", "updated_at"=>"2007-06-04 22:24:27", "code"=>"NRC", "id"=>"1", "created_at"=>"2007-06-04 22:24:27"}>, #<Client:0x4216ffc @attributes={"name"=>"Amazon, Co.", "updated_at"=>"2007-06-04 22:26:22", "code"=>"AMZ", "id"=>"2", "created_at"=>"2007-06-04 22:26:22"}>]
I can’t stress this enough: You should take care to use find_by_sql
only when you really need it! For one, it reduces database portability—when you use ActiveRecord’s normal find operations, Rails takes care of handling differences between the underlying databases for you.
Also, ActiveRecord already has a ton of built-in functionality abstracting SELECT statements—functionality that it would be very unwise to reinvent. There are lots of cases where at first glance it might seem that you might need to use find_by_sql
, but you actually don’t. A common case is when doing a LIKE query:
>> Client.find_by_sql("select * from clients where code like 'A%'") => [#<Client:0x4206b34 @attributes={"name"=>"Amazon, Inc.", ...}>]
Turns out that you can easily put that LIKE clause into a conditions option:
>> param = "A" >> Client.find(:all, :conditions => ["code like ?", "#{param}%"]) => [#<Client:0x41e3594 @attributes={"name"=>"Amazon, Inc...}>] # Right!
Under the covers, Rails sanitizes[4] your SQL code, provided that you parameterize your query. ActiveRecord executes your SQL using the connection.select_all
method, iterating over the resulting array of hashes, and invoking your ActiveRecord’s initialize
method for each row in the result set. What would the last example look like un-parameterized?
>> param = "A" >> Client.find(:all, :conditions => ["code like '#{param}%'"]) => [#<Client:0x41e3594 @attributes={"name"=>"Amazon, Inc...}>] # NOOOOO!
Notice the missing question mark as a variable placeholder. Always remember that interpolating user-supplied values into a SQL fragment of any type is very unsafe! Just imagine what would happen to your project if a malicious user called that unsafe find with a param
like this:
"Amazon'; DELETE FROM users;'
Sadly, very few people actually understand what SQL injection means. Google can be one of your best friends in this case.
By default, Rails attempts to optimize performance by turning on a simple query cache. It is a hash stored on the current thread, one for every active database connection. (Most Rails processes will have just one.)
Whenever a find
(or any other type of select operation) happens and the query cache is active, the corresponding result set is stored in a hash with the SQL that was used to query for them as the key. If the same SQL statement is used again in another operation, the cached result set is used to generate a new set of model objects instead of hitting the database again.
You can enable the query cache manually by wrapping operations in a cache
block, as in the following example:
User.cache do puts User.find(:first) puts User.find(:first) puts User.find(:first) end
Check your development.log
and you should see the following entries:
Person Load (0.000821) SELECT * FROM people LIMIT 1 CACHE (0.000000) SELECT * FROM people LIMIT 1 CACHE (0.000000) SELECT * FROM people LIMIT 1
The database was queried only once. Try a similar experiment in your own console without the cache
block, and you’ll see that three separate Person Load
events are logged.
Save and delete operations result in the cache being cleared, to prevent propagation of instances with invalid states. If you find it necessary to do so for whatever reason, call the clear_query_cache
class method to clear out the query cache manually.
The log file indicates when data is being read from the query cache instead of the database. Just look for lines starting with CACHE instead of a Model Load.
Place Load (0.000420) SELECT * FROM places WHERE (places.`id` = 15749) CACHE (0.000000) SELECT * FROM places WHERE (places.`id` = 15749) CACHE (0.000000) SELECT * FROM places WHERE (places.`id` = 15749)
For performance reasons, ActiveRecord’s query cache is turned on by default for the processing of controller actions. The module SqlCache
, defined in caching.rb of ActionController, is mixed into ActionController::Base
and wraps the perform_action
method using alias_method_chain
:
module SqlCache def self.included(base) #:nodoc: base.alias_method_chain :perform_action, :caching end def perform_action_with_caching ActiveRecord::Base.cache do perform_action_without_caching end end end
The ActiveRecord query cache was purposely kept very simple. Since it literally keys cached model instances on the SQL that was used to pull them out of the database, it can’t connect multiple find
invocations that are phrased differently but have the same semantic meaning and results.
For example, “select foo from bar where id = 1” and “select foo from bar where id = 1 limit 1” are considered different queries and will result in two distinct cache entries. The active_record_context plugin[5] by Rick Olson is an example of a query cache implementation that is a little bit smarter about identity, since it keys cached results on primary keys rather than SQL statements.
The simplest way to manipulate attribute values is simply to treat your ActiveRecord object as a plain old Ruby object, meaning via direct assignment using myprop=(some_value)
There are a number of other different ways to update ActiveRecord objects, as illustrated in this section. First, let’s look at how to use the update
class method of ActiveRecord::Base
class ProjectController < ApplicationController def update Project.update(params[:id], params[:project]) redirect_to :action=>'settings', :id => project.id end def mass_update Project.update(params[:projects].keys, params[:projects].values]) redirect_to :action=>'index' end end
The first form of update
takes a single numeric id and a hash of attribute values, while the second form takes a list of ids and a list of values and is useful in scenarios where a form submission from a web page with multiple updateable rows is being processed.
The update
class method does invoke validation first and will not save a record that fails validation. However, it returns the object whether or not the validation passes. That means that if you want to know whether or not the validation passed, you need to follow up the call to update
with a call to valid?
class ProjectController < ApplicationController def update @project = Project.update(params[:id], params[:project]) if @project.valid? # uh-oh, do we want to run validate again? redirect_to :action=>'settings', :id => project.id else render :action => 'edit' end end end
A problem is that now we are calling valid?
twice, since the update
call also called it. Perhaps a better option is to use the update_attributes
instance method:
class ProjectController < ApplicationController def update @project = Project.find(params[:id] if @project.update_attributes(params[:project]) redirect_to :action=>'settings', :id => project.id else render :action => 'edit' end end end
And of course, if you’ve done some basic Rails programming, you’ll recognize that idiom since it is used in the generated scaffolding code. The update_attributes
method takes a hash of attribute values, and returns true or false depending on whether the save was successful or not, which is dependent on validation passing.
ActiveRecord has another class method useful for updating multiple records at once: update_all
. It maps closely to the way that you would think of using a SQL update..where statement. The update_all
method takes two parameters, the set part of the SQL statement and the conditions, expressed as part of a where clause. The method returns the number of records updated[5].
I think this is one of those methods that is generally more useful in a scripting context than in a controller method, but you might feel differently. Here is a quick example of how I would reassign all the Rails projects in the system to a new project manager.
Project.update_all("manager = 'Ron Campbell'", "technology = 'Rails'")
The most basic way to update an ActiveRecord object is to manipulate its attributes directly and then call save
. It’s worth noting that save
will insert a record in the database if necessary or update an existing record with the same primary key.
project = Project.find(1) project.manager = 'Brett M.' assert_equal true, project.save
The save
method will return true if it was successful or false if it failed for any reason. There is another method, save!
, that will use exceptions instead. Which one to use depends on whether you plan to deal with errors right away or delegate the problem to another method further up the chain.
It’s mostly a matter of style, although the non-bang save and update methods that return a boolean value are often used in controller actions, as the clause for an if condition:
class StoryController < ApplicationController def points @story = Story.find(params[:id]) if @story.update_attribute(:points, params[:value]) render :text => "#{@story.name} updated" else render :text => "Error updating story points" end end end
The instance methods update_attribute
and update_attributes
take one key/value pair or hash of attributes, respectively, to be updated on your model and saved to the database in one operation.
The update_attribute
method updates a single attribute and saves the record. Updates made with this method are not subjected to validation checks! In other words, this method allows you to persist an ActiveRecord model to the database even if the full object isn’t valid. According to the Rails core team that behavior is by design. Internally, this method does exactly the same as model.attribute = some_value
and then model.save(false)
.
On the other hand, update_attributes
is subject to validation checks and is often used on update actions and passed the params hash containing updated values.
Rails provides a number of convenience update methods in the form of increment
, decrement
, and toggle
, which do exactly what their names suggest with numeric and boolean attributes. Each has a bang variant (such as toggle!
) that additionally invokes save after modifying the attribute.
Constructors and update methods that take hashes to do mass assignment of attribute values are susceptible to misuse by hackers when they are used in conjunction with parameter hashes available in a controller method.
When you have attributes in your ActiveRecord class that you want to protect from inadvertent or mass assignment, use one of the following two class methods to control access to your attributes:
The attr_accessible
method takes a list of attributes that will be accessible for mass assignment. This is the more conservative choice for mass-assignment protection.
If you’d rather start from an all-open default and restrict attributes as needed, then use attr_protected
. Attributes passed to this method will be protected from mass-assignment. Their assignment will simply be ignored. You will need to use direct assignment methods to assign values to those attributes, as illustrated in the following code example:
class Customer < ActiveRecord::Base attr_protected :credit_rating end customer = Customer.new(:name => "Abe", :credit_rating => "Excellent") customer.credit_rating # => nil customer.attributes = { "credit_rating" => "Excellent" } customer.credit_rating # => nil # and now, the allowed way to set a credit_rating customer.credit_rating = "Average" customer.credit_rating # => "Average"
Finally, if you want to remove a record from your database, you have two choices. If you already have a model instance, you can destroy it:
>> bad_timesheet = Timesheet.find(1) >> bad_timesheet.destroy => #<Timesheet:0x2481d70 @attributes={"updated_at"=>"2006-11-21 05:40:27", "id"=>"1", "user_id"=>"1", "submitted"=>nil, "created_at"=> "2006-11-21 05:40:27"}>
The destroy
method will both remove it from the database and freeze it (make it read-only) so you won’t be able to save it again:
>> bad_timesheet.save TypeError: can't modify frozen hash from activerecord/lib/active_record/base.rb:1965:in `[]='
Alternatively, you can call destroy
and delete
as class methods, passing the id(s) to delete. Both variants accept a single parameter or array of ids:
Timesheet.delete(1) Timesheet.destroy([2, 3])
The naming might seem inconsistent, but it isn’t. The delete
method uses SQL directly and does not load any instances (hence it is faster). The destroy
method does load the instance of the ActiveRecord object and then calls destroy
on it as an instance method. The semantic differences are subtle, but come into play when you have assigned before_destroy
callbacks or have dependent associations—child objects that should be deleted automatically along with their parent object.
Locking is a term for techniques that prevent concurrent users of an application from overwriting each other’s work. ActiveRecord doesn’t normally use any type of database locking when loading rows of model data from the database. If a given Rails application will only ever have one user updating data at the same time, then you don’t have to worry about locking.
When more than one user may be accessing and updating the same data simultaneously, then it is vitally important for you as the developer to think about concurrency. Ask yourself, what types of collisions or race conditions could happen if two users were to try to update a given model at the same time?
There are a number of approaches to dealing with concurrency in database-backed applications, two of which are natively supported by ActiveRecord: optimistic and pessimistic locking. Other approaches exist, such as locking entire database tables. Every approach has strengths and weaknesses, so it is likely that a given application will use a combination of approaches for maximum reliability.
Optimistic locking describes the strategy of detecting and resolving collisions if they occur, and is commonly recommended in multi-user situations where collisions should be infrequent. Database records are never actually locked in optimistic locking, making it a bit of a misnomer.
Optimistic locking is a fairly common strategy, because so many applications are designed such that a particular user will mostly be updating with data that conceptually belongs to him and not other users, making it rare that two users would compete for updating the same record. The idea behind optimistic locking is that since collisions should occur infrequently, we’ll simply deal with them only if they happen.
If you control your database schema, optimistic locking is really simple to implement. Just add an integer column named lock_version to a given table, with a default value of zero.
class AddLockVersionToTimesheets < ActiveRecord::Migration def self.up add_column :timesheets, :lock_version, :integer, :default => 0 end def self.down remove_column :timesheets, :lock_version end end
Simply adding that lock_version
column changes ActiveRecord’s behavior. Now if the same record is loaded as two different model instances and saved differently, the first instance will win the update, and the second one will cause an ActiveRecord::StaleObjectError
to be raised.
We can illustrate optimistic locking behavior with a simple unit test:
class TimesheetTest < Test::Unit::TestCase fixtures :timesheets, :users def test_optimistic_locking_behavior first_instance = Timesheet.find(1) second_instance = Timesheet.find(1) first_instance.approver = users(:approver) second_instance.approver = users(:approver2) assert first_instance.save, "First instance save succeeded" assert_raises ActiveRecord::StaleObjectError do second_instance.save end end end
The test passes, because calling save
on the second instance raises the expected ActiveRecord::StaleObjectError
exception. Note that the save
method (without the bang) returns false and does not raise exceptions if the save fails due to validation, but other problems such as locking in this case, can indeed cause it to raise exceptions.
To use a database column named something other than lock_version change the setting using set_locking_column
. To make the change globally, add the following line to environment.rb:
ActiveRecord::Base.set_locking_column 'alternate_lock_version'
Like other ActiveRecord settings, you can also change it on a per-model basis with a declaration in your model class:
class Timesheet < ActiveRecord::Base set_locking_column 'alternate_lock_version' end
Now of course, after adding optimistic locking, you don’t want to just leave it at that, or the end user who is on the losing end of the collision would simply see an application error screen. You should try to handle the StaleObjectError
as gracefully as possible.
Depending on the criticality of the data being updated, you might want to spend a lot of time crafting a user-friendly solution that somehow preserves the changes that the loser was trying to make. At minimum, if the data for the update is easily re-creatable, let the user know why their update failed with controller code that looks something like the following:
def update begin @timesheet = Timesheet.find(params[:id]) @timesheet.update_attributes(params[:timesheet]) # redirect somewhere rescue ActiveRecord::StaleObjectError flash[:error] = "Timesheet was modified while you were editing it." redirect_to :action => 'edit', :id => @timesheet end end
There are some advantages to optimistic locking. It doesn’t require any special feature in the database, and it is fairly easy to implement. As you saw in the example, very little code is required to handle the StaleObjectError
.
The disadvantages to optimistic locking are mainly that update operations are a bit slower because the lock version must be checked, and there is the potential for a bad user experience, since they don’t find out about the failure until after they’ve submitted potentially painful-to-lose data.
Pessimistic locking requires special database support (built into the major databases) and locks down specific database rows during an update operation. It prevents another user from reading data that is about to be updated, in order to prevent them from working with stale data.
Pessimistic locking is a fairly new addition to Rails, and works in conjunction with transactions as in the following example:
Timesheet.transaction do t = Timesheet.find(1, :lock=> true) t.approved = true t.save! end
It’s also possible to call lock!
on an existing model instance, which simply calls reload(:lock => true)
under the covers. You wouldn’t want to do that on an instance with attribute changes since it would cause them to be discarded by the reload.
Pessimistic locking takes place at the database level. The SELECT
statement generated by ActiveRecord will have a FOR UPDATE
(or similar) clause added to it, causing all other connections to be blocked from access to the rows returned by the select statement. The lock is released once the transaction is committed. There are theoretically situations (Rails process goes boom mid-transaction?!) where the lock would not be released until the connection is terminated or times out.
Web applications scale best with optimistic locking, which as we’ve discussed doesn’t really use any locking at all. However, you have to add application logic to handle failure cases. Pessimistic locking is a bit easier to implement, but can lead to situations where one Rails process is waiting on another to release a database lock, that is, waiting and not serving any other incoming requests. Remember that Rails processes are single-threaded.
In my opinion, pessimistic locking should not be super dangerous as it is on other platforms, since in Rails we don’t ever persist database transactions across more than a single HTTP request. In fact, I’m pretty sure it would be impossible to do that given the shared-nothing architecture.
A situation to be wary of would be one where you have many users competing for access to a particular record that takes a long time to update. For best results, keep your pessimistic-locking transactions small and make sure that they execute quickly.
In our first review of ActiveRecord’s find
method, we didn’t look at the wealth of options available in addition to finding by primary key and the :first
and :all
keyword parameters.
It’s very common to need to filter the result set of a find operation (just a SQL SELECT under the covers) by adding conditions (to the WHERE clause). ActiveRecord gives you a number of ways to do just that in the options hash optionally passed to the find
method.
Conditions are specified in the options hash as :conditions
and can be specified as a string, array, or hash representing the WHERE-part of a SQL statement. The array form should be used when the input data is coming from the outside world, a web form for instance, and should be sanitized prior to being sent to the database. Insecure, outside data is called tainted.
The simple string form can be used for statements that don’t involve tainted data. Finally, the hash form works much like the array form, except only equality is possible. If all you need is equality, versus, say LIKE criteria, I advise you to use the hash notation, since it’s safe and arguably the most readable of the bunch.
The Rails API docs examples do a pretty good job illustrating usage of the :conditions
option:
class User < ActiveRecord::Base def self.authenticate_unsafely(login, password) find(:first, :conditions => "login='#{login}' AND password='#{password}'") end def self.authenticate_safely(login, password) find(:first, :conditions => ["login= ? AND password= ?", login, password]) end def self.authenticate_safely_simply(login, password) find(:first, :conditions => {:login => login, :password => password}) end end
The authenticate_unsafely
method inserts the parameters directly into the query and is thus susceptible to SQL-injection attacks if the user_name and password parameters come directly from a HTTP request. A malicious end user could supply his own evil SQL query inside the string that was intended to just be a login or password.
The authenticate_safely
and authenticate_safely_simply
methods-both will sanitize the user_name and password before inserting them in the query, which will ensure that an attacker can’t escape the query and fake the login (or worse).
When using multiple parameters in the conditions, it can easily become hard to read exactly what the fourth or fifth question mark is supposed to represent. In those cases, you can resort to named bind variables instead. That’s done by replacing the question marks with symbols and supplying a hash with values for the matching symbol keys.
Again, the Rails API docs give us a pretty good example (modified for brevity):
Company.find(:first, [ " name = :name AND division = :div AND created_at > :date", {:name => "37signals", :div => "First", :date => '2005-01-01' } ])
During a quick discussion on IRC about this final form, Robby Russell gave me the following clever snippet:
:conditions => ['subject LIKE :foo OR body LIKE :foo', {:foo => 'woah'}]
In other words, when you’re using named placeholders (versus question mark characters) you can use the same bind variable more than once. Cool!
Simple hash conditions like this are very common and useful:
:conditions => {:login => login, :password => password})
They will only generate conditions based on equality with SQL’s AND
operator. If you want something other than AND,
you’ll have to use one of the other forms available.
It’s particularly important to take care in specifying conditions that include boolean values. Databases have various different ways of representing boolean values in columns. Some have native boolean datatypes, and others use a single character, often ‘1’ and ‘0’ or ‘T’ and ‘F’ (or even ‘Y’ and ‘N’).
Rails will transparently handle the data conversion issues for you if you use arrayed or hash conditions and use a Ruby boolean value as your parameter:
Timesheet.find(:all, :conditions => ['submitted=?', true])
The :order
option takes a fragment of SQL specifying the ordering of columns:
Timesheet.find(:all, :order => 'created_at desc')
The SQL spec defaults to ascending order if the ascending/descending option is omitted.
The value of the :order
option is not validated by Rails, which means you can pass any code that is understood by the underlying database, not just column/direction tuples. An example of why that is useful is when wanting to fetch a random record:
# MySQL Timesheet.find(:first, :order => 'RAND()') # Postgres Timesheet.find(:first, :order => 'RANDOM()') # Microsoft SQL Server Timesheet.find(:first, :order => 'NEWID()') # Oracle Timesheet.find(:first, :order => 'dbms_random.value')
Remember that ordering large datasets randomly is known to perform terribly on most databases, particularly MySQL.
The :limit
parameter takes an integer value establishing a limit on the number of rows to return from the query. The :offset
parameter specifies the offset from where the rows should be fetched in the result set and is 1-indexed. Together these options are used for paging results.
For example, a find for the second page of 10 results in a list of timesheets is:
Timesheet.find(:all, :limit => 10, :offset => 11)
Depending on the particulars of your application’s data model, it may make sense to always put some limit on the maximum amount of ActiveRecord objects fetched in any one specific query. Letting the user trigger unbounded queries pulling thousands of ActiveRecord objects into Rails at one time is a recipe for disaster.
By default, the :select
option is ‘*’ as in SELECT * FROM
, but it can be changed if, for example, you want to do a join, but not include the joined columns. Or you might want to include calculated columns in the result set:
>> b = BillableWeek.find(:first, :select => "monday_hours + tuesday_hours + wednesday_hours as three_day_total") => #<BillableWeek:0x2345fd8 @attributes={"three_day_total"=>"24"}>
When using :select
, as in the preceding example, keep in mind that columns not specified in the query, whether by * or explicitly, will not be populated in the resulting objects! So, for instance, continuing the preceding example, trying to access monday_hours
on b
has unexpected results:
>> b.monday_hours NoMethodError: undefined method `monday_hours' for #<BillableWeek:0x2336f74 @attributes={"three_day_total"=>"24"}> from activerecord/lib/active_record/base.rb:1850:in `method_missing' from (irb):38
To get the object’s normal columns plus the calculated column, add a *,
to the :select
parameter:
:select => '*, monday_hours + tuesday_hours + wednesday_hours as three_day_total'
The :from
option specifies the table name portion of the generated SQL statement. You can provide a custom value if you need to include extra tables for joins, or reference a database view.
Here’s an example of usage from an application that features tagging:
def find_tagged_with(list) find(:all, :select => "#{table_name}.*", :from => "#{table_name}, tags, taggings", :conditions => ["#{table_name}.#{primary_key}=taggings.taggable_id and taggings.taggable_type = ? and taggings.tag_id = tags.id and tags.name IN (?)", name, Tag.parse(list)]) end
If you’re wondering why table_name
is used instead of a an explicit value, it’s because this code is mixed into a target class using Ruby modules. That subject is covered in Chapter 9, “Advanced ActiveRecord
.”
An attribute name by which the result should be grouped. Uses the GROUP BY SQL-clause. Generally you’ll want to combine :group with the :select
option, since valid SQL requires that all selected columns in a grouped SELECT be either aggregate functions or columns.
>> users = Account.find(:all, :select => 'name, SUM(cash) as money', :group => 'name') => [#<User:0x26a744 @attributes={"name"=>"Joe", "money"=>"3500"}>, #<User:0xaf33aa @attributes={"name"=>"Jane", "money"=>"9245"}>]
Keep in mind that those extra columns you bring back are strings—ActiveRecord doesn’t try to typecast them. You’ll have to use to_i
and to_f
to explicitly convert to numeric types.
>> users.first.money > 1_000_000 ArgumentError: comparison of String with Fixnum failed from (irb):8:in '>'
Specifying the :lock => true
option on a find operation, within the scope of a transaction, establishes an exclusive lock on the rows selected. This option is covered in detail earlier in this chapter, in the section “Database Locking”.
The :joins
option can be useful when you’re performing GROUP BY and aggregating data from other tables, but you don’t want to load the associated objects.
Buyer.find(:all, :select => 'buyers.id, count(carts.id) as cart_count', :joins => 'left join carts on carts.buyer_id=buyers.id', :group => 'buyers.id')
However, the most common usages of the :joins
and :include
options are to allow you to eager-fetch additional objects in a single SELECT statement. We cover the subject in Chapter 7.
Specifying the :readonly => true
option marks returned objects as read-only. You can change their attributes, you just can’t save them back to the database.
>> c = Comment.find(:first, :readonly => true) => #<Comment id: 1, body: "Hey beeyotch!"> >> c.body = "Keep it clean!" => "Keep it clean!" >> c.save ActiveRecord::ReadOnlyRecord: ActiveRecord::ReadOnlyRecord from /vendor/rails/activerecord/lib/active_record/base.rb:1958
Connections are usually created through ActiveRecord::Base.establish_connection
and retrieved by ActiveRecord::Base.connection
. All classes inheriting from ActiveRecord::Base
will use this connection. What if you want some of your models to use a different connection? ActiveRecord allows you to add class-specific connections.
For example, let’s say you have a subclass of ActiveRecord::Base
named LegacyProject
with data residing in a database apart from the one used by the rest of your Rails application. Begin by adding details for the additional database under its own key in database.yml
. Then call LegacyProject.establish_connection
to make LegacyProject and all its subclasses use the alternate connection instead.
Incidentally, to make this example work, you must specify self.abstract_class = true
in the class context. Otherwise, Rails considers the subclasses of LegacyProject
to be using single-table inheritance (STI), which we discuss at length in Chapter 9.
class LegacyProject < ActiveRecord::Base establish_connection :legacy_database self.abstract_class = true ... end
The establish_connection
method takes a string (or symbol) key pointing to a configuration already defined in database.yml
. Alternatively, you can pass it a literal hash of options, although it seems kind of messy to put this sort of configuration data right into your model file instead of database.yml
class TempProject < ActiveRecord::Base establish_connection(:adapter => 'sqlite3', :database => ':memory:') ... end
Rails keeps database connections in a connection pool inside the ActiveRecord::Base
class instance. The connection pool is simply a Hash
object indexed by ActiveRecord class. During execution, when a connection is needed, the retrieve_connection
method walks up the class-hierarchy until a matching connection is found.
It is possible to use ActiveRecord’s underlying database connections directly, and sometimes it is useful to do so from custom scripts and for one-off or ad-hoc testing. Access the connection using the connection
attribute of any ActiveRecord class. If all your models use the same connection, then use the connection attribute of ActiveRecord::Base
.
The most basic operation that can be done with a connection is simply an execute
, from the DatabaseStatements
module (detailed in the following section). For example, Listing 6.2 shows a method that executes a SQL file statement by statement.
The ActiveRecord::ConnectionAdapters::DatabaseStatements
module mixes a number of useful methods into the connection object that make it possible to work with the database directly instead of using ActiveRecord models. I’ve purposely left out some of the methods of this module (such as add_limit!
and add_lock
) because they are used internally by Rails to construct SQL statements dynamically and I don’t think they’re of much use to application developers.
Begins a database transaction manually (and turns off ActiveRecord’s default autocommitting behavior).
Commits the transaction (and turns on ActiveRecord’s default autocommitting behavior again).
Executes a SQL DELETE statement provided and returns the number of rows affected.
Executes the SQL statement provided in the context of this connection. This method is abstract in the DatabaseStatements
module and is overridden by specific database adapter implementations. As such, the return type is a result set object corresponding to the adapter in use.
Executes an SQL INSERT statement and returns the last autogenerated ID from the affected table.
Used in Oracle and Postgres; updates the named sequence to the maximum value of the specified table’s column.
Rolls back the currently active transaction (and turns on auto-committing). Called automatically when a transaction block raises an exception or returns false.
Returns an array of record hashes with the column names as keys and column values as values.
ActiveRecord::Base.connection.select_all("select name from businesses order by rand() limit 5") => [{"name"=>"Hopkins Painting"}, {"name"=>"Whelan & Scherr"}, {"name"=>"American Top Security Svc"}, {"name"=>"Life Style Homes"}, {"name"=>"378 Liquor Wine & Beer"}]
Works similarly to select_all
, but returns only the first row of the result set, as a single Hash with the column names as keys and column values as values. Note that this method does not add a limit clause to your SQL statement automatically, so consider adding one to queries on large datasets.
>> ActiveRecord::Base.connection.select_one("select name from businesses order by rand() limit 1") => {"name"=>"New York New York Salon"}
Works just like select_one
, except that it returns a single value: the first column value of the first row of the result set.
>> ActiveRecord::Base.connection.select_value("select * from businesses order by rand() limit 1") => "Cimino's Pizza"
Works just like select_value
, except that it returns an array of the values of the first column in all the rows of the result set.
>> ActiveRecord::Base.connection.select_values("select * from businesses order by rand() limit 5") => ["Ottersberg Christine E Dds", "Bally Total Fitness", "Behboodikah, Mahnaz Md", "Preferred Personnel Solutions", "Thoroughbred Carpets"]
The full list of methods available on connection
, which returns an instance of the underlying database adapter, is fairly long. Most of the Rails adapter implementations define their own custom versions of these methods, which makes sense, since all databases have slight variations in how they handle SQL and very large variations in how they handle extended commands, such as for fetching metadata.
A peek at abstract_adapter.rb shows us the default method implementations:
... # Returns the human-readable name of the adapter. Use mixed case - one # can always use downcase if needed. def adapter_name 'Abstract' end # Does this adapter support migrations? Backend specific, as the # abstract adapter always returns +false+. def supports_migrations? false end # Does this adapter support using DISTINCT within COUNT? This is +true+ # for all adapters except sqlite. def supports_count_distinct? true end ...
In the following list of method descriptions and code samples, I’m accessing the connection of our sample time_and_expenses application in the Rails console, and I’ve assigned connection
to a local variable named conn
, for convenience.
Returns the human-readable name of the adapter, as in the following example:
>> conn.adapter_name => "SQLite"
Closes the active connection or closes and opens a new one in its place, respectively.
Provides access to the underlying database connection. Useful for when you need to execute a proprietary statement or you’re using features of the Ruby database driver that aren’t necessarily exposed in ActiveRecord. (In trying to come up with a code sample for this method, I was able to crash the Rails console with ease—there isn’t much in the way of error checking for exceptions that you might raise while mucking around with raw_connection
.)
Indicates whether the adapter supports using DISTINCT within COUNT in SQL statements. This is true
for all adapters except SQLite, which therefore requires a workaround when doing operations such as calculations.
Produces a list of tables in the underlying database schema. It includes tables that aren’t usually exposed as ActiveRecord models, such as schema_info
and sessions
.
>> conn.tables => ["schema_info", "users", "timesheets", "expense_reports", "billable_weeks", "clients", "billing_codes", "sessions"]
In addition to the configuration options used to instruct ActiveRecord on how to handle naming of tables and primary keys, there are a number of other settings that govern miscellaneous functions. Set them in config/environment.rb.
ActiveRecord::Base.colorize_logging
. Tells Rails whether or not to use ANSI codes to colorize the logging statements committed by the ActiveRecord connection adapter. The colors make it much easier to read the logs (except on Windows) and may complicate matters if you use software like syslog. Defaults to true
. Change to false
if you view your logs with software that doesn’t understand the ANSI color codes.
Here’s a snippet of log output with the ANSI codes visible:
^[[4;36;1mSQL (0.000000)^[[0m ^[[0;1mMysql::Error: Unknown table 'expense_reports': DROP TABLE expense_reports^[[0m ^[[4;35;1mSQL (0.003266)^[[0m ^[[0mCREATE TABLE expense_reports (`id` int(11) DEFAULT NULL auto_increment PRIMARY KEY, `user_id` int(11)) ENGINE=InnoDB^[[0m
ActiveRecord::Base.default_timezone
. Tells Rails whether to use Time.local
(using :local
) or Time.utc
(using :utc
) when pulling dates and times from the database. Defaults to :local
ActiveRecord::Base.allow_concurrency
. Determines whether or not to create a database connection for each thread, or to use a single shared connection for all threads. Defaults to false
and given the number of warnings and horror-stories[6][7] that follow any mention of this option online, it would be prudent to leave it in its default setting. Setting this option to true
is known to cause excessive use of database connections.
ActiveRecord::Base.generate_read_methods
. Determines whether to speed up access by generating optimized reader methods to avoid expensive calls to method_missing when accessing attributes by name. Defaults to true
.
ActiveRecord::Base.schema_format
. Specifies the format to use when dumping the database schema with certain default rake tasks. Use the :sql
option to have the schema dumped as potentially database-specific SQL statements. Just beware of incompatibilities if you’re trying to use the :sql
option with different databases for development and testing.
The default option is :ruby
, which dumps the schema as an ActiveRecord::Schema
file that can be loaded into any database that supports migrations.
This chapter covered the fundamentals of ActiveRecord, the framework included with Ruby on Rails for creating database-bound model classes. We’ve learned how ActiveRecord expresses the convention over configuration philosophy that is such an important part of the Rails way, and how to make settings manually, which override the conventions in place.
We’ve also looked at the methods provided by ActiveRecord::Base
, the parent class of all persistent models in Rails, which include everything you need to do basic CRUD operations: Create, Read, Update, and Delete. Finally, we reviewed how to drill through ActiveRecord to use the database connection whenever you need to do so.
In the following chapter, we continue our coverage of ActiveRecord by learning about how related model objects interact via associations.
1. | http://jayfields.blogspot.com/2006/12/rails-migrations-with-large-team-part.html |
2. | If you find it annoying that strings and symbols are used pretty much interchangeably throughout Rails, welcome to the club. |
3. | A recommended open-source Money class is available at http://dist.leetsoft.com/api/money/. |
4. | Sanitization prevents SQL injection attacks. For more information about SQL injection and Rails see http://www.rorsecurity.info/2007/05/19/sql-injection/. |
5. | http://activereload.net/2007/5/23/spend-less-time-in-the-database-and-more-time-outdoors. |
6. | Microsoft’s ADO library doesn’t support reporting back the number of affected rows, so |
7. | Read http://permalink.gmane.org/gmane.comp.lang.ruby.mongrel.general/245 for Zed Shaw’s explanation of the dangers of |
3.138.36.38