Get an Access Database of Player and Team Statistics

Get a free database of historical baseball data from the Internet (covering every major league game from 1871 through today) in Microsoft Access format.

Suppose you want to know the average ERA for your fantasy league. Or maybe you want to settle a bet with a friend about which players got on base most often during the 1970s. Or perhaps you want to show that Jim Thome was a clutch hitter for the Phillies in 2004. To do this, you’ll need to find some statistics.

Sometimes you can find the statistics you want from MLB.com, the Baseball Reference, or the Baseball Prospectus (see “Follow the Game Online” [Hack #5] for some suggestions), but other times you won’t find exactly what you want. You might be able to find this information online by spending hours tediously searching for the raw data, cutting and pasting the data into a spreadsheet, and producing the stats you want in the form you want.

I think that it’s often easier to find the stats you want if you have your own database. This hack shows you the easiest way I know to get a database of baseball players. These databases include the total statistics, by year, for each baseball player. Later in this book, I’ll show how to find records by game and play-by-play information. I’ll also show how to get data in two common database formats: Microsoft Access and MySQL, the popular (and free!) open source database.

The information in these databases is identical, so you can pick whichever format is easiest for you. (When I wrote this book, I used a MySQL database containing the Baseball DataBank information. If you want to follow along with all the hacks in this book, you’ll probably find it easiest to use the MySQL version. However, I do include some tips for using Microsoft Access. The idea of this book is to do things the easy way: if Microsoft Access is the easiest tool for you, go ahead and use it.)

A Player and Team Statistics Database for Microsoft Access

If you are using Microsoft Windows and have Microsoft Access, you will probably find this the easiest place to start. You can download the file from the Baseball Archive web site at http://www.baseball1.com. (Notice the number 1 at the end of the word baseball in the URL.)

Step 1: Download the file.

The Baseball Archive web site (http://www.baseball1.com) distributes a ready-to-use database in Microsoft Access format. You are free to use this database for noncommercial use, but the authors request a donation. (Again, notice the 1 in the domain name. If that doesn’t work, just type the words “baseball archive” in Google and click the I’m Feeling Lucky button. I can’t promise it will work by the time this book is published, but it gives me the right result now.)

Tip

Sean Lahman originally developed, and still maintains, this database. This is worth mentioning, because the database files have names like lahman51.mdb and because people sometimes refer to this as “the Lahman database.”

You can download the current version of the database from http://baseball1.com/statistics. You should download the version for Access 2000 (if you are using Microsoft Access 2000 or later); otherwise, download the version for Access 97.

Step 2: Decompress and save the file.

The file is distributed as a single zipped file. You need to decompress this file and copy the contents to a local drive to use the database. On a Windows machine, you can use whatever utility you like (as of Windows XP, a zipped folders application is included). I decompressed this to my desktop.

Step 3: Open the database file.

You can now open the database file. Double-click the icon (for Version 5.1, it is called lahman51.mdb). You will see a screen like the one shown inb Figure 2-1.

Step 4: Test the database.

As a quick test, try opening the 500 HR Club query. You should see a table showing all players with 500 or more home runs over the course of their careers. If so, everything is fine and you’re ready to start.

Lahman database

Figure 2-1. Lahman database

The Contents of the Database

Version 5.2 of the Baseball Archive database currently contains 21 tables and includes statistics through the 2004 season. Here is a short description of these tables:

Master

This table contains biographical information about each player, including their full name, birth date, country of origin, and batting and throwing hands. Each player has a unique ID (called playerID) that is referenced from other tables in the database.

Batting

This table contains batting statistics for each player, on each team, during each season. Rows are uniquely identified by playerID, teamID, yearID, and stint.

Fielding

This table contains fielding statistics for each player, on each team, during each season. Rows are uniquely identified by playerID, teamID, yearID, and stint.

FieldingOF

This table tells you how much time outfielders spent in each fielding position. Rows are uniquely identified by playerID, yearID, and stint.

Pitching

This table contains pitching statistics for each player, on each team, during each season. Rows are uniquely identified by playerID, teamID, yearID, and stint.

Teams

This table contains information on each team for each season, including aggregate batting statistics, pitching statistics, the team record, and post-season performance. Each line is uniquely referenced by yearID and teamID.

TeamsHalf

This table shows win-loss records for each team, midway through each season.

TeamsFranchises

This table includes the full name of each team, indexed by the franchIDfield.

Allstar, AwardsVotes, AwardsWinners, AwardsShareManager, AwardsSharePlayers, Managers, ManagersHalf, Salaries, Transactions, HallOfFame, BattingPost, PitchingPost, and FieldingPost

These tables contain what you would expect; I don’t use any of these in this book, so I’m not going to describe them in depth.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.47.163