Archive for December, 2007

XML Box Scores Now Available

It’s been a little while, due to waiting on the arrival of a new laptop, but my stats parser for generating NFL box scores in an XML format is finally ready to be “beta-tested.” What I mean by this is, I have XML box scores for the entire 2006 season, which I’m going to make available for download. Due to the sheer volume of statistics available, there’s no way that I can test the accuracy of each file, but I will say that I have run a few tests that look at all of the games for the season and I came up with the same season leaders in multiple categories as nfl.com has on their web site.

Unfortunately, WordPress doesn’t allow me to upload either zip files or xml files, so I’ve uploaded them to a third-party site, mediafire.com and they can be found at this link,  http://www.mediafire.com/?23mtrjxwfmy . That file unzips to a folder that contains a folder for each of the 17 weeks of the 2006 season, each of which contains all of the games played in that week. The format is very similar to the one mentioned here, NFL Box Score XML Format, with a few additions found in the gamebook for the game.

If you download the files and look at the xml files, please leave a comment to let me know what you think of the format, or if there are any improvements you’d like to see. I’m planning on using the 2006 season as a “test-run” sort of thing while I work on a gamebook parser that will be included in the 2007 season (and later improve the 2006). Thanks, and enjoy!

Advertisements

NFL “Official” Analysis

No, no, I’m not making any claims to be the best or official analyzer for the NFL. Rather, I was wondering whether anyone has ever attempted to perform some sort of analysis on the officiating crews for NFL games. With all the hoopla and conspiracy theories regarding whether the NFL wants the Patriots to continue their undefeated season, I thought it would be interesting to see if there is any correlation between certain officials reffing certain games, and then I realized there would probably not be enough data to see how a crew performs on a team-by-team basis, but there would be enough data to compare them to other officiating crews throughout the season in terms of how many penalties they call as opposed to other crews. Once I finish my gamebook parser with the play-by-play I’ll also know how many of what type of penalty was called and so you can find the crews with the most holding calls, pass interference, etc.

I’m still working on gathering all the data for this (I have the parsing program working for certain sections, but now I have the tedious task of downloading all the pdfs and converting them to html for all the games) but once I have it, can anyone suggest the types of regressions I should perform to try to discover something useful? I have some background in statistics, but not as much as I’d like, though I’m definitely willing to learn. I’m planning on including the officials data in my xml file of the box score for the game, so everyone else can use that data as well.

Bills’ Playoff Forecast

So it’s time that I let you all in on the disappointing secret that I am a Buffalo Bills fan. In case you were wondering, from childhood, I selected sports franchises that were doom to let me down – the Bills, Buffalo Sabres, and Chicago Cubs – and even my college team, NC State. Now that I have divulged that embarrassing piece of information, I’ll let you in on a little pastime of mine that comes around the second week of December. I begin the long-drawn process of figuring out the exact scenarios necessary for my dreams of the Bills making the playoffs to come true.

Here we are again – Buffalo teetering on mediocrity at 6-6 with 4 games to go: Miami, Cleveland, New York Giants, and Philly. I consider the first two, Miami and Cleveland, to be “must-win” for all intents and purposes. Miami, because it is a conference game, and an important game for the Bills to gain some momentum, and Cleveland because they are currently one game ahead of the Bills in the AFC Wild-Card race. A victory there would not only tie them up in the race, but also give them the head-t0-head tiebreaker in a two-way tie for the last wildcard spot. At 8-6, the Bills would almost control their own destiny. I’ll wait one week to start my exact scenarios but I wanted to mention the chances that the nfl-forecast.com blog, powered by Brian Burke‘s prediction model give the Bills. After Week 12 the Bills had just a 3% chance of obtaining one of the two AFC wild cards. Granted this number probably went up after the Bills’ victory and a loss by the Browns, I still think it will be a little low.

I have a lot of respect for Brian’s prediction methodology and think it’s the best model that I’ve come across, it can’t take things like the return of Marshawn Lynch, or Trent Edwards taking over the starting job. For now, I’ll just hold onto the glimmer of hope my teams always give me, before I’m brutally crushed.

UPDATE!!

NFL-forecast.com has released its post-Week 13 playoff predictions and the Bills bring home a whopping 17.88% chance of making the playoffs. Now my heart will only be broken a little more than 4 out of 5 times!

Availability of Free NFL Statistics

As I’ve been trying to work on my program which parses an NFL.com box score, I have to wonder why there is such a paucity of usable statistics, in not only the NFL but other professional sports leagues. Sure there are stats out there, but it seems to be an either/or choice in terms of either comprehensiveness and timeliness (i.e. nfl.com’s stats) or ease of use and downloadability (i.e. the csv files at pro-football-reference.com). Now I can understand why a site like ESPN or SI.com wouldn’t be able to make the stats it shows available for download, since it gets them from the stats conglomerate stats.com. However, the NFL shouldn’t have any legal qualms about making their stats available for download. So you have to wonder what is stopping the NFL from doing this.

Is it an issue of time spent to put them in a format available to download, or a question of technical server demands? I highly doubt it. Is it pressure from companies like Stats Inc. wanting to maintain a quasi-monopoly on the market? Probably not? The most likely reason is money. The NFL is probably trying to work on a way that they can make the stats available, but for a price. So, if the stats are easily accessible for free now, but at some point down the line they decide to make them available for a fee, they have just lost some money.

So if money is an issue, why not start making these stats available for a small fee to personal (non-commercial) use. Or attach a disclaimer that they can only be used for non-commercial use but sell a commercial license? On the surface there seems very little reason for the stats to be available on the site on separate pages for free, but no database dump available for download.