NFL Box Score XML Format

So, the work on the scraping program for putting NFL data and statistics into a parsable, useful format is coming along well. I’ve got a sample box score in html format (from and the corresponding xml file that my program creates. I’ll post the links here at the bottom, and what I’m looking for right now is some input about the format. I tried to keep the xml file schema as close to the logical layout found on the box score page as possible.

The initial format contains a “game-metadata” section, and two “team” sections. The game-metadata section consists of the names of the two teams playing and is also a placeholder for a bunch of information that I’d like to include in the future, such as the date, day of the week, weather, surface of the playing field, whether or not it’s a dome, etc.

Each team has it’s own section with “team-metadata”, like team name, win or loss, current record, etc. It also houses all of the team stats from the box score, and they’re labeled intuitively, like passingTouchdowns, or fumblesLost. Also, they’re available in a format that can easily be parsed from a String to an integer. For example, what is read in the box score as 13-25 (passing completions and attempts) is listed in the xml file as two separate fields, passingComp=”17″ and passingAtt=”26″, so that you don’t need to worry about problems with converting to an int.

Also under the “team” tag is a section for individual player stats. These are broken down into different categories: passing, rushing, receiving, fumbles, kicking, punting, kickoff returns, punt returns, and defense. Within each category is a list of all the players who recorded a stat for that particular category. So Trent Edwards is listed under the passing and rushing category but not kickoff returns or defense. The other option that I was considering was just listing each player for a team with all of his stats together, rather than separating them by category. If you leave input, please keep this in mind.

Without further ado, here is the link to the original box score at Bills’ defense stifles Jets in victory

And here is the link for the xml file that my program generated: buf-nyj.xml

The xml file is currently hosted on my pesonal school web space, since WordPress has restrictions on uploading xml files. Any advice on a place to permanently store the full set of stats would be appreciated as well. Please keep in mind this is only a first trial and sample format and that in order to get the most use out of it, your input is needed!


2 Responses to “NFL Box Score XML Format”

  1. 1 Joel Marcey December 4, 2007 at 3:10 pm

    Now that I know some of the alternatives and their cost, I think this a wonderful idea. The format you have is a good first cut. XML is a great way to store information given its flexibility. As far as storage, I think if you have a web site, you can store the stats on your web host.

  1. 1 XML Box Scores Now Available « 5146 Games Trackback on December 14, 2007 at 4:49 pm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: