Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Hugh Douglas Admits To Stealing From Jaguars

An introduction to SQL or the Nerdiest Fanpost you'll see on AN

 

I bet I know what you’re thinking. I love baseball stats and this new statistical revolution in my favorite sport has really enhanced my understanding. But, darn it, sites like Fangraphs simply aren’t geeky enough for me.

I mean, who hasn’t had that thought?

Fangraphs is a fantastic site. I can look at things like player A’s wRAA and see that linear weights suggest he was worth a certain number of batting runs last season. I can see what percentage of pitches Player B swung out that were outside the strike zone. I can look at hundreds of different calculations and can analyze pretty much to my heart’s desire.

But, I’m selfish. What if I want more? I want the ability to calculate things that aren’t available at Fangraphs, using tons of data. Maybe I simply want to learn how to use pitchFX, that new tool that you see everywhere with them fancy complicated graphs? Well, this fanpost is a very basic look at a tool that’s been making it’s way from StatSpeak to the Book Blog to the Hardball Times to Beyond the Boxscore.

Star-divide

This offseason I’ve probably overextended myself. I started by creating a series that I’ve kinda completely left unfinished 20% of the way through. I also started reading most of the book Mathletics by Wayne Winston. Winston’s book was interesting* and inspired me. The book presented some of the things one could do with baseball data in Excel.

*If you want to read a review of it from a sabermetrician’s perspective, go here.

Over the course of the offseason, I’ve spent many an hour developing Excel sheets analyzing our national past time. To give you a taste, I can tell you that a lineup made entirely of 9 Barry Bondses from the year 2004 would score approximately 4,074 runs in a season based on a Monte Carlo estimation. I can tell you that based on CHONE, Marcel, Oliver and ZiPS, Marcel was the best at estimating wRAA. I can tell you that Raj Davis performed in the 80th percentile* in projection systems.

*How I defined this was I ranked all players by the formula

proj wRAA – actual wRAA

in each projection system. I averaged these ranks and ranked those averages. Raj Davis outperformed 79.8% of players this way. The highest on the 2009 A’s team: Adam Kennedy, 85th percentile.

The point is I messed around with Excel for ages and wrote a ton of formulas that were pretty much useless, although quite entertaining. Well now I give you a better way to waste one’s time writing formulas: SQL!

WARNING: To follow along with the rest of this fanpost will require quite a few downloads but not a penny of money. Every download mentioned is completely free (of viruses and of the need for cash).  Some of my links may open files you may need to download.  These links can be figured out through context.  If you don't wish to actually try SQL, you can follow along once we get to the commands by looking at my tables which are exported from SQL to Excel to HTML.

SQL (pronounced sequel) is a database language. In fact, it stands for structured query language. Sound geeky enough for you?

But don’t fret yet. I had never heard of it before I decided to learn it. I’m far from a master, but I’ve been messing with it only a few days and have gotten a pretty good grasp of the basics so far. Let’s start where I started. This article by Colin Wyers is one of the best introductions to using SQL for baseball.

I’ve managed to get through the entire article except his last bit of code (the Marcel projections). I recommend opening Colin's article in a new window. I’ll give you the basics of the article as it never hurts to get a second wording of information. I figure, as a neophyte, my explanations should not be overly complex.

Onto the first step: the software. Also, as a note, I'm using PC and so is Colin in his article. Any other OS might need to use different software. In this case, the first software needed is SQL itself. Colin suggests MySQL in his article. Here is the appropriate section on what to download:

First, you need to download and run the installer for MySQL 5.4 CE Essentials. Choose a "typical" install. You'll typically have to wait a few minutes for files to copy. Then there will be some ads. Skip over them. Make sure that "Configure the MySQL server now" is checked. (I uncheck the box about registering the software with MySQL, but that's up to you.) Chose "Detailed configuration." Then you'll go through a set of pages asking you question. Here are the answers: 1.Developer Machine 2. Non-Transactional Database Only 3.Decision Support (DSS)/OLAP Continue on, using the defaults. When prompted, either add a password or uncheck the box to alter security settings. (This will leave your root password blank.) Once you're finished, you shouldn't have to start MySQL at all—it should be ever vigilant, waiting for you to command it to do your sabermetric bidding.

The next thing we need is a GUI to use MySQL. The link in Colin’s article links to a download page that only has free trials. I use the community version. You can download it here. Now that we have SQLyog and SQL, all we need is data. Again, I’ll quote Colin’s article on how to get some. In this case, we’ll work with Baseball Databank’s data:

First, download the ZIP file. Extract the .SQL file from the ZIP, and remember where you put it.

Now go into SQLyog. In SQLyog there will be three panes. The left-hand side should list your databases, the top is where you write queries, and the bottom shows the results. Let's add a database. Right-click in the pane, and select "Create Database." Call it "BDB," leaving off the punctuation. Leave the rest as the defaults, and click okay.

Now, let's populate the database. Right-click on the database, and select "Restore from SQL Dump." There should be a button with an ellipsis (...) in it. Click that, and select the file we extracted from the ZIP. Select Execute. It'll ask you if you want to execute the queries in database BDB. You do. This should take a little while.

Once that's done, go into your database and click on the folder that says Tables. If nothing is there, try hitting "F5" on the top row of your keyboard to refresh the browser. Go ahead and click on batting, and then slide on over to the right and click on "Table Data." Scroll around for a bit. If you've ever used a spreadsheet or looked at a table before, the layout should feel familiar.

Here is where you may hit your first roadblock. Unzipping files can be a bit of a pain. If you can’t unzip the files by simply double clicking, you may need to download another program. One of the better ones for this task is 7-Zip.

You may also have trouble with the tables loading. Most of the time these will import, but may not show up. Refreshing multiple times is very helpful.

So, now you should have the toolbox of tools needed to reside in the top level of geekiness, even among sabermetricians.  Now what do you do with these tools? In layman's terms, we have a database with lots of spreadsheets in it. To interact with this database, we need to talk to it. We do this task by writing queries.

To learn basic (but powerful!) queries you only need a few commands in SQL.  I'll (attempt to) teach you these queries by teaching you what each term means (although most are self-explanatory), giving you a code, and showing you the results.  You can copy-paste my code but I recommend typing it just so you get the hang of it.

The first of the commands we need is SELECT. SELECT is a very flexible command that will start almost all of your first queries. You tell the system what you want it to return. FROM is the second most basic command; it tells the computer where to search for what you’ve selected.

Open SQLyog. Browse through some of the tables you now have. Go to batting. It should have a bunch of columns named "playerid, yearid, stint…" Now we want to write a query. Go to the box in the top right where it says "1". If we simply want the database to return the entire batting table, we type:

SELECT *
     FROM BATTING;

Click "F9". This will execute the query. The asterisk tells the computer to select everything. The semicolon tells the computer that the query is finished. The query should display the entire batting table, which is too large to reproduce here.

Now, if we wanted to narrow our table and have only the specific columns playerid, yearid, and HR, we would write:

SELECT playerid, yearid, HR
 	FROM BATTING;

The result:

playerid           yearid                HR
abercda01 1871 0
addybo01 1871 0
…. …. ….

 

That's much more manageable if still pretty useless.  Next we have the commands AND and WHERE. WHERE allows us to be more specific with our requests. AND combines requests:

SELECT playerid, yearid, HR
 	FROM BATTING
        WHERE teamid= 'oak' AND yearid = 2009;

You must put the OAK in single quotes. You can copy-paste this command into SQL if you want, but typing it out will help you learn it better. This command returns a list of all the 2009 A’s and their corresponding HRs:

playerid yearid HR
cahiltr01 2009 0
outmajo01 2009 0
cunniaa01 2009 1
mazzavi01 2009 0
.....
.....
....

 

However, we don't really care that Trevor Cahill hit 0 HRs.  It would be much nicer to the eyes if we could order them. The command for this request is simply ORDER BY:

SELECT playerid, yearid, HR
 	FROM BATTING
        WHERE teamid= 'oak' AND yearid = 2009
        ORDER BY HR DESC; 

The ORDER BY command defaults to ordering ascendingly. Adding DESC makes the table prettier to us, as we normally think of the A’s HR leaders in descending order.

Here's what the query gives:

playerid          yearid                HR
custja01 2009 25
suzukku01 2009 15
hollima01 2009 11
giambja01 2009 11
kennead01 2009 11
...... ......
......

 

The next command that will take a little work to master is joins. There are multiple ways to join, but I’ll explain the way that makes the most sense to me. The point of a join is to combine certain statistics from different tables. For example, if we wanted a player’s actual name instead of his player id we would do this:

SELECT master.nameLast, master.nameFirst, batting.yearid, batting.HR
 	FROM batting, master 
        WHERE master.playerid=batting.playerid 
              AND teamid='oak' AND yearid = 2009
        ORDER BY HR DESC;

Players’ actual names are stored under the "master" table, while the "batting" table only carries their playerids. However these playerids are also stored in the master table. Using the command

WHERE master.playerid=batting.playerid

allows us to combine multiple columns within the master table as well as the batting table.

Here is the result of the final query:

nameLast nameFirst          yearid                HR
Cust Jack 2009 25
Suzuki Kurt 2009 15
Holliday Matt 2009 11
Giambi Jason 2009 11
...... ....... ....... ......

 

A couple more functions and their use:

 

AS: Allows you to use a formula and renames the column

SELECT playerid, yearid, AB, H/AB AS AVG
 	FROM BATTING 
WHERE teamid= 'oak' AND yearid = 2009 ORDER BY AVG DESC;
playerid           yearid                 AB              AVG
gonzaed01 2009 2 1
cahiltr01 2009 2 0.5
davisra01 2009 390 0.3051
sweenry01 2009 484 0.2934
kennead01 2009 529 0.2892
.......
........
.......
........

 

HAVING: Allows one to set a limitation on a group.

SELECT playerid, yearid, AB, H/AB AS AVG
 	FROM BATTING 
        WHERE teamid= 'oak' AND yearid = 2009
        HAVING AB>=50
        ORDER BY AVG DESC;
playerid           yearid
                AB              AVG
davisra01 2009 390 0.3051
sweenry01 2009 484 0.2934
kennead01 2009 529 0.2892
patteer01 2009 94 0.2872
hollima01 2009 346 0.2861
.........
.........
.......
.......

GROUP BY, SUM: When combined with a SUM command, GROUP BY combines a player’s stats when he is in your query more than once. If a player changes teams in a single year, this will combine his stats for each team and give you his year-long stats.

SELECT playerid, SUM(HR) AS HR, SUM(RBI) AS RBI
 	FROM batting
 	WHERE playerid='hollima01' AND yearid=2009
 	GROUP BY playerid;
playerid HR RBI
hollima01 24 109

 

With just these simple commands, you should now be able to run a bunch of basic queries. With these you can accomplish a ton.

For example, every Oakland A All-Star:

SELECT m.nameLast, m.nameFirst, b.yearid, b.teamid
 	FROM master m, allstar a, batting b
 	WHERE a.playerid=m.playerid AND b.playerid=m.playerid
             AND b.yearid=a.yearid
 	HAVING b.teamid='oak'
 	ORDER BY a.yearid DESC;
nameLast nameFirst yearid teamid
Bailey Andrew 2009 OAK
Duchscherer Justin 2008 OAK
Haren Danny 2007 OAK
Zito Barry 2006 OAK
Duchscherer Justin 2005 OAK
Hudson Tim 2004 OAK
Mulder Mark 2004 OAK
Foulke Keith 2003 OAK
Hernandez Ramon 2003 OAK
Mulder Mark 2003 OAK
.....
.....
......
.......

 

 

Every playoff series the A's have ever won in any city:

SELECT yearid, round, teamidloser, wins, losses, ties
 	FROM seriespost
 	WHERE teamidwinner IN('OAK','PHA','KCA');

EDIT: 'KCA' in the above command should be 'KC1'.

Result:

yearID round teamidloser wins losses
1910 WS CHN 4 1
1911 WS NY1 4 2
1913 WS NY1 4 1
1929 WS CHN 4 1
1930 WS SLN 4 2
1972 ALCS DET 3 2
1972 WS CIN 4 3
1973 ALCS BAL 3 2
1973 WS NYN 4 3
1974 ALCS BAL 3 1
1974 WS LAN 4 1
1980 ALCS NYA 3 0
1981 AWDIV KCA 3 0
1985 ALCS TOR 4 3
1985 WS SLN 4 3
1988 ALCS BOS 4 0
1989 ALCS TOR 4 1
1989 WS SFN 4 0
1990 ALCS BOS 4 0
2006 ALDS2 MIN 3 0

 

HOFers ranked by the percentage they were voted in by:

SELECT m.namefirst, m.namelast, h.ballots, h.votes, h.votes/h.ballots AS PCTG
 	FROM hofold h, MASTER m
 	WHERE m.hofid=h.hofid AND h.ballots IS NOT NULL
 	ORDER BY PCTG DESC; 

Result:

nameFirst nameLast ballots votes PCTG
Tom Seaver 430 425 0.9884
Nolan Ryan 497 491 0.9879
Cal Ripken 545 537 0.9853
Ty Cobb 226 222 0.9823
George Brett 497 488 0.9819
…………. …………. …………. …………. ………….

 

Career HR leaders as Oakland A’s:

SELECT playerid, SUM(HR) AS HR
 	FROM batting
 	WHERE teamid='oak'
 	GROUP BY playerid
 	ORDER BY HR DESC
 		LIMIT 5; 

Here's what it will look like:

playerid               
HR
mcgwima01 363
jacksre01 268
cansejo01 254
chaveer01 229
giambja01 198

 

These examples are only the tip of the iceberg. One can link SQL to R or Access, not to mention the other databases we can play around with (including Retrosheet and pitchFX). If there’s interest I’ll give tutorials on more stuff we can do with SQL including building Marcels, heat maps, and developing linear weights.

If you want to practive, here’s what should be almost a review in the form of a SQL tutorial.  If you want to get ahead, a couple of glances at connecting SQL to the pitchFX databaseand to R. Here's one that talks about installing it for Retrosheet. 

Use this thread (all one or two of you who actually want to try SQL) as a place to post queries that you can’t figure out or tips you may pick up. You can ask me questions and I’ll do my best to answer them. Remember we’re all beginners here and no question is stupid.

Poll
Will you or did you try using SQL?
Looks way too complicated for me, but I'd still like to see what it can do with another FP
11 votes
My head hurts, don't do this again
13 votes
I tried it and liked it. I'd like to see another FP
17 votes
I tried it and it was too hard; No more
1 votes

42 votes | Poll has closed

Comment 64 comments  |  10 recs  | 

Do you like this story?

Comments

Display:

This is too long to read

and I am too afraid that I will understand it….

god damn database classes…

by Zonis on Feb 21, 2010 11:30 PM PST reply actions   1 recs

This is too long to read

and I am 100% certain I will not understand it…

god damn humanities classes…

… and therein lies the tension between discouraging free rodent upgrades and relying on random walkup business. LB, you should tarp your kitchen. And move to San Jose. -mb

by Leopold Bloom on Feb 21, 2010 11:41 PM PST reply actions   1 recs

Oooooh.

SQL is one of those sabermetric tools I’ve not yet touched in the slightest (all of my PitchFX work is compiled by hand from the Gameday XMLs and worked up in Excel). I’ll keep this fanpost around to mess around with when I can find the time. I’ve had a couple ideas floating around for mini-pieces of research, but no realistic way of obtaining the massive amounts of data needed.

Great stuff, vignette.

Always the summers are slipping away.
Find me a way for making it stay.

by danmerqury on Feb 22, 2010 1:11 AM PST reply actions  

>:-(

Unless you’re talking about the android, data is not massive!

There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.

by iglew on Feb 22, 2010 1:44 AM PST up reply actions  

Hmm

A google search for “massive * data” turns up over 342,000,000 results.

That’s some pretty widespread usage.

The M-W online dictionary gives the following definitions (among others):

-impressively large or ponderous
-large in scope or degree <the feeling of frustration, of being ineffectual, is massive — David Halberstam>
(emphasis added)

If a feeling can be massive, surely data can be, as well.

/linguist’d

m*****f***ing c***s***ing peanut butter and jelly!! f*** f*** f***!!!

by JediLeroy on Feb 22, 2010 4:33 AM PST up reply actions  

I am well aware that the entire world is wrong.

You know me well to cite my favorite dictionary, though. I am a M-W partisan.

I am gradually coming to accept “miniscule”, but that’s just spelling. I still mourn that every adjective that gains popularity as a magnifier must soon be homogenized and rendered devoid of any unique meaning, like yesterday’s hit boy band.

There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.

by iglew on Feb 22, 2010 10:09 AM PST up reply actions  

Just be glad you're not a time traveller from the Middle Ages

Think about how pissed you’d be with what they did to language compared to then.

Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."

by DMOAS on Feb 22, 2010 4:08 PM PST up reply actions  

I like to think that with database queries

You start with all the data. You’re just applying the filters to get to the data you need.

100% Athletics, 100% Baseball. 2009 Athletics, 40% Baseball.

by fruitattack on Feb 22, 2010 1:49 AM PST up reply actions  

Christ, what an SQL.

A B -3X = Swedish girls like chocolate @('.')@

by monkeyball on Feb 22, 2010 3:25 PM PST up reply actions  

This is awesome!

One thing, though. You linked to the 2008 Baseball Databank database, which only has data through 2008, which makes many of your sample queries not work (anything with “yearid = 2009”) . The newer 2009 database is here.

Thanks again for the great intro!

The artist formerly known as HigherPie.

by vegAN ryAN on Feb 22, 2010 1:27 AM PST reply actions  

Right you are

It’s now fixed.

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on Feb 22, 2010 9:41 AM PST up reply actions  

Fluency in SQL

is a useful skill to have. Even in this down economy, I think learning SQL does more for your employability than several college courses.

You left out an option on the poll: This looks like fun, and I wish I had time to spare playing around with databases. But, you know, there are so many other things to do….

There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.

by iglew on Feb 22, 2010 1:48 AM PST reply actions  

I second the poll option

I do database work regularly, just never queried baseball data before :(

100% Athletics, 100% Baseball. 2009 Athletics, 40% Baseball.

by fruitattack on Feb 22, 2010 1:51 AM PST up reply actions  

For my sins, I've been using MySQL for work for a number of years now

it’s super fun, and super useful. Though, I am kinda loath to start using it to analyse baseball, as I don’t especially want me work life bleeding into my non work life

I suspect that you think tilting at windmills means something other than what it does.

The ninth fastest thirty year old in San Francisco

by bobnothing on Feb 22, 2010 9:35 AM PST up reply actions  

I'd agree with this

The ability to harness large amounts of data and efficiently parse said data is useful across fields.

By the way, if you’re using linux, postgreSQL is a pretty solid free client.

by rrryanc on Feb 22, 2010 11:13 AM PST up reply actions  

It might take me a week to read, but nice work.

By the way, the playoff thingy added in a couple of series’ won by the KC Royals.

I'm here to talk about the past.

by 67MARQUEZ on Feb 22, 2010 5:32 AM PST reply actions  

You mean we didn't win the 1985 World Series?

After looking at it, apparently, KCA does not refer to the Kansas City Athletics. KC1 does. Changing KCA to KC1 changes the result to eliminate the 1980 series against NYA and the two 1985 series.

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on Feb 22, 2010 9:33 AM PST up reply actions  

Who knows of any good Mac-based SQL solutions out there?

I have often thought that it might be easier just to do it myself than look at Fangraphs.

"Do I talk to myself? No, I just remind myself of what I'm trying to do. You know, I never answer myself so how can I be talking to myself?" - Rickey

by cuppingmaster on Feb 22, 2010 11:50 AM PST reply actions  

Go look at Beyond the Boxscore

They had a whole series called Saberizing A Mac. That may be helpful.

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on Feb 22, 2010 2:44 PM PST via mobile up reply actions  

Thanks...

Thanks for posting this. I am actually trying to learn SQL.

by uci anteater on Feb 22, 2010 1:43 PM PST reply actions  

I'll wait for the SQL.

I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal

by Nico on Feb 22, 2010 5:38 PM PST reply actions  

MySQL - this time it's personal

I suspect that you think tilting at windmills means something other than what it does.

The ninth fastest thirty year old in San Francisco

by bobnothing on Feb 22, 2010 6:00 PM PST up reply actions  

Well played sir. We played.

Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."

by DMOAS on Feb 22, 2010 6:55 PM PST up reply actions  

Poorly player sir. Poorly played.

Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."

by DMOAS on Feb 22, 2010 6:56 PM PST up reply actions  

I'd like to say that's in the top ten of the nerdiest jokes I've ever told

but truthfully, it’s not really even close

I suspect that you think tilting at windmills means something other than what it does.

The ninth fastest thirty year old in San Francisco

by bobnothing on Feb 22, 2010 10:15 PM PST up reply actions  

What's the best way to create and populate a database from scratch?

Let’s say i want to make my own database, or add my own table to the BDB database? For instance, let’s say I went to every A’s game this year, and counted ground balls and fly balls of every A’s batter. I’ve got my raw data in Excell, or even tab-separated text or something.

I read about the INSERT INTO statement, but that looks pretty tedious, writing out

INSERT INTO batters (playerid, date, GB, FB)
VALUES (anderbr04, 052310, 2, 0)

or whatever. Is there a better way to get raw data into a database without creating it line by line?
This is my first go at any of this, and I’ll probably go buy a book or something, but I’m just playing around and looking to see if I can use this with other data that I have.

Stewart: "What really needs to be clear is it wouldn't have mattered if there was an earthquake or not. We were going to beat the Giants.

by Elvez on Feb 22, 2010 6:28 PM PST reply actions  

Importing it probably

I haven’t looked at the program he suggested, but you may be able to import it by saving the excel file as a csv or dbf.

Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."

by DMOAS on Feb 22, 2010 6:58 PM PST up reply actions  

If you know any programming languages

You could do it that way. We populate databases using C at work. I’m sure it wouldn’t be too hard using a scripting tool like perl or python either.

by rrryanc on Feb 22, 2010 7:09 PM PST up reply actions  

I haven't really investigated but....

It looks like you may be able to load data in through the LOAD LOCAL command. This can be reached by right-clicking on a table →Import CSV using LOADLOCAL. You may have to mess with it a bit.

Also, this post looks like it addresses this issue but for a mac.

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on Feb 22, 2010 7:46 PM PST up reply actions  

Yeah, so

You can set up Excel so you have a series of commands (exactly how you do this will depend on which version of whatnot you’re using). For your example up above, I’d create a fifth column in your excel spreadsheet that has a formula something like this:

=“Insert Into batters (playerid, date, GB, FB) Values (”&A2&", “&B2&”, “&C2&”, “&D2&”)"

and then populate that down your data series.

Next, select the E column that you’ve populated with the above formula. Here’s where versions diverge, again – you might need to paste the values into another excel column (copy, then right click, then paste special – values only). You might be able to do this directly into your notepad page / interface.

you’ll have a nice list of populated commands. Hopefully?

here’s one I did a while ago; you should end up with a list like this:

INSERT INTO S.LABELS_ARTISTS_TEMP (LIST_ORDER, LABEL_NAME, LABEL_ID, ARTIST_NAME) VALUES (1,‘Fing Records’,23750,‘email@records.com’);
INSERT INTO S.LABELS_ARTISTS_TEMP (LIST_ORDER, LABEL_NAME, LABEL_ID, ARTIST_NAME) VALUES (3,‘G K’,24344,‘greg@gk.com’);

and so on and so forth.

I suspect that you think tilting at windmills means something other than what it does.

The ninth fastest thirty year old in San Francisco

by bobnothing on Feb 22, 2010 10:34 PM PST up reply actions  

I should add, this probably isn't the easiest way of doing it, necessarily

but I’m not a great programmer, so I prefer to use Excel (which I do know from top to bottom) to do all the work, and then MySql to do analysis

I suspect that you think tilting at windmills means something other than what it does.

The ninth fastest thirty year old in San Francisco

by bobnothing on Feb 22, 2010 10:35 PM PST up reply actions  

Or as Excel nerds say,

you know it from A1 to IV65536.

There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.

by iglew on Feb 23, 2010 1:32 AM PST up reply actions  

nice.

I shall start saying this now

I suspect that you think tilting at windmills means something other than what it does.

The ninth fastest thirty year old in San Francisco

by bobnothing on Feb 23, 2010 9:03 AM PST up reply actions  

This makes some sense to me...

so it’s refreshing compared to some of the other stuff I’ve been reading lately. Thanks!

Stewart: "What really needs to be clear is it wouldn't have mattered if there was an earthquake or not. We were going to beat the Giants.

by Elvez on Feb 23, 2010 8:08 AM PST up reply actions  

Cool; good luck with it!

I suspect that you think tilting at windmills means something other than what it does.

The ninth fastest thirty year old in San Francisco

by bobnothing on Feb 23, 2010 9:04 AM PST up reply actions  

Quick question for you though...

Since I can’t figure out how to word this for a web search…
My A column is the date. In Excel, I can format the A cells to contain dates instead of straight numbers.Then I put your formula in column C. When I reference the date cell in the formula:

VALUES (“&A2&”,“&B2&”)

Excel pulls in some other number instead of the date. For instance, cell A2 contains the date 11/01/2003, and the cell B2 contains the regular ol’ number 25, but in the C2 cell it’s shown as
37926, 25
instead of
11/1/2003,25

I can’t figure out how to get the formula to display the date when it references the A column, but a regular number when it references the B column. Does that make sense?

Stewart: "What really needs to be clear is it wouldn't have mattered if there was an earthquake or not. We were going to beat the Giants.

by Elvez on Feb 23, 2010 10:22 AM PST up reply actions  

try this:

select column C, and then right click—> format —> date and time

hopefully that should work?

I suspect that you think tilting at windmills means something other than what it does.

The ninth fastest thirty year old in San Francisco

by bobnothing on Feb 23, 2010 11:22 AM PST up reply actions  

There's a whole batch of excel functions for playing with dates

You need to use ’em.

This seems like a good start

"There's never enough time to do all the nothing you want" -Bill Watterson

by nevermoor on Feb 23, 2010 4:19 PM PST up reply actions  

Excellent

I’ve dabbled in SQL for several years (my company’s software uses SQL databases), but usually just stealing other people’s queries.

This is the easiest explanation of how to write a query I’ve seen! Hope they don’t block this site from work…

by ChuckBudd on Feb 22, 2010 11:32 PM PST reply actions  

S-Q-L
SQL (pronounced sequel)

The recommended pronunciation is actually “es-kyoo-el.” See:

http://en.wikipedia.org/wiki/SQL

and:

http://en.wikipedia.org/wiki/MySQL

by hunger on Feb 24, 2010 5:36 PM PST reply actions  

I actually pronounce it that way

I had heard someone say “sequel” and thought they were right. Overall, I don’t think it matters much. Some people call wOBA “wOH-bah” others “W-OH-BEE-EH”. I don’t think anyone really cares.

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on Feb 24, 2010 6:18 PM PST up reply actions  

I've always used as S-Q-L when I discuss SQL statements

but My-Sequel when discussing MySQL. But yeah, po-tato pah-tato

Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."

by DMOAS on Feb 24, 2010 6:21 PM PST up reply actions  

You would not hire me.

My experience is the same as Vignette’s. I always said ess-cue-ell, but when I read this fanpost I assumed I had been wrong.

There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.

by iglew on Feb 25, 2010 10:46 AM PST up reply actions  

This is what I do, too

I suspect that you think tilting at windmills means something other than what it does.

The ninth fastest thirty year old in San Francisco

by bobnothing on Feb 25, 2010 2:10 PM PST up reply actions  

Remember the old days

when we just picked up a glove and rounded up our buddies and played a game until someone knocked out the neighbor’s window?

Got a formula for that? :)

"Tonto think Billy Beane need to make team full of squirrels and bears."

by OptimistPrime on Feb 24, 2010 11:57 PM PST reply actions  

GO REED SUM SPREDSHEETZ, NERDZ

I suspect that you think tilting at windmills means something other than what it does.

The ninth fastest thirty year old in San Francisco

by bobnothing on Feb 25, 2010 2:10 PM PST up reply actions  

But, the Web addresses your fundamental problemo
I want the ability to calculate things that aren’t available at Fangraphs, using tons of data.

It would be really cool for a community to build a RESTful API to all these crazy baseball statistics databases. I don’t follow much about baseball statistics, but I’m amazed at how many replicant data sources there are from Fangraphs, to STATS, SABR, Baseball Cube, etc.

This one seems close to providing actual HTTP endpoints. But, it would be rather easy to extend the free and open source baseball databank’s MySQL database and allow some killer real-time mashups.

"That’s Chavey; he’s a good athlete. He can play anywhere … except second base. He’s not that good." -M. Ellis

by eshock on Feb 25, 2010 12:56 AM PST reply actions  

Bandwidth and server resources

Also, letting the public play with your database directly is a terrible idea. Controlling how they do it, without losing flexibility, would be difficult.

"There's never enough time to do all the nothing you want" -Bill Watterson

by nevermoor on Feb 25, 2010 10:56 AM PST up reply actions  

Fair statement...

in 2002. with web services, your not playing with the data, you’re just accessing it through an API to create mashups, right?

"That’s Chavey; he’s a good athlete. He can play anywhere … except second base. He’s not that good." -M. Ellis

by eshock on Feb 25, 2010 9:58 PM PST up reply actions  

You start it

And let me know what the bandwidth cost is

"There's never enough time to do all the nothing you want" -Bill Watterson

by nevermoor on Feb 26, 2010 7:04 PM PST up reply actions  

not sure why you're so blue?

Bandwidth and Storage are nearly free these days…

Costs for Amazon Web Services Simple Storage.
Pricing
First 50 TB / Month of Storage Used
$0.150 per GB

All Data Transfer In
Free until June 30th, 2010*
PUT, COPY, POST, or LIST
$0.01 per 1,000 Requests

I thought it was a good idea. Sorry if you disagree.

"That’s Chavey; he’s a good athlete. He can play anywhere … except second base. He’s not that good." -M. Ellis

by eshock on Mar 1, 2010 11:51 PM PST up reply actions  

That price seems nice on the surface especially for a small company database.

But if you were to consolidate the big databases and became the source for this sort of info gathering, for each time someone calls to the DB, that’s going to add up mighty quickly. You’re talking the 100k+ worth of people or more on SBN, people coming into it from the outside because you’re a great source, unless you’re turning the place into a pay site, you may potentially wind up with a pretty big cost on your hands.

Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."

by DMOAS on Mar 2, 2010 8:38 AM PST up reply actions  

Question on how to handle midseason trades.

I was playing around with the BDB, and wanted to aggregate the stats that one Matt Halladay produced over the course of a full season. His stints in Oakland ans St. Louis are split (rightfully so), but as I said, I’m searching for his totals. I think if involves the ‘join’ function, but I am stuck at the moment. Has anyone else encountered this.

By the way, fantastic fanpost. Very refreshing and useful, to say the least.

by Pucking Insane on Feb 25, 2010 11:17 PM PST reply actions  

I have your exact example in the FP actually

Here is the query again:


SELECT playerid, SUM AS HR, SUM AS RBI
FROM batting
WHERE playerid=‘hollima01’ AND yearid=2009
GROUP BY playerid;

You don’t have to use joins as the SUM and GROUP BY functions do it for you.

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on Feb 26, 2010 12:49 AM PST up reply actions  

That's a weird one I hadn't seen before

I get the same result on Preview.

But in comments you should be using <pre> for code anyway. As far as I can tell, AN will leave anything in <pre> unscathed, though you do have to watch out for your own line-ends.

There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.

by iglew on Feb 28, 2010 12:34 PM PST up reply actions  

I did use the tags inside a blockquote....

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on Feb 28, 2010 1:35 PM PST up reply actions  

Hmm...SBN doesn't like < pre > in subject lines

I thought it simply ignored subject lines. It normally does. And I thought I was finally understanding SBN’s autoformatting….

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on Feb 28, 2010 1:38 PM PST up reply actions  

Well, this is mighty strange.
SUM(HR) *bold* _italic_

If the blockquote starts the comment, stuff inside a PRE inside the blockquote is preserved. If the blockquote follows any text, then AN tries to format it.


SUM bold italic

I guess the solution is that if you really want PRE you have to do it without a blockquote. That sucks.

There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.

by iglew on Mar 1, 2010 10:54 AM PST up reply actions  

autoformat strikes again

Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."

by DMOAS on Mar 1, 2010 4:01 PM PST up reply actions  

Comments For This Post Are Closed


User Tools

Welcome to the SB Nation blog about Oakland Athletics.

Community Guidelines ANcillary Terms

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Aperture_logo_small
Community Prospect List #4
Img_2672_small
Long-Term Outlook

Recent FanPosts

Fubarcloud_small
Wolf being told to spend money
Small
The wRC+ Challenge
Pumpkin_small
Maybe this is a stupid stats question
Small
A's reportedly sign Cespedes
Unknown_small
Is It Really Worth It: Three Veterans Who May Be Playing Oakland Next Year, But Shouldn't Be
Small
Manny's Contract
Small
fantasy baseball league for A's fans!
Small
NYY Proposal
Small
Roy Oswalt = opportunity

+ New FanPost All FanPosts >


Front Page Writers

Maya_papi_small Tyler Bleszinski

08-_the_author_small 67MARQUEZ

Josefav2_small danmerqury

Baseball_small baseballgirl

Poochini-butt_in_box_2_small Nico

Img_0653_small dwishinsky

Front Page Writers

Smiley_face_small gigglingone

Venasfans_small OaklandSi

60-minutes-clock_small cuppingmaster

Patpicturebucky2_small YonYonson

Img_3830_small David Fung

Moderators

Photofunia-5c770b_small coffee roaster

Denver_small Colorado Fan

Ls_logo100_small LoneStranger

Thumbs_up_small LongTimeFan

Marty_profile_in_green_small mrod

Img_1877_small Billy Frijoles

Babycomputergeek_small paris7

Img_0115_small Tutu-late