An introduction to SQL or the Nerdiest Fanpost you'll see on AN
I bet I know what you’re thinking. I love baseball stats and this new statistical revolution in my favorite sport has really enhanced my understanding. But, darn it, sites like Fangraphs simply aren’t geeky enough for me.
I mean, who hasn’t had that thought?
Fangraphs is a fantastic site. I can look at things like player A’s wRAA and see that linear weights suggest he was worth a certain number of batting runs last season. I can see what percentage of pitches Player B swung out that were outside the strike zone. I can look at hundreds of different calculations and can analyze pretty much to my heart’s desire.
But, I’m selfish. What if I want more? I want the ability to calculate things that aren’t available at Fangraphs, using tons of data. Maybe I simply want to learn how to use pitchFX, that new tool that you see everywhere with them fancy complicated graphs? Well, this fanpost is a very basic look at a tool that’s been making it’s way from StatSpeak to the Book Blog to the Hardball Times to Beyond the Boxscore.
This offseason I’ve probably overextended myself. I started by creating a series that I’ve kinda completely left unfinished 20% of the way through. I also started reading most of the book Mathletics by Wayne Winston. Winston’s book was interesting* and inspired me. The book presented some of the things one could do with baseball data in Excel.
*If you want to read a review of it from a sabermetrician’s perspective, go here.
Over the course of the offseason, I’ve spent many an hour developing Excel sheets analyzing our national past time. To give you a taste, I can tell you that a lineup made entirely of 9 Barry Bondses from the year 2004 would score approximately 4,074 runs in a season based on a Monte Carlo estimation. I can tell you that based on CHONE, Marcel, Oliver and ZiPS, Marcel was the best at estimating wRAA. I can tell you that Raj Davis performed in the 80th percentile* in projection systems.
*How I defined this was I ranked all players by the formula
proj wRAA – actual wRAA
in each projection system. I averaged these ranks and ranked those averages. Raj Davis outperformed 79.8% of players this way. The highest on the 2009 A’s team: Adam Kennedy, 85th percentile.
The point is I messed around with Excel for ages and wrote a ton of formulas that were pretty much useless, although quite entertaining. Well now I give you a better way to waste one’s time writing formulas: SQL!
WARNING: To follow along with the rest of this fanpost will require quite a few downloads but not a penny of money. Every download mentioned is completely free (of viruses and of the need for cash). Some of my links may open files you may need to download. These links can be figured out through context. If you don't wish to actually try SQL, you can follow along once we get to the commands by looking at my tables which are exported from SQL to Excel to HTML.
SQL (pronounced sequel) is a database language. In fact, it stands for structured query language. Sound geeky enough for you?
But don’t fret yet. I had never heard of it before I decided to learn it. I’m far from a master, but I’ve been messing with it only a few days and have gotten a pretty good grasp of the basics so far. Let’s start where I started. This article by Colin Wyers is one of the best introductions to using SQL for baseball.
I’ve managed to get through the entire article except his last bit of code (the Marcel projections). I recommend opening Colin's article in a new window. I’ll give you the basics of the article as it never hurts to get a second wording of information. I figure, as a neophyte, my explanations should not be overly complex.
Onto the first step: the software. Also, as a note, I'm using PC and so is Colin in his article. Any other OS might need to use different software. In this case, the first software needed is SQL itself. Colin suggests MySQL in his article. Here is the appropriate section on what to download:
First, you need to download and run the installer for MySQL 5.4 CE Essentials. Choose a "typical" install. You'll typically have to wait a few minutes for files to copy. Then there will be some ads. Skip over them. Make sure that "Configure the MySQL server now" is checked. (I uncheck the box about registering the software with MySQL, but that's up to you.) Chose "Detailed configuration." Then you'll go through a set of pages asking you question. Here are the answers: 1.Developer Machine 2. Non-Transactional Database Only 3.Decision Support (DSS)/OLAP Continue on, using the defaults. When prompted, either add a password or uncheck the box to alter security settings. (This will leave your root password blank.) Once you're finished, you shouldn't have to start MySQL at all—it should be ever vigilant, waiting for you to command it to do your sabermetric bidding.
The next thing we need is a GUI to use MySQL. The link in Colin’s article links to a download page that only has free trials. I use the community version. You can download it here. Now that we have SQLyog and SQL, all we need is data. Again, I’ll quote Colin’s article on how to get some. In this case, we’ll work with Baseball Databank’s data:
First, download the ZIP file. Extract the .SQL file from the ZIP, and remember where you put it.Now go into SQLyog. In SQLyog there will be three panes. The left-hand side should list your databases, the top is where you write queries, and the bottom shows the results. Let's add a database. Right-click in the pane, and select "Create Database." Call it "BDB," leaving off the punctuation. Leave the rest as the defaults, and click okay.
Now, let's populate the database. Right-click on the database, and select "Restore from SQL Dump." There should be a button with an ellipsis (...) in it. Click that, and select the file we extracted from the ZIP. Select Execute. It'll ask you if you want to execute the queries in database BDB. You do. This should take a little while.
Once that's done, go into your database and click on the folder that says Tables. If nothing is there, try hitting "F5" on the top row of your keyboard to refresh the browser. Go ahead and click on batting, and then slide on over to the right and click on "Table Data." Scroll around for a bit. If you've ever used a spreadsheet or looked at a table before, the layout should feel familiar.
Here is where you may hit your first roadblock. Unzipping files can be a bit of a pain. If you can’t unzip the files by simply double clicking, you may need to download another program. One of the better ones for this task is 7-Zip.
You may also have trouble with the tables loading. Most of the time these will import, but may not show up. Refreshing multiple times is very helpful.
So, now you should have the toolbox of tools needed to reside in the top level of geekiness, even among sabermetricians. Now what do you do with these tools? In layman's terms, we have a database with lots of spreadsheets in it. To interact with this database, we need to talk to it. We do this task by writing queries.
To learn basic (but powerful!) queries you only need a few commands in SQL. I'll (attempt to) teach you these queries by teaching you what each term means (although most are self-explanatory), giving you a code, and showing you the results. You can copy-paste my code but I recommend typing it just so you get the hang of it.
The first of the commands we need is SELECT. SELECT is a very flexible command that will start almost all of your first queries. You tell the system what you want it to return. FROM is the second most basic command; it tells the computer where to search for what you’ve selected.
Open SQLyog. Browse through some of the tables you now have. Go to batting. It should have a bunch of columns named "playerid, yearid, stint…" Now we want to write a query. Go to the box in the top right where it says "1". If we simply want the database to return the entire batting table, we type:
SELECT *
FROM BATTING;
Click "F9". This will execute the query. The asterisk tells the computer to select everything. The semicolon tells the computer that the query is finished. The query should display the entire batting table, which is too large to reproduce here.
Now, if we wanted to narrow our table and have only the specific columns playerid, yearid, and HR, we would write:
SELECT playerid, yearid, HR FROM BATTING;
The result:
| playerid | yearid | HR |
| abercda01 | 1871 | 0 |
| addybo01 | 1871 | 0 |
| …. | …. | …. |
That's much more manageable if still pretty useless. Next we have the commands AND and WHERE. WHERE allows us to be more specific with our requests. AND combines requests:
SELECT playerid, yearid, HR
FROM BATTING
WHERE teamid= 'oak' AND yearid = 2009;
You must put the OAK in single quotes. You can copy-paste this command into SQL if you want, but typing it out will help you learn it better. This command returns a list of all the 2009 A’s and their corresponding HRs:
| playerid | yearid | HR |
| cahiltr01 | 2009 | 0 |
| outmajo01 | 2009 | 0 |
| cunniaa01 | 2009 | 1 |
| mazzavi01 | 2009 | 0 |
| ..... |
..... |
.... |
However, we don't really care that Trevor Cahill hit 0 HRs. It would be much nicer to the eyes if we could order them. The command for this request is simply ORDER BY:
SELECT playerid, yearid, HR
FROM BATTING
WHERE teamid= 'oak' AND yearid = 2009
ORDER BY HR DESC;
The ORDER BY command defaults to ordering ascendingly. Adding DESC makes the table prettier to us, as we normally think of the A’s HR leaders in descending order.
Here's what the query gives:
| playerid | yearid | HR |
| custja01 | 2009 | 25 |
| suzukku01 | 2009 | 15 |
| hollima01 | 2009 | 11 |
| giambja01 | 2009 | 11 |
| kennead01 | 2009 | 11 |
| ...... | ...... |
...... |
The next command that will take a little work to master is joins. There are multiple ways to join, but I’ll explain the way that makes the most sense to me. The point of a join is to combine certain statistics from different tables. For example, if we wanted a player’s actual name instead of his player id we would do this:
SELECT master.nameLast, master.nameFirst, batting.yearid, batting.HR
FROM batting, master
WHERE master.playerid=batting.playerid
AND teamid='oak' AND yearid = 2009
ORDER BY HR DESC;
Players’ actual names are stored under the "master" table, while the "batting" table only carries their playerids. However these playerids are also stored in the master table. Using the command
WHERE master.playerid=batting.playerid
allows us to combine multiple columns within the master table as well as the batting table.
Here is the result of the final query:
| nameLast | nameFirst | yearid | HR |
| Cust | Jack | 2009 | 25 |
| Suzuki | Kurt | 2009 | 15 |
| Holliday | Matt | 2009 | 11 |
| Giambi | Jason | 2009 | 11 |
| ...... | ....... | ....... | ...... |
A couple more functions and their use:
AS: Allows you to use a formula and renames the column
SELECT playerid, yearid, AB, H/AB AS AVG FROM BATTING
WHERE teamid= 'oak' AND yearid = 2009 ORDER BY AVG DESC;
| playerid | yearid | AB | AVG |
| gonzaed01 | 2009 | 2 | 1 |
| cahiltr01 | 2009 | 2 | 0.5 |
| davisra01 | 2009 | 390 | 0.3051 |
| sweenry01 | 2009 | 484 | 0.2934 |
| kennead01 | 2009 | 529 | 0.2892 |
| ....... |
........ |
....... |
........ |
HAVING: Allows one to set a limitation on a group.
SELECT playerid, yearid, AB, H/AB AS AVG
FROM BATTING
WHERE teamid= 'oak' AND yearid = 2009
HAVING AB>=50
ORDER BY AVG DESC;
| playerid | yearid |
AB | AVG |
| davisra01 | 2009 | 390 | 0.3051 |
| sweenry01 | 2009 | 484 | 0.2934 |
| kennead01 | 2009 | 529 | 0.2892 |
| patteer01 | 2009 | 94 | 0.2872 |
| hollima01 | 2009 | 346 | 0.2861 |
| ......... |
......... |
....... |
....... |
GROUP BY, SUM: When combined with a SUM command, GROUP BY combines a player’s stats when he is in your query more than once. If a player changes teams in a single year, this will combine his stats for each team and give you his year-long stats.
SELECT playerid, SUM(HR) AS HR, SUM(RBI) AS RBI FROM batting WHERE playerid='hollima01' AND yearid=2009 GROUP BY playerid;
| playerid | HR | RBI |
| hollima01 | 24 | 109 |
With just these simple commands, you should now be able to run a bunch of basic queries. With these you can accomplish a ton.
For example, every Oakland A All-Star:
SELECT m.nameLast, m.nameFirst, b.yearid, b.teamid
FROM master m, allstar a, batting b
WHERE a.playerid=m.playerid AND b.playerid=m.playerid
AND b.yearid=a.yearid
HAVING b.teamid='oak'
ORDER BY a.yearid DESC;
| nameLast | nameFirst | yearid | teamid |
| Bailey | Andrew | 2009 | OAK |
| Duchscherer | Justin | 2008 | OAK |
| Haren | Danny | 2007 | OAK |
| Zito | Barry | 2006 | OAK |
| Duchscherer | Justin | 2005 | OAK |
| Hudson | Tim | 2004 | OAK |
| Mulder | Mark | 2004 | OAK |
| Foulke | Keith | 2003 | OAK |
| Hernandez | Ramon | 2003 | OAK |
| Mulder | Mark | 2003 | OAK |
| ..... |
..... |
...... |
....... |
Every playoff series the A's have ever won in any city:
SELECT yearid, round, teamidloser, wins, losses, ties
FROM seriespost
WHERE teamidwinner IN('OAK','PHA','KCA');
EDIT: 'KCA' in the above command should be 'KC1'.
Result:
| yearID | round | teamidloser | wins | losses |
| 1910 | WS | CHN | 4 | 1 |
| 1911 | WS | NY1 | 4 | 2 |
| 1913 | WS | NY1 | 4 | 1 |
| 1929 | WS | CHN | 4 | 1 |
| 1930 | WS | SLN | 4 | 2 |
| 1972 | ALCS | DET | 3 | 2 |
| 1972 | WS | CIN | 4 | 3 |
| 1973 | ALCS | BAL | 3 | 2 |
| 1973 | WS | NYN | 4 | 3 |
| 1974 | ALCS | BAL | 3 | 1 |
| 1974 | WS | LAN | 4 | 1 |
| 1981 | AWDIV | KCA | 3 | 0 |
| 1988 | ALCS | BOS | 4 | 0 |
| 1989 | ALCS | TOR | 4 | 1 |
| 1989 | WS | SFN | 4 | 0 |
| 1990 | ALCS | BOS | 4 | 0 |
| 2006 | ALDS2 | MIN | 3 | 0 |
HOFers ranked by the percentage they were voted in by:
SELECT m.namefirst, m.namelast, h.ballots, h.votes, h.votes/h.ballots AS PCTG FROM hofold h, MASTER m WHERE m.hofid=h.hofid AND h.ballots IS NOT NULL ORDER BY PCTG DESC;
Result:
| nameFirst | nameLast | ballots | votes | PCTG |
| Tom | Seaver | 430 | 425 | 0.9884 |
| Nolan | Ryan | 497 | 491 | 0.9879 |
| Cal | Ripken | 545 | 537 | 0.9853 |
| Ty | Cobb | 226 | 222 | 0.9823 |
| George | Brett | 497 | 488 | 0.9819 |
| …………. | …………. | …………. | …………. | …………. |
Career HR leaders as Oakland A’s:
SELECT playerid, SUM(HR) AS HR FROM batting WHERE teamid='oak' GROUP BY playerid ORDER BY HR DESC LIMIT 5;
Here's what it will look like:
| playerid |
HR |
| mcgwima01 | 363 |
| jacksre01 | 268 |
| cansejo01 | 254 |
| chaveer01 | 229 |
| giambja01 | 198 |
These examples are only the tip of the iceberg. One can link SQL to R or Access, not to mention the other databases we can play around with (including Retrosheet and pitchFX). If there’s interest I’ll give tutorials on more stuff we can do with SQL including building Marcels, heat maps, and developing linear weights.
If you want to practive, here’s what should be almost a review in the form of a SQL tutorial. If you want to get ahead, a couple of glances at connecting SQL to the pitchFX databaseand to R. Here's one that talks about installing it for Retrosheet.
Use this thread (all one or two of you who actually want to try SQL) as a place to post queries that you can’t figure out or tips you may pick up. You can ask me questions and I’ll do my best to answer them. Remember we’re all beginners here and no question is stupid.
64 comments
|
10 recs |
Do you like this story?
Comments
This is too long to read
and I am 100% certain I will not understand it…
god damn humanities classes…
… and therein lies the tension between discouraging free rodent upgrades and relying on random walkup business. LB, you should tarp your kitchen. And move to San Jose. -mb
by Leopold Bloom on Feb 21, 2010 11:41 PM PST reply actions 1 recs
Oooooh.
SQL is one of those sabermetric tools I’ve not yet touched in the slightest (all of my PitchFX work is compiled by hand from the Gameday XMLs and worked up in Excel). I’ll keep this fanpost around to mess around with when I can find the time. I’ve had a couple ideas floating around for mini-pieces of research, but no realistic way of obtaining the massive amounts of data needed.
Great stuff, vignette.
Always the summers are slipping away.
Find me a way for making it stay.
>:-(
Unless you’re talking about the android, data is not massive!
There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.
Hmm
A google search for “massive * data” turns up over 342,000,000 results.
That’s some pretty widespread usage.
The M-W online dictionary gives the following definitions (among others):
-impressively large or ponderous(emphasis added)
-large in scope or degree <the feeling of frustration, of being ineffectual, is massive — David Halberstam>
If a feeling can be massive, surely data can be, as well.
/linguist’d
m*****f***ing c***s***ing peanut butter and jelly!! f*** f*** f***!!!
I am well aware that the entire world is wrong.
You know me well to cite my favorite dictionary, though. I am a M-W partisan.
I am gradually coming to accept “miniscule”, but that’s just spelling. I still mourn that every adjective that gains popularity as a magnifier must soon be homogenized and rendered devoid of any unique meaning, like yesterday’s hit boy band.
There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.
Just be glad you're not a time traveller from the Middle Ages
Think about how pissed you’d be with what they did to language compared to then.
Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."
I like to think that with database queries
You start with all the data. You’re just applying the filters to get to the data you need.
100% Athletics, 100% Baseball. 2009 Athletics, 40% Baseball.
This is awesome!
One thing, though. You linked to the 2008 Baseball Databank database, which only has data through 2008, which makes many of your sample queries not work (anything with “yearid = 2009”) . The newer 2009 database is here.
Thanks again for the great intro!
The artist formerly known as HigherPie.
Right you are
It’s now fixed.
"Loyal? I'm the most loyal player money can buy." - Don Sutton
Fluency in SQL
is a useful skill to have. Even in this down economy, I think learning SQL does more for your employability than several college courses.
You left out an option on the poll: This looks like fun, and I wish I had time to spare playing around with databases. But, you know, there are so many other things to do….
There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.
I second the poll option
I do database work regularly, just never queried baseball data before :(
100% Athletics, 100% Baseball. 2009 Athletics, 40% Baseball.
For my sins, I've been using MySQL for work for a number of years now
it’s super fun, and super useful. Though, I am kinda loath to start using it to analyse baseball, as I don’t especially want me work life bleeding into my non work life
I suspect that you think tilting at windmills means something other than what it does.
The ninth fastest thirty year old in San Francisco
I'd agree with this
The ability to harness large amounts of data and efficiently parse said data is useful across fields.
By the way, if you’re using linux, postgreSQL is a pretty solid free client.
Most colleges offer courses in which SQL is taught in some form or another.
You are correct though, it is certainly a sought after skill.
by Pucking Insane on Feb 25, 2010 11:14 PM PST up reply actions
It might take me a week to read, but nice work.
By the way, the playoff thingy added in a couple of series’ won by the KC Royals.
I'm here to talk about the past.
You mean we didn't win the 1985 World Series?
After looking at it, apparently, KCA does not refer to the Kansas City Athletics. KC1 does. Changing KCA to KC1 changes the result to eliminate the 1980 series against NYA and the two 1985 series.
"Loyal? I'm the most loyal player money can buy." - Don Sutton
Who knows of any good Mac-based SQL solutions out there?
I have often thought that it might be easier just to do it myself than look at Fangraphs.
"Do I talk to myself? No, I just remind myself of what I'm trying to do. You know, I never answer myself so how can I be talking to myself?" - Rickey
Go look at Beyond the Boxscore
They had a whole series called Saberizing A Mac. That may be helpful.
"Loyal? I'm the most loyal player money can buy." - Don Sutton
by vignette17 on Feb 22, 2010 2:44 PM PST via mobile up reply actions
I'll wait for the SQL.
I like Cindi. A. She never pretends to know more than she does. B. She has unbridled enthusiasm for her "Hotties," and isn't afraid to show it. -IM4Oakgal
MySQL - this time it's personal
I suspect that you think tilting at windmills means something other than what it does.
The ninth fastest thirty year old in San Francisco
Well played sir. We played.
Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."
Poorly player sir. Poorly played.
Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."
I'd like to say that's in the top ten of the nerdiest jokes I've ever told
but truthfully, it’s not really even close
I suspect that you think tilting at windmills means something other than what it does.
The ninth fastest thirty year old in San Francisco
What's the best way to create and populate a database from scratch?
Let’s say i want to make my own database, or add my own table to the BDB database? For instance, let’s say I went to every A’s game this year, and counted ground balls and fly balls of every A’s batter. I’ve got my raw data in Excell, or even tab-separated text or something.
I read about the INSERT INTO statement, but that looks pretty tedious, writing out
INSERT INTO batters (playerid, date, GB, FB)
VALUES (anderbr04, 052310, 2, 0)
or whatever. Is there a better way to get raw data into a database without creating it line by line?
This is my first go at any of this, and I’ll probably go buy a book or something, but I’m just playing around and looking to see if I can use this with other data that I have.
Stewart: "What really needs to be clear is it wouldn't have mattered if there was an earthquake or not. We were going to beat the Giants.
Importing it probably
I haven’t looked at the program he suggested, but you may be able to import it by saving the excel file as a csv or dbf.
Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."
If you know any programming languages
You could do it that way. We populate databases using C at work. I’m sure it wouldn’t be too hard using a scripting tool like perl or python either.
I haven't really investigated but....
It looks like you may be able to load data in through the LOAD LOCAL command. This can be reached by right-clicking on a table →Import CSV using LOADLOCAL. You may have to mess with it a bit.
Also, this post looks like it addresses this issue but for a mac.
"Loyal? I'm the most loyal player money can buy." - Don Sutton
Yeah, so
You can set up Excel so you have a series of commands (exactly how you do this will depend on which version of whatnot you’re using). For your example up above, I’d create a fifth column in your excel spreadsheet that has a formula something like this:
=“Insert Into batters (playerid, date, GB, FB) Values (”&A2&", “&B2&”, “&C2&”, “&D2&”)"
and then populate that down your data series.
Next, select the E column that you’ve populated with the above formula. Here’s where versions diverge, again – you might need to paste the values into another excel column (copy, then right click, then paste special – values only). You might be able to do this directly into your notepad page / interface.
you’ll have a nice list of populated commands. Hopefully?
here’s one I did a while ago; you should end up with a list like this:
INSERT INTO S.LABELS_ARTISTS_TEMP (LIST_ORDER, LABEL_NAME, LABEL_ID, ARTIST_NAME) VALUES (1,‘Fing Records’,23750,‘email@records.com’);
INSERT INTO S.LABELS_ARTISTS_TEMP (LIST_ORDER, LABEL_NAME, LABEL_ID, ARTIST_NAME) VALUES (3,‘G K’,24344,‘greg@gk.com’);
and so on and so forth.
I suspect that you think tilting at windmills means something other than what it does.
The ninth fastest thirty year old in San Francisco
I should add, this probably isn't the easiest way of doing it, necessarily
but I’m not a great programmer, so I prefer to use Excel (which I do know from top to bottom) to do all the work, and then MySql to do analysis
I suspect that you think tilting at windmills means something other than what it does.
The ninth fastest thirty year old in San Francisco
Or as Excel nerds say,
you know it from A1 to IV65536.
There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.
nice.
I shall start saying this now
I suspect that you think tilting at windmills means something other than what it does.
The ninth fastest thirty year old in San Francisco
This makes some sense to me...
so it’s refreshing compared to some of the other stuff I’ve been reading lately. Thanks!
Stewart: "What really needs to be clear is it wouldn't have mattered if there was an earthquake or not. We were going to beat the Giants.
Cool; good luck with it!
I suspect that you think tilting at windmills means something other than what it does.
The ninth fastest thirty year old in San Francisco
Quick question for you though...
Since I can’t figure out how to word this for a web search…
My A column is the date. In Excel, I can format the A cells to contain dates instead of straight numbers.Then I put your formula in column C. When I reference the date cell in the formula:
VALUES (“&A2&”,“&B2&”)
Excel pulls in some other number instead of the date. For instance, cell A2 contains the date 11/01/2003, and the cell B2 contains the regular ol’ number 25, but in the C2 cell it’s shown as
37926, 25instead of
11/1/2003,25
I can’t figure out how to get the formula to display the date when it references the A column, but a regular number when it references the B column. Does that make sense?
Stewart: "What really needs to be clear is it wouldn't have mattered if there was an earthquake or not. We were going to beat the Giants.
try this:
select column C, and then right click—> format —> date and time
hopefully that should work?
I suspect that you think tilting at windmills means something other than what it does.
The ninth fastest thirty year old in San Francisco
There's a whole batch of excel functions for playing with dates
You need to use ’em.
"There's never enough time to do all the nothing you want" -Bill Watterson
Excellent
I’ve dabbled in SQL for several years (my company’s software uses SQL databases), but usually just stealing other people’s queries.
This is the easiest explanation of how to write a query I’ve seen! Hope they don’t block this site from work…
S-Q-L
SQL (pronounced sequel)
The recommended pronunciation is actually “es-kyoo-el.” See:
http://en.wikipedia.org/wiki/SQL
and:
I actually pronounce it that way
I had heard someone say “sequel” and thought they were right. Overall, I don’t think it matters much. Some people call wOBA “wOH-bah” others “W-OH-BEE-EH”. I don’t think anyone really cares.
"Loyal? I'm the most loyal player money can buy." - Don Sutton
I've always used as S-Q-L when I discuss SQL statements
but My-Sequel when discussing MySQL. But yeah, po-tato pah-tato
Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."
You would not hire me.
My experience is the same as Vignette’s. I always said ess-cue-ell, but when I read this fanpost I assumed I had been wrong.
There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.
This is what I do, too
I suspect that you think tilting at windmills means something other than what it does.
The ninth fastest thirty year old in San Francisco
Remember the old days
when we just picked up a glove and rounded up our buddies and played a game until someone knocked out the neighbor’s window?
Got a formula for that? :)
"Tonto think Billy Beane need to make team full of squirrels and bears."
GO REED SUM SPREDSHEETZ, NERDZ
I suspect that you think tilting at windmills means something other than what it does.
The ninth fastest thirty year old in San Francisco
But, the Web addresses your fundamental problemo
I want the ability to calculate things that aren’t available at Fangraphs, using tons of data.
It would be really cool for a community to build a RESTful API to all these crazy baseball statistics databases. I don’t follow much about baseball statistics, but I’m amazed at how many replicant data sources there are from Fangraphs, to STATS, SABR, Baseball Cube, etc.
This one seems close to providing actual HTTP endpoints. But, it would be rather easy to extend the free and open source baseball databank’s MySQL database and allow some killer real-time mashups.
"That’s Chavey; he’s a good athlete. He can play anywhere … except second base. He’s not that good." -M. Ellis
Bandwidth and server resources
Also, letting the public play with your database directly is a terrible idea. Controlling how they do it, without losing flexibility, would be difficult.
"There's never enough time to do all the nothing you want" -Bill Watterson
Fair statement...
in 2002. with web services, your not playing with the data, you’re just accessing it through an API to create mashups, right?
"That’s Chavey; he’s a good athlete. He can play anywhere … except second base. He’s not that good." -M. Ellis
You start it
And let me know what the bandwidth cost is
"There's never enough time to do all the nothing you want" -Bill Watterson
not sure why you're so blue?
Bandwidth and Storage are nearly free these days…
Costs for Amazon Web Services Simple Storage.
Pricing
First 50 TB / Month of Storage Used
$0.150 per GB
All Data Transfer In
Free until June 30th, 2010*
PUT, COPY, POST, or LIST
$0.01 per 1,000 Requests
I thought it was a good idea. Sorry if you disagree.
"That’s Chavey; he’s a good athlete. He can play anywhere … except second base. He’s not that good." -M. Ellis
That price seems nice on the surface especially for a small company database.
But if you were to consolidate the big databases and became the source for this sort of info gathering, for each time someone calls to the DB, that’s going to add up mighty quickly. You’re talking the 100k+ worth of people or more on SBN, people coming into it from the outside because you’re a great source, unless you’re turning the place into a pay site, you may potentially wind up with a pretty big cost on your hands.
Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."
Question on how to handle midseason trades.
I was playing around with the BDB, and wanted to aggregate the stats that one Matt Halladay produced over the course of a full season. His stints in Oakland ans St. Louis are split (rightfully so), but as I said, I’m searching for his totals. I think if involves the ‘join’ function, but I am stuck at the moment. Has anyone else encountered this.
By the way, fantastic fanpost. Very refreshing and useful, to say the least.
by Pucking Insane on Feb 25, 2010 11:17 PM PST reply actions
I have your exact example in the FP actually
Here is the query again:
SELECT playerid, SUM AS HR, SUM AS RBI
FROM batting
WHERE playerid=‘hollima01’ AND yearid=2009
GROUP BY playerid;
You don’t have to use joins as the SUM and GROUP BY functions do it for you.
"Loyal? I'm the most loyal player money can buy." - Don Sutton
What the? That should say SUM(HR) AS HR and SUM(RBI) AS RBI, but the parens got deleted....
"Loyal? I'm the most loyal player money can buy." - Don Sutton
That's a weird one I hadn't seen before
I get the same result on Preview.
But in comments you should be using <pre> for code anyway. As far as I can tell, AN will leave anything in <pre> unscathed, though you do have to watch out for your own line-ends.
There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.
I did use the tags inside a blockquote....
"Loyal? I'm the most loyal player money can buy." - Don Sutton
Hmm...SBN doesn't like < pre > in subject lines
I thought it simply ignored subject lines. It normally does. And I thought I was finally understanding SBN’s autoformatting….
"Loyal? I'm the most loyal player money can buy." - Don Sutton
Well, this is mighty strange.
SUM(HR) *bold* _italic_
If the blockquote starts the comment, stuff inside a PRE inside the blockquote is preserved. If the blockquote follows any text, then AN tries to format it.
SUM bold italic
I guess the solution is that if you really want PRE you have to do it without a blockquote. That sucks.
There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.
autoformat strikes again
Leopold Bloom on why he loves Mr. Peter Gammons, his best buddy:
"Peter Gammons systematically ignored and/or ran down the A’s in the pages of Sports Illustrated and The Sporting News for a good ten year stretch in the late seventies and early to mid-eighties. Trust me, the c**ksucker hates our team."

by 

























