Baseball Analytics

These are some notes to myself on a baseball presentation I’m doing in a couple of weeks. Actually, I’ve been doing it pretty much constantly since last summer, just to different groups at work. I’m just trying to capture my thought process and the issues I’ve seen while developing a baseball analytical model that finds the factors most critical in a home team victory. As always, comments are welcome.

I had to relearn some basic statistics for a work assignment on Analytics – which is statistics on steroids, so I thought, what better subject than baseball? There’s tons of data available, there are statistics galore, and there are all sorts of things you can try to predict.

There are two issues that I’ve now come across while building a lab exercise based on game results – the results tend to make baseball people say, “Well, duh!” and more frighteningly, not all people understand baseball. WTF?

The “Well, duh!” part is actually a good thing – it means the statistical results may actually be correct, or at least believable in the baseball universe (this is called domain knowledge when you’re doing analytics.) Domain Knowledge is what allows you to know that if you’re trying to predict the home team’s score, using the home team’s RBI totals is probably cheating – since the numbers are the same.

That’s where the second problem comes in – not everyone knows what an RBI is. WTF? People know Honey Boo-Boo’s waist size and they don’t know how runs batted in are counted or what they mean?

So, now I have a presentation coming up and I expected to have to explain how the basic statistical model works, which is giving me enough heartburn. I’m using IBM SPSS Modeler (a really fun toy if it’s from work or a really sophisticated analytics platform if you’re trying to get your boss to buy it) to build a model based on MLB games from 2000-2012. (That’s a lot of games – I think there are over 24,000 records in the dataset. However, most analytics models would have more records than that.) The model looks at the factors that influence a home victory – which basically means the home team’s score is greater than the visiting team’s. (Well, duh!)

This is a major advantage of baseball – there are no ties, unless you have an idiot commissioner and the managers run out of pitchers. In the real universe, somebody is going to win. So, you can predict (try to predict) victories.

The other advantage is that baseball is a logical game of progression – you’re not going to have an interception, for example, and you can’t run out the clock. You have a specific number of batters receiving a specific number of pitches. The total number of pitches may vary, but three strikes and you’re out (this is the origin of that phrase, in case you really don’t know baseball.)

So, I will have to go over the basics of linear regression – trying to predict one value based on one or more other values, and then go over the baseball terms to explain why they are important. Oh, and explain SPSS Modeler to an audience that has never seen it.

I really didn’t think I would have to cover all that.

It’s interesting – in the three or four years I’ve had AirHogs tickets, I’ve learned a lot about baseball, but I always knew the basics, so I assumed everyone did. My dad took me to one Rangers game that I can remember (David Clyde was pitching), and I actually never played – I played softball in a corporate league in my thirties (I was a pitcher), but I still knew the basics. Now, we have a generation that doesn’t necessarily know. Oops.

For the record, given the games from 2000-2012 (thank you, www.retrosheet.org), the most important factors in predicting a home victory are:

  • The number of hits by the visiting team
  • The number of hits by the home team
  • The number of visitors’ walks
  • The number of home walks
  • and some other factors (errors, home runs) which have much less impact

The interesting part about this exercise to me has been realizing how important domain knowledge really is. If you don’t know much about baseball, you won’t look at the factors and think, “Wow, pitching is pretty important.” Now,  to baseball people, that’s obvious, but to a fan who is used to someone swinging for the fences, it may not be obvious that the visitors are swinging for the fences, as well – and stopping them is an important part of the game.

If you watch the movie Moneyball,  it begins with the “epic struggle” between the statistics nerd and the old school “just have a feelin’ about him” scouts. However, I think they are basically very similar – the statistics tend to prove what old school baseball people take as gospel (except for the dating the hot chick theory)  – they just don’t know why they know it. Also, the statistics and analytics may prove that some of the gospel is wrong – which is the premise of Moneyball in the first place.

Now, if you don’t care about baseball, then none of this is very meaningful, because the results are just gibberish. However, these lessons apply to business as well – if you are running a business and making decisions based on hunches – analytics can show whether the hunches are correct or not.  Maybe you’re right – in which case, you know your business well. If you’re not, either you’re in the wrong business, or you need to do research before making decisions, and not just guess.

In fact, from a modeling standpoint, building a model to look at baseball is not much different from building a model to check credit scores to approve credit card applications. The only issue that changes is the domain knowledge and the actual data.

The reason I picked baseball in the first place was because almost all of the analytical models I had seen built were for mobile phone churn (customers leaving for other carriers) or banking – what happens if you’re not in either of those industries? So, I assumed baseball was a universal industry that people would have some idea about. That may have been an incorrect assumption – but I’d rather explain baseball to a crowd of people than the mobile phone industry.

Reverse Auction

Maybe ticket prices should be based on a sliding scale, based on the number of pitches thrown. If you have a good pitcher on a good night, you pay less, since you spend less time at the ballpark. If he gets shelled, it costs you more. Attendance would eventually be based on the starting pitcher, and there would be more incentive to have a quality pitching staff.

Random (Again)

With the heat wave we’re under, the stream of consciousness is almost dry, but here’s some thoughts from the past week.

First, thank you to whomever first said “The heat hasn’t been this bad in Dallas since the NBA playoffs.” It’s not baseball, but it will annoy a few people I know from Miami.

Random fact – The starting pitcher chooses his team’s uniform for the game. We were actually told this about a month ago when my wife was sewing names on jerseys and asked which color jerseys needed to be done first. She was told, “We don’t know.” Apparently, the team doesn’t know until just before game time every day. I had just been alternating caps (red and black) at the beginning of the season, but it evidently was pure chance that I was usually wearing the same as the team. In fact, I was told most of the players don’t like the red hat. Who knew?

I asked one of the pitchers I know this week and he verified it. Since he’s not an AirHog this season (yet), it’s not just a Grand Prairie tradition. I always assumed there were home and away jerseys, but apparently, there are uniforms and the pitcher picks one (except on special jersey nights.)

The reason I brought this up is that Dallas has now had 33 days of 100+ temperatures this summer. The first pitcher who declares “shorts and t-shirts” as the uniform will be a hero for his team.

This also means Amarillo must only have one uniform because I can’t believe every one of their pitchers would choose that god-awful yellow jersey with the road-stripe pants every freakin’ night.

Random thought – I will never talk about someone’s hitting again, since it might affect the team’s won-lost record immediately after I publish it. I assumed it would have no effect, since about three people read this, and one of them is me, but I should have known. My apologies.

Random thought – I still haven’t decided when you can start talking about a magic number. I think once it’s 10 or below you can start talking about it. Of course, to know if you can talk about it, you have to calculate it, so if you figure it out [for the American Association, it’s 101 – (first place team’s wins) – (second place team’s losses)], and it’s more than ten, just keep it to yourself.

Random Thought – We had an old fart (I can say that – I’m one) umpire the other night and everyone seemed to agree with his calls at the plate, at least a much higher percentage than normal. The differences between him and other umps? He was older. He seemed more experienced. Most importantly, he made the call. You’re out. It’s a strike. Sit down. Shut up. If all the umpires had the same confidence level when making a call, there would be less hated umpires in the league. Maybe.

Random Thought – It’s 112 degrees in my back yard and almost time to head to QTP. I wonder how the players would feel about naked spectators?

You’re Outta Here!

There is an art to being thrown out of a baseball game. Before I started watching a lot of games at the park, I always thought it was pretty much the same – the umpire made a call, the manager came rushing out of the dugout to argue it, he got tossed. Now, I realize the real action is after the ejection, not before. First, the ejectee has a chance to make his case to remain in the game. This is usually replaced by a few choice comments about the umpire’s eyesight, upbringing or other attributes. Then, there is the walk of shame – at QTP, it’s all the way down the third base line, into the outfield to the far corner of the field to the gate to the clubhouse.

This walk can take quite a long time. Former AirHogs manager Pete Incaviglia would take a tremendous amount of time. It was his evening constitutional. Then, Pete would get so distracted thinking about what he had done (and I’m sure feeling remorse) that he would often leave the gate open. Unfortunately, the game could not proceed until the gate was closed. The umpires would tell the nearest AirHog to close the gate, but the players work for the manager, not the umpire. Eventually, someone would close the gate. Eventually.

There are actually rules about when someone can get tossed – theoretically, you can say anything about the call (“that was a horseshit call”, but not about the umpire (“you’re a horseshit ump.”) Ultimately, it’s the umpire’s decision, so like many decisions, ejections will be questioned, as in a couple of cases below.

Here are the three tosses from the past week and a half or so, two of which were in the same game  –

Ricky VanAsselberg, AirHogs manager. The plate umpire had a strike zone that moved more than a popcorn kernel in hot oil. The batters were swinging in defense, and our pitcher was getting really flustered. Ricky headed to the mound to calm the pitcher down. After the usual pause, the home plate umpire waddled up to the mound to break it up. Ricky kept talking. Then, Ricky started discussing something with the umpire. Then, the hook. I’ve meant to ask Ricky what he said, but I’m sure it was something about if the umpire knew how to keep a strike zone consistent, Ricky wouldn’t be out there wasting time trying to calm down his pitcher. I’m sure it was reasoned and polite. Although, he did get ejected, so I’m pretty sure the term “horseshit” was used somewhere in the calm and reasoned discussion.

Antagonizing the umpire – Ricky stormed off towards home plate, and kicked dirt all over it. It was covered. This I find pretty funny, but it’s been done before. Then, he headed to the dugout to dump his equipment and put someone else in charge, since he was leaving. While this was happening, the umpire brushed off home plate.

Umpire’s Fatal Error – Home plate is between the dugout and the walk of shame. So, Ricky covered home plate in dirt. Again. His catcher was snickering as the umpire bent down to clean it off. Again. That was hilarious.

Mike Conroy, Wichita Wingnuts outfielder. Mike played with the AirHogs before, so I’ve met him a few times. He’s a very passionate guy, so you just stay out of the way, expect the usual outburst every now and then, and nobody will get hurt. He was up to bat, took a third strike, and dropped an F-bomb. It was “slightly” loud, since I could hear it from my seat by the first base dugout. Here’s the interesting part – he wasn’t actually challenging the call. He wasn’t even upset at the umpire, he was upset at himself. Then, the hook. WTF? What is he getting thrown out about?

Antagonizing the umpire – If you tell Mike Conroy “You’re outta here!”, I’m pretty sure he hears “Please tell me how you feel about my officiating, since you are now out of the game and can speak freely. Also, if you have any questions about the legitimacy of my birth or my mother’s alleged former hourly occupation, please feel free to discuss those as well.”

Umpire’s Fatal Error – After Mike discussed the call to his satisfaction and was storming towards the dugout, the umpire called for three more balls. He was signaling the bat boy, but Mike was happy to oblige. A handful of baseballs came raining out of the dugout. As Mike headed down the walk of shame, the bat boy was trying to gather them all up, and was having problems because he was laughing. If you can make someone laugh after causing them work, that’s a pretty good ejection.

Brian Rose, Wichita Bench Coach – I would have never known why Brian was thrown out, but we had lunch with him the next day, and it was being discussed, so I have the quote. Brian is the last person I would have expected being thrown out of a ball game. He’s a bench coach – the voice of reason. Brian’s a calm guy. (He was the AirHogs’ bench coach before he moved to Wichita, so we’ve watched him on the field.) Still, a player had a called third strike (in the same game that had seen Mike tossed a few innings earlier), and the player questioned the call briefly, and then returned to the dugout. So, that was that, until the hook appeared. At first, I didn’t know who had been ejected. It turns out Brian had asked a very innocent question – “How many more of those are you going to get wrong?” Apparently, the umpire took offense. Ironically, this is probably a legitimate ejection, since Brian was questioning the umpire. (He was not alone in this, but you can’t do it out loud if you’re in uniform.)

Antagonizing the umpire – After he was ejected, I really don’t think Brian was antagonizing the umpire as much as blowing off steam after having the same crappy officiating for six days. If you are the bench manager of a team, living on the road while battling cancer (visit Brian’s page for more info), I really don’t think you need a 19-year old working home plate, especially when he’s apologized for blowing calls before. It’s bad for your stress levels. That said, I believe Brian heard basically the same quote as Mike when he was ejected, “Say, since you’re leaving us, what do you think of my officiating? Do you have any constructive criticism for me?”

Umpire’s Fatal Error – You ejected Brian? He’s the one person keeping managers from killing you in the parking lot. That is not going to win you many karma points, dude.

Walk-Up Songs

Most ball players have a walk-up song – that song that plays as a batter approaches the plate or a pitcher approaches the mound. In fact, through the wonders of Google, I found I was not alone in considering the topic. Luckily, that article is well-organized, which makes up for this one.

Some random thoughts, then, on walk-up songs.

When you’re at the ballpark, if you have an Android or iPhone, you can get a great app called SoundHound to help you figure out what the songs actually are, assuming (like me) you’re older than the players by a generation and have no idea what that racket is these kids are listening to these days.

I think everyone should have a walk-up song, even if you’re not a ball player. Can you imagine a librarian wander in between the shelves, while “Bleed It Out” blares over the speakers?

I want “Pictures of Matchstick Men” to start playing as I approach my computer in the mornings. I don’t know why that song came to mind, but the opening guitar riff would be a great walk-up. It would also scare the hell out of the dogs and the Spousal Unit, but that’s just a bonus.

Wouldn’t a walk-up song be an easy item to change if a hitter is slumping? The songs always seem constant throughout a season. Maybe it’s not your stance, maybe it’s not your swing. Maybe it’s just the wrong song. Perhaps Linkin Park would be a bit more motivating than Katy Perry, say. Of course, if you started changing walk-up songs regularly, this would require even more statistics – on-base percentage could be affected by the genre of the song, the sex of the singer and other musical variables.  Eventually, there would be a statistician dedicated to choosing the right song based on the pitcher, the number of men on base, the number of outs, and so forth. In retrospect, maybe one song is enough. Work through the slump.

It would be interesting to discover what the royalty structure is when the team plays the various songs in public – I assume the park just pays ASCAP or BMI (or both) a flat fee since there is music playing almost constantly during some games, but if you weren’t happy with your salary structure, you could pick a really expensive walk-up song and then laugh inwardly every time you went up to bat.

If you’re a struggling musician, you should consider writing and recording a really loud metal or rap song called “See that Ump? Kill that Mutha.” It would probably get a lot of playtime during the spring and summer months.

When the umpires come out before the game, they really should play “Three Blind Mice”, at least until someone records “See that Ump? Kill that Mutha.”

My favorite comment about walk-up songs was the night a woman sitting behind me mentioned loudly that the opposing team’s songs all seemed to be (how to put this delicately) a bit less than manly. They were playing the usual suspects – “Sexy Lady”, “She’s A Lady”, and so forth. I then overheard her date gently explaining to her that if you’re from out of town, the press box picks your song for you – nobody actually asked for “She’s A Lady” to boom out over the speakers as he approached the plate. Perhaps somewhere there is a player so masculine that playing “I Am Woman” would be seen as ironic as he strode to the plate, but I doubt it.

If  you chose the Star-Spangled Banner as your walk-up song, would the game start over every time you came up to bat?

Scattered Thoughts

I can’t believe hockey season went longer than basketball season – and they both go on too freakin’ long. Congratulations, Mavericks! Next year, try to close it out sooner.

On to more important sports.

Baseball can make anyone an obsessive-compulsive about statistics. I was in Nashville for a customer meeting, and after my wife mentioned she was late for the AirHogs game, I thought “This is the South. There has to be a baseball game around here somewhere.” So, a couple minutes with Google later, I found the Nashville Sounds – the Milwaukee Brewers’  Triple-A team, and they play a couple of miles from my hotel. As an added bonus, the Round Rock Express was in town, so I could see a Texas team, specifically a Texas Rangers’ team.

Side note – parking $3, ticket $14, beer $6, beef brisket sandwich & fries $7. Total $30. I think that’s under my meal limit. 

This was a pitchers’ duel – the Express had three hits but couldn’t score any runs. The Sounds had one hit, but it was a home run, so they won 1-0.

I looked at the stats at the end of the game – Scott Feldman, the Express starting pitcher (on rehab assignment from the Rangers) went 5 innings, walked 2, struck out 5, gave up no hits. He only faced 17 batters and he only threw 73 pitches. (I was surprised he came out, actually.) Derek Hankins came on in relief and faced 7 batters. He didn’t walk anyone, struck out 4, got 2 to ground out (the six outs that made up his two innings of work), and gave up one hit – a home run. 24 pitches, 17 strikes … one over the fence. Beau Jones closed by getting three batters out – two ground-outs, one fly-out. Three up, three down. So, three pitchers, a one-hitter, a 1-0 loss.

For some reason, I’m now just obsessing about this. 4 out of 7 struck out. 57% strike-out rate. 17 strikes out of 24 pitches is 71%. 1 pitch out of 24 is 4%. 96% not bad is usually good, but not in baseball. Almost three-quarters of his pitches were strikes, but he still lost the game.

On the other hand, if three pitchers can limit your opponents to one hit, don’t you think somebody should score them some runs in support?

I am going to try to stop obsessing now.

The other thought wandering around my head lately has been how a team is directed – inward or outward, and does it make a difference to the fans? With the change in managers in Grand Prairie, the team seems much more focused on the game – not that they weren’t focused over the last few years, but it seemed like they were more accessible to the fans. Once the game started, that was it – it was heads down, back to work, but the rest of the time, they either chose to interact with the fans or were directed to do so.

It made being in the Booster Club fun, because the players were always around, and they recognized the booster club members.

This year, they’re off to a great start and they’re kicking the crap out of some of their opponents, but sometimes the fans almost seem to be an afterthought. They are circling the wagons and the team is in the center. While I do think it helps minimize the possible prima donna issues on the team, it means the team is looking inward and not outward.

I assume that a lot of the attitude trickles down from the management – do they see the team as family entertainment playing a game or as a unit that must win all the time? (A related question – is the manager supposed to be a baseball evangelist who draws fans to the park or a general waging war, assuming victory alone produces fans?)

So, a question I’ve been asking myself – Is it preferable to follow an average to above-average team that will acknowledge the fans readily and interact with them when possible or follow a championship quality team that apparently doesn’t know you’re there?

I’m too old to hang with the players or try to keep up with them, so it’s not about socializing for me. There are quite a few players (and a few alumni) who are on Facebook, so I can ask questions and get feedback. A few of the guys will always say “Hi” before the games. It’s just I’ve sensed the overall mood has changed.

I’m not sure I prefer winners who are playing for themselves. I think I would prefer winners that were playing for the fans.

Maybe I’m thinking too much about baseball.

 

Determination

The AirHogs won last night, 7-6 over the El Paso Diablos. The victory was clinched in the bottom of the ninth, the way all home team victories should be, and it was the essence of a team that was playing together, making good decisions, waiting for their pitches and working towards a common goal. To many people not familiar with the strategies of the game, it was probably an anticlimactic ending. To any number of fans, it was boring enough to skip.

The game was tied at six. Both teams had traded the lead a couple of times. There hadn’t been any big innings on either side. The AirHogs had a good outing from their starter, Ryne Tacker, and Chris Martin had pitched in relief and sat down every batter he faced. The pitching had done their job. Now, it was time for the bats to win the game.

The AirHogs started the inning at the second spot in their batting order – Antoin Gray. He hit the first pitch he saw into the shallow outfield for a single. The crowd exploded. One more single, and the game was over! It was the last hit of the inning for the AirHogs.

One on.

Next up was David Espinosa, one of the heroes of the All-Star game this week. He has an RBI in this game. He hit the first pitch foul. Then, he watched two balls sail by, and on the second, Gray moved to second on a wild pitch. He swung and missed, and watched the next two go by. Espinosa walked.

Two on.

People waiting for the dramatic swing to win the game are getting worried.

Greg Porter strides to the plate. He’s fourth in RBIs in the league this year. He has the pool at QTP named after him, because he was the first player to hit a home run into it. He has an RBI tonight. He’s overdue for the big hit. Two balls whistle by. An epic swing, strike one. Fouled off, strike two. He watched two more balls pass him by, and he’s on first. The runners advance.

Bases loaded. The AirHogs need one run to win.

Mike Hollimon walks to the plate. He has a triple and a two-run double in the game so far. One good swing, and it’s over. Surely, he will hit one out of the park. People are probably thinking about Mighty Casey at the bat – but forgetting that Casey struck out.

Two balls go by. 2-0 count. Then, he starts to swing. Two fouls, and it’s a 2-2 count. No margin for error. Two more fouls, to stay alive, rattle the pitcher and torture the crowd. He watches a ball sail by. 3-2. Full count. He fouls off the next pitch. People start watching Daniel Berg in the on-deck circle, just in case, even though that’s bad karma.

Last pitch of the game. Ball. He walks. The runners advance. Gray walks home. Run scores. Ball game.

Gray saw one pitch and liked it. Espinosa saw five. Porter saw five. Hollimon saw nine. (The nine pitches were the most stressful at-bat I can recall.) Nineteen pitches, waiting for the twelve balls that would drive in a run without requiring another hit.

A walk-off walk.

It’s not a grand slam, it’s not even a walk-off hit. It’s just good baseball. Actually, it’s great baseball. It’s one of the best endings to a game I’ve ever seen.

Go AirHogs!

Plunk!

Entire essays have been written about batters being hit by pitches, followed by retaliation, re-retaliation and so on. I thought it was interesting that one of our pitchers showed zero innings last night – which means he hadn’t gotten anyone out. That’s all the box score will tell you. Then, my wife mentioned Greg Porter’s wife said he was HBP in the game last night. At that point, I got curious.

Here’s what happened:

In the second inning Grand Prairie’s Michael Hollimon was hit by a pitch and both benches got a warning for the rest of the game.  Come the eighth inning that came into play as Arnoldo Ponce was hit by a pitch from Chris Martin the first pitch after Bernal’s home run.  Martin and his manger Pete Incaviglia were both ejected at that point.  The next inning the Diablos responded by plunking Greg Porter who turned and wrestled catcher Adam Deleo to the ground.  Porter, along with Butch Henry and pitcher Christian Staehley were all ejected at that point as a total of five were ejected from the game. (Quoted from the Diablos game summary.)

Three HBP in one game. Possible (well, probable) retaliation on both sides. Five ejections. Other than adding a little drama to an otherwise pretty boring rout, does it really help or hinder the teams?

HBP made some sense to me when pitchers weren’t replaced by designated hitters, so if you plunked one of their guys, you were going to get hit. After the DH entered the picture, HBP should really mean “Hit By Proxy.”

These days, although it’s still gonna hurt, hitting someone with a pitch is really just giving a guy an intentional walk without all the outside pitches. So, you have to wonder why giving up a base would make sense on the defensive side of the ball.

Now, if one of your guys got hit and you’re retaliating, then I understand. I’m not sure I agree, mainly because it has always seemed a bit childish, since you’re putting your defense down a base just to make a point, but sometimes, you do what you have to do.If the manager tells you to make a point (not that they ever would), you make a point.

There are also unintentional HBP – the pitcher just lets one get away or it sails a bit and the batter gets plunked. As we saw in an earlier game this year, there are some batters who will draw a HBP by simply not moving, and if the umpire doesn’t know the rule specifically states that the batter should try to get out of the way, he’ll get a free base. (We had one crew that gave one opposing batter three bases in two nights because he didn’t move. You wonder why we hate the umpires.)

When a batter gets plunked unintentionally and suddenly everyone’s getting hit in retaliation, I just fail to see how this is helpful, either tactically or strategically. I suppose if it incites a brawl, it will build some camaraderie on both sides, but you’re really just putting one of their guys on base, and in some cases, you’re also getting yourself removed from the game. (Note to Greg on last night – good move going after the catcher instead of charging the mound. Very original and probably very surprising to the catcher!)

Maybe someone who has played will comment. I know historically there were pitchers who were just protecting the plate by throwing inside and sometimes, batters who crowded the plate got hit. Establishing the zone made sense – especially if batters were usually frantically ducking out of the way and learned where to stand without getting hit. Now, I’m not sure the pitchers are that calculating.

It seems to me the best revenge for someone getting plunked is still winning the game. In that case, throwing strikes may be the best retaliation for one of your guys getting HBP.