Tuesday, December 11, 2012

Why Baltimore was correct to kick an extra point for an eight-point lead against Washington

With 4:47 remaining in their game vs Washington, the Baltimore Ravens scored a touchdown that increased their lead from 21-20 to 28-20.  You probably know what happened next: the Redskins drove down the field, overcoming the loss of RG3 along the way, with Kirk Cousins hitting Pierre Garcon for a touchdown with 32 seconds remaining and then successfully executing a QB draw for the tying two-point conversion; Washington then went on to win the game in overtime.  In the aftermath of the game, some analysts, such as Grantland's Bill Barnwell and espn.com's Gregg Easterbrook (aka TMQ) suggested that Baltimore ought to have attempted a two-point conversion after their touchdown, which if successful would have given them a nearly insurmountable nine-point lead.  They both suggest that the two-point try offered great upside with little risk, pointing out that even if it failed, Washington would still almost certainly only end up tying the game with a touchdown.

But closer examination of the issue reveals that this is a rare case where Barnwell and Easterbrook (normally very sharp football thinkers) are wrong, and the conventional wisdom manifested in John Harbaugh's decision is right.  In fact, Barnwell and Easterbrook have fallen prey to the very same error that underlies many of the legitimately foolish strands of football's conventional wisdom: defining the issue in terms of vaguely-specified risk/reward criteria, instead of a rigorous analysis of the win probabilities involved.  The math involved in this particular case is actually too simple to be of much inherent interest, but it still serves as a valuable reminder of how important it is to think through these questions in quantitative terms - we see that even top-caliber analysts can be led astray when they rely entirely on their intuition. 

Here, the key point is that by kicking an extra point, Baltimore made it considerably more difficult for Washington to tie the game with a touchdown, as they now needed to convert a two-point try, rather than simply kicking an extra point.  The fact that Washington would have faced a much easier road to a tied game following a failed two-point attempt by Baltimore may not trigger any visceral perception of risk, but it still endows the extra point with a fair amount of upside in relative to a failed two-point try.  In order to estimate the success probability at which Baltimore's two-point try would have been a break-even proposition, we need to weigh this upside against the greater reward offered by a successful two-point conversion, which would result in a nine-point lead. 

We will use P(B) to denote the probability of Baltimore successfully executing a two-point conversion, P(W) to refer to the probability of Washington executing a successful two-point conversion.  (For simplicity, we assume that probability of extra point success is 100% - this variable largely cancels out on both sides of the equation, and the logic of the conclusion can easily be restated in order to allow this assumption to be relaxed.)  Now, restricting ourselves to the space of potential outcomes where Washington outscores Baltimore by exactly one touchdown over the remainder of the game (since these are the only feasible outcomes where Baltimore's post-TD decision is directly reflected in the final result) we see that Baltimore faces the following dilemma:

Option 1: Kick an extra point.  Now with probability P(W), the game goes to overtime, and with probability 1-P(W) Baltimore wins in regulation.
Option 2: Attempt a two-point conversion.  Now with probability P(B), Baltimore wins the game in regulation, and with probability 1-P(B), the game goes to overtime.

So to first approximation, Baltimore's two-point try breaks even if P(B) = 1-P(W); i.e. if their probability of success is as high as Washington's probability of failure.  While we cannot say for certain that this hurdle was not reached in the game, we note that it requires that at least one team to have a two-point success probability of over 50%* - in which case they ought to be adopting the two-point conversion as their general post-touchdown strategy.  This conclusion can be summarized by pointing out that statistically, a team is better off needing to defend a two-point conversion than they are needing to make one of their own, unless one or both teams would be successful more than half the time. 

And in fact, the two-point conversion attempt has yet another factor in its disfavor.  As indicated above, the game situations where Washington outscores Baltimore by exactly one touchdown over the remainder of the game are the only ones in which Baltimore's post-TD decision has direct bearing on the final score.  But the probability of the various rest-of-game outcomes is not an entirely exogenous factor to their decision; in particular, a successful two-point conversion, by clarifying Washington's scoring requirements for the remainder of the game, would make it slightly more likely that Washington would outscore them by one TD plus another score of some variety, thereby winning the game outright themselves.  This is still a very low-probability event - scoring twice in the final five minutes is always a tall order - but it still means that a slight downward adjustment to the equity of the two-point try is in order.  Therefore, even if the nominal break-even point of P(B) = 1-P(W) were reached, we conclude that the two-point attempt would still be an incorrect decision. 

* Here is we can bring extra-point probability back into the mix: the actual threshold is not 50%, but P(XP)/2.  But the general consideration remains the same: if a team's two-point probability were in excess of this value, they ought to attempt a two-point conversion after virtually every touchdown.

Tuesday, February 8, 2011

This is an extract from a letter I wrote to espn.com's TMQ, in which I defended Mike Tomlin's decision to attempt a two-point conversion late in their Wild Card playoff game against the Jacksonville Jaguars following the '07 season. The Steelers had scored just a touchdown, reducing their deficit to five point, with 10:25 remaining in the game. A holding penalty was called on the two-point attempt, moving the line of scrimmage back to the 12-yard line. At this point, many commentators (including TMQ) apparently believe that Tomlin erred in repeating the two-point try, rather than kicking an extra point. An exception was footballcommentary.com, whose calculations suggest that the break-even probability of success on the two-point try was 0.17 - a relatively low threshold. Unfortunately, the process used by footballcommentary.com is rather opaque, using a dynamic programming model based on league-average probabilities of possible game events. As a result, I have found in the course of my discussions of this issue that casual fans do not hesitate to dismiss footballcommentary.com's conclusions out of hand, whenever they violate conventional intuition on the subject, as was certainly the case here.

In order to vindicate footballcommentary.com's two-point conversion chart - and also to show how the break-even point can be modified based on our knowledge of the specific teams in question - I will attempt to work through the issue in a more step-by-step manner. This will undoubtedly sacrifice some degree of accuracy in comparison with footballcommentary.com, but should serve to validate their conclusions by showing how they can be approximated in a thoroughly transparent manner. While I will refer only to the particular case faced by the
Steelers in this game, the general intuition can be applied to many late-game situations where a two-point try must be considered.

Our approach will be to partition all possible outcomes of the game by conditioning over the possible future scores by our opponents after the possible two-point attempt, and then evaluating how the two-point conversion attempt holds up in comparison with an extra point in each case, with each case weighted by its estimated likelihood of occurring in the game. This would ordinarily be a very tedious (and very imprecise) process; in this case, however, we can safely assume that if the opponent scores twice more, our win probability is so small that it can safely be disregarded. (This may seem a controversial assumption, but I believe it is justified: even if the opponent scores two field goals, this requires that we score twice more as well - and their scoring possessions presumably consume a large amount of the time remaining in the game. So while not 0, the probability of winning if the opponent scores twice more following the two-point try is of an order of magnitude that its effect on the final conclusion will be negligible.) This leaves us only needing to consider three cases:

a) The opponent does not score for the rest of the game. Then the
two-point conversion attempt is clearly superior to the extra point,
as if successful, it allows us to tie the game with a field goal. So
it is superior from a win probability standpoint by a margin of P(2) x
P(FG) x 0.5.

b) The opponent scores a field goal later in the game. This is more complicated, as the extra point after this touchdown would allow us to tie the game with a subsequent TD + XP, while a two-point attempt, depending on its success or failure, would either leave us needing a TD + 2-pt in order to tie, or allow us to take the lead after a TD + XP. So the win probability value of the extra point, holding other things equal, is P(TD) x 0.5 (I'm assuming throughout this discussion that all extra points succeed, as this assumption works unequivocally against the case I am attempting to make in favor of the two-point try), and the win probability value of the 2-pt attempt is [P(2) x P(TD)] + [(1-P(2)) x P(TD) x P(2) x 0.5]. So the net value of the two-point conversion attempt rather than the extra point is the second expression minus the first.

c) The opponent scores a touchdown later in the game, and kicks an extra point (regardless of our two-point decision and its success/failure, they would have no reason to attempt a two-point
conversion of their own at this point.) Then it makes virtually no difference whether an extra point or two-point was attempted after this touchdown, as one (and only one) two-point conversion will be required in order to allow a field goal to tie the game, and it can be equally well attempted after the next touchdown.

So in case (a), the two-point try is unequivocally superior, and in case (c), the two-point and XP attempts are of equal value. This leaves only case (b) in which the XP could come out ahead. However, we can easily see that if we use a conservative approximation of 0.4 for P(2) in
these expressions, the net value of the two-point conversion is still positive, as it simplifies to 0.52 x P(TD) - 0.5 x P(TD). In fact, for any value of P(2) greater than 0.38, going for two strategically dominates the extra point, as it provides equal or greater win probability regardless of what the opponent does for the rest of the game. Thus, it is undoubtedly the right call in a normal situation.

Things become more interesting when one assumes that P(2) is significantly less than 0.38, as was presumably the case when Pittsburgh attempted their conversion from the 12-yard line. Then one needs to estimate values for P(TD) and P(FG), as well as for P(Opp. No Score) and P(Opp. FG), which gives the respective weight to be assigned to cases (a) and (b). Plugging in league-average values gives a break-even point for P(2) (i.e. where the gain from case (a)
exactly offsets the loss from case (b)) of 0.156 - quite close to the footballcommentary.com value of 0.17. However, one can pursue the question still further by using team-specific and game-specific values for the various probabilities. For example, using FootballOutsiders' Drive Stats, with a simple average taken between the values for Pittsburgh Offense and Jacksonville
Defense, or vice versa, gives a more appropriate value for the break-even point than using league average values does. (There may well be an even more accurate method of estimating these team-specific probabilities, but this suffices as a first approximation.) By this method, since the Pittsburgh defense was generally so good at preventing the other team from scoring, the break-even point for this particular P(2) is actually reduced all the way to 0.12. Surely this threshold is low enough that Tomlin's decision was justified. [In a discussion of Bill Belichick's decision to attempt a 4th and 13 conversion rather than kick a FG in SB XLII, footballcommentary.com uses a success probability of 25% in their analysis - the probability here is lower, since a defensive penalty would not result in an outright success, but certainly cannot be pegged below the 12% threshold.]

A simplified version of the "two-point conversion equation" for this game situation (trailing by 5 following the TD, and where the amount of time remaining is such that we can safely disregard situations where the opponent scores two or more times) is the following:

P(2) x P(Opp. No Score) x P(FG) x 0.5 - [P(Opp. FG) x P(TD) x [0.5 - (1.4 x P(2))] = 0.

This establishes the break-even probability for P(2). Analogous equations could be constructed for other game situations, and empirical testing of the degree of inaccuracy introduced by the assumptions could refine the conclusion still further. Regardless, the primary purpose of this discussion was to show that footballcommentary.com's chart of break-even probabilities makes good logical sense, and by obtaining a close correspondence with its result in one of the most apparently counter-intutive cases, I think that it succeeds in this endeavor.

Thursday, November 26, 2009

Thoughts on NFL Overtime

It is apparent to many observers of the NFL that the current overtime system (coin toss to determine which team kicks off, first points thereafter decide the game) is flawed. In a large number of cases, the team who receives the opening kick scores on this first possession, ending the game before the other team has had a chance at an offensive possession. Overall, teams winning the toss won 60% of all OT games between 2000 and 2007 (http://www.advancednflstats.com/2008/10/how-important-is-coin-flip-in-ot.html) and 59% of all OT games from 1994-2003 (http://www.footballcommentary.com/otauctions.htm). This means that a team will gain or lose 10% of win probability depending on a completely random event, clearly an unpalatable state of affairs. But it's worth examining the issue in more detail, as it is not as easy a case to resolve decisively as it may appear at first, and it gives rise to some interesting considerations along the way.

1) Proponents of the current system often attempt to entirely dismiss the primary objection - that the game can be decided without both teams having possession - by saying that the losing team in this scenario should have "played some defense." This is line of reasoning is obviously inadequate. It rests on the fallacy - still all too common in sports discussions - of treating game events as arising deterministically, rather than probabilistically, from the actions of the players. The fact that a team allows a score on a given possession does not necessarily imply that their performance was in any way sub-par; it is perfectly possible to perform at an average level - or even above this - and still suffer a reversal due to the random breaks of the game. (Note that these are not random in the strict quantum mechanical sense - but are determined by margins that are too fine to be precisely correlated with physical and/or mental actions of the players.) A given level of performance implies a probabilistic distribution of game outcomes, and vice versa. In the absence of further information, the end result of one possession is much too small a sample to infer anything about how well either team played. In general, NFL teams score on around 30% of their possessions. This does not mean that teams "played some defense" 70% of the time, and failed to do so on the other 30%. A good defensive performance on a possession may reduce the average probability of a score to 25% or 20%, but it does not reduce it to anywhere near 0% - that's just not how the world works. So even if the defensive team does all that can be asked of them on the first overtime possession, they have still been exposed to a substantial risk of losing immediately, while their opponents have faced only the comparatively trivial risk of losing on a turnover retured for a score.

2) Opponents of the current system often introduce a red herring into the discussion by calling it "unfair." As those who support the system tend to gleefully point out in response, this is not correct - both teams have an equal shot at the advantage resulting from the coin toss, so no fairness criteria are violated. A more precise way to state the objection is that the result of the game is highly influenced by an exogenous factor. After all, a system that determined the result of a game that was tied at the end regulation entirely by a coin toss (winner of the toss wins the game, no OT play at all) would be fair to both teams - but it would also be pretty stupid. One of the most important goals in constructing rules for a game should be to ensure that the greatest predictor of success is performance across a set of related skills. Some element of physical randomness is unavoidable (see #1 above) and even desirable. But introducing external random factors - such as coin tosses - into situations where they have a large impact on the outcome unnecessarily weakens the correspondence between performance and result.* If one were to graph the win probability of both teams over the course of the game, one of the largest single-event changes would result from the OT coin toss - it would rank alongside the game's biggest plays, and those in the "clutchiest" of situations, as one of the most significant events in determining a winner. When considered in these terms, the objection to the coin-toss/sudden death system should be obvious.

3) Even with the validity of the objection established, commonly-suggested alternatives often introduce flaws of their own. The most obvious possibility - stipulating an equal number of possessions for each team - risks making the game last substantially longer, exposing the players to greater risk of injury and (silly as this consideration sounds) interfering with TV scheduling. It also increases the possibility of the game ending in a tie, which many NFL fans seem to find an insufferable proposition. On the other hand, resorting to alternating possessions from the 25-yard line, as in NCAA football, offends football purists who deplore such gimmickry.

4) As a result, it may seem that we are trapped in a "de gustabus non disputandum est" situation, where every possible solution has its own strengths and weaknesses, and it is a matter of taste whether one prefers the current system or one of the alternatives. However, we could circumvent this problem if we were able to state the major objectives of an optimal overtime system, and present solutions that dominate the current system by performing equal or better in all categories. It seems to me that the main criteria are:

1) Minimize the probability of a tie
2) Minimize the amount of time it takes to reach a result
3) Minimize the effect of exogenous factors on the determination of the result
4) Maximize the similarity between OT and regulation play (i.e. avoid gimmicks)
5) Maximize "excitement" (the subjectivity of this criterion means that it can probably serve as a catch-all for all other specific criteria that one might think of)

Clearly, the current system does pretty well at 1, 2, 4 and 5. Solutions involving equal possessions or a fixed-time OT period improve on 3, but at the cost of a decline in 1 and 2. So whether one prefers these alternatives to the current system depends on how much weight one assigns to 1-3 on the list above. This makes it hard for supporters of the respective solutions to make any progress in persuading the other.

But there seem to be good candidates for solutions that dominate across all five criteria:

1) Simply extend the fourth quarter. When time runs out in the fourth quarter with the game still tied, add 15 minutes to the clock, and continue play as before, with the next points deciding the game. This still gives one team the advantage of having the first opportunity to score, but the difference here is that there is no exogenous factor - the advantage is built-in to the game state at the end of the fourth quarter, such that both teams have ample time with which to factor it into their plans earlier in the game. Not only is a coin flip not responsible for a sudden spike in win probability - there is no spike in probability at all. (To confirm this, consider that the last moments of regulation are de facto sudden death, as there will be no time left for a score in response. So if a team keeps possession across the nominal end of the fourth quarter and into overtime, their win probability will not change, except to increase as their distance to field goal range decreases.)

So this possibility obviously does at least as well as the current system in categories 1, 2, and 3. In my opinion, it gets a particular added boost from its performance in category 5. Under the current system, a team driving down the field with little time left, trailing by 3 or 7 points, will often play for a tie, taking their 50% chance of winning in overtime minus whatever chance there is that their opponent scores in regulation. Under the extended fourth quarter overtime scenario, teams in this situation would know that the opponent would in effect be guaranteed to win the overtime coin toss, giving them greater incentive to play for an outright win in normal time. Teams trailing by 3 with a minute remaining might forego a 30-yard field goal on 4th and 2, and play for a touchdown instead, and teams who score a late touchdown when trailing by 7 may choose to attempt a two-point conversion. The decisions made by coaches in such situations are often excessively conservative, failing to maximize win probability (as well as being relatively boring.) It is therefore a beneficial secondary consequence of this overtime system that it would give them a nudge in the right direction. Besides being more exciting and representing less frustratingly bad decision-making, this incentive to play for a win also feeds back into the second criterion, making it more likely that the game will end sooner, without any extra time played at all.

Several people have objected that this system would take away the two-minute drill at the end of a game, as teams now know that with the ability to continue a possession beyond the end of the quarter, they now have no reason to hurry. But this only applies to situations where the game is tied - teams trailing by anywhere from one to eight points would still have to run their two-minute offense. And when the game is tied, the two-minute offense only has to reach the opposing team's 30-yard line in order to have a decent shot at a winning field goal. Is this really the most exciting thing in the world? (Perhaps Patriots fans have reason to be a little biased on this question, recalling SB XXXVI). In any case, this consideration is also counterbalanced by the fact that it avoids a situation where a team is pinned deep in their own territory late in the game decides to run out the clock and play for overtime rather than attempt to do anything with the final possession of regulation (what John Madden thought that the Patriots should have done.)

2) The key feature of the extended fourth quarter system is that both teams know which team will have the first chance to score in the sudden death period, before this period starts. This allows them to spread the win probability deficit out throughout many game events throughout regulation, and allows them to increase their chance of overcoming it entirely through the decisions they make. Therefore, overtime systems that make use of this principle allow us to keep the sudden death feature (which cuts down on time and on ties) without the ugly spike in win probability that occurs when one team suddenly finds out that they are the first to have their heads on the chop block (no pun.) There are other ways to accomplish this: giving the first overtime possession to the home/road team, performing the OT coin flip at half-time, and having the opening coin flip count as the OT coin flip are the most obvious. In both cases, the team who will be at a sudden-death disadvantage may have slight incentive throughout the game (or the second half in the case of the half-time flip) to unbalance the score by attempting two-point conversions in situations where they otherwise would not. This could be kind of cool, though I still prefer to give the incentive to unbalance the game to the team who trails last. Retaining the coin flip, either at the outset of the game or at halftime, allows the overtime period to begin with a kickoff, which might be construed as superior performance in the fourth criterion, and takes care of the two-minute drill problem (though it allows teams in some cases to sit on the ball and force overtime, as in the current system.) Personally, the extended fourth quarter remains my favorite of these possibilities.

3) Another solution entirely is to keep the sudden death element of overtime, but to determine possession by bidding on field position. For a full discussion of this, see http://www.footballcommentary.com/otauctions.htm. It could be maintained that this is subpar according to the fourth criterion, as it involves a coaching decision the like of which never appears at any other point in a football game. But it fulfills the basic requirement of dominating across the first three categories without resorting to anything excessively foolish, and therefore would be a better choice than the current system.

Saturday, November 21, 2009

Why do people overrate the punt?

My last post tried to show that the criticism of Belichick's fourth-down decision is largely misguided - that a very realistic value of the probability of the Colts scoring from a short field causes the decision to break even, even if all other assumptions are made so as to favor the punt. If I am correct, this would fit the general trend in NFL practice and commentary: most coaches choose to punt far more often than a probabalistic analysis indicates that they ought to, and most pundits compound the situation by criticizing coaches on the rare occasions where they correctly forego a punt (but almost never vice versa.) So perhaps we should follow up by looking at the meta-question: why are people so eager to believe in the punt? After all, if people are going to go around spouting irritating nonsense about sports, we can at least use this as a natural sociological experiment to identify the sorts of conceptual errors to which human beings are particularly prone. Here is a letter I sent to Boston Metro on Monday morning following the Colts game, in which I offer a possible answer for this in the case of punting:

--------------------------------

The mad scramble by fans and pundits alike to condemn Belichick's 4th-and-2 gamble on Sunday (believe it or not, the numbers suggest that he probably made the right decision) is evidence of a more general issue worthy of discussion: the punt is currently the most overrated strategy in American sports. Contrary to what the "Trust your defense!" crowd seems to believe, punting does not send the ball across a magical Rubicon that ensures that the other team will fail to score. It adds some distance - generally around 40 yards - to how far they will have to go, making it somewhat less likely that they will succeed. This can make it a valuable tactical option in certain game scenarios. But in a great many cases, it is merely a foolish waste of an opportunity to do something productive with an offensive possession, whether scoring points, or, in this case, running out the clock without the other team having a chance to touch the ball. The fact that NFL teams punt far more often than they ought to has been mathematically demonstrated several times: the ground-breaking analysis was done by David Romer of UC Berkley in a paper called Do Firms Maximize? Evidence From Professional Football, and espn commentator Gregg Easterbrook (aka TMQ) routinely exposes the folly of the automatic fourth-down punt in his columns.

So why is it so firmly ensconced in conventional wisdom that punting (in all but a tiny minority of fourth down situations) represents "playing the percentages," when in fact the opposite is often true? The answer seems to lie in a phenomenon known by economists as "risk aversion" - we are hard-wired to prefer the certainty of some fixed amount of most things (food, money, etc) rather than an uncertain outcome where we will either end up with a great deal, or none at all. This is true even if the weighted average of the uncertain outcome is higher than the certain one: when signing a ten-year contract, most of us would choose a guaranteed income of $50,000 rather than a possibility of either $5000 or $100,000, to be decided by a coin flip. (Risk aversion is one of the major reasons why people purchase insurance.) This is a sensible response to uncertainty when making economic decisions: personal well-being, not money, is the ultimate currency here, and for most people, being rich is not that much better than being comfortable, while poverty can be ruinous. But risk aversion is not rational when the uncertain quantity is win probability in sports, as this probability itself is the ultimate standard by which the decision is to be evaluated. (As such, it is not subject to decreasing marginal utility - a strictly linear relationship is maintained at all values.) Here, ten times out of ten, we should prefer a decision that will leave us with either a coin-flip 5% or 100% of winning to one that will guarantee us a 50% chance of winning.  While the specific numbers differ widely from case to case, this general template underlies the various possibilities arising from a fourth down decision. A punt will guarantee a relatively negligible impact on win probability - we don't score, and we lower the probability that the other team scores. Nothing much has changed. An attempted fourth-down conversion, on the other hand, will have a much greater impact on win probability in one direction or the other: either we keep the ball, with a good chance to put points on the board, or we give the other team the ball with good field position. We instinctively balk at this latter scenario, treating a coach who chooses it as though he were an investment manager exposing his client's 401k to ruin in pursuit of a big payoff. In fact, we should be celebrating coaches like Belichick who are bold enough to look past the irrationality of the conventional wisdom, and make decisions that put their team in the best position to win.

Comments

on anything I say here (or anything in general, for that matter) can be e-mailed to dajepson at gmail dot com.

Brief addendum on P(S)

Re-reading my previous post on the 4th-and-2 question, I realize it may seem a bit disappointing that I failed to offer a stronger conclusion on whether or not Belichick's decision was ultimately correct. I thought it important to err on the side of understatement - to elucidate the reasons why the punt tends to be overrated in such situations without using any over-optimistic estimates that might encourage a skeptic to dismiss the thought process entirely. However, I think it is possible to make a stronger attempt at showing why 0.77 is such a realistic value for P(S), without jeopardizing the tightness of the case:

1) Heading into this week, Indianapolis has had 97 offensive possessions; according to the invaluable Drive Stats page at footballoutsiders.com, these have traveled an average distance of 39.59 yards. So P(S) is essentially the probability that they could sustain a drive covering roughly 75% of their season average. If we account for the fact that New England has allowed around 28 yards per defensive possession (e.g. by taking the average of the two figures) we still have an expected value for drive distance that is greater than that required for a TD in this case. Even though this is the mean value, not the median, this already seems like very strong evidence that any estimate of P(S) must start at values considerably higher than 50%.

2) The probability of a TD is higher than that suggested by the season yards-per-drive averages of the two teams. We must give P(S) a small boost due to the fact that Indianapolis is playing at home, on their favored artificial turf, in front of a crowd who will not disrupt their snap counts. More importantly, we must give P(S) a substantial increase because the Colts are playing with all four downs in this situation - there will be no punts or field goals. Most drives in the season-long data set would have only used three downs to try to pick up a first down, with a kick of some sort occurring on fourth down if there was no success.

3) Adjustment for the precise state of the match-up between the Colts' offense and the Patriots' defense at this point in the game should further raise P(S). Restricting ourselves to the data of the final quarter does not generate a robust sample size, of course, but it is important to take account that the match-up had clearly assumed a markedly different character late in the game. Even if we retrospectively impose justice on the proceedings by treating the bogus 39-yard PI as an incomplete pass, Manning had completed 7 of 10 fourth-quarter passes at better than 10 YPA, and three running plays had gone for 9, 11, and 4(TD) respectively. Apart from the interception thrown by Manning midway through the quarter, the closest Colts came to being in danger of failing to convert a first down was when they faced 3rd and 1 at the Patriots' 4-yard line. There was no indication that barring a bad mistake by an Indianapolis player, the Patriots could do anything to generate a significant probability of preventing a first down on a given series.

For these reasons, I fail to see how anyone can confidently maintain that P(S) was below 0.77.

Why Belichick's 4th-and-2 decision was almost certainly correct

There are many decisions from last Sunday's game vs. Indianapolis for which Bill Belichick could reasonably be second-guessed. The unconventional decision to attempt to convert a 4th and 2 from his own 28-yard line, leading by six points as the two-minute warning approached, was not one of them. Note that this is not the same as saying that the decision was necessarily correct; this assessment depends on the exact values of the various probabilities involved. No one knows these for sure. However, I will show that a) even with very conservative estimates of the probabilities, the 4th down conversion comes out looking very reasonable, and b) the particular assumption made by many of Belichick's detractors - that the Colts were all but guaranteed to score following a failed 4th down play - actually confirms that his decision was correct. Paraxodical as it seems, only by assuming that the Patriots had a good chance of stopping the Colts from the short field can one maintain a case that the punt was the better choice. This will be explained in more detail below.

We start by assigning variables to the relevant probabilities. Let's call the probability of the Patriots picking up the first down P(1), the probability of a Colts TD following a failed fourth-down play P(S), and the probability of a Colts TD following a punt P(L). Then our equation for finding the break-even point between the punt and the attempt to pick up the first down is:

P(1) + [1-P(1)] x [1-P(S)] = 1-P(L)

The left side of the equation contains the probabilities that of the Patriots winning the game following their fourth-down play: either by successfully picking up the first down [P(1)], or by failing to pick up the first down [1-P(1)] but preventing the Colts from scoring anyway [1-P(S)]. The break-even point occurs when this sum is equal to the probability of winning the game following a punt, which is essentially equal to the probability that the Colts fail to score from a longer field [1-P(L)].

Procedural note: we find the break-even point because it is easier to deal with equations than with inequalities. Once we have found the break-even point, it will be clear that deviations in one direction favor one decision, and vice versa.

This equation contains two assumptions which need to be addressed:

1) The Patriots are guaranteed to win the game if they pick up the first down. This is not quite true, but it is very close. With one time-out, Indianapolis's best-case scenario, barring a very improbable event, is getting the ball back with about 30 seconds and no time-outs, needing to drive for a TD. Following a normal punt, the chances of this - even for Peyton Manning - are truly miniscule. Their only significant prospects come from a fumble on a Patriots' running play, a blocked punt, or a long punt return. These are all very improbable events which do little to raise their win probability - especially when multiplied by the probability of converting such an event into a TD, which is still not a guarantee. Other analyses have used 92% as the Patriots' win probability figure here; this is derived from data in all similar score/down/distance situations, regardless of time-outs - a factor which makes a world of difference here. Using the 92% estimate errs on the side of caution, which is generally laudable. But here it seems like a truly gratuitous concession to the case for punting.

2) No outcomes need to be considered other than those considered in our equation. The main omission here is the chance that the Patriots win with a field goal following a Colts touchdown. This probability adds to the expected value of the fourth-down attempt, so the omission is generous to the case for punting (almost certianly by a larger margin than the 100% win probability estimate following a successful conversion helps the case for fourth-down attempt.)

To solve the equation, we need to supply values for P(1), P(S), and P(L). Since we can never be sure of the exact values, it is important to select very conservative estimates from the point of view of our ultimate conclusion. Many statistical analyses of the decision have used fourth-down conversion data to estimate that the Patriots had a 55-60% chance of success, and used this to show that the fourth down attempt had a higher expected win probability; while this may be a good estimate, it is susceptible to the following skeptical response: "I don't believe that the probability was really this high - the Patriots offense was looking shaky, the Indy defense was fired up, the five-receiver set was a poor choice, etc. etc.....therefore I reject the conclusion reached by your analysis."

Bill Simmons (The Sports Guy) provided a perfect example of this thought process in his espn.com article on the subject. He suggests that the probability was more like 1 in 3 - the rate at which offenses have successfully made two-point conversions with passes on the road in the last three years. This is obviously a ridiculously low estimate - he would have us believe that an offense featuring Tom Brady, Wes Welker and Randy Moss has 12% less chance of picking two yards on a single snap than the average NFL team does when they attempt a two-point conversion (a situation where the defense also has less field area to defend, further raising the offense's degree of difficulty.) Nonetheless, we will use his figure as our estimate of P(1), confident that none but the wilfully obstinate can now object that we are overestimating the Patriots' chance of success. This means that our equation is now:

.33 + .67 (1-P(S)) = 1-P(L)

Now we come to the crux of the entire issue: the underlying reason why so many people have misjudged this decision so badly (and why hardly an NFL game goes by where a coach or a commentator does not badly misjudge a punting decision.) How much does the Colts' probability of scoring change depending on whether they take over following a failed fourth down or following a punt? Unfortunately, there is an enduring irrationality baked into the manner in which conventional wisdom evaluates such questions, which can loosely be characterized as follows:

"Not punting is a huge risk. The other team may end up with great field position, and will probably score. Punting gives the defense a great chance to stop them."

No element of this statement is categorically false, but every element involves a stretching of the truth to the point that the ultimate conclusion is badly distorted from what is suggested by a dispassionate account of the probabilities involved. (Why have people been so inclined to accept stretching of the truth on this particular issue? I will address this in another post.) In fact, while punting does reduce the probability that the opponent will score, it does so by a much smaller margin than is commonly believed, particularly when dealing with a powerful opposing offense such as that of the Colts.

To see why, consider a league of hypothetical football, where every first down is spotted ten yards from the previous first down - i.e. if you complete a 20 yard pass on 3rd and 2, you only get to move two yards down the field. Let's stipulate also that we know empirically in this league that first downs are equally easy to convert at any point on the field - i.e. 1st and 10 on your own 30 is no more or less difficult to convert than a 1st and 10 in the red zone. So in this league, the number of first downs you need to achieve in order to score a touchdown is precisely determined by the number of yards away at the beginning of the drive. And since the probability of a first down does not change over the course of the drive, if we have an estimate of the probability that a given offense will pick up a first down against a given defense (let's call it p) then the probability of scoring from the other team's 30 is p^3, the probability of scoring from your own 30 is p^7. This means that the probability of scoring from two given points on a field is given by an exponential relationship determined by the ratio of the two distances to the end zone. The probability of scoring from your own 30 is equal to the probability of scoring from the other team's 30 raised to the power of 7/3.

What does this have to do with real football? The point is that every adjustment that we have to make in order to conform with actual football works in the direction of making a score from a longer distance MORE likely (in actual football) than predicted by the exponential model derived in hypothetical football. The primary factors are that 1) in real football, it is harder to move the ball in the red zone, as the opposing defense has less area to defend, and 2) the greater field distance affords a greater opportunity for the offense to pick up "residual yards" with each first down, thus reducing the number of first downs required for a score. (We also may need to adjust the implicit assumption in the hypothetical model that first-and-goal success probabilities conform to fractional exponents. But this adjustment - even if it works against the probability of a long-field score, which is by no means apparent - should be dwarfed in scope by the two primary factors listed above.) Therefore, the exponential approximation is a lower bound estimate for expressing the probability of a long-field score in terms of the probability of a short-field score.

In the present case, we can use this lower bound as another conservative assumption from the point of view of the case for the attempt to pick up the first down. We can also - again generously to the punting case - assume that all fourth down failures would leave the ball at the Patriots' 28-yard line, and that Chris Hanson would average a 42-yard net on a punt. This means that we now estimate that P(L) = P(S)^2.5, thereby giving us both of the Colts' scoring probabilities once we make a single judgment about how easily they would be able to move the ball against the Patriots' defense on this final drive. (This method depends on the assumption that the clock was not a significant factor, which seems to be unobjectionable. A drive requiring 70 yards in two minutes [with one time-out] for a team that is very comfortable with a no-huddle offense is hardly likely to fail because the clock hits triple zero.)

Our equation for the break-even point now becomes:

.33 + .67[1-P(S)] = [1-P(S)]^2.5

All we have done to get here is to 1) implement some assumptions that clearly err on the side of the case for punting, and 2) make use of a little mathematical manipulation in order to express the equation in terms of a single variable.

So now the million-dollar question: what is P(S)? Again, no one knows for sure - we can look at all the data we want for how often teams have scored a touchdown from 28 yards out in this situation.....but none of those teams were QBed by Peyton Manning at the peak of his powers, playing against an undermanned and worn-out defense, who clearly had no adjustments left to offer against the Colts' hurry-up offense. And the main point is this: the vast majority of those who rushed to criticize Belichick's decision almost certainly did so based on the assumption that a 4th-down failure would be a terrible outcome, leaving the Colts in great position to score. In fact, as can be easily verified by plugging values into the equation, the higher the probability that the Colts would score from a short field, the WORSE the punting option becomes.

How can this possibly be the case? The answer is that if it was going to be such an easy proposition for the Colts to move the ball 30 yards down the field, it was not going to be that much harder, following a punt, for them to move 30 yards, then another 30, then another 10. Only if there was a significant probability that the defense could stop them within any given 30 yard chunk of field would the exponential relationship add enough equity to the punt to make it the superior option. This is a classic case of the punting fallacy: that an 40 extra yards of field position must somehow make an enormous difference to the opponent's prospects of scoring.

Solving the equation gives a break-even point when P(S) = 0.77. Only values lower than this favor the punt - and remember that we made a whole host of punt-friendly assumptions even to get here. I don't pretend to know for sure that P(S) was equal to or greater than 0.77 - if you believe that P(S) was below this threshold, and that the decision was suboptimal as a result, then more power to you. We should remember, however, that Bill Belichick was in a better position than anyone to evaluate how likely the Colts were to score from the 28 against his defense. I maintain that in any case, 0.77 is a realistic enough threshold that the decision becomes immune from rational second-guessing - even if we believe that P(S) is, say, 0.6, how can we be confident that our estimate is superior to his? Further, I would be willing to bet that 99.9999% of those who condemn the decision believe that P(S) was considerably higher than 0.77. If this is the case, there is no mathematical substance to their argument - they are merely falling back on the extremely foolish conventional wisdom that the punt is always the "safe" choice.