Saturday, November 21, 2009

Why Belichick's 4th-and-2 decision was almost certainly correct

There are many decisions from last Sunday's game vs. Indianapolis for which Bill Belichick could reasonably be second-guessed. The unconventional decision to attempt to convert a 4th and 2 from his own 28-yard line, leading by six points as the two-minute warning approached, was not one of them. Note that this is not the same as saying that the decision was necessarily correct; this assessment depends on the exact values of the various probabilities involved. No one knows these for sure. However, I will show that a) even with very conservative estimates of the probabilities, the 4th down conversion comes out looking very reasonable, and b) the particular assumption made by many of Belichick's detractors - that the Colts were all but guaranteed to score following a failed 4th down play - actually confirms that his decision was correct. Paraxodical as it seems, only by assuming that the Patriots had a good chance of stopping the Colts from the short field can one maintain a case that the punt was the better choice. This will be explained in more detail below.

We start by assigning variables to the relevant probabilities. Let's call the probability of the Patriots picking up the first down P(1), the probability of a Colts TD following a failed fourth-down play P(S), and the probability of a Colts TD following a punt P(L). Then our equation for finding the break-even point between the punt and the attempt to pick up the first down is:

P(1) + [1-P(1)] x [1-P(S)] = 1-P(L)

The left side of the equation contains the probabilities that of the Patriots winning the game following their fourth-down play: either by successfully picking up the first down [P(1)], or by failing to pick up the first down [1-P(1)] but preventing the Colts from scoring anyway [1-P(S)]. The break-even point occurs when this sum is equal to the probability of winning the game following a punt, which is essentially equal to the probability that the Colts fail to score from a longer field [1-P(L)].

Procedural note: we find the break-even point because it is easier to deal with equations than with inequalities. Once we have found the break-even point, it will be clear that deviations in one direction favor one decision, and vice versa.

This equation contains two assumptions which need to be addressed:

1) The Patriots are guaranteed to win the game if they pick up the first down. This is not quite true, but it is very close. With one time-out, Indianapolis's best-case scenario, barring a very improbable event, is getting the ball back with about 30 seconds and no time-outs, needing to drive for a TD. Following a normal punt, the chances of this - even for Peyton Manning - are truly miniscule. Their only significant prospects come from a fumble on a Patriots' running play, a blocked punt, or a long punt return. These are all very improbable events which do little to raise their win probability - especially when multiplied by the probability of converting such an event into a TD, which is still not a guarantee. Other analyses have used 92% as the Patriots' win probability figure here; this is derived from data in all similar score/down/distance situations, regardless of time-outs - a factor which makes a world of difference here. Using the 92% estimate errs on the side of caution, which is generally laudable. But here it seems like a truly gratuitous concession to the case for punting.

2) No outcomes need to be considered other than those considered in our equation. The main omission here is the chance that the Patriots win with a field goal following a Colts touchdown. This probability adds to the expected value of the fourth-down attempt, so the omission is generous to the case for punting (almost certianly by a larger margin than the 100% win probability estimate following a successful conversion helps the case for fourth-down attempt.)

To solve the equation, we need to supply values for P(1), P(S), and P(L). Since we can never be sure of the exact values, it is important to select very conservative estimates from the point of view of our ultimate conclusion. Many statistical analyses of the decision have used fourth-down conversion data to estimate that the Patriots had a 55-60% chance of success, and used this to show that the fourth down attempt had a higher expected win probability; while this may be a good estimate, it is susceptible to the following skeptical response: "I don't believe that the probability was really this high - the Patriots offense was looking shaky, the Indy defense was fired up, the five-receiver set was a poor choice, etc. etc.....therefore I reject the conclusion reached by your analysis."

Bill Simmons (The Sports Guy) provided a perfect example of this thought process in his espn.com article on the subject. He suggests that the probability was more like 1 in 3 - the rate at which offenses have successfully made two-point conversions with passes on the road in the last three years. This is obviously a ridiculously low estimate - he would have us believe that an offense featuring Tom Brady, Wes Welker and Randy Moss has 12% less chance of picking two yards on a single snap than the average NFL team does when they attempt a two-point conversion (a situation where the defense also has less field area to defend, further raising the offense's degree of difficulty.) Nonetheless, we will use his figure as our estimate of P(1), confident that none but the wilfully obstinate can now object that we are overestimating the Patriots' chance of success. This means that our equation is now:

.33 + .67 (1-P(S)) = 1-P(L)

Now we come to the crux of the entire issue: the underlying reason why so many people have misjudged this decision so badly (and why hardly an NFL game goes by where a coach or a commentator does not badly misjudge a punting decision.) How much does the Colts' probability of scoring change depending on whether they take over following a failed fourth down or following a punt? Unfortunately, there is an enduring irrationality baked into the manner in which conventional wisdom evaluates such questions, which can loosely be characterized as follows:

"Not punting is a huge risk. The other team may end up with great field position, and will probably score. Punting gives the defense a great chance to stop them."

No element of this statement is categorically false, but every element involves a stretching of the truth to the point that the ultimate conclusion is badly distorted from what is suggested by a dispassionate account of the probabilities involved. (Why have people been so inclined to accept stretching of the truth on this particular issue? I will address this in another post.) In fact, while punting does reduce the probability that the opponent will score, it does so by a much smaller margin than is commonly believed, particularly when dealing with a powerful opposing offense such as that of the Colts.

To see why, consider a league of hypothetical football, where every first down is spotted ten yards from the previous first down - i.e. if you complete a 20 yard pass on 3rd and 2, you only get to move two yards down the field. Let's stipulate also that we know empirically in this league that first downs are equally easy to convert at any point on the field - i.e. 1st and 10 on your own 30 is no more or less difficult to convert than a 1st and 10 in the red zone. So in this league, the number of first downs you need to achieve in order to score a touchdown is precisely determined by the number of yards away at the beginning of the drive. And since the probability of a first down does not change over the course of the drive, if we have an estimate of the probability that a given offense will pick up a first down against a given defense (let's call it p) then the probability of scoring from the other team's 30 is p^3, the probability of scoring from your own 30 is p^7. This means that the probability of scoring from two given points on a field is given by an exponential relationship determined by the ratio of the two distances to the end zone. The probability of scoring from your own 30 is equal to the probability of scoring from the other team's 30 raised to the power of 7/3.

What does this have to do with real football? The point is that every adjustment that we have to make in order to conform with actual football works in the direction of making a score from a longer distance MORE likely (in actual football) than predicted by the exponential model derived in hypothetical football. The primary factors are that 1) in real football, it is harder to move the ball in the red zone, as the opposing defense has less area to defend, and 2) the greater field distance affords a greater opportunity for the offense to pick up "residual yards" with each first down, thus reducing the number of first downs required for a score. (We also may need to adjust the implicit assumption in the hypothetical model that first-and-goal success probabilities conform to fractional exponents. But this adjustment - even if it works against the probability of a long-field score, which is by no means apparent - should be dwarfed in scope by the two primary factors listed above.) Therefore, the exponential approximation is a lower bound estimate for expressing the probability of a long-field score in terms of the probability of a short-field score.

In the present case, we can use this lower bound as another conservative assumption from the point of view of the case for the attempt to pick up the first down. We can also - again generously to the punting case - assume that all fourth down failures would leave the ball at the Patriots' 28-yard line, and that Chris Hanson would average a 42-yard net on a punt. This means that we now estimate that P(L) = P(S)^2.5, thereby giving us both of the Colts' scoring probabilities once we make a single judgment about how easily they would be able to move the ball against the Patriots' defense on this final drive. (This method depends on the assumption that the clock was not a significant factor, which seems to be unobjectionable. A drive requiring 70 yards in two minutes [with one time-out] for a team that is very comfortable with a no-huddle offense is hardly likely to fail because the clock hits triple zero.)

Our equation for the break-even point now becomes:

.33 + .67[1-P(S)] = [1-P(S)]^2.5

All we have done to get here is to 1) implement some assumptions that clearly err on the side of the case for punting, and 2) make use of a little mathematical manipulation in order to express the equation in terms of a single variable.

So now the million-dollar question: what is P(S)? Again, no one knows for sure - we can look at all the data we want for how often teams have scored a touchdown from 28 yards out in this situation.....but none of those teams were QBed by Peyton Manning at the peak of his powers, playing against an undermanned and worn-out defense, who clearly had no adjustments left to offer against the Colts' hurry-up offense. And the main point is this: the vast majority of those who rushed to criticize Belichick's decision almost certainly did so based on the assumption that a 4th-down failure would be a terrible outcome, leaving the Colts in great position to score. In fact, as can be easily verified by plugging values into the equation, the higher the probability that the Colts would score from a short field, the WORSE the punting option becomes.

How can this possibly be the case? The answer is that if it was going to be such an easy proposition for the Colts to move the ball 30 yards down the field, it was not going to be that much harder, following a punt, for them to move 30 yards, then another 30, then another 10. Only if there was a significant probability that the defense could stop them within any given 30 yard chunk of field would the exponential relationship add enough equity to the punt to make it the superior option. This is a classic case of the punting fallacy: that an 40 extra yards of field position must somehow make an enormous difference to the opponent's prospects of scoring.

Solving the equation gives a break-even point when P(S) = 0.77. Only values lower than this favor the punt - and remember that we made a whole host of punt-friendly assumptions even to get here. I don't pretend to know for sure that P(S) was equal to or greater than 0.77 - if you believe that P(S) was below this threshold, and that the decision was suboptimal as a result, then more power to you. We should remember, however, that Bill Belichick was in a better position than anyone to evaluate how likely the Colts were to score from the 28 against his defense. I maintain that in any case, 0.77 is a realistic enough threshold that the decision becomes immune from rational second-guessing - even if we believe that P(S) is, say, 0.6, how can we be confident that our estimate is superior to his? Further, I would be willing to bet that 99.9999% of those who condemn the decision believe that P(S) was considerably higher than 0.77. If this is the case, there is no mathematical substance to their argument - they are merely falling back on the extremely foolish conventional wisdom that the punt is always the "safe" choice.

No comments: