<< PECOTA Home Player Search     
PECOTA Hitting Forecast Glossary and Reference

2004 Forecast

2004 Forecast is a representation of the hitter's expected performance in the upcoming season at various levels of probability. For example, if a hitter's 75th percentile EQA forecast is .296, this indicates that he has a 75% chance to post an EQA less than or equal to .296, and a 25% chance to post an EQA better than .296. Higher percentiles indicate more favorable outcomes.

PECOTA runs a series of regressions within the set of comparable data in order to estimate how changes in peripheral statistics are related to changes in equivalent runs. For example, if it first estimates that Pat Burrell will produce 100 equivalent runs next year, it then tries to determine what home run total, walk total, and so on are most likely to be associated with a 100 run season.

A player's 2004 numbers are adjusted to the park and league context associated with the team listed at the top of the forecast page. Park factors are based on a three-year average over the period 2001-2004, except for teams that have changed ballparks.

PECOTA forecasts playing time (plate appearances) in addition to a player's rate statistics. These forecasts are based on a player's previous record of performance, and the comparable player data, and do not incorporate any additional information about managerial decisions.

Attrition Rate

Attrition Rate is the percent chance that a hitter's plate appearances will decrease by at least 50% relative to his Baseline. Although it is generally a good indicator of the risk of injury, attrition rate will also capture seasons in which his playing time decreases due to poor performance or managerial decisions.

Baseline

The Baseline forecast, although it does not appear here, is a crucial intermediate step in creating a hitter's forecast. The Baseline developed based on the hitter's previous three seasons of performance. Both major league and (translated) minor league performances are considered.

The Baseline forecast is also significant in that it attempts to remove luck from a forecast line. For example, a player who hit .310, but with a poor batting eye and unimpressive speed indicators, is probably not really a .310 hitter; it's more likely that he's a .290 hitter who had a few balls bounce his way; the Baseline attempts to correct for this.

Batting Average

Batting Average (BA) is one of five primary production metrics used in identifying a hitter's comparables. It is defined as H/AB.

Breakout Rate

Breakout Rate is the percent chance that a hitter's EQR/PA will improve by at least 20% relative to the weighted average of his EQR/PA in his three previous seasons of performance. High breakout rates are indicative of upside risk.

Breakout rates measure change relative to a player's previously-established level of performance. For this reason, a high Breakout score can create a falsely optimistic picture for a player who has a very poor performance record. It is far easier for a player with a baseline of 40 EQR per season to improve upon that figure by 20% than it is for a player with a baseline of 100 EQR per season; as a result, his Breakout score is likely to be higher (see also Ugueto Effect).

Collapse Rate

Collapse Rate is the percent chance that a hitter's EQR/PA will decrease by at least 20% relative to the weighted average of his ERA/PA in his three previous seasons of performance. High collapse rates are indicative of downside risk.

Comparable Players

Comparable Players are the backbone of a hitter's PECOTA. Only the twenty best comparables are listed on the player card, but as many as 100 players may be used in the generation of his forecast, provided they are sufficiently comparable.

PECOTA compares each hitter against a database of roughly 15,000 major league batter seasons since World War II. In addition, it also draws upon a database of roughly 5,000 translated minor league seasons (1998-2002) for hitters who spent most of their previous season in the minor leagues. (When minor league comparables are used, they appear in ALL CAPS).

PECOTA considers four broad categories of attributes in determining a hitter's comparability:

  • Production metrics--in particular, batting average, isolated power, unintentional walk rate, strikeout rate, and a modified version of the Bill James speed score.

  • Usage metrics, including career length and plate appearances.

  • Phenotypic attributes, including handedness, height and weight.

  • Fielding Position. PECOTA doesn't require that a comparable hitter play the same defensive position; it is a factor that is evaluated along with many others, and assigned a relatively substantial weight. Consideration is also given to the 'similarity' between two positions; for example, a shortstop will be compared to a second baseman before he is compared to a left fielder. (See additional discussion).

In most cases, the database is large enough to provide a meaningfully large set of comparables. When it isn't, the program is designed to 'cheat' by expanding its tolerance for dissimilar players until a reasonable sample size is reached. In the case of very old or very young hitters, there may not be a significant number of hitters who played at that age, and so the results of their forecast may be less reliable.

Comparable Year

Comparable Year represents the season analogous for 2004 for a comparable hitter. For example, if Dick Allen is listed as a comparable, and year listed next to his name is 1974, Allen's 1974 is used as a component of the player's forecast. It also indicates that Allen's Baseline performance entering into the 1974 season was similar to the Baseline performance of the player in question.

PECOTA constructs a 182-day interval on either side of a player's birthdate in order to match ages; this method is more precise than the Bill James similarity scores, which use a player's age as of July 1.

Diagnostics

Diagnostics are a series of metrics designed to estimate the probability of certain types of changes in production and playing time; see the individual entries for additional detail.

Drop Rate

Drop Rate is the percent chance that a player will not receive any major league plate appearances in a given season, based on comparables who disappear from the dataset entirely. Because of the conventions PECOTA uses in selecting comparables, the Drop Rate is always assumed to be zero for 2004, but it is an important consideration in a hitter's Five-Year Forecast.

EqBA, EqOBP, EqSLG and EqMLVR

EqBA, EqOBP, EqSLG and EqMLVR are calibrated to an ideal major league with the following characteristics:


EqBA = .270
EqOBP = .340
EqSLG = .440
EqMLVR = .000

Note that these levels of performance are slightly higher than 2002 league averages.

While a major league hitter's equivalent stats should not differ substantially from his actual numbers, a minor league hitter's equivalent stats undergo translation and may differ significantly. Equivalent stats also account for park effects.

EQA

EQA, or Equivalent Average, is a metric developed by Clay Davenport to measure a hitter's overall value on a scale similar to batting average. EQA considers batting as well as baserunning, but not the value of a position player's defense. The league average EQA is equal to .260. Additional information on the derivation of EQA can be found in the player cards glossary.

EQA Distribution

The EQA Distribution chart displays a hitter's EQA forecast at various levels of probability (see discussion). It progresses in sequential intervals of five percentage points, ranging from a hitter's 95th percentile forecast on the left, to his 5th percentile forecast on the right.

In addition to the probability distribution for a given hitter, which appears in blue, the chart also includes a normal distribution on EQA for all hitters in the league ("Norm"), and a dashed line representing the performance of a replacement level hitter ("Replace") at his position.

EQR/PA

EQR/PA, or Equivalent Runs per Plate Appearance, is used in PECOTA's internal calculations to calibrate a hitter's batting and baserunning outcomes (2B, HR, SB, etc.) with his overall offensive value. Additional information on the derivation of Equivalent Runs can be found in the player cards glossary.

Five-Year Forecast

The Five-Year Forecast presents a series of high-level measurements designed to analyze a player's value over the forthcoming five seasons. It is derived from the same set of comparables used to generate his 2004 forecast, and assumes that the player remains with the same team, in the same league, and plays the same position over the entire five-year period. The Five-Year Forecast consists of three parts:

  • The Five-Year Value forecast measures a player's wins above replacement. As time progresses, certain of the player's comparables will drop from the dataset entirely. In some cases, this is the result of a comparable player not yet having appeared in the comparable year in question; for example, a hitter with a comparable year of 2001 will not yet have completed all five seasons that would be used in the evaluation. These players are dropped from the average for the season in question without any prejudicial effect. In other cases, a hitter has completed his comparable year, but did not record any major league plate appearances as a result of injury, retirement, demotion, and so on. These players are retained in the wins above replacement calculation, but are assigned a value of zero. (These comparables also contribute to a player's Drop Rate). Because of this convenient method for handling comparables who disappear from the dataset, the Five-Year Value forecast is the best way to evaluate a player's value going forward.

    The five-year value forecasts also displays a pitcher's wins above replacement in the three most recently completed seasons (2001-2003). A player is only assigned value for playing time accumulated at the major league level.

  • The Five-Year Performance forecast measures a hitter's EQA at various percentiles over the course of the next five seasons. Unlike the Value forecast, the Performance forecast has no convenient way to adjust for dropped comparables, and so it simply ignores them. For this reason, the Performance forecast may be unreliable for players whose comparables have a high attrition rate.

    Note also that the Performance forecast displays a player's EQA in the three most recently completed seasons, including performance at the minor league level.

  • The Five-Year Attrition forecast measures a player's Attrition Rate and Drop Rate over the forthcoming five seasons. These forecasts consider only players who have completed the comparable year in question.

Historical Stats

Historical Stats are the player's previous three seasons of performance as they appear in the BP book. MLVr and all columns to the left of it are raw statistics, while EqBA and all columns to the right of it are translated statistics.

Improvement Rate

Improvement Rate is the percent chance that a hitter's EQR/PA will improve at all relative the weighted average of his ERA/PA in his three previous seasons of performance. A hitter who is expected to perform just the same as he has in the past will have an Improvement Rating of 50%.

Isolated Power

Isolated Power (ISO) is one of five primary production metrics used in identifying a hitter's comparables. PECOTA uses a slightly modified version of Isolated Power that assigns the same value to triples as to doubles (extending a double into a triple is generally an indicator of speed, rather than additional power). Thus, the formula for isolated power as follows:

ISO = (2B + 3B + HR*3) / AB

MLVr

MLVr is a rate-based version of Marginal Lineup Value (MLV), a measure of offensive production created by David Tate and further developed by Keith Woolner. MLV is an estimate of the additional number of runs a given player will contribute to a lineup that otherwise consists of average offensive performers. MLVr is approximately equal to MLV per game. The league average MLVr is zero (0.000). Additional information on MLV and MLVr can be found here.

Percentile

Percentile. See 2004 Forecast.

Player Profile

The Player Profile is a chart that evaluates a given hitter's primary production metrics (batting average, isolated power, unintentional walk rate, strikeout rate, and speed score) as a percentile compared to all major league hitters. For example, a player with an isolated power rating of 75% is superior in this category to three-quarters of all major leaguers. The player profile is based on the player's three previous seasons of performance, rather than his projection.

Position

A player's Position is a consideration in identifying his comparables, as well as in calculating his VORP. The player's primary position as used by PECOTA is listed at the top of his forecast page; however, secondary and tertiary positions are also considered based on the relative amount of appearances that a player receives there. The position determination is made primarily based on the position(s) that a player appeared in the previous season (2002), with lesser consideration given to the position(s) he appeared in during seasons prior to last year (2001 and 2000). Both major league and minor league defensive appearances are considered in the determination of a player's position, but major league appearances are weighted more heavily. PECOTA considers LF, CF and RF to be separate positions.

Similarity Index

Similarity Index is a composite of the similarity scores of all of a player's comparables. Similarity index is an gauge of the player's historical uniqueness; a player with a score of 50 or higher has a very common typology, while a player with a score of 20 or lower is historically unusual. For players with a very low similarity index, PECOTA expands its tolerance for dissimilar comparables until a meaningful sample size is established (see discussion).

Similarity Score

Similarity Score is a relative measure of a player's comparability. Its scale is very different from the Bill James similarity scores; a score of 100 is assigned to a perfect comparable, while a score of 0 represents a player who is meaningfully similar. Players can and frequently do receive negative similarity scores, and they are dropped from the analysis. A score above 50 indicates that a player is substantially comparable, and scores in excess of 70 are very unusual. The comparable player observations are weighted based on their similarity score in constructing a forecast.

Speed Score

Speed Score (SPD) is one of five primary production metrics used in identifying a hitter's comparables. It is based in principle on the Bill James speed score and includes five components:

  • Stolen base percentage.
  • Stolen base attempts as a percentage of times on first base.
  • Triples.
  • Double plays grounded into.
  • Runs scored as a percentage of times on base.

Strikeout Rate

Strikeout Rate (K) is one of five primary production metrics used in identifying a hitter's comparables. It is defined as SO/PA.

Trend

Trend identifies players who demonstrate dramatic changes from their Baseline during their comparable year. Trend is designed to correspond to a player's Breakout and Collapse scores. Players who improve their EQR/PA by at least 20% are identified by a green, upward-pointing arrow and contribute to a player's Breakout score; players whose EQR/PA decreases by at least 20% are identified by a red, downward-pointing arrow and contribute to a hitter's Collapse score.

Ugueto Effect

The Ugueto Effect is name given to the phenomenon in which very poor players are associated with very high Breakout scores. It is far easier for a player like Luis Ugueto, who would produce about 40 EQR over a full season, to improve upon that figure by 20% than it is for Alex Rodriguez; as a result, his Breakout score is likely to be higher. This does not mean that Ugueto is a player you'd want anywhere near your roster.

Unintentional Walk Rate

Unintentional Walk Rate (BB) is one of five primary production metrics used in identifying a hitter's comparables. It is defined as (BB-IBB)/PA.

Value Distribution

The Value Distribution chart plots a players's wins above replacement at various levels of probability (see discussion). It accounts for both the quantity and the quality of his expected performance.

VORP

VORP, created by Keith Woolner, is an estimate of a hitter's value over and above a replacement-level hitter at his position, as measured in runs. An extensive description of the derivation of VORP can be found here. Because it accounts for both quantity and quality of a player's performance, it is the single best measure for assessing his value. VORP scores do not consider the quality of a player's defense. The replacement levels for individual positions are assumed by PECOTA to be the same as they were in 2002. Specifically, they are set equal to the following:


          BA     OBP     SLG     EQA
C       .230    .292    .359    .222
1B      .228    .315    .423    .251
2B      .237    .297    .361    .225
3B      .227    .294    .387    .232
SS      .234    .290    .366    .224
LF      .237    .318    .421    .252
CF      .231    .297    .387    .233
RF      .241    .320    .429    .254

Designated hitters are assigned the same replacement level as first basemen.

Weighted Mean

The Weighted Mean forecast incorporates all of the player's potential outcomes into a single average (see also 2004 Forecast), with an additional adjustment for playing time. In almost all cases, poor performances are associated with a reduced number of plate appearances. For that reason, they don't hurt a player's team quite as much as good performances help it; the weighting is designed to compensate for this effect.

Wins

Wins is a conversion of a hitter's VORP to wins added over a 162-game season, based on a version of the Pythagorean formula.


Baseball Prospectus Home  |  Terms of Service  |  Privacy Policy  |  Contact Us
Copyright © 1996-2004 Prospectus Entertainment Ventures, LLC.