home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Total Baseball (1994 Edition)
/
Total_Baseball_1994_Edition_Creative_Multimedia_1994.iso
/
dp
/
0019
/
00190.txt
next >
Wrap
Text File
|
1994-02-14
|
113KB
|
1,614 lines
$Unique_ID{BAS00190}
$Pretitle{}
$Title{Statistics: Introduction}
$Subtitle{}
$Author{}
$Subject{Registers Leaders Rosters Introduction statistic Statistics
statistical stat stats Chadwick Origins Flowering Golden Computer Age
Batting Base Stealing Fielding Pitching Errors Controversies Sources Missing
Incomplete data}
$Log{}
Total Baseball: Registers, Leaders, and Rosters
Statistics: Introduction
Part Two, the statistical section of Total Baseball, presents the record of
major league contests played from 1871 through 1993--all 158,982 of them. It
details the accomplishments of the game's 2,153 teams and 14,052 players more
completely and more accurately than any other encyclopedic work; it applies to
all of baseball's glorious past the "sabermetric" stats that fans first
embraced in the 1980s; it introduces original measures of player performance.
Yet for all its innovation, Total Baseball stands squarely in the tradition of
baseball record keeping; it is--like each new spring of our national
pastime--a link in a long, long chain. As the game of one hundred and fifty
years ago lives on in the game of today, so is this volume enriched by the
labors of statisticians from Henry Chadwick to Ernie Lanigan, from S.C.
Thompson to David Neft to Bill James.
The Origins, 1845-1875
In fact, baseball and stats were a tandem from the outset of the game's
history, as the editors of this volume first discussed in their earlier Hidden
Game of Baseball (1984), from which portions of this introduction are adapted.
The first box score appeared in the New York Morning News on October 22, 1845,
just a month after Alexander Cartwright and his Knickerbocker teammates
codified the first set of rules. Why did these early players and scribes
measure individual performance rather than simply count the score? In part to
imitate the custom of cricket; yet the larger explanation is that the numbers
served to legitimize men's concern with a boys' pastime. The pioneers of
baseball reporting--William Cauldwell of the Sunday Mercury, William Porter of
Spirit of the Times, the unknown annalist at the News, and later Henry
Chadwick--may indeed have reflected that if they did not cloak the game in the
"importance" of statistics, it might not seem worthwhile for adults to read
about, let alone play. Statistics elevated baseball from other boys' field
games of the 1840s and '50s to make it somehow "serious", like business; its
essential simplicity was adorned with intricate detail that suited it
perfectly to quantification.
In the development of baseball statistics, no man is more important than
Father Chadwick. Born in England in 1824, he came to these shores at age
thirteen steeped in the tradition of cricket. In his teens he played the
English game and in his twenties he reported on it for a variety of
newspapers, including the Long Island Star and the New York Times. In the
early 1840s, before the Knickerbocker rules eliminated the practice of
retiring a base runner by throwing the ball at him rather than to the base,
Chadwick occasionally played baseball too, but he was not favorably impressed,
having received "some hard hits in the ribs." Not until 1856, by which time
he had been a cricket reporter for a decade, were Chadwick's eyes opened to
the possibilities in the American game, which had improved dramatically since
his youth. In 1868 he recalled, "On returning from the early close of a
cricket match on Fox Hill, I chanced to go through the Elysian Fields during
the progress of a contest between the noted Eagle and Gotham clubs. The game
was being sharply played on both sides, and I watched it with deeper interest
than any previous ball game between clubs that I had seen. It was not long
before I was struck with the idea that baseball was just the game for a
national sport for Americans . . . as much so as cricket in England. At the
time I refer to I had been reporting cricket for years, and, in my method of
taking notes of contests, I had a plan peculiarly my own. It was not long,
therefore, after I had become interested in baseball, before I began to invent
a method of giving detailed reports of leading contests at baseball . . ."
Thus Chadwick's cricket background was largely the impetus to his method
of scoring a baseball game, the format of his early box scores, and the
copious if primitive statistics that appeared in his year-end summaries in the
New York Clipper, Beadle's Dime Base-Ball Player, and other publications.
Actually, cricket had begun to shape baseball statistics even before
Chadwick's conversion. The first box score reported on two categories, outs
and runs: outs, or "hands out," counted both unsuccessful times at bat and
outs run into on the basepaths; "runs" were runs scored, not those driven in.
The reason for not recording hits in the early years, when coverage of
baseball matches appeared alongside that of cricket matches, was that, unlike
baseball, cricket had no such category as the successful hit which did not
produce a run. To reach "base" in cricket is to run to the opposite wicket,
which tallies a run; if you hit the ball and do not score a run, you have been
put out.
Cricket box scores were virtual play-by-plays, a fact made possible by
the lesser number of possible events. This play-by-play aspect was applied to
a baseball box score as early as 1856; interestingly, despite the abundance of
detail, hits were not accounted, nor did they appear in Chadwick's own box
scores until 1867. The batting champion as declared by Chadwick, whose
computations were immediately and universally accepted as "official," was the
man with the highest average of runs per game. An inverse though imprecise
measure of batting quality was outs per game. After 1863, when a fair ball
caught on one bounce was no longer an out, fielding leaders were those with
the greatest total of fly catches, assists, and "foul bounds" (fouls caught on
one bounce). Pitching effectiveness was based purely on control, with the
leader recognized as the one whose delivery offered the most opportunities for
outs at first base and led to the fewest passed balls.
In a sense, Chadwick's measuring of baseball as if it were cricket can be
viewed as correct in that when you strip the game to its basic elements, those
that determine victory or defeat, outs and runs are all that count in the end.
No individual statistic is meaningful to the team unless it relates directly
to the scoring of runs. Chadwick's blind spot in his early years of baseball
reporting lay in not recognizing the linear character of the game, the
sequential nature whereby a string of base hits or men reaching base on error
(there were no walks then) was necessary in most cases to produce a run. In
cricket each successful hit must produce at least one run, while in baseball,
more of a team game on offense, a successful hit may produce none.
Early player stats were of the most primitive kind, the counting kind.
They'd tell you how many runs, or outs, or fly catches had occurred--later,
how many hits or total bases. Counting is the most basic of all statistical
processes; the next step up is averaging, and Chadwick was the first to put
this into practice.
As professionalism infiltrated the game, teams began to bid for
star-caliber players. Stars were known not by their stats but by their style
until 1865, when Chadwick began to record in the Clipper a form of batting
average taken from the cricket pages--runs per game. Two years later, in his
newly founded baseball weekly, The Ball Players' Chronicle, he began to record
not only average runs and outs per game, but also home runs, total bases,
total bases per game--and hits per game. The averages were expressed not with
decimal places but in the standard cricket format of the "average and over."
Thus a batter with 23 hits in 6 games would have an average expressed not as
3.83 but as "3-5"--an average of 3 with an overage, or remainder, of 5.
Another innovation was to remove from the individual accounting all bases
gained through errors. Runs scored by a team, beginning in 1867, were divided
between those scored after a man reached base on a clean hit and those arising
from a runner's having reached base on an error. This was, of course, a
precursor of today's earned run average.
In 1868, despite Chadwick's derision, the Clipper continued to award the
prize for the batting championship to the player with the greatest average of
runs per game. Actually, the old yardstick had been less preposterous a
measure of batsmanship than one might imagine today, because team defense was
so much poorer and the pitcher, with severe restrictions on his method of
delivery, was so much less important. If you reached first base, whether by a
hit or by an error, your chances of scoring were excellent; indeed, teams of
the mid-1860s registered more runs than hits! By 1876, the caliber of both
pitching and defense had improved to the extent that the ratio of runs to hits
was about 6.5 to 10; today the ratio stands at roughly 5 to 10.
By the end of the decade Chadwick was recording total bases and home
runs, but he placed little stock in either, as conscious attempts at slugging
violated his cricket-bred image of "form". Just as cricket aficionados watch
the game for the many opportunities for fine fielding it affords, so was
baseball from its inception perceived as a fielders' sport. The original
Cartwright rules of 1845, in fact, specified that a ball hit out of the
field--in fair territory or foul--was a foul ball! "Long hits are showy,"
Chadwick wrote in the Clipper in 1868, "but they do not pay in the long run.
Sharp grounders insuring the first-base certain, and sometimes the second-base
easily, are worth all the hits made for home-runs which players strive for."
Chadwick prevailed, and hits per game became the criterion for the
Clipper batting championship and remained so until 1876, when the problem with
using games as the denominator in the average at last became clear. If you
were playing for a successful team, and thus were surrounded by good batters,
or if your team played several weak rivals who committed many errors, the
number of at-bats for each individual in that lineup would increase. The more
at-bats one is granted in a game, the more hits one is likely to have. So, for
example, if Player A had 10 at-bats in a game, which was not so unusual in the
1860s, he might have 4 base hits. In a more cleanly played game, Player B
might bat only 6 times, and get 3 base hits. Yet Player A, with his 4-for-10,
would achieve an average of 4.00; the average of Player B, who went 3-for-6,
would be only 3.00. By modern standards, of course, Player A would be batting
.400 while Player B would be batting .500.
In short, the batting average used in the 1860s is the same as that used
today except in its denominator, with at-bats replacing games. Moreover,
Chadwick created a measure in the 1860s that divided total bases by games
played; change the denominator to at-bats and you have today's slugging
average--which, incidentally, was not accepted by the National League as an
official statistic until 1923 and by the American until 1946 (baseball was
born and bred conservative).
Chadwick's "total bases average" represents the game's first attempt at a
weighted average--an average in which the elements collected together in the
numerator or the denominator are recognized numerically as being unequal. In
this instance, a single is the unweighted unit, the double is weighted by a
factor of two, the triple by three, and the home run by four. Statistically,
this is a distinct leap forward from, first, counting, and next, averaging.
The weighted average is in fact the cornerstone of today's statistical
innovations, or "sabermetrics."
The 1870s gave rise to some new batting stats and to the first attempt to
quantify thoroughly the other principal facets of the game, pitching and
fielding. Although the Clipper recorded base hits and total bases as early as
1868, a significant wrinkle was added in 1870 when at-bats were listed as
well. This was a critical introduction because it permitted the improvement of
the batting average, first introduced in its current form by H.A. Dobson of
Washington, D.C., in the Dime Base-Ball Player of 1872, and first computed
officially--that is, for the National League--in 1876. Since then the batting
average has not changed, except for 1876, when bases on balls were figured as
outs, and 1887, when they were counted as hits. Total Baseball counts a walk
as neither an at-bat nor an out for all years since 1871.
The objections to the batting average are well known, but to date have
not dislodged it from its place as the most popular measure of hitting
ability. First of all, the batting average makes no distinction between the
single, the double, the triple, and the home run, treating all as the same
unit. This objection had been addressed in 1868 by Chadwick's total bases
average.
Second, it gives no indication of the effect of that base hit--that is,
its value to the team. This was the reason Chadwick clung to runs per game as
the best possible batting measure. Third, the batting average does not take
into account those occasions when first base is reached via a walk, hit by
pitch, or error. This last point was addressed at a surprisingly early date,
too, as for 1879 the National League adopted as an official statistic a
forerunner of the on-base percentage; it was called "reached first base,"
which included times reached by error as well as base on balls and base hits.
(Being hit by a pitch did not give the batter first base until 1884 in the
American Association, 1887 in the National League.)
The Flowering, 1876-1920
Ever since the Civil War, serial guides like Beadle and DeWitt and
sporting columns like those in the Clipper had carried year-end tabulations of
batting, fielding, and pitching exploits, varying from year to year with the
brainstorms of Chadwick or other demon compilers like New York's M.J. Kelly or
Philadelphia's Al Wright. But the year 1876 was special. It was significant
not only for the founding of the National League and the official debut of the
batting average in its current form, it was also the Centennial of the United
States, which was marked by a giant exposition in Philadelphia celebrating the
mechanical marvels of the day. American ingenuity reigned, and technology was
seen as the new handmaiden of democracy. Baseball, that mirror of American
life, reflected the fervor for things scientific with an explosion of
statistics far more complex than those seen before, particularly in the
previously neglected areas of pitching and fielding. The increasingly minute
statistical examination of the game met a responsive audience, one primed to
view complexity as a measure of worth.
The crossroads year of 1876 highlights how the game had changed to that
point, as well as how it has changed since.
In that year, the number of offensive stats tabulated at season's end in
any of the publications inspired by Chadwick or Spalding was six: games,
at-bats, runs, hits, runs per game, and batting average. (And as with all the
various guides until 1941, the stats of men who played in fewer than a
specified minimum number of games were not noted.) Of these six, only runs
and runs per game were common in the 1860s, while that decade's tabulation of
total bases vanished. The number of offensive stats a hundred years later?
Twenty. (Today the number is twenty-one, with the addition of on base
percentage).
The number of pitching categories in 1876 was eleven, and there were some
surprises, such as earned run average, hits allowed, hits per game, and
opponents' batting average. Strikeouts were not recorded, for Chadwick saw
them strictly as a sign of poor batting rather than good pitching (his view
had such an impact that pitcher strikeouts were not kept officially until
1889). The number of pitching stats today? Twenty-four.
The number of fielding categories in 1876 was six. One hundred years
later it was still six (with the exception of the catcher, who gets a seventh:
passed balls), dramatizing how the game, which originated as a showcase for
fielders, had changed. The fielding stats of 1876 lumped "battery errors" with
fielding errors, so that wild pitches and passed balls--in some years, even
walks--diminished one's fielding percentage. This practice continued until
1887, but in Total Baseball battery errors are not included in fielding stats.
Battery-mates' fielding stats were boosted by the awarding of an assist to the
pitcher on strikeouts. This practice lasted until 1889, but is not reflected
in Total Baseball.
The custom in 1876, as it is now, was to combine putouts, assists, and
errors to form a "percentage of chances accepted," or what is today known as
fielding average or fielding percentage. A "missing link" variant, devised by
Al Wright in 1875, was to form averages by dividing the putouts by the number
of games to yield a "putout average"; dividing the assists similarly to arrive
at an "assist average"; and dividing putouts plus assists by games to get
"fielding average." These averages took no account of errors. (Wright's
"fielding average" was reborn a century later as Bill James' Range Factor.)
The public's appetite for new statistics was not sated by the outburst of
1876. New measures were introduced in dizzying profusion in the remaining
years of the century. Some of these did not catch on and were soon dropped for
all time, like the ridiculous "total bases run," while others fizzled only to
reappear with new vigor in the twentieth century. These include (a) the
above-mentioned "reached first base," which resurfaced in the early 1950s in
an unofficial, improved form called on base percentage and became an official
stat more than thirty years later, and (b) an 1860s stat, earned run average,
which was periodically revived before dropping from sight in the 1880s, only
to return triumphant to the NL in 1912 and the AL in 1913. In 1913 Ban Johnson
not only proclaimed the ERA official but became so enamored with it that he
also instructed American League scorers to compile no official won-lost
records (this state of affairs lasted for seven years, 1913-1919).
Another stat that was "sent back to the minors" before its eventual
adoption as an official stat in 1920 was the run batted in. Introduced by a
Buffalo newspaper in 1879, the stat was picked up the following year by the
Chicago Tribune and even became an official NL stat for the opening months of
1891. By season's end it had faded as most NL scorers declined to account for
it in their summaries (The American Association, however, recorded it all year
long.) Ernie Lanigan picked up the RBI baton with his reports to the New York
Press in 1907, but only about a third of his data has been found, and he did
not figure RBIs for men who played in fewer than ten games, or club totals for
traded players. For Total Baseball we have placed much reliance upon the
source material donated by Information Concepts, Inc. (ICI) to the National
Baseball Library in Cooperstown following publication of its Baseball
Encyclopedia for Macmillan in 1969. David Neft also kindly supplied us with
his unpublished RBI data for the previously missing National League seasons of
1880-1885. The John Tattersall collection of nineteenth century game accounts
and box scores was valuable as well.
Other statistics introduced officially before the turn of the century
were stolen bases (though not caught stealing); doubles, triples, and homers;
and sacrifice bunts (though an at-bat was charged from 1889 through 1893).
Pitcher strikeouts, bases on balls, and the hit-by-pitch also appeared before
1900, but hit-by-pitch stats were not kept for batters on a systematic basis
until 1917 in the NL and 1920 in the AL. Through newspaper research, we have
filled in HBP data from 1884 through 1916 in the National League, Players
League, and American Association, from 1901 through 1919 in the American
League, and the 1914-1915 Federal League.
Hit into double play--including line outs as well as groundouts--was
recorded erratically in the nineteenth century, but separate stats for
groundouts into double plays have been kept by the leagues only since 1933 in
the NL and 1939 in the AL. Batters' strikeouts were reported unofficially in
1891, but not as a league stat until 1910 in the NL and 1913 in the AL.
Innings pitched were not kept until 1908 in the AL and 1903 in the NL.
Stolen bases were awarded not only for clean steals but also for extra
bases taken through daring, from the first year in which totals were kept,
1886, until 1898 (the Macmillan Baseball Encyclopedia begins its record of
stolen bases with 1887). Because the figures reported in the guides were
grossly inflated (such as Harry Stovey's ostensible 156 steals in 1888), the
figures in Total Baseball reflect game-by-game research and refiguring.
Caught-stealing (CS) figures are available on a very sketchy basis in some of
the later years of the century, as some newspapers carried the data in the box
scores of hometown games. From 1912 on, Lanigan recorded CS in box scores of
the New York Press, but the leagues did not keep the figure officially until
1920. The AL has tabulated CS from that year to the present, excepting 1927,
which members of the Society for American Baseball Research reconstructed from
newspaper box scores. National League caught-stealing data exists for
1920-1925, and for 1951 to the present.
The new century added little in the way of new official statistics--ERA,
RBI, and slugging average are better regarded as revivals despite their
respective adoption dates of 1912, 1920, and 1923. But back in 1908 there was
a classic case of a statistic rushing in to fill a void, as Phillies' manager
Billy Murray observed that his outfielder Sherry Magee had the happy facility
of providing a long fly ball whenever presented with a situation of a man on
third and fewer than two outs. Taking up the cudgels on his player's behalf,
Murray protested to the National League office that it was unfair to charge
Magee with an unsuccessful time at bat when he was in fact succeeding, doing
precisely what the situation demanded.
Murray won his point, but baseball flip-flopped a couple of times on this
stat, in some years reverting to calling it a time at bat, in other years not
even crediting an RBI. The sacrifice-fly rule was in effect from 1908 through
1930, with a sacrifice being given for advancing any runner, not just to home,
for the final four years of this period. The rule was revived for one year in
1939. In none of these years was a distinction made between a sacrifice bunt
or fly. When the rule came back into force in 1954, there was a breakdown of
each.
More recent stats that have followed from this sort of perception--that
something important was occurring on the field which had no verifiable reality
because it was not yet being measured--are the save and the late, lamented
game-winning RBI, which will be discussed later.
A signal event took place in 1912: the publication by Baseball Magazine
editor John Lawres of Who's Who in Baseball, a small book that became the
first to provide career statistics and personal facts for a group of players.
Although thoroughly inadequate by today's standards--its only tabulations were
games, batting average, and fielding average (even for pitchers, who were
given no mound records!)--Who's Who was a groundbreaking work, giving rise to
a much-expanded format in 1916 and inspiring two other significant
encyclopedic works: in 1914, George Moreland's self-published opus called
Balldom (grandiosely subtitled "The Britannica of Baseball," which it surely
wasn't), and Ernest J. Lanigan's Baseball Cyclopedia, also sponsored by
Baseball Magazine, which debuted in 1922 and was updated annually through
1933.
The Golden Age, 1920-1968
There have been other new statistical tabulations in this century, but
generally of the counting sort: complete games (NL 1910, AL 1922), games
started (AL 1926, NL 1938), games finished (NL 1920, AL 1926). And there were
sacrifice bunts allowed (NL 1916, AL 1922), intentional bases on balls (only
since 1955), and, in the next period, saves (1969) and game-winning RBIs
(1980). The only new average since slugging average was adopted in 1923 has
been the on base percentage, adopted in 1985. The ICI group computed saves for
prior years. Another such stat that failed to survive, alas, was stolen bases
off pitchers, which the American League recorded only in 1920-1924; it has
been recorded on an unofficial basis in the 1980s by the Elias Sports Bureau
and Project Scoresheet. The only new fielding measure was team double plays,
added to the AL list in 1912 and the NL in 1919. Other new and more
interesting stats appeared in the 1940s and '50s but have not yet gained the
official stamp of approval, such as Ted Oliver's Weighted Rating System,
Alfred P. Berry's Average Bases Allowed (opponents' slugging average), and
Branch Rickey and Allan Roth's Isolated Power.
This period of baseball's history may have fielded its most dazzling
array of stars, but strategically and statistically it was pretty dim. There
was some excitement, however, in baseball record keeping. First came
Daguerreotypes, issued by The Sporting News in 1934, featuring the playing
records of many retired players both celebrated and obscure; most if not all
of these statistical and biographical profiles originally appeared in the
pages of TSN. Although its number of statistical categories was fewer than one
might have wished, Daguerreotypes was very useful and, through its several
editions ably edited by Paul MacFarlane, long-lived.
In 1940 came The Sporting News' Baseball Register, which supplied full
records for active players, managers, coaches, and umpires, plus a grab bag of
former stars. Since the expansion of the major leagues from sixteen teams to
twenty-six, the Register has only accommodated contemporary players and
managers, but it remains a valuable source. One year later, TSN issued a
notable edition of its Official Baseball Record Book, giving for the first
time full statistical lines for all men who played in a major league game the
previous year.
In 1944 a little-known man named Ted Oliver published in obscurity a
booklet called Kings of the Mound. It introduced a new stat called the
Weighted Rating System for pitchers, a stat that we modified and continue to
employ as Wins Above Team. Moved by the inadequacy of both the won-lost
percentage and the ERA to reflect the value of a decent pitcher laboring for a
lousy club, Oliver ingeniously subtracted the pitcher's decisions from his
team's, then took the difference between the pitcher's won-lost percentage and
his team's and multiplied that difference by the pitcher's number of
decisions. Although his concept and his math were flawed, his
principles--viewing a pitcher's record in relation to his team and weighting
the result of his calculation by the number of decisions--were of unparalleled
sophistication for the time. (Oliver's formula for his Weighted Rating System
and its modification in Total Baseball are detailed in the Glossary, as are
the calculations behind every statistic employed in this book.)
Then in 1951 came the first true encyclopedia of baseball, the claims of
Moreland and Lanigan notwithstanding. Compiled by Hy Turkin and S.C. Thompson,
The Official Encyclopedia of Baseball was published by the A.S. Barnes
Company. Its 620 pages contained a wealth of features such as manager and
umpire rosters, historical essays, playing tips, a bibliography, and much
more. But the heart of the volume and the key to its subsequent success was a
register of nearly nine thousand men who played one or more games at the major
league level from 1871 through 1949 (the 1950 record of players appearing in
ten games or more was tacked on to the end). In this register, Turkin/
Thompson also offered birth and death data and what today seems fairly limited
statistical information but by previous standards was a veritable cornucopia:
year, club, league, position, games, and batting average or won-lost record. A
landmark volume that did much to inspire this one, The Official Encyclopedia
of Baseball lasted through ten revised editions, the last being published in
1979, ten years after the initial appearance of Macmillan's Baseball
Encyclopedia.
The genesis of the Turkin/Thompson opus was one day in September 1944
when musician Thompson invited his neighbor, New York Daily News sportswriter
Turkin, to "look over his baseball collection." What Turkin saw was a massive
treasure chest of data, collected and collated over twenty years. "Tommy"
Thompson was a baseball nut--a figure filbert, in the parlance of the
time--who researched baseball just for the love of it. He was not alone in
this pursuit, although very nearly so--other baseball archaeologists of the
time who contributed to this encyclopedia were Frank Marcellus, Tom Shea, Lee
Allen, Ralph Lin Weber, Joe Overfield, Bob McConnell, and the aforementioned
Ernie Lanigan.
The Official Encyclopedia of Baseball went a long way toward making the
study of baseball history and records a respectable pursuit, just as a century
earlier the statistical accounting of a boys' game had helped to make baseball
a sport for grown men. The researchers' ranks expanded to include such men as
Bob Davids, who in 1971, aided by other experts like Cliff Kachline, Bill
Haber, Ray Nemec, John Pardon, and Joe Simenic, would create SABR, the Society
for American Baseball Research (pronounced "saber"). Formerly the lonely
pursuit of a handful of "nuts" like S.C. Thompson, baseball research and
sabermetrics--a neologism coined in honor of SABR, signifying the statistical
analysis of the game's records--would become the pastime of thousands.
An article in Life magazine by Branch Rickey on August 2, 1954, gave
further impetus to the study of baseball statistics, but not just to set the
historical record straight. Indeed, this article may be viewed as the opening
shot of the sabermetric assault of the 1980s. In "Goodby to Some Old Baseball
Ideas," Rickey, with the aid of some new mathematical tools supplied by Dodger
statistician Allan Roth, sought to puncture some long-held conceptions about
how the game was divided among its elements (batting, baserunning, pitching,
fielding), who was best at playing it, and what caused one team to win and
another to lose. This is a pretty fair statement of what sabermetrics is
about.
Rickey attacked the batting average and proposed in its place the on base
percentage; advocated the use of Isolated Power (extra bases beyond singles,
divided by at-bats) as a better measure than slugging average; introduced a
"clutch" measure of run-scoring efficiency for teams, and a similar concept
for pitchers (earned runs divided by baserunners allowed); reaffirmed the
basic validity of the ERA; saw the strikeout as the insubstantial thing it
was--and more. But the most important thing Rickey did for baseball statistics
was to pull it back from the wrong path it had taken with the introduction of
the batting average in 1876: to strip the game and its stats to their
essentials and start again, this time remembering that individual stats came
into being as an attempt to apportion the players' contributions to achieving
a team victory, for that is what the game is about.
Rickey and Roth devised a formula to measure a team's efficiency in
turning its offensive and defensive statistics into runs, and thus wins. They
realized, and had confirmed for them by mathematicians at the Massachusetts
Institute of Technology, that just as the team which scores more runs in a
game gets the win, so a team which over the course of a season scores more
runs than it allows should win more games than it loses--and by an extent
correlated to its run differential. From this startlingly simple (or rather,
seemingly simple) observation in 1954 flowed: first, the trailblazing but
little noted work of George Lindsey in the 1950s and early 1960s, when he
developed a model for run-scoring probability from the twenty-four
combinations of outs and bases occupied; the development of "percentage
baseball" stats and strategies by Earnshaw Cook in the 1960s; the play-by-play
analysis of complete seasons by the Mills brothers, Eldon and Harlan, in
1969-1970; and, over the next two decades, the statistical and historical
works of several sabermetricians, most notably Bill James.
The Computer Age, 1969-
Despite the death of Turkin in 1957 and Thompson ten years later, their
Official Encyclopedia of Baseball remained the dominant book of baseball
statistics, although many fans were frustrated with the fragmentary records it
presented. As Frank V. Phelps wrote in the 1987 edition of The National
Pastime, "Gaps and obvious errors in official averages, the lack of many early
records, difficulty in securing the records of players who appeared in only a
few games, and frustrating discrepancies among existing guides and registers
had long since created a desire for an ultimate, complete, correct set of
major league records. But it wasn't until the mid-1960s that the development
of sophisticated computers which could absorb, retain, order, and output huge
amounts of data finally made a project feasible."
Beginning in 1967, a battalion of researchers commanded by David Neft
foraged through the official records and newspaper box scores to provide
freshly compiled figures for those who had no ERAs, RBIs, slugging averages,
saves, and all manner of wonderful things. The material which finally appeared
in the tome was entered into a data bank, and the book was the first typeset
entirely by computer, now a common practice. Published in 1969, The Baseball
Encyclopedia was a milestone in computer technology, but as indispensable as
the computer were the old-fashioned scrapbooks and files of Lee Allen and John
Tattersall. The result was a mammoth ledger book of the major leagues more
thorough than any that had appeared before.
The Baseball Encyclopedia researchers not only found new data to correct
old inaccuracies but also applied new yardsticks to men who had gone to their
graves never having heard of an RBI or a save. They also raised the hackles of
traditionalists with many of their findings, which prompted the formation of a
Special Baseball Records Committee. Its members ruled upon such matters as
whether, for the historical record, bases on balls should be counted as hits
(as they were in 1887), outs (as they were in 1876), or neither (as has been
the practice in all other years); or whether "sudden-death" home
runs--thirty-seven game-winning blows with men on base that they identified as
having occurred in the bottom half of the ninth or extra inning--would be
credited as homers or, in the practice before 1920, would count for only as
many bases as needed to push across the winning run. In the latter
controversy, committee members first decided to count the disputed blows as
homers, but then, when complaints arose that Babe Ruth's famous total of 714
would change to 715, they reversed themselves. They decided that the National
Association of 1871-1875 was not a major league, while the Federal League,
Union Association, and Players League were; and they ruled on several other
issues, all of which were published in the Appendix to The Baseball
Encyclopedia.
In Total Baseball, we have abided by most of the committee's
decisions--not to preserve Ruth's total, but because there were many more such
homers before 1920 than the thirty-seven the committee identified, and the
disputes surrounding some of them are now beyond settling. We have, however,
treated the National Association as a major league, as Turkin/ Thompson and
all previous record books did, and in accordance with the views of most
historians. And we have differed from the committee's ruling on awarding
pitchers wins and losses in the years before 1920. Not finding any official
scoring rule or practice for that time, they chose to apply 1950 guidelines to
decisions awarded in 1876-1920. This well-intentioned decision produced
substantial alterations in the records of such hurlers as Cy Young, Christy
Mathewson, Grover Alexander, and others. In the ensuing years, the notable
research of Frank Williams (reported in "All the Record Books Are Wrong," The
National Pastime, 1982) revealed that there was indeed a pattern and a
rationale for the way decisions were awarded in those days; the data in Total
Baseball conforms with his findings.
ICI research created new stars, launching several previously
underappreciated heroes of old into the Hall of Fame. Sam Thompson, Addie
Joss, Roger Connor, Amos Rusie--their phenomenal level of play was hidden
simply because statisticians back then were not recording the particular
numbers which would show them off to best advantage. If sabermetrics consists
of finding things in the existing data that were not seen before, or
collecting that data which makes possible the application of new statistics to
old performances, the first edition of The Baseball Encyclopedia was a
monument in the course of sabermetrics.
However, its subsequent editions declined from that standard, dropping
valuable data, jimmying figures for star players in a misguided homage to
tradition, and making a shambles of individual/team balance in the totals. As
Phelps wrote of the second edition, edited by Joseph L. Reichler for the
Macmillan Company after the ICI group broke up and relinquished supervision:
"Players' batting statistics were changed without compensating for
changes in the records of other players on the same teams or in the
corresponding team and league totals. Later editions included even more
unbalanced adjustments . . .
"Quite apart from the problem of record-balancing, the numerous changes
in players' totals and averages has caused serious misapprehensions and
confusions for fans, writers, and researchers. The records of Fred Clarke and
Cy Young differ in all six editions [to 1987] even without counting Clarke's
astronomical 1899 BA [in the third edition, Clarke was credited with a batting
average of .986 that boosted his lifetime mark by 15 points]. The figures for
Burkett, Chesbro, Duffy, Hornsby, Walter Johnson, Radbourn, Speaker, and
Waddell differ in five of the six books. The same is so in four of six for at
least twenty-three other Hall of Famers, and many more less gifted players."
The seventh edition was issued in 1988 and, like the five that preceded it,
was less accurate than the classic first issue. The eighth edition, published
in 1990, corrected many of the errors in the seventh but--perhaps because of
its marketing link with Major League Baseball--retained many once-contested
errors that historians had long since expunged from the record, while changing
other statistics in a manner at variance with Major League Baseball's wishes.
For the ninth edition, MLB withdrew its product endorsement. (David Neft of
ICI, along with Baseball Encyclopedia staff alumni Dick Cohen and Jordan
Deutsch, went on to form Sport Products, Inc. Since 1974 they have issued the
excellent Sports Encyclopedia: Baseball, which has endured as the baseball
reference of choice for thousands of sophisticated fans.)
We will have more to add about accuracy and balance in the "Errors and
Controversies" section of this Introduction.
There were two other interesting developments in 1969. The first and less
celebrated was a research project launched by Eldon and Harlan Mills that,
like the ICI encyclopedia, could not have been contemplated without the
computer. The Mills brothers tracked the entire major league seasons of 1969
and 1970 on a play-by-play basis. Then they applied to that record the
probabilities of winning which derived from each possible outcome of a plate
appearance, as determined by a computer simulation incorporating nearly eight
thousand possibilities. What, for example, was the visiting team's chance of
winning the game before the first pitch was thrown? Fifty percent, if we are
pitting two theoretical teams of equal or unknown ability on a neutral site.
If the first man fails to get on base, the chances of the visiting team
winning are reduced to 49.8 percent; should he hit a double, the visiting
team's chance of victory is raised to 55.9 percent, as determined by the
probabilistic simulation. Every possible situation--combining half inning,
score, men on base, and men out--was tested by the simulator to arrive at "Win
Points."
The Millses' purpose was to determine the clutch value of, say, hitting a
homer with two men on and one man out in the bottom of the ninth, with the
team trailing by two runs, the situation Bobby Thomson faced in the climactic
National League game of 1951--oddly, the rookie year of the first modern
computer. (It gained for him 1,472 Win Points; had it come with no one on in
the eighth inning of a game in which his team led 4-0, the homer would have
been worth only 12 Win Points.) What the Mills brothers were attempting to do
was to evaluate not only the what of a performance, which traditional
statistics indicate, but the when, or clutch factor, which no statistic to
that time could provide.
This project, detailed in a small book issued in 1970 called Player Win
Averages, proceeded from the same impulse that led to other measures of clutch
performance: the game winning RBI, introduced as an official major league stat
in 1980 and scrapped in 1989; the measure of batting performance in
late-inning pressure situations first published by Seymour Siwoff, Steve
Hirdt, and Peter Hirdt of the Elias Sports Bureau in 1985; and the
historically complete indexes of clutch hitting and clutch pitching developed
for this book.
The other noteworthy baseball event of 1969 (besides the centennial of
professional baseball and the miracle of the Mets) was the adoption by the
major leagues of the save, the stat associated with the most significant
strategic development since the advent of the gopher ball. Now shown in the
papers on a daily basis, saves were not officially recorded at all until 1960;
it was at the instigation of Jerry Holtzman of the Chicago Sun-Times, with the
cooperation of The Sporting News, that this statistic was finally accepted.
(Although Pat McDonough, a founding member of SABR, had developed a similar
stat in 1924 which he called "games finished by relief hurlers"; its first
appearance in print came in the New York Telegram three years later.) The need
for the save arose because relievers operated at a disadvantage when it came
to picking up wins. The bullpen specialists were a new breed, and as their
role increased, the need arose to identify excellence, as it had long ago for
batters, starting pitchers, and fielders. The save's prime statistical
drawback is that there is no negative to counteract the positive, no stat for
saves blown (except, all too often, a victory for the "fireman"); unofficial
attempts to develop such a stat have accelerated in recent years, and now are
part of the formula for the Fireman of the Year award.
August 10, 1971, marked another milestone, the founding in Cooperstown of
SABR, the group in whose annual publications most of today's sabermetricians
cut their analytical teeth. Its statistical analysis research committee,
headed for more than a decade by Pete Palmer, has served as a sounding board
for the inventive approaches of such men as Dallas Adams, Dick Cramer, Steve
Mann, Craig Wright, and Bill James.
James published The Baseball Abstract from his home in Lawrence, Kansas,
for five years to a minute if appreciative audience (its 1977 publication
budget: $112.73). In 1982 Ballantine Books, recognizing the increasing
sophistication of baseball fans in the computer age, assumed publication of
the Abstract, and the audience for sabermetrics became sizable indeed, with
James' annuals reaching the bestseller lists and his Historical Baseball
Abstract becoming an essential book for anyone who viewed himself as a serious
fan. James has popularized a different approach to the whole question of what
baseball statistics are for--that they are not brass knuckles to beat a
barroom adversary with, but a means of achieving a better understanding of the
game and heightening one's pleasure in it.
Among the many valuable analytical tools he has developed are the Brock-2
System of projecting career totals, the Victory Important RBI, Offensive and
Defensive Winning Percentages, Secondary Average, Range Factor, and Runs
Created. The last-mentioned, perhaps because James developed it earlier in his
career, is the most widely known, and we apply it in this book to all batters,
in all fourteen variations of the formula, bringing in data for stolen bases,
caught stealing, hit-by-pitch, and grounded into double play for those years
in which it is available. (See the Glossary for the formulas.)
The 1980s also brought attention to another attempt to redefine the
measure of individual performance. In 1978 Barry Codell of Chicago distributed
a paper describing his new statistic, the Base-Out Percentage, to fellow
statisticians and figures in the sports media. At about the same time, Tom
Boswell, not a statistician by trade or inclination but rather a sportswriter
for the Washington Post, developed a stat called Total Average. Like the
Base-Out Percentage, Total Average is a gauge of offensive proficiency which
takes into account not only batting but also base-running skills. (See the
Glossary.)
Dallas Adams and Dick Cramer devoted themselves in the late 1970s to a
discussion of average batting, pitching, and fielding skill, which more than a
decade later remains a subject of intense interest and passionate
disagreement. The question, roughly put, is: How would Cy Young do against the
batters of today? Or Wade Boggs against the pitchers of the 1890s? How many
homers would Babe Ruth hit if he were active today? Or how many strikeouts
would Nolan Ryan have registered in 1880, pitching from a fifty-foot distance?
In other words, how can we adjust the statistics of players to reflect the
certainty that the average batter, pitcher, and fielder have improved over
time, thus narrowing the gap between each succeeding era's peak performance
and its average one? (For more on this philosophically and mathematically
complex subject, we refer you to The Hidden Game of Baseball.)
Adams and Cramer advanced a discussion that had begun in 1976 with the
first article on cross-era comparison, in which David Shoebotham proposed a
new statistic called the Relative Batting Average. Shoebotham recognized that
a .320 batting average in 1893, when the National League batted .280, did not
represent the same level of accomplishment as that average did in 1968 when,
for a number of reasons, the National League batted a measly .243. His
solution? To normalize the players' averages to their respective league
averages simply by dividing the player's batting average by that of his
league.
In this fashion he demonstrated, for example, that Pete Rose, who led the
NL with a .335 BA in 1968, had a Relative BA of 1.38; while Ed Delahanty, who
led the NL with a BA of .380 in 1893, had a Relative BA of only 1.36. Another
way of stating this conclusion is that Rose's .335 was 38 percent above the
average batting performance in the NL of 1968, while Delahanty exceeded his
league's norm by 36 percent. The inferences that might be drawn from this
approach are many: that batting skill has not declined since the days of Ruth,
Gehrig, Foxx, et al., but that pitching skill might have increased; that no
batting average of the years around 1930 ought to be taken without a carload
of salt; that some of the most notable batting performances of all time, as
measured by the batting average, have occurred right under our noses,
unbeknownst to us.
Normalizing a statistic to its league average is a valuable analytical
tool if employed logically. A Relative Batting Average, for example, tells a
good deal more, and tells it more straightforwardly, than Relative Homers or
Relative Strikeouts. The relativist approach works better with ratios such as
batting average, on base percentage, or slugging average--or for that matter
with Runs Created or Total Average--than it does for simple counter stats.
Another worthwhile adjustment to various averages is for home-park
effects. The pioneering work in this area was done by Robert Kingsley,
particularly in regard to why homers flew out of Atlanta's park despite its
"normal" dimensions, but Pete Palmer was first to measure the effects of home
parks on run totals and then to devise a park adjustment for the records of
batters and pitchers. These were discussed in depth in The Hidden Game of
Baseball, and the data base for park factor in this book has been upgraded to
include runs scored and runs allowed at home parks instead of just the latter.
In 1984 the editors of this volume introduced, in The Hidden Game, the
Linear Weights System of assessing players' contributions to their teams--at
the bat, on the basepaths, in the field, or on the mound--in terms of runs,
which are the currency of the game. Its back-to-basics foundation is the same
as that underlying the Rickey formula of 1954 and most of the new statistics
developed since then: that wins and losses are what the game is about; that
wins and losses are proportional in some way to runs scored and runs allowed;
and that runs in turn are proportional to the events which go into their
making.
In the Linear Weights System, these events are expressed not in the
familiar yet deceptive ratios--base hits to at-bats, wins to decisions,
etc.--but in runs themselves, the runs contributed (by batting or base
stealing) or saved (by pitching or fielding). Computer simulations of over
100,000 games produced the run values of, for example: a single (.47 runs),
double (.78), triple (1.09), home run (1.40), walk (.33), steal (.30), caught
stealing (-.60), out (-.25), and out made on base (-.50). Using a
straightforward additive formula, one can calculate a batter or baserunner's
contribution to his team in runs. These would be expressed in terms of runs
contributed beyond what a league-average replacement player could contribute
in his stead, and that average is defined as a baseline of zero. A team
composed entirely of average performers would finish with a record of .500, as
the league must--so each above-average player contributes positive runs toward
a win, and each subpar player contributes negative runs.
Normalizing factors (to league average) are built into the formulas for
all but base stealing, where league average is not a shaping force; these
factors enable us to compute, for example, the number of runs (Batting Runs)
that Cecil Fielder provided in 1990 beyond those an average hitter might have
produced in an equivalent number of plate appearances. And by adjusting
Fielder's Batting Runs for Detroit's homepark influences, the Linear Weights
comparison may be extended to how many runs he accounted for beyond what an
average player might have produced in the same number of at-bats had he too
played half his games in Tiger Stadium.
Furthermore, having determined the number of runs above average required
to transform a loss to a win in the final standings (generally around ten,
historically in the range of nine to eleven; for more on the theory behind
this, see the Glossary), we can convert a player's Linear Weights
record--expressed as Batting Runs, Base Stealing Runs, Pitching Runs, or
Fielding Runs--to the number of wins above average he alone contributed. What
are individual statistics for if not to achieve some understanding of this?
Last, by reviewing the win contributions of all a team's personnel, we may
establish a solid assessment of that team's strength and weaknesses--either to
predict a team's chances for success in the upcoming season or, in an
encyclopedia like Total Baseball, to analyze how and even why it failed its
reasonable statistical expectations or exceeded them.
Formulas for the Linear Weights measures for batting, baserunning,
fielding, and pitching will be found in the Glossary.
Other developments of the decade include the previously mentioned
adoption of the game-winning RBI (GWRBI) in 1980; it credited the batter who
drove in a run to give his club a lead that it never relinquished. This stat
was pilloried in the press from its introduction, with merit, until Major
League Baseball finally gave up on it before the 1989 season. In 1984 on base
percentage was made official, thirty years after its introduction to the
general baseball public by Branch Rickey and Allan Roth. Subsequent years
brought the Quality Start, which takes note of a pitcher who gives his club
six innings or more while allowing three runs or less. Under this
construction, an ERA of 4.50 in a mercifully shortened outing is held to be
commendable. The editors of this book do not regard the Quality Start as a
quality stat.
More interesting are the situational stats which are the specialty of
the Elias Sports Bureau and Project Scoresheet--performance in day games vs.
night, grass vs. artificial turf, lefty vs. righty, day game following night,
bases-loaded situations, and so on. When the data is drawn from a large enough
sample, these stats can be provocative and meaningful; too often, however,
television announcers desperate to maintain conversation flow will burden
their listeners with something like, "Over the last two seasons, he's batted
.375 against this guy" (not bothering to add that the figure represents three
hits in eight times at bat). Situational stats are the wave of the future in
baseball, but are not yet of much use for reviewing the past--Elias has kept
them systematically only since 1975.
Total Baseball
The next major event in the history of baseball record keeping is Total
Baseball. Founded upon a unique historical database that Pete Palmer has
cultivated for decades--in the tradition of baseball archivists like S.C.
Thompson, Bradshaw Swales, Leonard Gettelson, and John Tattersall--Total
Baseball is the third-generation encyclopedia of the game. Just as the advent
of the Macmillan/ICI encyclopedia supplanted Turkin/ Thompson, the standard
for two decades, Total Baseball has taken advantage of new technology and new
research, notably by members of the Society for American Baseball Research, to
present more accurate data than ever before, and more of it. There are, of
course, the traditional stats one would expect in a baseball reference work;
there are many of the new, more revealing stats discussed above; there are
stats never published before and developed now for this book. And as you have
seen in Part One, there is a recognition that baseball history and knowledge
resides not only in its numbers.
But returning to the statistics and records which make up this second
part of Total Baseball, here is a brief rundown of what's coming (full
descriptions will be found in the separate introduction to each section):
- The Annual Record: Season-by-season standings and records for all teams
since 1871, plus the top five league leaders in generally forty-eight
categories per season.
- The Rosters: A completely revised manager roster, courtesy of some
splendid research into the early years by SABR's Bob Tiemann and Richard
Topp; and the application to all managers of the "actual vs. expected
win" method introduced in the editors' Baseball Annual 1990 (with Eliot
Cohen); a definitive roster of the men in blue, compiled by expert Larry
Gerlach; a roster of coaches, never before compiled; a roster of club
owners and presidents; and a roster of all the black men who played
professionally in the years of segregated ball.
- The Player, Pitcher, and Relief Pitcher Registers: The heart of this
section of Total Baseball, presenting complete seasonal and lifetime
records for every major leaguer, with twenty-three stats for players,
twenty-five for pitchers, and ten for relievers.
- All-Time Leaders: The top one hundred lifetime and single-season
performers in 219 categories, including important conventional stats not
found in other encyclopedias and dozens of the sabermetric variety.
Now that the genealogy of the more significant records and record books
has been described, it's time to say a few words about the measures you'll
find in the main statistical sections of Total Baseball: the annual record and
the player/pitcher registers. We will not attempt to define the basic counting
stats such as games, at-bats, wins, losses, and so on; if these are puzzling
to you, you have picked up the wrong book.
Batting
Let's start with the batting statistics, and the first of these to
consider will be that venerable, uncannily durable fraud, the batting average.
(It consists simply of hits divided by at-bats.) We know as well as anyone
else that this monument just won't topple; the best that can be hoped is that
in time fans and officials will recognize it as a bit of nostalgia, a
throwback to the period of its invention when power counted for naught, bases
on balls were scarce, and no one wanted to place a statistical accomplishment
in historical context because there wasn't much history yet.
Time has given the batting average a powerful hold on the American
baseball public; everyone knows that a man who hits .300 is a good hitter
while one who hits .250 is not. Everyone knows that--no matter that it is not
true. You want to trade Lenny Dykstra for Kevin Mitchell? Willie McGee for
Will Clark? Batting average treats all hits in an egalitarian fashion. A
two-out bunt single in the ninth with no one on base and your team trailing by
six runs counts the same as Bobby Thomson's "shot heard 'round the world." And
what about a walk? Say you foul off four 3-2 pitches, then watch a close one
go by to take your base. Where's your credit for a neat bit of offensive work?
Not in this stat. And a .250 batting average may have represented a distinct
accomplishment in certain years, like 1968 when the American League mean was
.230. That .250 hitter stood in the same relation to an average hitter of his
season as a .282 hitter did in the American League in 1988--or a .329 hitter
in the National League of 1930! If .329 and .282 and .250 all mean roughly
the same thing, it raises questions about the value of the measure.
And yet, the batting champion each year is declared to be the one with
the highest batting average, and this will not soon change. And the Hall of
Fame is filled with .300 hitters who couldn't carry the pine tar of many who
will stay forever on the outside looking in. Knowledgeable fans have long
realized that the ability to reach base and to produce runs are not adequately
measured by batting average, and they have looked to other measures--for
example, the other two components of the Triple Crown, home runs and RBIs.
Still more sophisticated fans have looked to the slugging average or on base
percentage, and in the 1980s to various sabermetric measures.
The slugging average does acknowledge the role of the man whose talent is
for the long ball and who may, with management's blessing, be sacrificing bat
control and thus batting average in order to let'er rip. (Slugging average is
the number of total bases divided by at-bats.) But the slugging average has
its problems, too. It declares that a double is worth two singles, that a
triple is worth one and a half doubles, and that a home run is worth four
singles. All of these proportions are intuitively pleasing, for they relate to
the number of bases touched on each hit, but in terms of the hits' value in
generating runs, the proportions are wrong. One home run in four at-bats is
not worth as much as four singles, for instance, in part because the total run
potential for the team of four singles is greater, and in part because the man
who hit the four singles did not also make three outs; yet the man who goes
one for four at the plate, that one being a homer, has the same slugging
percentage of 1.000 as a man who singles four times in four at-bats.
Moreover, it is possible to attain a high slugging average without being
a slugger. In other words, if you have a high batting average, you must have a
decent slugging average; it's difficult to hit .350 and have a slugging
percentage of only .400. Even a bunt single boosts not only your batting
average but also your slugging average. (The attempt to counteract this
problem is a statistic called Isolated Power, which divides only extra bases
by at-bats.) Other things the slugging average does not do are: indicate how
many runs were produced by the hits; give any credit for other offensive
categories, such as walks, hit-by-pitch, or steals; permit the comparison of
sluggers from different eras (if Jimmie Foxx had a slugging percentage of
.749 in 1932 and Mickey Mantle had one of .705 in 1957, was Fox 7 percent
superior? The answer is no, and the reason is in the higher slugging average
of the AL in 1932.
Well, how about on base percentage? (To calculate this stat, divide hits,
walks, and hit-by-pitch by at-bats, walks, hit-by-pitch, and sacrifice flies.)
On base percentage has the advantage of giving credit for walks and
hit-by-pitch, but it is an unweighted average and thus makes no distinction
between those two events and, say, a grand-slam homer. A fellow like Eddie
Yost, who, in some years when he hit under .250, drew nearly a walk a game,
gets his credit with this stat as does a Gene Tenace, one of those guys whose
statistical line looks puny without his walks. Similarly, players like Mickey
Rivers or Omar Moreno, leadoff hitters with a lot of speed, no power, and no
patience, are exposed by the OBP as distinctly marginal major leaguers, even
in years when their batting averages look respectable or excellent. In short,
on base percentage does tell you more about a man's ability to get on than
does the batting average, and thus is a better indicator of run generation,
but it's not enough by itself to separate the "good" hitters from the
"average" or "poor" ones.
Not by itself, no . . . but when you add it to slugging average, you
come up with a very powerful indicator of batting ability. These two
one-legged men, when joined together, make for a very sturdy tandem, the
infirmity of the one being almost exactly compensated by the power of the
other. The virtues of on base plus slugging, a combined stat called
Production, are that it is easily computed from officially issued stats and
that it is the most accurate of all the newer stats except those denominated
directly in runs. Its weaknesses are that because it is stated as the sum of
two averages, it is--like a batting average or earned run average or any other
average--a measure of the rate of success rather than the amount, and the fan
needs considerable context to know what it means. Is a Production mark of
.750 poor, average, or outstanding? (Answer: pretty good, because the league
average figure in recent years has been in the low .700s--although in the NL
of 1930 it exceeded .800.)
This second drawback may be eliminated in the same manner for all
averages: by normalizing, or adjusting, each individual performance to the
league average in that category for the year in which it took place. If a
batter's Production was .700 in a year when the league average was .700, he
performed at a rate of 100 (his Production divided by the league's, discarding
the decimal point for ease of expression). If his Production was .800, his
league-adjusted mark would be 114. The meaningfulness of that performance
might be further refined by adjusting it once more, to take into account the
run-producing characteristics of the man's home park: a batter whose home park
was a hitters' haven like Wrigley Field might have his Production adjusted
downward, while another playing half his games in the Astrodome might have his
adjusted upward. In Total Baseball, figures adjusted for league average and
park factor are denoted by "/ A" following the raw figure, and the Park Factor
(PF) is expressed with a baseline of 100--a hitter's park might have a factor
of 110, a pitcher's park 90. In this third edition of Total Baseball, we state
Production in the Player Register only in its normalized, park-adjusted form,
here termed "Production+."
RBIs? Don't they indicate run production and clutch ability? Yes and no.
The RBI does tell you something about run-producing ability, but not enough:
it's a situation-dependent statistic, inextricably tied to factors which vary
wildly for individuals on the same team or on others (including, importantly,
the position of each player in the batting order). And the RBI makes no
distinction between being hit by a pitch to drive in the twelfth run of a game
that concludes 14--3 and, again for comparison, the Thomson blast. RBIs tell
how many runs a batter pushed across the plate, all right, but they don't tell
how many fewer he might have driven in had he batted eighth rather than
fourth, or how many more he might have driven in on a team that put more men
on base. They don't even tell how many more runs a batter might have driven in
if he had delivered a higher proportion of his hits with men on base.
The American League kept RBI Opportunities--men on base presented to each
batter--as an official stat for the first three weeks of 1918, then saw how
much work was involved and ditched it. The problem remains: how to assess run
productivity for batters. Pitchers are easier. Their accomplishments are
directly measured in runs allowed. But batters, baserunners, and fielders make
their contributions in the constituent parts of runs--outs, hits, and a
variety of more or less successful other events. (Even a batter who hits a
solo homer contributes more than one run to his team, because he permits
another player to bat who otherwise would not have, and each batter has a
potential for producing further runs.)
You hear a lot in the media about the value of Runs Produced, a stat we
track in Total Baseball in the top five section of the Annual Record. Runs
Produced is simply runs scored plus runs batted in, subtracting homers because
a dinger gives a batter "double credit"--a run scored plus an RBI. The editors
view Runs Produced as an odd linkage of one opportunity-dependent stat with
another that depends upon largely the same factors, but we offer the stat for
those who like that sort of thing.
And so we come to the recently formulated game-winning RBI (GWRBI)--a
noble attempt at describing the value of a hit to the team, its "clutchness,"
but a measure which was misconceived in its presumption that a game could be
won with a hit in the first inning. A man who drives in a run in the first
inning is simply doing his job, not performing an extraordinary feat; if the
pitcher makes that run hold up by throwing a shutout, bully for him, but why
credit the hitter? Were he to drive in the lone run of the game in the seventh
inning or later, that would be different. Nonetheless, the latest formulation
of the stat gave the man who drove in that first-inning run a GWRBI even if
his team eventually won 22-0, since it gave the team a lead that was never
relinquished.
Worse, the GWRBI was situation-dependent to an even greater degree than
the RBI. You can't play for a lousy team and lead the league in GWRBIs because
there aren't enough games won to go around. And it's even harder to accumulate
GWRBIs from the eighth place in the batting order than it is to accumulate
RBIs. Last, if you put your team ahead with an RBI in the bottom of the
eighth, why should you lose your GWRBI simply because the pitcher allows the
lead to be lost? Wasn't your hit "clutch"? Say the pitcher allows the score to
be tied, then a teammate might pick up the GWRBI that should have been safely
tucked away for you. Nicely motivated, the GWRBI, but utterly without merit
and thus we barred it from the first edition of Total Baseball, no matter that
at the time we prepared that book the GWRBI was still an official Major League
Baseball stat. It is no longer.
We do, however, present a measure called Clutch Hitting Index, which
addresses the problem of run-producing opportunities on a historical basis. We
offer this with several reservations, including the classic philosophical one
about whether clutch ability exists at all. Is a man who hits .280 with men on
base and .240 with the sacks clear a hero in the former situation or a bum in
the latter? The Clutch Hitting Index measures actual RBIs over expected RBIs,
which have been calculated on the basis of a man's extra-base hits and the
opportunities he could have been expected to have, based on the average RBIs
per league and where he batted in the lineup and who batted above him. This
is, by admission, a rough measure indeed, but we think it's an interesting
one. We included it in the Player Register in the first edition; since then we
have confined the stat to the Annual Record and Leaders Sections. For teams,
the measure of clutch hitting is more elegant: the ratio of its actual runs to
its runs as calculated by the Linear Weights method.
Previously discussed were Runs Created, Total Average, and Batting Runs.
Total Average numbers will tend to look like those of Production, which
measures largely the same things only in a different manner; for this reason
we have removed it from the Player Register in this edition while retaining it
in the Annual Record and Leaders. The numerical expression of Runs Created
exceeds that of Batting Runs, except that its baseline of zero defines the
worst player in the league rather than the average one.
Base Stealing
Many fans understand, as a result of sabermetric findings of the 1980s,
that a man with a lot of stolen bases is not necessarily the best baserunner,
nor even an asset to his team; he might have been caught nearly as often as he
stole and thus may have cost his team many runs on balance. The game's
encyclopedic reference works have in years past contained stolen base totals,
even if the tabulations for the early years were suspect because of unclear
standards for what differentiated a steal from clever baserunning. What they
have not offered is the flip side of the steal--the caught-stealing numbers
that make sense of the steal itself.
As mentioned above, caught stealing was recorded officially in the AL
beginning in 1920, then was dropped for 1927, was resumed in 1928, and has
been continuously in use ever since. In the NL, it was computed for 1920-1925,
then was dropped until 1951, when it resumed on a continuous basis. We have
figures kept by Ernie Lanigan for the years 1914-1916 in the AL and for
1915-1916 in the NL, and in the second edition we have added caught-stealing
data for 1927 from newspaper accounts (about 90 percent complete). In Total
Baseball we present, for those years in which the data exists, the raw CS
data, Stolen Base Averages, and Stolen Base Runs. This last is expressed in
runs, based on the computer-derived value of .30 runs for a stolen base and
-.60 runs for a thwarted steal. To make a positive contribution to his team, a
base thief must be successful in more than two-thirds of his attempts.
Fielding
When, back in 1954, Rickey and Roth came up with their "efficiency
formula" for run scoring and run prevention, the defensive half of the
equation was divided into five segments. The first was opponents' batting
average; the second was opponents' reaching base through bases on balls or hit
batsmen; the third was a measure of a pitcher's clutch ability; the fourth was
his strikeout capability; and the fifth was fielding, to which they assigned a
mathematical value of zero. "There is nothing on earth," Rickey declared,
"anyone can do with fielding." Besides, he added, good fielding might account
for the critical run in a ballgame only four or five times a year.
Was Rickey right? The central weakness of the fielding average has long
been known: you can't make an error on a ball you don't touch. To counter this
weakness in fielding average and to credit the plays made as well as the plays
not made, total chances per game is a more useful statistic--and when errors
are deducted from chances, you have a fielder's Range Factor. James pointed
out how absurd it had become, in a time when the best-fielding second baseman
might commit ten errors a season and the worst twenty, to focus on this
difference of ten rather than on the 250-300 in total chances which might
separate the most agile keystoner from the exemplar of Lot's wife.
Another difficulty with the fielding average is that to understand what
figure represents mean performance (and thus be able to identify inferior and
superior fielders), one must adjust for position: a shortstop who fields
.980 has done quite well, but a first baseman, catcher, or outfielder with
that figure would have been below average. Thus the fan must bring to the
fielding average a great deal of background knowledge--the mean for each
fielding average for each position in each season. This is a demand that, on
first reflection, is not created by the batting average (all men stepping to
the plate occupy the same position--batter). On second thought, however, the
knowledgeable fan recognizes that a batting line of .267, 10 HRs, 80 RBIs will
mean different things when applied to a shortstop or to a left fielder. In
other words, just as any evaluation of fielding performance carries an
inherent positional bias, so does batting performance.
High double-play totals are believed to indicate excellence among middle
infielders, but the more double plays a club turns, as a rule, the worse the
pitching. Which teams had the most double plays in major league history? In
the 154-game season, the Philadelphia A's of 1949 and the Los Angeles Dodgers
of 1958; in the 162-game season, Toronto and Boston of 1980 and Pittsburgh of
1966. Of these, only the last-mentioned had a team ERA better than the league
average. If the pitchers are putting a lot of men on base, the team can get a
lot of double plays even without a great-fielding shortstop and second
baseman.
So what to do? How do we assess fielding excellence? The idea of
crediting stellar fielding plays individually has been proposed occasionally
ever since 1868, when Chadwick wrote: "The best player in a nine is he who
makes the most good plays in a match, not the one who commits the fewest
errors, and it is in the record of his good plays that we are to look for the
most correct data for an estimate of his skill in the position he occupies."
Father Chadwick was correct to see that fielding percentage emphasized failure
rather than success, but in truth the fielding percentage was a far better
measure of ability in the 1860s, when one play in four produced an error, than
now, when only two plays in a hundred are flubbed.
The choice in Total Baseball has been to concentrate on Total Chances but
not to disregard the error, as Range Factor does; nor to include it in Total
Chances, as David Neft would favor; nor to subtract it from Total Chances, as
Barry Codell once advocated. The error may be infrequent today but it is not
insignificant; instead, it is a peculiarly damaging event, turning an out
(with its computer-derived run value of -.25) to, in effect, a hit (with its
run value of +.50). This is a turnaround of .75 runs, or the equivalent of
three outs; an outfield error costs even more, because it so often produces
more than one base for batter and runners both. Thus the defensive stats we
favor in this book and include in the Player Register and Pitcher Register are
Linear Weights formulas, expressed in runs and computed differently for the
different positions (see the Glossary for the formulas, which have a
significant refinement in this edition). However, in all cases the elements of
the statistics are putouts, assists, double plays, and errors (and for
catchers, passed balls).
Position players are gauged by Fielding Runs, a Linear Weights measure of
the runs they saved (or allowed) through their play that an average man at
that position would not have (second baseman are compared with other second
baseman rather than, say, with left fielders--even the worst-fielding second
sacker would cost his team fewer runs at the position than the best defensive
left fielder). Pitcher Defense (like Pitcher Batting) is to be found in the
Pitcher Register. Raw fielding statistics, of questionable value in and of
themselves, have been excluded from this third edition of Total Baseball. An
innovation this time around is the placement of the fielding average in the
Player Register. It is computed for the position at which a man played the
most games.
Pitching
On to the pitching statistics you will see in the Annual Record and
Pitcher Register. First to be reviewed are wins and losses, and won-lost
percentage. Wins are a team statistic, obviously, as are losses, but we credit
a win entirely to one pitcher in each game. Why not to the shortstop? Or the
left fielder? Or some combination of the three? In a 13-11 game, several
players may have had more to do with the win than any pitcher. No matter.
We're not going to change this custom, though Ban Johnson gave it a good try
when he banished it from the American League records for seven years beginning
in 1913.
To win many games a pitcher generally must play for a team that wins many
games. Look at Red Ruffing's won-lost record with the miserable Red Sox of the
1930s, then at his mark with the Yankees. Or at Danny Jackson, first with
Kansas City, then with Cincinnati. There is an endless list of good pitchers
traded to stronger offensive clubs who "emerge" as stars.
The recognition of the weakness of this statistic came early. Originally
it was not computed by such men as Chadwick because most teams leaned heavily,
if not exclusively, on one starter, and relievers as we know them today did
not exist. As the season schedules lengthened, the need for a pitching staff
became evident, and separating out the team's record on the basis of who was
in the box seemed a good idea. However, it was not then nor is it now a good
measure of performance, for the simple reason that one may pitch poorly and
win, or pitch well and lose.
The natural corrective to this deficiency of the won-lost percentage is
the earned run average--which, strangely, preceded it, gave way to it in the
1880s, and then returned in 1912. Originally, the ERA was computed as earned
runs per game because pitchers almost invariably went nine innings. In this
century it has been calculated as earned runs times nine, divided by innings
pitched.
The purpose of the earned run average is noble: to give a pitcher credit
for doing what he can to prevent runs from scoring, aside from his own
fielding lapses and those of the men around him. It succeeds to a remarkable
extent in isolating the performance of the pitcher from his situation, but
objections to the statistic remain. Say a pitcher retires the first two men in
an inning, then has the shortstop kick a ground ball to allow the batter to
reach first base. Six runs follow before the third out is secured. How many of
these runs are earned? None.
The prime difficulty with the ERA in the early days, say 1913, when one
of every four runs scored was unearned, was that a pitcher got a lot of credit
in his ERA for playing with a bad defensive club. The errors would serve to
cover up in the ERA a good many runs which probably should not have scored.
Those runs would hurt the team, but not the pitcher's record. This situation
has been aggravated further by the use of newly computed ERAs for pitchers
prior to 1913, the first year of its official status. Example: Bobby Mathews,
sole pitcher for the New York Mutuals of 1876, allowed 7.19 runs per game, yet
his ERA was only 2.86--almost a perfect illustration of the league's 40
percent proportion of earned runs.
It is not an accident that pitchers of the dead-ball era of this century
(1900-1919) dominate the lifetime and seasonal leaders tables in ERA. Yes,
there were circumstances away from the mound that depressed batting, but the
pitchers of that period also benefited mightily in the ERA column from the
high number of errors, as compared to today. How to compare the ERA of an Ed
Walsh or Three Finger Brown with a Frank Viola or a Dwight Gooden? As with
batting stats, normalize the ERA to league average and adjust for home park
effects. A pitcher from 1908 whose Adjusted ERA was 150 can be compared to one
from 1988 with the same Adjusted ERA--each stood in the same relation to his
peers, that is, 50 percent better than average.
What gave rise to the ERA, and what we appreciate about it, is that like
the batting average it is an attempt at an isolating stat, a measure of
individual performance not dependent upon one's own team. Its principal
shortcoming is that it indicates only a pitcher's rate of efficiency, not his
actual benefit to the team. In a league with an ERA of 4.00, a starter who
throws 300 innings with an ERA of 3.50 must be worth more to his team than a
starter whose ERA is the same but who pitches in only half as many innings.
Through the Linear Weights figures of Pitching Runs (broken out in the
top-five section of the Annual Record as Starter Runs and Relief Runs), we can
determine the number of runs a pitcher saved his team beyond what a pitcher
performing at the league-average ERA would have allowed. A truly simple stat,
it consists of nothing more than a pitcher's normalized, or league-adjusted,
earned run average weighted by his innings pitched.
Because Pitching Runs has a built-in normalizing factor, when you see it
in Total Baseball under a heading for "/A," that adjustment will be for park
factor. Pitchers' park factor is calculated differently from batters' park
factor, for a number of fairly complex reasons that technical-minded readers
might best consult in the Glossary.
While the ERA is a far more accurate reflection of a pitcher's value than
the BA is of a hitter's, it fails to a greater degree than the BA in offering
an isolated measure. For a truly unalloyed individual pitching measure, we
must look to the glamour statistic of strikeouts, the pitcher's mate to the
home run (though home runs are highly dependent upon home park, strikeouts are
to only a slight degree).
Is a strikeout artist a good pitcher? Maybe yes, maybe no; a good
analogue would be to ask whether a home run slugger is a good hitter. The two
stats run together: periods of high home run activity (as a percentage of all
hits) invariably are accompanied by high strikeout totals. Strikeout totals,
however, may soar even in the absence of overzealous swingers, say, as the
result of a rules change such as the legalization of overhand pitching in
1884, the introduction of the foul strike (NL, 1901; AL, 1903), or the
expanded strike zone in 1963.
Just as home run totals are a function of the era in which one plays, so
are strikeouts. The great nineteenth-century totals--Matches Kilroy's 513,
Toad Ramsey's 499, One Arm Daily's 483--were achieved under different rules
and fashions. No one in that era fanned batters at the rate of one per inning;
indeed, among regular pitchers (those with 154 innings pitched or more), only
Herb Score did until 1960. In the next five years the barrier was passed by
Sandy Koufax, Jim Maloney, Bob Veale, Sam McDowell, and Sonny Siebert. Walter
Johnson, Rube Waddell, and Bob Feller didn't run up numbers like that. Were
they slower, or easier to hit, than Sonny Siebert?
Even in today's game, which lends itself to the accumulation of, by
historic standards, high strikeout totals for a good many pitchers and
batters, the strikeout is, as it always has been, just another way to make an
out. Yes, it is a sure way to register an out without the risk of advancing
baserunners and so is highly useful in a situation such as when there is a man
on third with fewer than two outs; otherwise, it is a vastly overrated stat
because it has nothing to do with victory or defeat--it is mere spectacle. A
high strikeout total indicates raw talent and overpowering stuff, but the
imperative of the pitcher is simply to retire the batter, not to crush him.
What's not listed in your daily averages are strikeouts by batters--fans are
not as interested in that because it's a negative measure--yet the strikeout
may be a more significant stat for batters than it is for pitchers.
Bases on balls will drive a manager crazy and put lead in fielders' feet,
but it is possible to survive, even to excel, without first-rate
control--provided your stuff is good enough to hold down the number of hits.
Total Baseball offers two stats that are, like strikeouts, highly interesting
but ultimately of debatable value: Opponents' Batting Average and Opponents'
On Base Percentage. (The same could be said of Fewest Hits Per Game and Fewest
Walks Per Game, of course.) It is illuminating to compare one or the other
with a pitcher's ERA or Pitching Runs, but both calculations are somewhat
academic, for at the end of a game, season, or career, it doesn't matter how
many men a pitcher puts on base. Theoretically he can put three men on every
inning, leave the twenty-seven baserunners allowed, and pitch a shutout. A man
who gives up one hit over nine innings can lose 1-0; it's even possible to
allow no hits and lose. Who is the better pitcher? The man with the shutout
and twenty-seven baserunners allowed, or the man who allows one hit? No matter
how sophisticated your measurements for pitchers, the best ones are counted in
runs.
The nature of baseball at all points is one man against nine. It's the
pitcher against a series of batters. With that situation prevailing, we have
tended to examine batting with intricate, ingenious stats, while viewing
pitching through generally much weaker, though perhaps more copious,
measurements. What if the game were to be turned around so that we had a
"pitching order"--nine pitchers facing one batter? Think of that for a minute.
The nature of the statistics would change, too, so that your batting stats
would be vastly simplified. You wouldn't care about all the individual
components of the batter's performance, all combining in some obscure fashion
to reveal run production. You'd care only about runs. Yet what each of the
nine pitchers did would bear intense scrutiny, and over the course of a year
each pitcher's Opponents' Batting Average, Opponents' On Base Percentage,
Opponents' Slugging Average, and so forth, would be recorded and spun to come
up with a sense of how many runs each pitcher had saved.
A pitching stat with an interesting history is complete games. This is
your basic counter stat, but it's taken to mean more than most of those
measurements by baseball people and knowledgeable fans. When everyone was
completing 90-100 percent of his starts, the stat was without meaning and thus
was not kept. As relief pitchers crept into the game after 1905, the
percentage of completed games declined rapidly. By the 1920s it became a point
of honor to complete three quarters of one's starts; today the man who
completes half is quite likely to lead his league. So with these shifting
standards, what do CGs tell you? About pitchers, not a lot anymore: about
managers and bullpens, a great deal.
Can we say that a pitcher with 18 complete games out of 37 starts is
better than one with 12 complete games in 35 starts? Not without a lot of
supporting help, we can't, not without a store of knowledge about the
individuals, the teams, and especially the eras involved. The more uses to
which we attempt to put the stat, the weaker it becomes, the more attenuated
its force. If we declare the hurler with 18 CGs "better," how are we to
compare him with another pitcher from, say, fifty years earlier who completed
27 out of 30 starts? Or another pitcher of eighty years ago who completed all
the games he started? (Jack W. Taylor completed every one of the 187 games he
started over five years.) Or what about Will White, who in 1880 started 75
games and completed every blessed one of them? But the rules were different,
you say, or the ball was less resilient, or they pitched from a different
distance, with a different motion, or this, or that. The point is, there are
limits to what a traditional, unadjusted baseball statistic can tell you about
a player's performance in any given year, let alone compare his efforts to
those of a player from a different era.
Of shutouts there is little to say that is not perfectly obvious, except
that historical totals have been revised because (a) in 1920-1939 the American
League did not count games of less than nine innings as shutouts, and (b) in
those years and before, in both leagues, a pitcher was credited with a shutout
even if he was pulled midway, if he had pitched enough innings of a combined
whitewash. Total Baseball counts only complete-game shutouts.
Wins Above Team is, as discussed, a variation of Ted Oliver's stat made
public in 1944, which he called the Weighted Rating System. Apart from
modifying his math, we have taken Oliver's "points"--the thousands of points
his formula gave to hurlers who performed well for poor teams--and by
retaining the decimal that he would have discarded, we have come up with a
stat that is expressed quite properly in wins. In this edition, Wins Above
Team is recorded only for the top hundred lifetime and season marks.
Newly developed for the first edition of this book was a Clutch Pitching
Index that, like the measure for clutch hitting, could be applied to
historical data. The CPI is figured by taking how many earned runs the pitcher
should have allowed, based on the performance of the batters who faced him,
and how many he actually allowed (see the Glossary for the formula). The
Clutch Pitching Index consists of expected runs over actual runs, so marks
over 100 exceed league-average performance.
For relief pitchers, we have previously discussed saves, and Relief Runs
(the Linear Weights category) are figured no differently than Starter Runs.
Newly developed here is Relief Ranking, which adjusts Relief Runs for the
greater situational importance of each run a bullpenner saves or yields. The
other elements of the formula are wins, losses, and saves, in a proportion
detailed in the Glossary. Games, innings, and ERA in relief are broken out in
the new Relief Pitcher Register.
Bringing It All Together
Pitcher Batting and Pitcher Defense are recorded in the Pitcher Register
as Linear Weights figures, expressed in runs. (Pitcher batting has been
removed from league stats for such computations, so that the batting records
of everyday players are compared only with those of their peers and pitchers'
batting records are compared only with those of their peers.) For this
edition we have added to Pitcher Batting Runs the seasonal totals for hits and
the pitchers' batting averages. The totals for Pitcher Batting Runs and
Fielding Runs are seldom of a great magnitude--and for AL pitchers since 1973,
the batting figure is, of course, zero--but in earlier years a pitcher's
ability to help himself and his team off the mound has occasionally counted
for a great deal in a given season; spitballer Ed Walsh in 1907, the year
before he won 40 games for the White Sox, accounted for an astounding 2.3
Fielding Wins. The hitting ability of a Wes Ferrell or Don Drysdale certainly
counted for something in their teams' prospects for victory. In Total Baseball
a pitcher's overall contribution is reflected in the Total Pitcher Index,
converted from Runs above average to Wins, based on the Runs required to
create an extra Win in that year.
For everyday position players, add Fielding Runs to Stolen Base Runs to
Batting Runs, then convert those combined Runs to Wins, and you have the best
measure of the complete ballplayer: the Total Player Rating. We believe,
however, that a positional adjustment must be made to the above combination to
reflect the greater skill required to play, for example, second base than left
field; this adjustment is based on the average batting skill required at that
position to hold a major league job. Historically, left fielders have
presented the best record in Batting Runs and middle infielders the worst. In
other words, a left fielder who accounted for 10 Fielding Runs should not be
regarded as having the same value to a team as a shortstop who also
contributed 10 Fielding Runs: Have the two men switch positions and you would
soon see who made more of a defensive contribution. And because some
positions--shortstop, catcher, second base, and third base--are harder to play
than others, we see a relative scarcity of good hitters at these positions and
an abundance at the others. Again, see the Glossary for more detail.
The ultimate stat brings together batters, pitchers, fielders, and
baserunners in the Total Baseball Ranking. The equivalent of a Most Valuable
Player Award and Cy Young Award wrapped into one, it reveals the best baseball
player every season and the best ever. Relief pitchers and shortstops can
compete on the same plane--wins contributed to their team through all their
accomplishments. In 1978 the MVP question in the American League was whether
to vote for Jim Rice, who had 46 homers, 139 RBIs, and 400 total bases (the
first time for an American Leaguer in forty-one years), or for Ron Guidry,
25-3 with an .893 won-lost percentage that was the all-time high for a starter
with 20 or more wins, and whose ERA of 1.74 was less than half the league
average. Why don't you flip to the page in the Annual Record for 1978 and see
for yourself who deserved the MVP Award that year.
Total Baseball also sums things up on the team level. Fielding Runs are
expressed as Wins in the team stats section of the Annual Record, as are
Batting Runs, Stolen Base Runs, and Pitching Runs. This enables one to see the
component parts of a team's predicted success or failure--that is, the wins or
losses beyond the average (a .500 season) that the players' performance could
have been expected to produce. The Differential figure (DIF) in this section
of the Annual Record states the spread between the team's actual won-lost
record and that predicted by the Linear Weights measures of batting, base
stealing, fielding, and pitching. The miracle Mets of 1969 exceeded
expectations by 13 Wins--in other words, instead of finishing 87-75 as their
players' performance would have warranted, they finished 100-62. Did the Blue
Jays also do it with mirrors in 1992? Check for yourself.
Errors and Controversies
The data ICI reported in the first edition of The Baseball Encyclopedia
upset many people in baseball, for their numbers were different from those
traditionally accepted; however, their changes were responsible ones, the
product of new research that corrected errors of long standing, or in response
to the rulings of the Special Baseball Records Committee. For example, much of
the statistical information on Hall of Fame plaques was rendered obsolete. The
result has been that through the ensuing editions, the offending data has been
fudged to bring it into line with tradition--more on this in a moment.
Despite the uproar that greeted ICI's revised numbers, 1969 was hardly
the first time corrections had been made to official data. In 1929 Grover
Cleveland Alexander won his 373rd game, breaking Christy Mathewson's National
League record, then thought to be 372. He never won another game. A number of
years later, Joe Reichler found a game in which, by today's rules, Matty
should have gotten the win, this game taking place on May 21, 1902. The record
was changed and they were given a tie. The problem was that no one checked all
of Mathewson's other games to see how many times he received a win under the
old rules that wouldn't have been credited that way today. When ICI did their
research in 1968, they found Matty had only 367 wins total by today's rules,
while Alexander had 374. (Further research, notably by Frank Williams, has
restored Alexander and Mathewson to a tie at 373 wins.) The Records Committee
decided that all wins and losses should be awarded according to the present
rules, so Macmillan printed the totals as such in the Baseball Encyclopedia.
However, after the book came out, Commissioner Bowie Kuhn decided that it was
better to show stats that agreed with previously published recognized sources,
so all records--not only those of Mathewson and Alexander--were supposed to be
changed back in accordance with the scoring practices at the time. What
happened in the next edition was that some records, especially those of the
stars, were changed, while others were not; team totals and the records of
other players on the same team were not; and the data base was corrupted.
Here's another celebrated example of record-book flip-flops. When the
American League was formed in 1901, Nap Lajoie was credited with a .422
average, with 220 hits in 543 at-bats. After a number of years, someone
noticed that if you take these at-bats and hits, the average comes out only to
.405, so his average was changed. (Turkin/Thompson gave Nap a mark of .409 in
its first edition.) Later in the 1950s, John Tattersall had his doubts and
decided to go through his newspaper collection of box scores. He found 229
hits for Lajoie, not 220--the error had been in the figure for hits, not in
the figure for batting average. Thus his average was restored to .422, which
happened to be the highest in American League history. Then ICI research in
this area came up with a .426 mark (232 for 544, based on newspaper accounts),
which was published in the first edition, then trimmed back to .422 in
subsequent editions. The .426 figure is the one this book uses.
Nap seemed to be involved in a number of controversies. ICI research
found four more hits for him in 1902, raising his average from .369 to .378.
Later editions have changed Lajoie's stats back to the old values; we have
not.
In 1910 there was a very close batting race between Cobb and Lajoie. At
the end of the season, most people thought Nap had won, based on his getting
seven hits in a doubleheader on the final day of the season. There was talk
that the opposing Browns had let him get a number of bunts by playing back, so
that the hated Cobb would lose. However, the AL office went over their figures
and gave Cobb the title, .385 to .384. Nearly eighty years later, Pete Palmer
discovered a critical error: a game in which Cobb had two hits in three
at-bats had been entered twice. This was found because Sam Crawford had 14
games on his official sheet for the homestand yet the Tigers only played 13.
It turned out that Detroit played a doubleheader on September 24, but the
second game inadvertently was inserted in the official sheets as being played
on September 25. Later, this second game of the twenty-fourth, which appeared
to have been missing, was put in the scoresheets again. The League Office
discovered this mistake soon after its official announcement that Cobb had won
the batting title, because the double entry was corrected for all the other
Detroit players. However, Ban Johnson had made a big deal out of how carefully
his people had checked the figures in order to settle the controversy, so they
kept quiet about the gaffe, leaving Cobb the winner.
Appeals to Commissioner Kuhn in 1981 to set the matter straight
officially were to no avail, because that would not only have changed the
outcome of the 1910 batting race, it would also have altered Cobb's lifetime
hit total, then being pursued to massive media attention by Pete Rose. Kuhn's
statement read, in part, "The passage of 70 years, in our judgment . . .
constitutes a certain statute of limitation as to recognizing any changes in
the records with confidence of the accuracy of such changes. . . . Since a
variety of questions have been raised through the years about the accuracy of
the statistics of that period, the only way to make changes with confidence
would be for a complete and thorough review of all team and individual
statistics. That is not practical." It may not have not been practical, but
we have done it, and are continuing to do it. A notable area of change
reflected in this edition of Total Baseball is the National Association period
of 1871-1875, in which the research of Michael Stagno and a team of SABR
researchers have not only supplied new, more accurate statistics but also a
handful of new players, previously not included in any baseball encyclopedia.
In 1912, Heinie Zimmerman got credit for a Triple Crown victory, although
it wasn't called that then. Ernie Lanigan's RBI figures gave him 98, compared
to 94 for Honus Wagner. However, ICI research gave Wagner 102 and Zimmerman
99. Later editions of The Baseball Encyclopedia changed Zimmerman up to
103--giving him back his phony Triple Crown.
The National League batting data has been pretty accurate since 1910.
That was the first year that the NL kept daily game records for teams as well
as players and compared the team totals to the sum for the players for that
team and tried to resolve any differences. Before then, the team totals simply
were the sum for the players. The American League had team totals all the way
back to 1905, but never compared them with the sum of the players and
therefore had a great many errors. The AL, however, did introduce team
pitching first in 1930, while the NL followed in 1941. The AL never published
league totals, so the fact that the batter hits, strikeouts, walks, etc. did
not agree with the corresponding pitcher totals was somewhat academic.
However, the NL did publish league totals starting in 1926, and when they
first presented team pitching, the totals did not agree. In order to make this
look correct, they doctored the pitching totals to agree with the batting
stats. After a few years, this was no longer necessary, as they took the time
to resolve and correct differences. For the AL, most team totals did not agree
with the sum of the players for that team until around 1935: the at-bats,
runs, hits, and extra-base hits usually checked out, but walks and strikeouts
did not add up until the 1960s. The AL converted to computer in 1973, while
the NL did in 1981, improving accuracy.
On the whole there have been surprisingly few errors in the National
League stats. Most of the bigger ones have involved innings pitched in the
years before 1930. Because no one added up the innings and compared them to
putouts to check for discrepancies, in 1926 Wayland Dean brought up the rear
in ERA with a 6.10 mark. It turned out that his innings pitched had been added
up incorrectly, and he should have had 204, not 164. This reduced his ERA to
4.90. For a game in 1920, Jimmy Ring had his faced batsman total of 35 put in
the innings pitched column, giving him 26 extra innings pitched for the game.
It would seem that someone adding up innings pitched would question a figure
of 35 for one game, but it slipped through. Ring was also credited for nine
extra innings in 1923. But the strangest mix-up in the NL was in 1909, the
year before the team totals were kept. For some strange reason, 700 putouts
were dropped from the team totals, all the result of adding mistakes for
catchers. Pat Moran and Red Dooin each lost 200, while Peaches Graham, Bill
Bergen, and Doc Marshall lost 100 each.
The American League has had many errors of 100 or more putouts or assists
over the years due to addition mistakes, as well as quite a few blunders in
innings pitched. Ed Willett in 1910 lost 77 innings, showing only 147 instead
of 224. The correction lowered his ERA from 3.60 to 2.36. However, this was
still more than a run behind the leader, Ed Walsh. Frank Williams discovered a
dozen or more errors in entering wins and losses for pitchers in the AL every
year from 1905 through 1919. And John Tattersall, in his home run research,
found over 100 official errors, about 80 percent in the AL and most before
1920. George Sisler picked up 3 new homers, These were on April 12, 1916,
September 22, 1921, and June 29, 1929.
From 1912 to 1914, the AL statistician decided not to enter anything for
a player who had all zeroes for his line in any given game. Most of these were
relief pitchers, but they had entries on their pitching sheets and these games
were restored by the ICI researchers. There were about 600 other cases where
nonpitchers had games omitted. These are included in Total Baseball. This kind
of record keeping over the early years kept some men out of the encyclopedias
altogether, like pinch runners or defensive replacements. SABR research has
added several of these one-time ciphers to The Baseball Encyclopedia over the
years, as well as to Total Baseball.
For the American League records of 1913, the official sheets disagree
with the data published in the baseball guides for almost every player. The
only logical explanation is that the official figures weren't ready when it
came time to publish the guide, so they must have used data from another
source. Total Baseball uses the official figures, as they have daily sheets to
support the data.
An interesting quirk in the way records are kept--and another reminder,
as if one needed it, that baseball record keeping remains subject to error and
controversy--occurred as recently as 1981. The league rule was to round off
the innings pitched at the end of the season, although the weekly reports
showed thirds of innings. Baltimore's Sammy Stewart had 29 earned runs in 112
1/3 innings, while Oakland's Steve McCatty had 48 in 185 2/3 innings. This
gave Stewart the ERA title, 2.323 to 2.327. But when the innings were rounded
off, McCatty won, 2.32 to 2.33. McCatty got the title, but the next year both
leagues decided to count thirds of innings.
Sources
The computer has made possible the rapid analysis of mountains of raw
baseball data based upon observed games or mathematically accurate,
probabilistic computer simulations. Questions once thought to be unanswerable
are mysteries no longer. What is the worth, in terms of its run-producing
capacity, of a single, or a walk, or a homer? How valuable is a stolen base?
Who were the best clutch hitters? But as invaluable as the computer has been
in producing the statistical data for Total Baseball, the editors owe more to
the people who have contributed their time, their expertise, their love of the
game, and their passion for getting things right. These individuals are listed
here, in the Acknowledgements, or in the table at the end of the book of those
readers of the first two editions who helped us improve the accuracy of Total
Baseball this time around. A collective debt is owed to the Society for
American Baseball Research and the National Baseball Library.
The statistics were obtained primarily from the following sources:
- John Tattersall Collection of newspaper box scores and compilations for
1876-1890 NL.
- ICI computer printouts, National Baseball Library, 1891-1902 NL,
1882-1891 AA, 1884 UA, 1890 PL, 1901-1904 AL, 1914-1915 FL.
- Official league averages, 1903-date NL, 1905-date AL.
- Michael Stagno Collection of newspaper box scores and compilations for
1871-1875 NA, supplemented by research of SABR's nineteenth century
research committee, headed by Bob Tiemann, Bob Richardson,
and Fred Ivor-Campbell.
Supplemental sources were:
- For batters hit by pitch, 1884-1896 AA/NL/PL, 1909-1916 NL, 1909-1919 AL,
research from newspapers by Alex Haas, Pete Palmer, John Schwartz, Bob
Davids, John Tattersall, Lyle Spatz, Herb Goldman, Keith Carlson, and
others.
(Note: research continues for the 1897-1908 period, but the data is, at
this writing, about 90 percent complete.)
- For home runs allowed by pitchers, 1876-1950 AL/NL, the Tattersall
Collection, reviewed and corrected by Bob McConnell.
- For runs batted in, 1903-1919 NL, 1905-1919 AL, ICI research.
- For runs batted in, 1880-1885 NL, David Neft.
- For pitcher saves (except 1901-1919 AL) 1876-1968 NL/AA/UA/PL/AL.
- For stolen bases, 1886 NL, Spalding Baseball Guide.
- For wins and losses for pitchers, 1876-1900 NL/AA/PL, and for wins,
losses, games started, complete games, shutouts, saves, 1901-1919 AL, and
complete pitching data, 1892, research from newspapers and official
sheets by Frank Williams.
- For shutouts, 1876-1939, Joe Wayman.
- For biographical data, the biographical research committee of SABR,
notably Richard Topp, Bill Carle.
- For caught-stealing data, 1914-1916 AL, 1915-1916 NL, Ernie Lanigan,
courtesy of Bob Davids.
- For home/away data, 1876-1891 NL/AA/UA/PL, Bob McConnell.
- For game scores, 1876-1884 NL/AA/UA, Bob Tiemann.
- For game scores, 1885-1891 NL/AA/PL, Richard Topp.
- For runs and homers home/away, 1980s NL/AL, Bill Carr.
Missing data includes:
- Hit batters: 1897-1908, scattered data, especially for New York and
Cincinnati.
- Caught stealing: 1886-1914, 1916 (players with fewer than 20 steals),
1917-1919, 1926-1950 NL; 1886-1891 AA; 1890 PL; 1901-1913, 1916 (players
with fewer than 20 steals), 1917-1919, 1914-15 FL.
- Sacrifice hits: 1927-1930 (fly balls advancing runners to any base
counted as sacrifice hits).
- Sacrifice flies: 1908-1930, 1939.
- Runs batted in, 1882-1887, 1890 AA; 1884 UA.
- Strikeouts for batters: 1882-1888, 1890 AA; 1884 UA; 1897-1909 NL;
1901-1912 AL. (Team batting strikeouts are presented for 1897-1902
NL and 1901-1904 AL.)
Incomplete data for those years through 1902 NL and 1904 AL are available
from the ICI computer printouts at the National Baseball Library. Additional
research could turn up more data. If your research or sharp eye should detect
errors or gaps in Total Baseball, please write us in care of the publisher and
we'll be delighted to improve our data and credit your catch in the next
edition.