Experimental Design and Data Analysis for Biologists
R**N
Thorough and Modern
I appreciate the depth and consideration.
P**T
Finally something that makes sense!!
I have finally found a useable experimental design book. This book is as it's titled is for biologists and it is understandable. This book could have saved me from the frustration of 'not knowing' the proper terminology in project design. There is a easier, concise way to say something about 'what I wanted to do in the field.'I am finally satisfied in my quest for an understandable design and data analysis book! I would have liked to have had the information this text has before I had to jump into writing proposals - even if the proposals were only for course work. (I have recommended this book to my professor to use in future courses and to fellow grad students.) I highly recommend this book!! A second choice would be "Vegetation Description and Data Analysis: A Practical Approach" by Kent and Coker. These books combined have made my project design understandable. (The latter is a bit harder to find - I had mine shipped over from India) Again can't recommend this one enough - Thank you so much G. P. Quinn!!
A**R
This is the best stats textbook I have ever read
This is the best stats textbook I have ever read, by far! It's great for understanding the basis for each test, as well as what to look out for and when you need to use a different test. The writing style is enjoyable, and it even gets quite humorous in the final chapter.
C**D
Great reference for biologists
This book helped me to comprehend a very theoretical regression and analysis of variance class I took last fall by presenting the material with lots of practical examples. I have used it as a reference many times since. Great book.
S**A
Like new
Good book and it was in like new conditions.
S**I
review
thank you
S**T
Uncertain Intent, Uncertain Result
This is a review by Celia Lombardi and myself was published in Ecology 83:810-812 (2003).UNCERTAIN INTENT, UNCERTAIN RESULTThe percentage of statistical analyses in the ecological literature that are incorrect or inappropriate is high. Thesis advisors, journal editors, textbook writers and statistics professors have not done their job well. So new books on design and analysis are always approached with great hope.This hefty volume by two of Australia's more quantitatively sophisticated ecologists is written for biologists and environmental scientists. All examples are ecological and almost all from the published literature. Readers are presumed to have had one or two semesters of introductory statistics. It begins with a detailed, nine-page table of contents and ends with an extensive bibliography that lists, among others, many useful recent works on controversial statistical issues. There are indeed very good sections on some topics. By its end, however, it is apparent that this volume is not one that will reverse the tide of statistical malpractice.The intent of the book is not completely clear, and its title is misleading. This is not a book one would select for a course in experimental design sensu stricto, i.e. the design of manipulative experiments. The authors state that the book's "approach is to encourage readers to understand the models underlying the most common experimental designs" (p. xv) and that its "emphasis is on manipulative experiments" (p.157).But they also state, "Our emphasis here is on dealing with biological data - how to design sampling [our emphasis] programs that represent the best use of our resources ..." (p. xv). The book is, in fact, as much about observational studies as experimental ones. Scattered throughout it are 71 `boxes' that present worked examples from actual investigations. Of these boxes, 70 percent deal with observational studies and only 30 percent with manipulative experiments. Seven chapters (6, 13-18), constituting 35 percent of the text, deal exclusively or almost exclusively with statistical analysis of observational studies; and reference to observational studies is frequent in other chapters.On the other hand, sampling design is formally covered in three pages of text, while about 91 pages are devoted to experimental design. Even within these 91 pages, however 39 percent of the `boxed' examples concern observational studies!The blurring of the distinction between observational and experimental studies may reflect, in part, a desire to cover a very broad array of topics in a single course - and this text may have been designed for such a course. We believe that a strong, modern (post)graduate program in ecology or the environmental sciences, however, needs to offer separate courses in sampling design and experimental design. These are subject matters where both the primary literature and textbooks are riddled with error. Students cannot be expected to sort the wheat from the chaff easily on their own. Independent study does not suffice - not that we recommend blind faith in professors either.The subject matter is apportioned roughly as follows: scientific method (8 pp.), graphical exploration of data (14 pp.), statistical analysis (379 pp.), study design (94 pp.), and presentation of results (16 pp.). The authors have a website for the book at [...]. Here they present the raw data files for all worked examples in the book, materials for instructors, an erratum sheet, and a link to the website for their own course.The brief section on the scientific method is useful for its summary of the different positions of a number of philosophers and scientists on the topic, its citation of several key recent works not yet well known, and its recognition that both inductive and deductive approaches are critical to the advance of science. The latter view, disparaged by rigid Popperians, has been especially well articulated and defended in the excellent treatise by Ford (2000). One weakness in this section is its promulgation of the widespread confusion between the research hypotheses and statistical hypotheses (null and alternative) (p. 5).A tremendous range of topics is covered. Critical review of many of the more fundamental ones discovers problems.For their chapters on design, the authors have adopted a confusing and superfluous terminology from the social and behavioral sciences. This perhaps reflects a negative influence of Winer et al. (1991) whom they cite favorably in many places. The term subject is at the core of this terminology. But subject has never been a statistical concept. On one page it is used to mean block, on another to mean experimental unit, on another to mean evaluation unit, and so on. An early demise of the term is to be hoped for. On p. 265 they refer to crossover designs as "subject x treatment designs" and to designs where experimental units are monitored on two or more occasions as "subject x trial designs." Thirty two pages later there is a section titled "Crossover designs," but it does not mention that these are the same ones earlier termed "subject x treatment designs." On p. 266 they describe a simple randomized complete block design study of differences in frog abundance on burned versus unburned plots that was monitored in each of three years; later they refer to year as the treatment factor in this study and burn condition as a "within subjects" factor (p. 273).Other confusions relating to design are present. The sampling units in observational studies they sometimes term "experimental units" (p.265), a term best used only for manipulative experiments. They claim that "[t]he most common situation in biological experiments [with randomized complete blocks designs is where]...the blocks used in the experiment are a random sample from a larger population of blocks" (p. 273), whereas in fact this is a very uncommon situation. They describe one study as having a split-plot design whereas it actually has a randomized complete block design with two levels of blocking and a single treatment factor (p. 303). They strongly recommend against unequal replication (p. 42, 69, 187) and fractional factorial designs (p. 261) on grounds of difficulties they may pose to analysis; yet strong arguments for both often can be made on the grounds of objectives, ethics, cost considerations, and logistics - and we should not let the tail wag the dog. Easy statistical analysis is a secondary objective.Pseudoreplication (sensu Hurlbert 1984) is briefly discussed and defined as a design problem characterized by the taking of multiple samples from experimental units where there is only one experimental unit per treatment (p. 159). In fact, pseudoreplication is not a design error but rather an error of statistical analysis and interpretation. And it does not refer to absence of treatment replication. Sacrificial pseudoreplication is one of the commonest types (Hurlbert and White 1993, Lombardi and Hurlbert 1996, Garcia-Berthou and Hurlbert 1999) and by definition is possible only where there are multiple experimental units per treatment. On three occasions, analyses constituting pseudoreplication - simple (p. 354) or temporal (pp. 224, 242) - are inadvertently presented as examples of the correct way to do things. The two cases of temporal pseudoreplication also can be viewed as examples of pseudofactorialism (Hurlbert and White 1993). In a section on "pooling" the authors argue, in effect, that sacrificial pseudoreplication is acceptable procedure so long as the power to detect the real differences among experimental units is low (p. 260). Their motive is a desire for greater power than is conferred by the experimental design and data. They forget that this is accomplished by biasing P values, hopefully but not necessarily, in a downward direction, i.e. by exaggerating the evidence for a treatment effect and the precision of the estimate of its magnitude. An important recent reference on this topic is Jenkins (2002).This volume perpetuates, or at least leaves weakly evaluated, many widespread but unjustifiable conventions of statistical analysis. Let's start with fundamentals. The authors acknowledge that "[t]here is no reason why all tests have to be done with a significance level fixed at 0.05" (p. 53) but do not acknowledge that there is no need in most research situations to specify any α value whatsoever. Evaluation of evidence is not a black-and-white matter. Clear interpretation and good writing will dispense with the habit and crutch of calling certain results "significant." The authors do at least advocate the reporting of exact P values, at least when these are > 0.001 (p. 34, 496).There is confusion as to how a P value is to be defined and interpreted. The authors incorrectly state that P "is the probability of a result occurring by chance in the long run if H0 is true..." (p. 35). Early on they seem to argue that a high P value constitutes evidence that the alternative hypothesis (HA) is "incorrect" (p. 4), but later they correctly recognize that it argues only for suspension of judgment as to the correctness of HA versus H0 (p. 35).Their assessment of the validity of one-tailed testing is ambivalent and inconclusive (p. 37). They confuse the issue of what might be predicted with that of what would be of interest. They cite approvingly one of their own studies that justified one-tailed testing with the claim that a result in the direction opposite that predicted would have been ignored. Use of one-tailed tests indicates statistical misunderstanding at several levels. It reflects the black-and-white thinking of fixed significance levels, confusion between the testing of research hypotheses and the testing of statistical hypotheses, and failure to recognize that, applied to a given data set, a one-tailed test yielding a P = 0.04 (or 0.96) and a two-tailed test yielding a P = 0.08 are two exactly equivalent summaries of the same information and should lead to identical conclusions as to what the evidence shows (Kimmel 1957, Pillemer 1991, Lombardi and Hurlbert 1995 and in manuscript).The issue of multiple testing and type I error is raised at many points in the book and considered a "most difficult" one by the authors (p. 48), though it should not be. They correctly state that a "Type I error is when we mistakenly reject a correct HO" (p. 42) and that "most common HOs in biology, and other sciences, are always false" (p.53). But then, instead of taking the next logical step and concluding that type I errors are likely to be very rare and can simply be evaluated as an unlikely possibility on a test-by-test basis, the authors revert to the antediluvian `alpha paranoia' of many of our statistical forefathers. They state, erroneously, that "[a]s the number of tests increases, so does the probability of making at least one Type I error among the collection of tests" (p. 49) and that "[t]he probability of at least one Type I error among the family of ten tests [being carried out in a hypothetical example], if each test is conducted at α equals 0.05 and the comparisons are independent of each other, is 0.40" (p. 196). Without knowing how many, if any, of the null hypotheses being tested are true, it is not possible to calculate the probability of making one or more Type I errors. For most investigations that probability is likely to be zero.The authors recognize that the many conventional procedures available "to keep the [maximum] family-wise Type I error rate [possible] at some [fixed] reasonable level" (p. 49) are problematic. They require arbitrary decisions as to what constitutes a "family" of tests and as to what the family-wise α should be set at; and if conventionally low values of α are used, power will be greatly reduced. With respect to "planned comparisons" they thus conclude: "Our broad recommendation is that the default position should be no adjustment for multiple testing [our emphasis] if the tests present clearly defined and separate hypotheses" (p. 197). To emphasize the import of this helpful recommendation, they might have pointed out how widely it is ignored by authors, reviewers, and editors.For "unplanned comparisons" (which is something of a strawman category) the authors feel that procedures entailing fixed family-wise Type I error rates may often be appropriate (p. 199ff). They present no clear rationale, however. Having co-authored a very thorough comparison of such procedures several years ago (Day and Quinn 1989), Quinn perhaps is not yet ready to face the music: there is no legitimate employment for the games of Mssrs. Bonferroni, Duncan, Dunnet, Ryan, Scheffé, Student-Neuman-Keuls, and Tukey! The attention given to those games is distracting and makes it less likely the reader will absorb the full significance of the "broad recommendation" quoted above. Good, feet-on-the ground advice on this topic may be had from Mead (1988:309-315) and Carmer and Walker (1982).Somewhat related issues are raised in the author's discussions of repeated measures designs. They recommend Greenhouse-Geiser-adjusted error degrees of freedom for testing for time and time x treatment effects in such studies (p. 282), but use unadjusted error degrees of freedom in examples they present (p. 336, 357). Nowhere do they acknowledge that separate date-by-date analyses - without correction for `multiple testing' - is a perfectly valid, simpler, more informative, and less error-prone approach to such data. Why create giant ANOVA tables for the trivial purpose of proving that, by golly, response variables and treatment effects do not remain constant over time!?In discussing meta-analyses, the authors specify that these must employ "a measure of effect size...that incorporates the variance of the effect" (p. 51). They cite favorably four books and review articles that advocate this decades-old conventional wisdom. Yet, the illogic and defects of such standardized indices of effect size in most situations are clear (Hurlbert 1994, Abelson 1997, Petraitis 1998). Repair of the damage that has been done to the intelligibility of meta-analyses in most disciplines by blind use of the usual standardized indices would require a book longer than that under review.A final topic where issue may be taken with statements in this volume is that of data transformation. The authors accurately state on p. 64 that "[t]he most common transformation is the log transformation ..." But one page later they claim that "[t]he most common type of transformation useful for biological data is the power transformation, which transforms Y to Yp, where p is greater than zero. For data with right skew, the square root transformation, where p = 0.5, is applicable particularly for data that are counts (Poisson distributed) ..." (p. 65). In fact, the square root transformation will almost always be a bad choice, as count or other types of abundance data will rarely even approximately conform to a Poisson distribution; nature is patchy. The square root transformation will often lead to problems like that evident in their Fig. 9.7 (p. 253), where the mean minus one standard error or standard deviation takes one into negative values for limpet density. Log transformation of data does not permit such illogical results. When a data set contains zeros, use of log transformation requires addition of some constant to all values in the set; the authors mention four different options for selecting that constant (p. 65) but no advice as to which is correct.Our focus on its problems notwithstanding, it is evident that a great deal of scholarship has gone into this book and that there is much good advice in it. Production of a definitive and flawless volume covering such a wide subject matter seems not yet to be feasible, however. The existing statistics books and primary literature still contain too much misinformation on too many topics. Correction of this will require a Herculean effort - many critical, narrowly focused reviews by many scholars over some period of time - before broad syntheses become a manageable task for one or two authors. The present authors, for example, repeatedly indicate their reliance on Winer et al. (1991), Sokal and Rohlf (1995), and Underwood (1997). We have referred to the confusing terminology promulgated, albeit not invented, by Winer et al. Underwood's book contains many of the same problems pointed out in this volume (Hurlbert 1997). And Sokal and Rohlf is less highly regarded by statisticians than it is by biologists; Mead (1982) complained of the "cookbook" quality of the first edition and noted that "it is referenced much more frequently than any other book by biologists whose statistics, in papers I referee, are incorrect." New construction evidently requires deeper digging and the repair or replacement of old foundations.ReferencesAbelson, R.P. 1997. A retrospective on the significance test ban of 1999 (If there were no significance tests they would be invented). Pages 117-141 in L.L. Harlow, S.A. Mulaik, and J.H. Steger, editors. What if there were no signficance tests? Lawrence Erlbaum, Mahwah, New Jersey.Carmer, S.G. and W.M. Walker. 1982. Baby bear's dilemma: a statistical tale. Agronomy Journal 74:122-124.Day, R.W., and G.P. Quinn. 1989. Comparison of treatments after an analysis of variance in ecology. Ecological Monographs 59:433-463.Ford, E.D. 2000. Scientific method for ecological research. Cambridge University Press, New York.Garcia-Berthou, E. and S.H. Hurlbert. 1999. Pseudoreplication in hermit crab shell selection experiments: comment to Wilber. Bulletin of Marine Science 65:893-895.Hurlbert, S.H. 1984. Pseudoreplication and the design of ecological field experiments. Ecological Monographs 54: 187-211.Hurlbert, S.H. 1994. Old shibboleths and new syntheses [review of Design and Analysis of Ecological Experiments, ed. by S.M. Scheiner and J. Gurevitch]. Trends in Ecology and Evolution 9:495-496.Hurlbert, S.H., 1997. Experiments in ecology [Review of book by same title by A.J. Underwood]. Endeavour 21:172-173.Hurlbert, S.H., and M.D. White. 1993. Experiments with freshwater invertebrate zooplanktivores: quality of statistical analyses. Bulletin of Marine Science 53: 128-153.Jenkins, S.H. 2002. Data pooling and type I errors: a comment on Leger & Didrichsons. Animal Behaviour 63:F9-F11.Kimmel, H.D. 1957. Three criteria for the use of one-tailed tests. Psychological Bulletin 54: 351-353.Lombardi, C.M. and S.H. Hurlbert. 1995. Misprescription and misuse of one-tailed tests. Association for the Study of Animal Behaviour Newsletter 23:14.Lombardi, C.M. and S.H. Hurlbert. 1996. Sunfish cognition and pseudoreplication. Animal Behaviour 52:419-422.Mead, R.R. 1982. [Review of Biometry, 2d edition, by R.R. Sokal and F.J. Rohlf]. Biometrics 38:863-864.Mead, R.R. 1988. The design of experiments. Cambridge University Press, New York.Petraitis, P.S. 1998. How can we compare the importance of ecological processes if we never ask, "Compared to what?" Pages 183-201in W. Resetarits and J. Bernardo, editors. Experimental Ecology, Issues and Perspectives, Oxford University Press, New York.Pillemer, D.B. 1991. One- versus two-tailed hypothesis tests in contemporary educational research. Educational Researcher 20(9):13-17.Sokal, R.R. and F.J. Rohlf. 1995. Biometry. 3d edition. W.H. Freeman, New York.Underwood, A.J. 1997. Experiments in ecology. Cambridge University Press, New York.Winer, B.J., D.R. Brown, and K.M. Michels. 1991. Statistical principles in experimental design, 3d edition. McGraw-Hill, New York.
F**N
Clear, practical, usefull
I'm starting my PhD, and this book has being a great help!! It helps from basic concepts, design and analysis. I definitly recomend it!!