# Find Open Textbooks

## Introductory Statistics: OpenStax

**Description**: Introductory Statistics follows the scope and sequence of a one-semester, introduction to statistics course and is geared toward students majoring in fields other than math or engineering. This text assumes students have been exposed to intermediate algebra, and it focuses on the applications of statistical knowledge rather than the theory behind it. The foundation of this textbook is Collaborative Statistics, by Barbara Illowsky and Susan Dean, which has been widely adopted. Introductory Statistics includes innovations in art, terminology, and practical applications, all with a goal of increasing relevance and accessibility for students. We strove to make the discipline meaningful and memorable, so that students can draw a working knowledge from it that will enrich their future studies and help them make sense of the world around them. The text also includes Collaborative Exercises, integration with TI-83,83+,84+ Calculators, technology integration problems, and statistics labs.

**Author**: Barbara Illowsky, De Anza College, Susan Dean, De Anza College

**Original source:** cnx.org

**Adoption (faculty): **Contact us if you are using this textbook in your course

**Adaptations: **Support for adapting an open textbook

**Need help? **Visit our Help page for FAQ and helpdesk assistance.

**Accessibility: **Textbooks flagged as accessible meet the criteria noted on the Accessibility Checklist.

### Open Textbooks:

- WEBSITE Instructor Resource: Course
- DOWNLOAD EDITABLE: .html files (37 MB)
- WEBSITE Read this book online
- WEBSITE Student Resources
- WEBSITE Instructor Resources
- PRINT Buy a print copy (U.S.)
- DOWNLOAD Print (.pdf) (7 MB)
- PRINT Buy a print copy (Cdn)

Introductory Statistics: OpenStax by Barbara Illowsky, De Anza College, Susan Dean, De Anza College is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

#### Reviews for 'Introductory Statistics: OpenStax'

##### Number of reviews: 3

###### Average Rating: 3.77 out of 5

**1. Reviewed by:** Shane Rollans

**Institution:**Thompson Rivers University**Title/Position:**Senior Lecturer**Overall Rating:**3.7 out of 5 **Date:****License:**

#### Q: The text covers all areas and ideas of the subject appropriately and provides an effective index and/or glossary

The text covers most of the areas that would normally be included in an introductory course with a few exceptions that I will note later. The index is definitely not effective and I feel that the glossary, while complete, needs revision.

Text:

The only major topic that is omitted is experimental design but that is not an important omission unless the course is for science or social science students. There is no section on ethics but very few Statistics texts include such a section. Probability plots are not covered and the chapter on regression makes no reference to residual plots which is highly unusual.

In my opinion the biggest thing this textbook is missing is motivation for studying statistics. Statistics plays a huge part in trying to answer many important questions and this text gives little or no indication of this. The examples and problems generally deal with uninteresting questions predominantly with made up data. Even when the data is real there is rarely any motivation given or apparent reason to analyze it. Here is an example (pages 398-399) from Chapter 9, Hypothesis Testing: Single Mean and Single Proportion which is typical of most of the student generated questions in the chapter.

“NOTE: The following questions were written by past students. They are excellent problems!

Exercise 9.16.18

18. "Asian Family Reunion" by Chau Nguyen

Every two years it comes around

We all get together from different towns.

In my honest opinion

It's not a typical family reunion

Not forty, or fifty, or sixty,

But how about seventy companions!

The kids would play, scream, and shout

One minute they're happy, another they'll pout.

The teenagers would look, stare, and compare

From how they look to what they wear.

The men would chat about their business

That they make more, but never less.

Money is always their subject

And there's always talk of more new projects.

The women get tired from all of the chats

They head to the kitchen to set out the mats.

Some would sit and some would stand

Eating and talking with plates in their hands.

Then come the games and the songs

And suddenly, everyone gets along!

With all that laughter, it's sad to say

That it always ends in the same old way.

They hug and kiss and say "good-bye"

And then they all begin to cry!

I say that 60 percent shed their tears

But my mom counted 35 people this year.

She said that boys and men will always have their pride,

So we won't ever see them cry.

I myself don't think she's correct,

So could you please try this problem to see if you object?”

I am not sure what hypothesis I am being asked to test here. I would certainly disagree with it being described as an excellent problem. While many of the student generated problems are similar to this one there was one about the endings of Japanese girl’s names (9.16.25 Page 402) that I found quite interesting.

Index:

The index clearly had little or no human input. As well as reasonable entries the index includes a host of random words. For example, the index includes 80 references for the word “elementary” and 186 references for the word “statistics”. It also includes references for many words such as “answer”, “box”, “word”, “good” and “two” that should not be in any index.

Glossary:

I would rate the glossary as somewhat effective. The glossary is fairly complete but I believe that many of the entries should be rewritten. It includes some minor errors such as the definition of a geometric distribution “The probability of exactly x failures before the first success is given by the formula: P (X = x)= p (1- p)^(x-1).” In at least one case an entry is given with no definition.

Some of the other definitions are somewhat unclear. For example:

Mutually Exclusive

An observation cannot fall into more than one class (category). Being in more than one

category prevents being in a mutually exclusive category.

Standard Normal Distribution

A continuous random variable (RV) X~N (0, 1) .. When X follows the standard normal

distribution, it is often noted as Z~N (0, 1).

Other definitions just don’t match my preferences. For example the definition of correlation includes the so called computational formula which I feel doesn’t belong in any statistics textbook. I also didn’t like the definition of “Random Variable” being given under the heading “Variable”. Doing that accentuates the confusion between a variable in algebra and a random variable in probability.

**Comprehensiveness Rating:** 3 out of 5

#### Q: Content is accurate, error-free and unbiased

The content is generally accurate and unbiased, although I am not sure what a biased statistics text would look like. There are some errors such as the previously mentioned definition of the geometric distribution which is not much more than a typo and the occasional more serious error such as the statement: “True random sampling is done with replacement.” on page 20. In my opinion, virtually every graph in the chapter on graphing is done badly but they are not really errors.

**Content Accuracy Rating:** 4 out of 5

#### Q: Content is up-to-date, but not in a way that will quickly make the text obsolete within a short period of time. The text is written and/or arranged in such a way that necessary updates will be relatively easy and straightforward to implement

This text is a mix, up-to-date in some ways, quite old fashioned in others.

It makes good use of graphical calculator technology using the calculator to calculate probabilities rather than using antiquated tables although the tables are still included if an instructor prefers to use them. It also uses the graphical calculator in all aspects of statistical analysis. If you are convinced that a graphical calculator is the best technology to use when teaching introductory statistics, this is one of the primary strengths of the text. The fact that it includes no other technology is a weakness. For example, the text gives long detailed instructions for creating frequency tables and histograms from scratch. I do not feel that this section was done well and even done well it should have disappeared 30 years ago.

The text correctly indicates that the normal approximation to the binomial is no longer necessary with the technology that is currently available. However it then uses the same normal approximation when doing inference with proportions. While this is still the norm for introductory classes and should probably be included, it would have been nice to include a justification for using the normal approximation after saying it isn’t necessary.

One of the first sections I look at when I review a text for possible adoption is the section on comparing means using independent samples. The more modern texts use the Welch’s t-test. That is the test used by this text so for me that is a positive. However it follows that section with a long section using the assumption that the variances are known. The variances are never known so the only justification for including such a section is as a lead-in for Welch’s t-test. In that case it should be much shorter and should be included first as was done in the single population chapter. While the text indicates “In practice, we rarely know the population standard deviation.” (I would replace rarely by never) it devotes more space to the case when the variance or variances are known than when they are unknown.

I also check to see if the text differentiates between large and small sample inference for means since there is no reason to do so. This text does not differentiate and it says why which is another plus.

As I have mentioned before, this text gives very few examples of what statistics is being used for. Since few of the examples or problems are topical, it will take them a long time to become dated. I would consider this to be a minus but in the context of this question it might be considered a plus.

The textbook is written in a way that updates and revisions will be straightforward to implement but in my opinion, so many are needed before I would consider adopting this text that it would not be easy.

**Relevance Rating:** 3 out of 5

#### Q: The text is written in lucid, accessible prose, and provides adequate context for any jargon/technical terminology used

The text is written very clearly in some places less so in others. It gives a very clear, step by step set of instructions for taking a small simple random sample from an already given sampling frame. However, no mention is made of how difficult it is to create a sampling frame for a large population and no mention is made of how a large simple random sample could be taken from a sampling frame. It also gives relatively clear instructions on how to create a frequency table and histogram including detailed instructions for calculating the number of bars of width 1 required to graph data consisting of the integers 1, 2 ,3 ,4, 5, and 6. (Spoiler: the answer is 6.) It gives a pretty good job of relating decisions using p-values to the concept of rare events.

Other parts are less clear. My guess is that no one in a class of tourism students would get anything from the chapter on analysis of variance. It contains lots of jargon with very little context. For example, this is how the description of the F test starts out:

“To calculate the F ratio, two estimates of the variance are made.

1. Variance between samples: An estimate of σ^2 that is the variance of the sample means

multiplied by n (when there is equal n). If the samples are different sizes, the variance

between samples is weighted to account for the different sample sizes. The variance is also

called variation due to treatment or explained variation.

2. Variance within samples: An estimate of σ^2 that is the average of the sample variances

(also known as a pooled variance). When the sample sizes are different, the variance within

samples is weighted. The variance is also called the variation due to error or unexplained

variation.”

While most of the text is written clearly, I feel that a general shortcoming throughout this textbook is that it does not provide sufficient context for the techniques it looks at.

**Clarity Rating:** 3 out of 5

#### Q: The text is internally consistent in terms of terminology and framework

The text is consistent in terms of terminology and framework.

**Consistency Rating:** 4 out of 5

#### Q: The text is easily and readily divisible into smaller reading sections that can be assigned at different points within the course (i.e., enormous blocks of text without subheadings should be avoided). The text should not be overly self-referential, and should be easily reorganized and realigned with various subunits of a course without presenting much disruption to the reader.

The text is easily and readily divisible into smaller reading sections; it is not overly self-referential and should be easily reorganized to the extent that any statistics text could be.

**Modularity Rating:** 5 out of 5

#### Q: The topics in the text are presented in a logical, clear fashion

The organization is similar to most old-school intro stats texts and while it is not the same as what I use I am sure that it conforms to the organization that many instructors use. The only really awkward place that I noticed was introducing box-plots before measures of centre or location. It meant that the authors had to define quartiles and medians in that section and then define them again later. It would be easy to move the section on box-plots after the discussion of quartiles and medians.

**Organization Rating:** 4 out of 5

#### Q: The text is free of significant interface issues, including navigation problems, distortion of images/charts, and any other display features that may distract or confuse the reader

I was working from the pdf file so I cannot comment on these issues.

**Interface Rating:** 3 out of 5

#### Q: The text contains no grammatical errors

I did not notice any grammatical errors.

**Grammar Rating:** 4 out of 5

#### Q: The text is not culturally insensitive or offensive in any way. It should make use of examples that are inclusive of a variety of races, ethnicities, and backgrounds

The text is not culturally insensitive or offensive in any way. The names it uses in its examples are inclusive of a variety of ethnicities.

**Cultural Relevance Rating:** 4 out of 5

#### Q: Are there any other comments you would like to make about this book, for example, its appropriateness in a Canadian context or specific updates you think need to be made?

The six recommendations of the GAISE (Guidelines for the Assessment and Instruction in Statistics Education) college report prepared for the American Statistical Association are:

1. Emphasize statistical literacy and develop statistical thinking

2. Use real data

3. Stress conceptual understanding, rather than mere knowledge of procedures

4. Foster active learning in the classroom

5. Use technology for developing conceptual understanding and analyzing data

6. Use assessments to improve and evaluate student learning

This textbook does an excellent job on points 4 and 5. There are many group exercises throughout the text. It is a conscious focus of the text and is its primary strength. The textbook is also based on the use of a graphic calculator. While I feel that it is a poor tool for doing statistics, it is a reasonable tool for use in an introductory statistics class. This textbook does an excellent job of integrating it into the curriculum. This is the other strength of the textbook. However, as I mentioned earlier, I feel that ignoring other technologies is a weakness.

The book also is less successful in stressing conceptual understanding rather than mere knowledge of procedures, point 3. For example, in the chapter on sampling it gives brief descriptions of different sampling methods but says nothing about the conditions under which one method is better than another. It lists possible problems in sampling but gives no context. Another example is that it lists the properties of correlation but doesn’t relate them to data and the only formula given is the computational formula which I feel has no pedagogical value what-so-ever.

It uses some real data but I don’t feel that it uses enough. The real data it uses does involve the students in the collection of data, making that data more relevant and fostering active learning, obviously a good thing. However, it does not include much data that was used to answer interesting questions.

I feel that the critical failure of this textbook is that it doesn’t do a good job of teaching statistical thinking. Far too often it emphasizes how to do questions in a textbook rather than how to do statistics. This is a consistent focus throughout the text. Here are a few examples:

These listed learning outcomes all talk about textbook questions:

“By the end of this chapter, the student should be able to:”

“Classify discrete word problems by their distributions.”(Chapter 4 Page 159)

“Classify continuous word problems by their distributions.”(Chapter 7 Page 281)

“Discriminate between problems applying the normal and the student-t distributions.”

(Chapter 8 Page 319)

As it introduces confidence intervals for proportions it does so in the context of a textbook problem:

“How do you know you are dealing with a proportion problem? First, the underlying

distribution is binomial. (There is no mention of a mean or average.)” (Page 331)

In the discussion of using hypothesis testing to make decisions on page 375:

“A systematic way to make a decision of whether to reject or not reject the null

hypothesis is to compare the p-value and a preset or preconceived α (also called

a "significance level"). A preset α is the probability of a Type I error (rejecting

the null hypothesis when the null hypothesis is true). It may or may not be given to

you at the beginning of the problem.”

When working an example of a test for two means:

“Example 10.1: Independent groups

The average amount of time boys and girls ages 7 through 11 spend playing sports

each day is believed to be the same. … Is there a difference in the mean amount of

time boys and girls ages 7 through 11 play sports each day? Test at the 5% level

of significance.”

“The words "the same" tell you Ho has an "=". Since there are no other words to indicate

Ha, then assume "is different." This is a two-tailed test.”

Another example of the lack of statistical thinking is that while the textbook mentions the assumptions for the various procedures, it never indicates how to assess whether they are reasonable for a particular set of data. The only assumption checking it does is again based on textbook questions rather than data.

For example (Page 381):

“Example 9.13

Statistics students believe that the mean score on the first statistics test is 65.

A statistics instructor thinks the mean score is higher than 65. He samples ten

statistics students and obtains the scores 65; 65; 70; 67; 66; 63; 63; 68; 72; 71.

He performs a hypothesis test using a 5% level of significance. The data are from

a normal distribution.

…

“Distribution for the test: If you read the problem carefully, you will notice that

there is no population standard deviation given. You are only given n = 10 sample

data values. Notice also that the data come from a normal distribution. This means

that the distribution for the test is a student’s-t.”

Since the data are given for the question, the decision on whether to use a t-test should be based on the data, not artificially given in the statement of the question.

While this textbook does an excellent job of integrating graphical calculators and includes a large number of collaborative exercises it does not come close to matching my needs for a textbook for an introductory statistics course. I feel that the first three recommendations of the GAISE college report are all critical and I do not believe that this textbook adequately addresses any of the three. I personally would not consider adopting it without extensive revision.

**2. Reviewed by:** Robin Susanto

#### Q: The text covers all areas and ideas of the subject appropriately and provides an effective index and/or glossary

The text covers most of the topics I teach in an Introductory Statistics course, and covers them at the appropriate depth. Two emissions are Experimental Designs and Bayes Theorem.

I would like to see more detailed coverage in some areas, such as Sampling and Bias, the Central Limit Theorem for Proportion, and a few others. An explicit explanation on the Scales of Measurement would also be helpful in the discussion on Data and Variable. In Regression I would like to see a discussion on why we should not make prediction outside of the data range.

On the other hand, some areas receive more coverage than they should. In the Linear Regression and Correlation, for example, I can do with a lot less manual calculation and sketching of the Least Square line.

But overall, coverage and depth is satisfactory.

I am able to find what I am looking for in the index. The glossary looks fine.

**Comprehensiveness Rating:** 4 out of 5

#### Q: Content is accurate, error-free and unbiased

The definition of Median (p.59) is incorrect if there are repeated values in the data. Although I understand that, from a pedagogical point of view, it is sometimes preferable to present students, especially at the introductory level, with a 'simplified' definition the can understand intuitively as opposed to a technically correct one that may confuse or discourage learning, a footnote explaining how this definition may not work in some situations is needed.

I did not find any other 'errors,' although some definitions, in my opinion, could be better worded.

**Content Accuracy Rating:** 3 out of 5

#### Q: Content is up-to-date, but not in a way that will quickly make the text obsolete within a short period of time. The text is written and/or arranged in such a way that necessary updates will be relatively easy and straightforward to implement

I share the authors' philosophy in making the text contemporary without giving it too short of a shelf life. Most of the examples and exercises are from made-up data. One advantage of this is, unlike real-life data examples, they are not dated, and therefore will not quickly become outdated.

Some of the real-life data examples and exercises are student-generated. While this is an excellent way to promote student involvement, I feel that better guidance is needed.

For instance, almost all of the student-generated exercises on pp.397-404 were written in verse. Were they instructed to do so? I appreciate originality, and writing Statistics problems in verse is original – unless everyone else is doing it too. Many of these exercises are also lacking from a technical point of view.

**Relevance Rating:** 3 out of 5

#### Q: The text is written in lucid, accessible prose, and provides adequate context for any jargon/technical terminology used

The language itself is good. It strikes the right balance of accessibility and technical accuracy. This is very important for an Introductory Statistics text, where the main challenge for the instructor is to explain complicated and subtle concepts to students with limited mathematical background, many of whom are ESL students.

But some explanations could be better worded. The definitions of type of data and type of variables are confusing. An explicit discussion on scale of measurement is needed.

**Clarity Rating:** 3 out of 5

#### Q: The text is internally consistent in terms of terminology and framework

I see no problem with respect to consistency of terminology. The group exercises are also consistent with text's collaborative approach.

**Consistency Rating:** 4 out of 5

#### Q: The text is easily and readily divisible into smaller reading sections that can be assigned at different points within the course (i.e., enormous blocks of text without subheadings should be avoided). The text should not be overly self-referential, and should be easily reorganized and realigned with various subunits of a course without presenting much disruption to the reader.

I usually think of Correlation as an introduction to Regression. And I treat the two as related by separate topics. In the text, they are enmeshed. But other than this, I see no problem with modularity.

Although the sequence of topics I use is different from the text. (e.g., I do Correlation and Regression before probability), I don't see any problem with this, as I can easily 'jump around' the text.

**Modularity Rating:** 3 out of 5

#### Q: The topics in the text are presented in a logical, clear fashion

I see no problem with the text in this respect.

**Organization Rating:** 4 out of 5

#### Q: The text is free of significant interface issues, including navigation problems, distortion of images/charts, and any other display features that may distract or confuse the reader

I see no problem with this item.

**Interface Rating:** 4 out of 5

#### Q: The text contains no grammatical errors

I found a couple of minor typos:

p. 18 last paragraph should read: “Any group of n individuals is equally likely to be chosen as any other group of n individuals.”

p. 532, the second sentence in the first bullet under “The assumptions underlying the test of significance are:” should read “In other words the expected value of y for each particular x value lies on a straight line in the population.”

But these are minor, and I did not notice any other.

**Grammar Rating:** 4 out of 5

#### Q: The text is not culturally insensitive or offensive in any way. It should make use of examples that are inclusive of a variety of races, ethnicities, and backgrounds

The examples in the text are inclusive of the cultures that made up the Canadian mosaic. Other than race and ethnicity, it is also important to me that a text is inclusive of people from different economic backgrounds. This text does that. In addition to business examples that refer, for example, to sales figures in the millions of dollars, there are also many examples of situations that working class or middle class people would find themselves in. More examples of small businesses or non-profit would be welcomed.

**Cultural Relevance Rating:** 5 out of 5

#### Q: Are there any other comments you would like to make about this book, for example, its appropriateness in a Canadian context or specific updates you think need to be made?

Most if not all of the examples involving politics are American. It would be nice to see examples involving Canadian political institutions, geography, etc.

I think the text does an excellent job in facilitating students' participation and collaboration.

Where it falls short is in encouraging Statistical thinking. There is too much rote doing, and recipe-following (e.g. calculation of least-square line), and not enough discussion on why one should choose one statistical procedure over another. This is a serious shortcoming in my opinion.

While I consider this text as valuable resource, I will not be adopting it for my class.

**3. Reviewed by:** RIchard Lockhart

**Institution:**Simon Fraser University**Title/Position:**Professor and Chair**Overall Rating:**3.9 out of 5 **Date:****License:**

#### Q: The text covers all areas and ideas of the subject appropriately and provides an effective index and/or glossary

This textbook is very long and covers a certain scope of material very completely at the level it targets. The number of procedures covered starting in Chapter 8 and running to Chapter 13 is very large.

However, an instructor, using a textbook like this, would find the comprehensiveness over the top I believe. For instance, probability runs from page 113 to page 251 or so. There is extensive discussion of special distributions: Binomial, Geometric, Hypergeometric, Uniform, Exponential and finally normal. This is much more probability than we would ever do in our courses -- other than our calculus-based course.

One of the things I like least about elementary statistics courses is that we continue to teach students to use tables when we never ever use them ourselves. I understand that we find it easiest to give tests where students can use tables but we really need, as a discipline, to move beyond that. There is some focus on computing probabilities using tables but the book does see that tables are no longer really useful or used. Unfortunately the solution adopted here relies on calculators rather than computers. This make it unsuitable for a number of our courses at SFU where computing must be part of the syllabus.

For students in the social sciences there are some gaps: the language of scales of measurement (nominal, ordinal, ratio and interval) and the discussion of cross tabulation, contingency tables and measures of association seems to be limited to illustrating probability calculations and then a short section on tests of independence in Chapter 11. My own view is that the explanation of the interpretation of independence is a bit thin. I also notice that quite a number of the contingency table examples have either rows or columns or both which have ordered categories. The usual Pearson chi-squared test is generally a bad idea in this context. I prefer illustrations where the suggested technique is likely to be a good technique.

**Comprehensiveness Rating:** 4 out of 5

#### Q: Content is accurate, error-free and unbiased

I have only a few complaints here.

I read phrases I didn't like from time to time but I always feel that way when I read texts.

Here are some examples, though, including at least one which bothers me:

"True random sampling is done with replacement." Page 20. I would not say this to students -- as if sampling without replacement were some inferior form of survey.

"When you analyze data, it is important to be aware of sampling errors and nonsampling errors. The actual

process of sampling causes sampling errors. For example, the sample may not be large enough." Page 21. I really don't like joining sample size to the issue of "sampling errors".

"For example, in a college population of 10,000 people, suppose you want to randomly pick a sample of 1000

for a survey. For any particular sample of 1000, if you are sampling with replacement,

the chance of picking the first person is 1000 out of 10,000 (0.1000);

the chance of picking a different second person for this sample is 999 out of 10,000 (0.0999);

the chance of picking the same person again is 1 out of 10,000 (very low)." Page 21.

I really don't like this one. What does it mean to say "the chance of picking the first person is 1000 out of 10,000 (0.1000)"? This chance seems to distinguish some group of 1000 people from a group of 9000 people. Who are these people? The 1000 people in the sample? What is meant by "the first person"? Then why is 999 out of 10,000 the right probability of anything?. It feels like the authors didn't think through what they were saying here very carefully; I hope that does not reflect a general pattern but I confess that I have not read the whole book with the sort of attention needed to spot this sort of problem.

"1.8 Answers and Rounding Off" on page 26. I think this is fine but do non-science students these days really understand phrases like "carry your final answer one more decimal place"?

Is the bar graph in example 2.4 a good idea? Indeed is it a good idea to have age groups 13-25 (thirteen years) and then 26-44 (19 years) and 45-64 (20 years)? I don't think so; a histogram here would have quite different bar widths. Even if that is the way the data came from the source we have an obligation to try to help people understand what sort of groups they ought to make.

The three dimensional graphs in Example 2.5 probably ought to be discouraged, I think.

"Sampling Distributions and Statistic of a Sampling Distribution". This is the title of subsection 2.7.2 on page 69. What is "Statistic of a Sampling Distribution"? This little subsection contains the phrase "If you let the number of samples get very large (say, 300 million or more), the relative frequency table becomes a relative frequency distribution." If you look at Table 2.6 you are entitle to ask if that contains 1 sample or 30 samples and then ask what it means to "let the number of samples get very large"?

On page 74 I see mu-bar in the formulas for population standard deviation.

"The statistic of a sampling distribution was discussed in Descriptive Statistics: Measuring the Center of

the Data." Page 74. Really? I still attach no meaning to the first 6 words of that sentence.

Section 3.5 on Contingency Tables. In chapter 3 sample data is often used to DEFINE probabilities. I feel this runs the risk of confusing sample values (the statistics in the tables in this chapter) with population values. Since we spend a lot of effort on this distinction I wonder if it is wise to be so vague about the difference in this context.

Do others like the discussion, on pae 159, of "Random Variable Notation"? Look at "If X is a random variable, then X is written in words. and x is given as a number." And earlier on the page "A random variable describes the outcomes

of a statistical experiment in words." I would find this unteachable but others might cope.

In Section 4.5.1 on page 166 I see the phrase "The parameters are n and p". I don't see that "parameter" has been used in this sense before. I think sometimes the authors are not careful about explaing new words as they use them; they appear to forget occasionally that some of these words have multiple technical uses. In particulart n and p in a Binomial model have not been connected to the population values of some numbers which is the previous meaning assigned to "parameter".

"Often real estate prices fit a normal distribution." Page 253. Really? I doubt it profoundly.

I am not happy about the "Empirical Rule". "About 68.27% of the x values lie between -1s and +1s of the mean m (within 1 standard deviation of the mean)." That is a lot of digits for an empirical rule and the word "about".

**Content Accuracy Rating:** 4 out of 5

#### Q: Content is up-to-date, but not in a way that will quickly make the text obsolete within a short period of time. The text is written and/or arranged in such a way that necessary updates will be relatively easy and straightforward to implement

The text discusses computing only in the context of a specific brand of calculator. When we teach intro stats for social science students, for instance, we introduce them to SPSS -- our client departments (sociology and anthropology, criminology, communications and other arts programs) are very anxious that we do such a thing. The calculator references will be out of date rather quickly and I believe strongly that statistics without computing will leave students with no ability to connect our course with the statistics in their own disciplines. On the other hand the use of calculators is substantially confined to specific sections near the ends of units; perhaps these could be replaced by computing units.

I don't think the presentation of the material could be called modern but the basic ideas underlying the Neyman-Pearson approach have not changed so this is probably ok.

**Relevance Rating:** 3 out of 5

#### Q: The text is written in lucid, accessible prose, and provides adequate context for any jargon/technical terminology used

I think this is true. Occasionally they seem to pick a piece of jargon and re-use it rather than re-explain but generally it is quite all right.

**Clarity Rating:** 4 out of 5

#### Q: The text is internally consistent in terms of terminology and framework

I noticed no problems here.

**Consistency Rating:** 4 out of 5

#### Q: The text is easily and readily divisible into smaller reading sections that can be assigned at different points within the course (i.e., enormous blocks of text without subheadings should be avoided). The text should not be overly self-referential, and should be easily reorganized and realigned with various subunits of a course without presenting much disruption to the reader.

I don't think it is all that modular. It feels to me that it might be hard to skip the probability sections and get on to the normal curve directly. That would be a problem for our courses -- we have thirteen weeks to complete the one course most of these students will take and the ideas underlying hypothesis testing and confidence intervals seem to me to be more important that mastering jargon like "mutually exclusive".

**Modularity Rating:** 3 out of 5

#### Q: The topics in the text are presented in a logical, clear fashion

As Shane Rollans says -- the index is computer generated and not useful. In an on-line / pdf document the page references in an index ought to be active. The actual order is very standard -- that is just fine.

**Organization Rating:** 4 out of 5

#### Q: The text is free of significant interface issues, including navigation problems, distortion of images/charts, and any other display features that may distract or confuse the reader

I guess my comment about active links belongs here. I clicked on a number of links in the text and a depressing number did not lead to the objects they should have. This will be a problem for a long time to come in on-line materials and is not limited to this text.

**Interface Rating:** 4 out of 5

#### Q: The text contains no grammatical errors

No complaints from me.

**Grammar Rating:** 4 out of 5

#### Q: The text is not culturally insensitive or offensive in any way. It should make use of examples that are inclusive of a variety of races, ethnicities, and backgrounds

No complaints from me.

**Cultural Relevance Rating:** 5 out of 5

#### Q: Are there any other comments you would like to make about this book, for example, its appropriateness in a Canadian context or specific updates you think need to be made?

In the material above I gave some commentary on the specific Review Criteria which we were given.I also want to discuss the issue in terms of who might actually use this text.

I am reviewing a textbook for an introductory Statistics course. I have in mind two potential uses of the text: use in some course in my department at SFU; and use in some other post-secondary institution in BC. I am, I think, better qualified to be firm about the value of the text in the former context than in the latter. I will start, then, with the question: is this a useful text for the Statistics and Actuarial Science Department at Simon Fraser University? I think not. Over all I think the text is reasonable and sensible and has no significant technical flaws. But the book is pointed at an audience which is comfortable with more mathematical notation than I think is wise for our non-calculus based courses. At the same time the mathematical level is too low for our calculus based introductions. Thus I doubt that it will be used in any courses we offer. Here are some more details and concerns with respect to actually using the text.

We teach three non-calculus introductions (general, social science, and life science students are the three target audiences) and one calculus based introduction. Only in the latter do I use the Greek letters which are used often in this book. I think the formulas and the algebra are not really suitable for the social science non-calculus course and probably would be problematic in our general course as well. Life science students are required to take calculus so the notation may be ok there. In any case I would much prefer a text which did not have so many formulas and symbols for the non-calculus introductions. Look, for instance, at the formula atop page 59 where they solve an equation to find out how many bars are needed in a histogram. I, for one, certainly avoid even the tiniest bit of algebra since it encourages students to think that the algebra is the important part.

On collaborative activities: I guess that a lot of instructors would find many of these activities hard to do in a room with 250 students. They might be a good idea, though. I didn't get the feeling that the collaborative activities were terribly central in spite of the title of the book.

If I were using this book for a life sciences audience I would be a bit disappointed by the examples, I feel. There are many which use data which is convenient to find on the web or generate in a class or in a small group. I see the value in this but worry that the result is data which is unconnected with the life science material the students are studying elsewhere. I think there is a serious risk that students in a statistics course will fail to see the relevance of the ideas to their own science.