simpleR – Using R for Introductory Statistics
2e+05 20000 40000 60000 80000
These notes are an introduction to using the statistical software package R for an introductory statistics course. They are meant to accompany an introductory statistics book such as Kitchens “Exploring Statistics”. The goals arenot to show all the features of R, or to replace a standard textbook, but rather to be used with a textbook to illustrate the features of R that can be learned in a one-semester, introductory statistics course. These notes were written to take advantage of R version 1.5.0 or later. For pedagogical reasons the equals sign, =, is used as an assignment operator and not the traditional arrowcombination The > is called the prompt. In what follows below it is not typed, but is used to indicate where you are to type if you follow the examples. If a command is too long to ﬁt on a line, a + is used for the continuation prompt.
Entering data with c
The most useful R command for quickly entering in small data sets is the c function. This function combines, or concatenates terms together. As anexample, suppose we have the following count of the number of typos per page of these notes: 2 3 0 3 1 0 0 1 To enter this into an R session we do so with > typos = c(2,3,0,3,1,0,0,1) > typos  2 3 0 3 1 0 0 1 Notice a few things • We assigned the values to a variable called typos • The assignment operator is a =. This is valid as of R version 1.4.0. Previously it was (and still can be) amean(typos)  1.25 As well, we could call the median, or var to ﬁnd the median or sample variance. The syntax is the same – the function name followed by parentheses to contain the argument(s): > median(typos)  1 > var(typos)  1.642857
Data is a vector
The data is stored in R as a vector. This means simply that it keeps track of the order that the data is entered in. In particular there is aﬁrst element, a second element up to a last element. This is a good thing for several reasons: • Our simple data vector typos has a natural order – page 1, page 2 etc. We wouldn’t want to mix these up. • We would like to be able to make changes to the data item by item instead of having to enter in the entire data set again. • Vectors are also a mathematical object. There are natural extensionsof mathematical concepts such as addition and multiplication that make it easy to work with data when they are vectors. Let’s see how these apply to our typos example. First, suppose these are the typos for the ﬁrst draft of section 1 of these notes. We might want to keep track of our various drafts as the typos change. This could be done by the following: > typos.draft1 = c(2,3,0,3,1,0,0,1) >typos.draft2 = c(0,3,0,3,1,0,0,1) That is, the two typos on the ﬁrst page were ﬁxed. Notice the two diﬀerent variable names. Unlike many other languages, the period is only used as punctuation. You can’t use an _ (underscore) to punctuate names as you might in other programming languages so it is quite useful. 1 Now, you might say, that is a lot of work to type in the data a second time. Can’t I justtell R to change the ﬁrst page? The answer of course is “yes”. Here is how > typos.draft1 = c(2,3,0,3,1,0,0,1) > typos.draft2 = typos.draft1 # make a copy > typos.draft2 = 0 # assign the first page 0 typos Now notice a few things. First, the comment character, #, is used to make comments. Basically anything after the comment character is ignored (by R, hopefully not the reader). Moreimportantly, the assignment to the ﬁrst entry in the vector typos.draft2 is done by referencing the ﬁrst entry in the vector. This is done with square brackets . It is important to keep this in mind: parentheses () are for functions, and square brackets  are for vectors (and later arrays and lists). In particular, we have the following values currently in typos.draft2 > typos.draft2  0 3 0 3 1 0 0...
Leer documento completo
Regístrate para leer el documento completo.