1) Descriptive Statistics a) Central Tendency: Mean, median, mode b) Variation: std dev, variance, CV = sd/mean b) Shape: skewness using histograms: hist() 2) Normal distribution a) Area finding under the curve: Probabality of some event. e.g. if we randomly pick up a person from this group, what are the chances, this person's age > 25, or salary < 75000, or weight between 150 and 170 lbs. ==> pnorm() b) Area is given. We wonder cutoff point(s) for area. ==> qnorm() 3) Confidence Intervals LowerLimit < mu < UpperLimit LowerLimit = xbar - z*sd/sqrt(n) UpperLimit = xbar + z*sd/sqrt(n) 90% confidence level -> z=1.68 95% confidence level -> z=1.96 99% confidence level -> z=2.58 For example: qnorm(0.95) = 1.68 4) Hypothesis testing - One sample Two sided (mu = A or not) a) Construct your hypothesis H0: mu = A Ha: mu != A b) i) Calculate zcrit and zstat using formulas. Sketch and look at zstat's position in the distribution. ii) Use t.test() function. Look at the pvalue. 5) Hypothesis testing - One sample One sided (mu > A or not) a) Construct your hypothesis H0: mu <= A Ha: mu > A H0: mu >= A Ha: mu < A b) Same procedure as in two sided except don't divide alpha by 2. If question is ">", shade the right tail. If question is "<", shade the left tail. 6) Hypothesis testing - Two samples a) Paired t-test: t.test(..., paired = TRUE, ..) b) Equal variances: t-test(..., var.equal = TRUE,..) c) Seperate variances t-test: t-test(..., var.equal = FALSE,..) Look at the pvalue, if it is less than alpha/2, then reject H0. 7) z-test for proportions One categorical variable. Test claims about the proportion of one category. e.g. Proportion of voters supporting candidate A is equal to 55%. Majority: 50% or more supporting candidate A. 8) Chi-Square test for independence Two categorical variables. Test if these two variables are independent or not. Use the chi.square() and look at pvalue. If pvalue < alpha, reject H0 which says they are independent. 9) ANOVA F test One numerical and One categorical variable. You are testing if the mean of the numerical variable differentiate along the categories of the categorical variable. H0: mu1 = mu2 = mu3 .... = mun Ha: At least one mean is different. Use function aov() and look at pvalue to see if you can reject H0. 10) Linear Regression Simple Lin. Regression: Y = b0 + b1*X Multiple Linear Regression: Y = b0 + b2*X1 + b2*X2 + ... + bn*Xn Calculate b coefficients using lm() function. Make sure the model is a good fit. Understand what each coefficient means.