Biol/Stat 2244A (FW20) – Lab Assignment
Objectives
Lab Assignment 4 provides an opportunity to experience the Conclude stage of the Scientific Inquiry
Framework (“PPDAC”), in addition to applying concepts from earlier stages. Specifically, this includes:
i. applying your understanding of vocabulary and concepts associated with interpreting analytical
results;
ii. reporting conclusions in appropriate scientific formats.
To achieve these objectives, you will need to draw on content pre-lab module that possible statistical
errors, power and effect size, as well as from tools and concepts described in previous lab lessons and
course lecture material.
Background for Assignment
In the first Lab Assignment, you were introduced to a little background information that led/explained the
logic behind the following Research Objective for 2244 labs:
Characterize the effect of practicing yoga on the balance/stability of visually impaired people.
In Lab Assignment 1, you were tasked with Planning a sampling and study design to collect data to
address some research questions related to this Objective. In Lab Assignment 2, you were introduced to
the sampling and study design for a research study related to this objective and provided datafiles from
that study. You should re-acquaint yourself with the sampling and study design that generated the data,
and the Data Description file you were provided. In lab assignment 3, you conducted two detailed
statistical analyses.
Instructions for Assignment
To help you approach this Assignment in a logical/organized fashion, you are encouraged to follow these
steps (in the order presented).
1. Refresh your memory (as needed) about the data we are working with, as described in the Data
Description.
2. Read through the remainder of this Assignment file, including the “Reminders/Tips for Success”,
the Assignment Questions, and the comments related to Marking Rubrics so that you know what
you are being asked to do.
3. Open the “Answer template” file. Use this file to type/enter your answers to the Assignment
Questions; it is set up with the proper headings for this Assignment; you just need to input your
answers (use whatever space you need to do so).
4. Answer the Assignment Questions (below).
Reminders/Tips for Success
Biol/Stat 2244A (FW20) – Lab Assignment 4
2
1. Make interpreting your R code easy and ensure that it is functional. The best approach is to generate
an R script file for each numbered Question that requires use of R in any way. Put everything in that
R script file that takes you from loading packages to completing the question. Annotate your more
complicated lines of code with #comments (as demonstrated in the first in-lab session and in the
example provided in the Assignment Guidelines and Format file). Then, simply copy/paste the
contents of that script file into your answer (and include the Output if applicable) when asked for R
code. We should then be able to copy/paste what you’ve included, and the code should run without
problems!
2. Your answers should be written specifically for the research study/context (in terms of variables,
sample, units, measurements, etc.) with which we are dealing. For example, it’s insufficient to talk
about the “response variable”; we should be talking about the actual name of that variable, e.g.
concentration of testosterone (using an example from our in-class case study on heel height and
male behavior). This idea of using the context or “language” of the Problem has repeatedly been
illustrated in more recent video lectures. Being specific means using the context of the research; a
sentence like, “the distribution of student heights in my sample of 2244 students needs to be
symmetric” is explicit about what needs to be symmetric AND uses the vocabulary of the study.
3. Demonstrate your understanding of course content through application, not definition. Questions
which asks you to discuss something with reference to particular course concepts (e.g. sampling
variability) requires an application of those course concepts to the current scenario/situation. That
means that simply providing the definition of those concepts is not going to result in points awarded
for the Question. Your answers should demonstrate you understand that concept and why it applies
(or doesn’t!) or is useful in the particular context.
4. Use what we have been doing in lecture as cues to understand the questions. Everything you are
asked to do in this Assignment is somehow illustrated/ discussed during a lecture lesson and/or a lab
lesson. Your first line for trying to understand what a question is asking is to go back to your
notes/videos.
5. Show us what you know, completely. Most inference procedures have more than one ‘condition’
that must be met for the underlying model to be valid. If we violate any of the conditions, we
shouldn’t use the procedure. In these situations, it may seem ‘redundant’ to continue to check the
other conditions of a model. Remember that these Assignments are assessing your understanding
and application of course material. Occasionally, we violate conditions for a model (it’s bound to
happen with real data!). Be sure to assess ALL conditions of a model completely, regardless of
whether one or more is violated. Show us what you know!
6. If you get stuck with R, at least tell us what you wanted to do. We recognize this is your first course
that involves using R (and for many of you, any kind of programming language). Some of these
Assignments questions will be tough, others should be quite accessible with some careful thought
and application of what you are learning. If you get stuck and run out of time to get help to “unstick”,
don’t leave an answer blank. Tell us what you were trying to do, show us the code you were trying to
use, or what functions and types of arguments you think would be relevant. That is, walk us through
your thought process. It likely won’t be worth full marks, but some part marks may be obtainable.
7. How to write ‘symbols’ in a document. For some questions, you may need to use symbols to
represent specific values. If you use a word processing software like Word (i.e. .docx) to create your
assignment file, you will likely find most of the symbols you would need can be inserted either from
the Insert/Symbol menu, or, by using the Equation Editor (also part of the Insert menu).
Alternatively, if you aren’t sure how to get mathematical symbols in your word processing software,
you can use the following “phonetic” symbols; when we see these “words”, we will interpret them as
the corresponding symbols. In all cases, feel free to use subscripts liberally to help communicate with
these symbols.
Biol/Stat 2244A (FW20) – Lab Assignment 4
3
symbol “phonetic” version
σ sigma
µ mu
̅ x-bar
̂ p-hat
epsilon
In addition, you are always welcome—at the end of a given question—to provide a short commentary
justifying/explaining any choices you made for which variables, subsets, etc. or reasoning you used to
answer a question. Help us understand your thought process when working with our data.
Assignment Questions
Question 1.
In the abstract, Jeter et al. claim the “groups were equivalent at baseline (all p > 0.05)”. By groups, they
are referring to the “AYT” and “waitlist” groups.
In the methods section, they describe the analysis that they used to make that claim: “Independent t-
tests were conducted to determine baseline differences between groups” (page 11). By “independent t-
tests” they mean the test for a difference between two means that you learned in class.
In the results section, they elaborate very little on the claim from the abstract: “There were no significant
differences at baseline between groups for all COP variables (all p > 0.05).” (page 13). By COP variables,
they mean Firm_EO, Firm_EC, Foam_EO, and Foam_EC.
No further details about this part of the analysis are presented in the paper.
a. Briefly describe (1 or 2 sentences) why it is important to investigate whether the baseline values
of the COP variables differ between the two experimental conditions.
b. List the pieces of information that you were taught to present in the conclusion step of the data
analysis protocol, that the authors have not stated in this analysis.
c. Is the authors’ claim that the groups are equivalent justified based solely on the information
presented in the paper? Briefly justify your answer.
Biol/Stat 2244A (FW20) – Lab Assignment 4
4
Question 2.
In the methods section, Jeter et al. state “this study was not powered to detect between group
differences following AYT” (page 11). By this they mean that a statistical test comparing the mean change
in COP (baseline – post) in the AYT group to the mean change in COP in the waitlist group would have low
statistical power.
a. Use R to calculate the power of such an analysis to detect a small, medium, and large effect (as
defined by Cohen) based on the size of the sample in the study and alpha = 0.05 (as used in the
study). Report your R code and output.
b. Briefly describe the meaning of your findings from part a. It is not enough to say, “the power is
…”. This question asks you to describe in your own words (1 to 3 sentences) what the numbers
you calculated mean about what type of statistical test you evaluated and about its usefulness in
the situation described.
c. Do you agree with the author’s assessment of their power? Briefly justify your response.
Question 3.
In the discussion, Jeter et al. conduct a power analysis to plan for future studies. They use the effect size
for Foam_EO measured in their study, and a desired power of 0.8 and alpha of 0.05. They calculate the
sample size required for a future study which would compare the change in COP in the AYT group to the
change in COP in the waitlist group (page 14). They plan for a slightly different type of test than you have
learned (an ANCOVA).
a. What type of power analysis is this: a priori or post hoc? Write one sentence to justify your
answer.
b. Use R to calculate the effect size, d, based on the mean change in Foam_EO in the AYT group, the
mean change in Foam_EO in the waitlist group (both given in Table 4) and a standard deviation of
5.8 (report R code and output):
> lab3data$delta_FoamEO <- lab3data$Foam_EO_baseline - lab3data$Foam_EO_post
> sd(lab3data$delta_FoamEO)
[1] 5.781177
c. Use R to calculate the required sample size for a t test comparing the difference between two
means based on the effect size you calculated in part b and other parameters specified in this
question. Report your R code and output and write one sentence describing your findings.
Question 4.
In the introduction, Jeter et al. describe two competing hypotheses related to COP.
Minimization hypothesis: “a reduction in absolute COP displacement would imply greater postural
stability. … Therefore, COP minimization as an indicator of greater stability and reduced fall risk is often
cited in the literature” (page 3). This is the hypothesis that was presented to you in the video at the start
of lab 2.
Biol/Stat 2244A (FW20) – Lab Assignment 4
5
Stabilization hypothesis: “[relative to normal adults] COP can be reduced in clinical populations at risk of
falls such as Parkinson’s patients [and] lower leg amputees. … [COP] can be increased after significant
balance training such as Tai Chi and other forms of exercise. … While the mechanism for COP changes
after training is not clear, it is possible that individuals develop corrective strategies to down-weight
unreliable and up-weight reliable information.” The last sentence about down-weighting and up-
weighting information is a reference to the author’s overall theory that balance is maintained based on
input from the somatosensory system (senses such as pressure, pain, or heat which can occur anywhere
in the body), the vestibular system (inner ear structures that are instrumental in balance in mammals)
and the visual system. Perhaps exercise training increases body awareness and thus makes people
respond more decisively (higher COP) when one of those systems that they trust to be reliable tells them
they are losing balance.
Look at the first eight rows of Table 4 in the paper. The column on the right measures effect size using a
statistic (η: “eta”) that differs from the d and h statistics that you were introduced to. However, like d
and h, higher values of η indicate a larger effect.
Do the results in Table 4 support one hypothesis (minimization or stabilization) more than the other?
Refer to specific findings from different columns to justify your answer.
Marking Rubric
Most of the questions in Assignment 4 have correct vs. incorrect answers and/or approaches.
Consequently, the marking scheme for evaluating your answers to certain questions may often have a
single ‘right’ answer/approach for which we are looking. However, how we use R to explore, summarize,
and analyze our data can, to some degree, vary in technique. That is, there may be more than one way to
ask R to complete a particular ‘task’.
So, what does this mean for a student trying to understand expectations when completing this
Assignment? In addition to the “Reminders/Tips for Success” provided on pages 2-3 of this file, consider
the following general criteria for different types of questions/marking; these criteria will likely play a
heavy role in evaluating the answers submitted for the Assignment:
Criteria for R code and output
✓ Selection of data, variables, and subsets is relevant to the question or task;
✓ Choice of R functions is relevant and appropriate (demonstrating an understanding of the analysis
being conducted) for the question, task, and/or type of data being summarized/analyzed;
✓ Reported R code for any numbered question is complete and would function (i.e. reproduce the
output included/described in the answer) if it were copied/pasted into R and run.
✓ Reported R code uses brief #comments to help interpret the purpose of more complex
commands/functions
Criteria for ‘other’ questions (i.e. identifying, describing, explaining, discussing, etc.)
✓ Knowledge: use and application of relevant statistical vocabulary/concepts demonstrates an
accurate understanding of those concepts; that is, the vocabulary/concepts are used/applied in a
manner that is consistent with the definition/understanding. The use moves beyond simply
defining the concepts, but actually applies them to the situation. This criterion also connects to
whether an answer is consistent with any expectations/guidelines communicated in course
content (e.g. lectures).
Biol/Stat 2244A (FW20) – Lab Assignment 4
6
✓ Connections/Justification: Answer demonstrates (through explanation and/or description, where
appropriate) the relevance or relationship of choices made/vocabulary used to the question(s) or
situation. That is, it’s clear WHY you have made the choices you did, and these choices make
logical sense.
✓ Completeness: Answer provides enough detail (whether in written answers or visual content) that
another, knowledgeable individual can understand and/or recognize the application of the
concepts, without ambiguity or doubt. This also refers to whether the answer has addressed all
aspects of the question.
✓ Communication: Answer uses clear and concise language, and thoughtful organization of ideas to
facilitate readability and understanding. That is, we do not have to re-read your answer multiple
times to try to understand what you are saying.