Introduction to Programming and Data
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
EDUC 263: Introduction to Programming and Data
Management Using R
Office Hours: Tues 2:30-4PM; and by appt Office Hours: Mon 4-5PM; Wed 4-5PM;
Office: Moore Hall 3038 Location: Moore Hall 3120 (computer lab)
This course has two foundational goals: (1) to develop core skills in “data management,” which are
important regardless of which programming language you use, and (2) to learn the fundamentals
of the R programming language.
Data management consists of acquiring, investigating, cleaning, combining, and manipulating
data. Most statistics courses teach you how to analyze data that are ready for analysis. In real
research projects, cleaning the data and creating analysis datasets is often more time consuming
than conducting analyses. This course teaches the fundamental data management and data ma-
nipulation skills necessary for creating analysis datasets.
The course will be taught in R, a free, open-source programming language. R has become the most
popular language for statistical analysis, surpassing SPSS, Stata, and SAS. What differentiates R
from these other languages is the thousands of open-source “libraries” created by R users. R
is one of the most popular languages for “data science,” because R libraries have been created
for web-scraping, mapping, network analysis, etc. By learning R you can be confident that you
know a programming language that can run any modeling technique you might need and has
amazing capabilities for data collection and data visualization. By learning fundamentals of R in
this course, you will be “one step away” from web-scraping, network analysis, interactive maps,
quantitative text analysis, or whatever other data science application you are interested in.
Students will become proficient in data manipulation tasks through weekly “problem sets” that
you complete in groups of three. These problem sets will account for 90% of your grade for the
course. Each week class will begin with one group will leading a discussion of challenges they
encountered while completing the problem set. The rest of class time will be devoted to learning
new material. The instructor will provide students with lecture notes, and also data and code used
during lecture. Therefore, student can follow along by running code from their own computers.
1/8
EDUC 263: Introduction to Programming and Data Management Using R – Fall 2019
Course Learning Goals
1. Understand fundamental concepts of object oriented programming
• What are the basic object types and how do they apply to statistical analysis
• What are object attributes and how do they apply to statistical analysis
2. Become familiar with Base R approach to data manipulation and Tidyverse approach to data
manipulation
3. Investigate data patterns
• Sort datasets in ways that generate insights about data structure
• Select specific observations and specific variables in order to identify data structure and
to examine whether variables are created correctly
• Create summary statistics of particular variables to diagnose errors in data
4. Create variables
• Create variables that require calculations across columns
• Create variables that require processing across rows
5. Combine multiple datasets
• Join (merge) datasets
• Append (stack) datasets
6. Manipulate the organizational structure of datasets
• summarize and collapse observations by group
• Reshape and “tidy” untidy data