COGS 108: Data Science in Practice
COURSE OVERVIEW This class is a hands-on practical, technical, and applied data science course intended to get you experience working on data science projects. In COGS 9 (Introduction to Data Science) you (may have) learned why data and data science are important. This class goes beyond appreciation for what can be done to actually doing it. Often the best way to learn something is to do it yourself. Often, this process will involve attempting to do something, doing it wrong, learning from your mistakes, and then succeeding. That’s part of the data science process. This course is all about the practice of data science. In focusing on the application, there is theory that won’t be discussed and mathematical proofs that won’t be done. That is by design. In particular: 1. There are entire courses dedicated to each of the topics we’ll cover. To have time to do anything, we can’t teach all the details in a single course. 2. Experts in each of these domains are out there and excited to teach you the nitty gritty about each topic. 3. My expertise is not machine learning. It’s data science, education, human genetics, and the intuition behind data analysis. 4. We’re promoting data literacy. We believe that everyone who is data literate is at an advantage as they go out into the modern world. Data literacy is not limited to those who are computational gurus or math prodigies. You do not have to be either of those to excel at this course. In this course, you will try many methods. Every so often, you’ll even be asked to implement a technique that has not been explicitly taught. Again, this is by design. As a data scientist, you’ll regularly be asked to step outside of your comfort zone and into something new. Our goal is to get you as comfortable as possible in that space now. We want to provide you with a technical and a data science mindset that will allow you to ask the right questions for the problem at hand and set off alarm bells when something in your dataset or analysis is “off.”
COURSE OBJECTIVES • Formulate a plan for and complete a data science project from start (question) to finish (communication) • Explain and carry out descriptive, exploratory, inferential, and predictive analyses in Python • Communicate results concisely and effectively in reports and presentations • Identify and explain how to approach an unfamiliar data science task
CLASS TECHNOLOGY • Python (>= 3.6; Anaconda distribution) • Jupyter Notebooks • git and GitHub (option to use SourceTree, GitHub Desktop, or other GUI)
Individual Final Project Option 2 will be completed individually and has been designed to mimic the data science interview process. During data science interviews, applicants are often given a dataset, a question, and tasks and sent home to complete the task. This is what students who choose Option 2 will be asked to do. Monday night of Week 10, students will be given a dataset, a topic, and tasks to complete individually. Students will have until the Final Deadline to carry out the data science project on their own. For all aspects of this project, students will have full access to course materials, their own brains and information on the Internet but are not allowed to discuss their approach or analysis with any other humans (this includes, but is not limited to: family members, members of the class, friends, or people online).
Choosing this option Students who choose Option 2 will have to specify this choice via Google Form by the Friday of week 3 (see Course Schedule). One form will be submitted per individual.
Project Proposal Students who choose Option 2 will still submit a project proposal by the end of week 3 (see Course Schedule below) on GitHub. This will be completed individually on a topic of our choosing.
Final Project Survey Every individual in the class will provide feedback about their experience completing this option. Surveys will be completed individually and are due at the same time as your Final Project (date of the final at 11:59 PM).
Final Project The final project will be a full, detailed data science report in the form of a Jupyter notebook that carries out an analysis from start to finish. This report will answer the data science question provided to you during finals week. You will have 5 days to complete your individual final project. We do not anticipate it taking you 5 days straight; however, we know you’ll have other finals to study for and take during this time. More details will be provided in class, but generally this report will include (1) background research and ethical considerations, (2) your data science question(s) and hypothesis/hypotheses, (3) data & data wrangling, (4) a descriptive & an exploratory data analysis, (5) your full analysis, (6) your results, and your (7) conclusion(s).