Administrative Information

  • Course day/time: Fri., 9:15-10:45 (building 22A, room 203)
  • Instructor: Uli Niemann
  • Course type: seminar
  • ECTS credits: 6
  • Audience: all FIN Master degree programs
  • Course language: english
  • Registration and application:
  • Prerequisites: see Prerequisites section
  • Technical requirements: bring your own laptop with R and RStudio installed on it
  • Grading: based on several deliverables in the context of a semester-long data science project → see Project page

News

10.10.2018
The notifications regarding admission have been sent to all students who registered via LSF.

24.09.2018
Due to the very high interest in Data Science with R, we unfortunately cannot admit all students who have registered to the seminar via LSF. This is why we would like you to submit a brief application letter (ca. 300 words) at uli.niemann[at]ovgu.de until October 8th 2018. This letter should contain a) a statement why the seminar would enhance your study and b) a list of prerequisites and/or recommendations. On October 10th 2018, we will pick up to 30 students for admission to the seminar from all applications.

Course Description

Data Science with R (DataSciR) is an applied course about learning from data to perform predictions and to obtain useful insights. In the seminar, we will use the statistical programming language R.

Necessary skills to manage and analyze data will be taught and practiced on real-world applications and through a semester-long graded data science project.

Programming knowledge of other courses are helpful but not mandatory. However, you are expected to have a profound knowledge of fundamental data mining techniques, such as classification, regression and clustering.

After successful completion of this course, you will be able to proficiently perform the following tasks in R:

  • import and preprocess raw data
  • transform data for modelling
  • perform exploratory data analysis with summary statistics and visualization
  • understand, build and evaluate predictive classification and prediction models, including regression models, tree-based models, ensembles and boosted models
  • communicate and disseminate results and findings through reproducible documents, presentations, websites and interactive web applications

Tentative Syllabus

Prerequisites

There are no mandatory prerequisites for DataSciR. However, it is recommended that you have heard at least one of the following lectures (or comparable):

Also, you should have a basic programing and statistics knowledge. For example, you will learn the most important vector types and classes in R, but you will not learn what a vector or a class is in general. Accordingly, you should know what the terms mean, standard deviation, probability, hypothesis test, p-value, etc. mean.

Technical Requirements

It is recommended to bring your laptop to each course meeting. Class meetings are a mix of lecture and short coding exercises. You will get the most out of the meetings if you have a laptop and can work on these exercises. Hence, you should set-up your laptop until the end of the first week as described in the Software section.

Ressources

Other Ressources

Software

By the end of the first week, you should have installed the following software on your own laptop:

  1. R
  2. RStudio
  3. optional for Windows: Rtools

Also, please check whether you can successfully install packages. To do so, click on the Packages tab in the bottom-right pane in RStudio. Then, click on the Install button and specify an arbitrary package, e.g. dplyr. Finally, click on Install. Alternatively, you can install a package from the console with install.packages("dplyr"). If everything is set up correctly, no error messages should be displayed when you load the installed package with library(dplyr).

List of packages used on slides

Execute the following code chunk to install all packages that are used on the course slides (so that you don’t need to manually install each of them). Please note that the list will be updated during the semester.

install.packages("pacman")
pacman::p_load(char = c(
  "broom",
  "caret",
  "e1071",
  "extrafont",
  "farver",
  "fastAdaboost",
  "gapminder",
  "GGally",
  "ggmap",
  "ggrepel",
  "ggridges",
  "ggthemes",
  "here",
  "htmltab",
  "kableExtra",
  "knitr",
  "kohonen",
  "latex2exp",
  "maps",
  "openintro",
  "plotly",
  "randomForest",
  "ranger",
  "RColorBrewer",
  "rgdal",
  "rpart",
  "rpart.plot",
  "seriation",
  "skimr",
  "tidytext",
  "tidyverse",
  "tsne",
  "UsingR",
  "xaringan",
  "xgboost",
  "viridisLite",
  "wordcloud"
))