Data Curation


This course serves as the entry point of a series of data science skillset for business analytics in the modern big data era. We will introduce concepts of data curation and management with applications. Students will explore the characteristics of data and perform data curation through hands-on experiences, such as the data extraction, data wrangling, data exploration, database and data science workflow in terms of reproducible Extract- Transform-Load (ETL) processes.
Students do not need to have programming experience but programming knowledge, such as R, Matlab, Java, etc would be highly preferred.

General characterization





Responsible teacher

Qiwei Han


Weekly - Available soon

Total - Available soon

Teaching language



Available soon


This course does not require any textbook, because data science is a rapidly changing field and no textbook may cover all materials we will teach in the course. However, the following book is recommended for your reference:
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 2nd Edition Python Data Science Handbook Essential Tools for Working with Data, Chapter 1-3

Teaching method

Students are required to bring own laptops for in-class exercises and quizzes. This course adopts learning-by-doing culture that allows students to implement data curation process through programming in Python and SQL. Most of class material will be in the Jupyter notebooks to facilitate reproducible practices.

Evaluation method

The overall evaluation of performance consists of 4 parts
•    Class participation through 5 quizzes (20%)
•    3 bi-weekly assignment (30%)
•    Final exam (50%)

Subject matter

This course contains 6 modules that students learn about data curation through hands-on programming exercise. This course will also serve as crash course of Python, the most popular programming language in Big Data era. Most of lectures will be presented using Python/SQL examples.