Data Curation for Business Analytics
Objectives
This course is the entry point of a series of data science skillset for business analytics in the modern big data era. We will introduce concepts of data curation and management with applications. Students will explore data characteristics and perform data curation through hands-on experiences, such as data extraction, data wrangling, data exploration, and database, and data science workflow in terms of reproducible Extract-Transform-Load (ETL) processes.
General characterization
Code
2659
Credits
3.5
Responsible teacher
Qiwei Han
Hours
Weekly - Available soon
Total - Available soon
Teaching language
English
Prerequisites
n/a
Bibliography
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 2nd Edition Python
Data Science Handbook Essential Tools for Working with Data
Teaching method
Students are required to bring own laptops for in-class exercises and quizzes. This course adopts learning-by-doing culture that allows students to implement the data curation process through programming in Python and SQL. Most of the class material will be in the Jupyter notebooks to facilitate reproducible practices.
Evaluation method
Class participation through 4 quizzes (20%)
2 bi-weekly assignments (30%)
Final exam (50%)
Subject matter
This course contains 6 modules that students learn about data curation through hands-on programming exercises. This course will also serve as a crash course in Python, the most popular programming language in the Big Data era. Most of the lectures will be presented using Python/SQL examples.