Course Information

Class hours:TR 11:00 am - 12:15 pm
Class location:F Paul Anderson Tower, Rm. 267
Instructor:Licong Cui
Office Address:233 James F. Hardymon Building
Office Phone:859-257-3062
Office hours:Thursday 2:00pm – 4:00pm
Course website:

Course Description

There has been an unprecedented growth of data in business (e.g., e-commerce), science (e.g., biomedicine), healthcare, and social networking (e.g., twitter). These large volumes of datasets are also called “big data,” marked by high velocity (in terms of generation rate and need for quick analysis) and variety (in terms of forms of data). The new paradigm of Big Data requires new computational techniques to address the need to turn data into actionable information.

This course will offer an opportunity for students to learn emerging, cutting-edge big data techniques and apply them to tackle real-world data science challenges (e.g., processing, storing, querying, exploring, and mining big data). Such techniques include Hadoop/MapReduce, a scalable, distributed, data-intensive computing framework, as well as higher level tools built on top of the Hadoop platform like HBASE (a non-relational, distributed database), HIVE (a data warehouse infrastructure), Pig (a higher level interface to MapReduce), and Spark (for analytics). The course will introduce semantic-oriented techniques such as ontologies, semantic web, and common data elements that address the complexity and variety of big data.


Algorithm design and analysis; Database system (e.g., MySQL); Programming language (e.g., Java, C, C++, Python); Linux basics

Student Learning Outcomes

After completing this course, the student will be able to:

Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale (4th Edition), by Tom White

Grading Criteria

Grading scale: 90 – 100% = A; 80 – 89% = B; 70 – 79% = C; < 70% = E.

Course Policies

Academic Integrity:
Students can discuss the content covered in the class, but need to independently complete the homework assignments. For collaborative course projects, clear statements of each member’s contribution need to be included in the written reports. Proper acknowledgement is required if you borrow idea or content from other sources. More information about University policy for plagiarism and cheating can be found at

Submission Policy:
Attendance Policy:
Attendance of class is necessary to complete homework and other assignments. According to the University policy, students are expected to withdraw from the class if more than 20% of the classes scheduled for the semester are missed (excused or unexcused).

Excused Absences:
Senate Rules defines the following as acceptable reasons for excused absences: (a) serious illness, (b) illness or death of family member, (c) University-related trips, (d) major religious holidays, and (e) other circumstances found to fit “reasonable cause for nonattendance” by the professor. See for more information.

Verification of Absences:
Students must notify the instructor of an absence within one week after the absence. Students may be asked to verify their absences in order for them to be considered excused. Senate Rule states that faculty have the right to request “appropriate verification” when students claim an excused absence because of illness, or death in the family. Appropriate notification of absences due to University-related trips or a major religious holiday is required prior to the absence.

Accommodations due to disability:
Students who have a documented disability and require academic accommodations are encouraged to contact the instructor as soon as possible. In order to receive accommodations in this course, students must provide the instructor with a Letter of Accommodation from the Disability Resource Center ( for coordination of campus disability services. The DRC is located in Suite 407 of the Multidisciplinary Science Building, 725 Rose Street, Lexington, KY 40536-0082. Please call (859) 257-2754 to contact the DRC by phone.