|Class hours:||TR 11:00 am - 12:15 pm|
|Class location:||F Paul Anderson Tower, Rm. 267|
|Office Address:||233 James F. Hardymon Building|
|Office hours:||Thursday 2:00pm – 4:00pm|
There has been an unprecedented growth of data in business (e.g., e-commerce), science (e.g., biomedicine), healthcare, and social networking (e.g., twitter). These large volumes of datasets are also called “big data,” marked by high velocity (in terms of generation rate and need for quick analysis) and variety (in terms of forms of data). The new paradigm of Big Data requires new computational techniques to address the need to turn data into actionable information.
This course will offer an opportunity for students to learn emerging, cutting-edge big data techniques and apply them to tackle real-world data science challenges (e.g., processing, storing, querying, exploring, and mining big data). Such techniques include Hadoop/MapReduce, a scalable, distributed, data-intensive computing framework, as well as higher level tools built on top of the Hadoop platform like HBASE (a non-relational, distributed database), HIVE (a data warehouse infrastructure), Pig (a higher level interface to MapReduce), and Spark (for analytics). The course will introduce semantic-oriented techniques such as ontologies, semantic web, and common data elements that address the complexity and variety of big data.
Algorithm design and analysis; Database system (e.g., MySQL); Programming language (e.g., Java, C, C++, Python); Linux basics
Student Learning Outcomes
After completing this course, the student will be able to:
- Understand the basic architecture and programming models of Hadoop for scalable big data analytics.
- Implement MapReduce algorithms for processing and managing large data sets in a parallel, distributed way.
- Gain hands-on experience with scalable NoSQL data management solutions (e.g., HBASE, HIVE, Pig) to store, query, and explore big data.
- Apply common data mining techniques in Spark (e.g., clustering and classification algorithms, association rule learning) to analyze big data and derive knowledge from data.
- Identify data science research problems from real-world applications and apply the learned skills to solve these problems.
Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale (4th Edition),
by Tom White
- Homework/Programming Assignments (40%)
Paper presentation (20%)
- The instructor will select research and survey papers for paper presentation.
- Students are also encouraged to select papers of their interest.
- Project team: each team consists of up to 3 members.
- Clear statement of contribution for each team member.
- Deliverables: proposal, live demos, final report.
- Attendance and participation (10%)
90 – 100% = A;
80 – 89% = B;
70 – 79% = C;
< 70% = E.
Students can discuss the content covered in the class, but need to independently complete the homework assignments. For collaborative course projects, clear statements of each member’s contribution need to be included in the written reports. Proper acknowledgement is required if you borrow idea or content from other sources. More information about University policy for plagiarism and cheating can be found at http://www.uky.edu/ombud/
- Assignments will be posted and submitted electronically via Canvas: https://uk.instructure.com/courses/1892521.
- Late submission will be penalized 10% per day within 3 days. Submission that is late for more than 3 days will not be accepted.
Attendance of class is necessary to complete homework and other assignments. According to the University policy, students are expected to withdraw from the class if more than 20% of the classes scheduled for the semester are missed (excused or unexcused).
Senate Rules 18.104.22.168 defines the following as acceptable reasons for excused absences: (a) serious illness, (b) illness or death of family member, (c) University-related trips, (d) major religious holidays, and (e) other circumstances found to fit “reasonable cause for nonattendance” by the professor. See http://www.uky.edu/ombud/
for more information.
Verification of Absences:
Students must notify the instructor of an absence within one week after the absence. Students may be asked to verify their absences in order for them to be considered excused. Senate Rule 22.214.171.124 states that faculty have the right to request “appropriate verification” when students claim an excused absence because of illness, or death in the family. Appropriate notification of absences due to University-related trips or a major religious holiday is required prior to the absence.
Accommodations due to disability:
Students who have a documented disability and require academic accommodations are encouraged to contact the instructor as soon as possible. In order to receive accommodations in this course, students must provide the instructor with a Letter of Accommodation from the Disability Resource Center (http://www.uky.edu/DisabilityResourceCenter/
) for coordination of campus disability services. The DRC is located in Suite 407 of the Multidisciplinary Science Building, 725 Rose Street, Lexington, KY 40536-0082. Please call (859) 257-2754 to contact the DRC by phone.