About ISA5810, Fall 2024

Data mining serves as a crucial field that leverages advanced algorithms to reveal hidden, yet invaluable insights buried within extensive datasets. These algorithms are drawn from a multitude of areas such as machine learning, artificial intelligence, pattern recognition, statistics, and database systems, working together to facilitate a deeper understanding and analysis of data.

This course, ISA5810: Data Mining: Concepts, Techniques, and Applications, is designed to equip you with the foundational knowledge and hands-on experience needed to delve into the expansive world of data mining. Whether you are looking to enhance your skill set or embark on a new career path, this course will serve as a stepping stone to achieving your goals.

The curriculum encompasses a range of topics that will introduce you to the core concepts and techniques prevalent in the field of data mining. These include:

  • Association Rules: Understand the principles behind identifying rules that highlight relationships between seemingly independent data in a database.
  • Clustering: Learn about grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.
  • Classification: Gain knowledge on the procedures for identifying the predefined class of a new observation.
  • Text Mining: Equip yourself with the skills needed to analyze and interpret large collections of text data to extract meaningful information.
  • Data Mining Applications: Explore the various practical applications of data mining across different industries and sectors.

Text Book

    Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Addison Wesley

Time in 2024

  • Monday 9:00AM-10:20AM
  • Monday 10:30AM-11:40AM
  • 16-week Based Course

Location:

People

Supporting this course

Instructor:

Yi-Shin Chen

Yi-Shin Chen

She offers the fundamental database course and advance database courses for more than a decade. Her current research interests are: social networks, data mining, emotion analysis, and web intelligence.

  • email: yishin@gmail.com
  • phone: +886-3-573-1211
  • office: Delta 607
  • office hours: By email appointment

Teaching Assistants:

Didier Salazar

Didier
  • NTHU2024DM@gmail.com

Kuan-Hao Yeh

Kuan-Hao Yeh
  • NTHU2024DM@gmail.com

Po-Yung (Joe) Huang

Joe
  • NTHU2024DM@gmail.com

Gerraldo Candra

Gerraldo Silakumaro Candra
  • NTHU2024DM@gmail.com

Retnani Latifah

Retnani Latifah
  • NTHU2024DM@gmail.com

Meng-Chieh Tang

Jessie
  • NTHU2024DM@gmail.com

Hao-Ze (Arthur) Wang

Hao-Ze (Arthur) Wang
  • NTHU2024DM@gmail.com

Syllabus

Orientation

9/2 for 3 hours

During the orientation session, you'll have the opportunity to acquaint yourself with the course structure, meet your instructor, and connect with fellow classmates, fostering a collaborative and engaging learning environment. Additionally, we will provide a comprehensive overview of the course content, setting the stage for a productive and enlightening educational journey.

Activities

  • Reading: Syllabus
  • Join Teams
  • YouTube
  • For those unable to attend the initial session, kindly review the recordings available on NTU Cool or Teams and take Orientation Quiz
  • The one-minute reflection summary could be found here.

Interesting Videos

Overview and Data

9/9, 9/16 for 6 hours

Mastering and optimizing data stands as a pivotal phase in the comprehensive process of data mining activities. In this session, an introduction to the diverse attributes and distinct characteristics inherent in datasets will take center stage. This will transition into a deep dive into various data preprocessing techniques essential for effective data analysis.

Following this, a range of similarity and distance measures will be explored, serving as vital tools for discerning patterns and trends within the data. To conclude the session, an immersion into the art of data visualization will take place, showcasing a potent tool that aids in the intuitive representation and interpretation of complex data structures.

Related Videos

Activities

  • Join NTU Cool. For students from Chaoyang University, National Yang Ming Chiao Tung University, National Tsing Hua University, National Cheng Kung University, Southern Taiwan University of Science and Technology, Tatung University, and National Taiwan Normal University who enrolled in the class before September 3rd, NTU Cool has already sent out the invitation emails. If you haven't received one, please check the email address you provided through your school, as the invitation should be there.

Lab for Data Exploration and Management

9/23

During this lab session, emphasis will be placed on utilizing scientific computing libraries for the adept processing, transformation, and management of data. Moreover, participants will be acquainted with practices and introduced to cutting-edge visualization tools, fostering effective big data analysis.

Activities

  • Class is offered in YouTube only
  • Assignment One should be submitted before Oct 27

Classification

9/30, 10/7 for 6 hours

Classification, often identified as supervised learning, stands as a focal point in the spheres of data mining and machine learning. The primary objective here is to categorize input data into defined classes, enhancing the accuracy of predictive analyses.

In this session, crucial algorithms integral to classification techniques will be explored. The discussion will commence with an analysis of Decision Trees, utilizing a tree-like graph structure for strategic decision-making. This will transition into a study of Bayesian Networks, central tools for deducing probabilities and making informed predictions by analyzing the statistical relationships between different variables. Subsequently, the focus will shift to Neural Networks, potent frameworks adept at deciphering complex patterns and facilitating precise predictions. The session will conclude with an overview of Convolutional Neural Networks (CNNs), vital instruments in the realm of visual imagery analysis, notably in tasks involving image and video recognition.

This session aims to impart a comprehensive understanding of the core principles and subtleties of classification, furnishing participants with the skills vital for success in data mining projects.

Activities

Related Videos

Text Mining

10/14, 10/21 for 6 hours

Text mining operates as a method for gleaning essential insights from unstructured textual data, commonly employing Natural Language Processing (NLP) techniques such as lexical and syntactic analysis, and inference methods.

In this session, advanced computational methodologies like the Word2Vec algorithm will be discussed, highlighting its role in mapping word relationships through vector spaces. The conversation will also introduce Transformers, which enable efficient sequence processing, and Large Language Models, renowned for their expansive text generation and comprehension capabilities. A segment on ChatGPT will illustrate its significance in modern applications such as chatbots and content creation, underscoring the current innovations in the text mining domain.

Related Videos

Activities

Lab for Deep Information Retrieval and Neural Word Embeddings

10/28 for 3 hours

During this lab session, hands-on practice will take center stage, guiding participants through the utilization of information retrieval techniques for the modeling, training, and classification of textual data. The session will offer practical exposure to advanced deep learning frameworks such as word2vec, doc2vec, and FastText. Furthermore, participants will have the opportunity to engage with traditional text classification approaches like KNN, SVM, and Naive Bayesian, enabling a comprehensive, practice-oriented understanding of the diverse techniques utilized in the field.

Activities

  • Class is only offered in YouTube
  • Assignment Two should be submitted before Nov 26

DM Clustering & Project Progress Report

11/4, 11/11 for 6 hours

Cluster analysis serves as a technique to group objects such that those within the same cluster exhibit higher similarity to each other compared to those housed in separate clusters. Initially embraced within the realms of pattern recognition and signal processing, these clustering strategies have expanded their influence into many other domains. This session will present a deep dive into a range of clustering techniques, emphasizing key algorithms such as K-Means for partitioning, Hierarchical Clustering which forms a tree of clusters, Density-Based Clustering that groups together points with sufficient proximity, and aspects of Cluster Validity which assesses the quality and reliability of the clusters formed. This discussion aims to furnish attendees with a robust understanding of these pivotal clustering algorithms and their practical applications.

Related Videos

Activities

  • 11/4 Project Progress Report

Association Rules

11/18, 11/25 for 3 hours

Association rules learning delves into identifying meaningful relationships between variables in large datasets, using metrics such as interestingness and confidence measures to pinpoint strong rules that arise from data analysis. This session will provide a succinct introduction to the core concepts of association rules, along with an overview of the Frequent Pattern Growth algorithm and key techniques for Pattern Evaluation. Participants will be equipped with the knowledge to effectively apply these techniques in real-world scenarios

Activities

  • Classes are also offered in Teams
  • Reading: J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. In SIGMOD’95
  • Reading: J Han, J Pei, Y Yin, R Mao, Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Mining and Knowledge Discovery, 2004 - Springer
  • Reading: N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. ICDT'99, 398-416, Jerusalem, Israel, Jan. 1999
  • Reading: R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables”. ACM SIGMOD96

Related Videos

Activities

  • 11/18 Project Progress Report

Examination

12/2 for 3 hours

Time to evaluate. Different from other examination in our life, we do not want to assess how much we remember. It is more important to know how much we understand. Hence, each student can bring one A4-page paper with all kinds of notes into the classroom. Enjoy.

Notes

  • Students can take one A4 page with them
  • The locations will be annouced through emails

Student Presentation & Discussion

12/9 for 3 hours

Participants will engage in a collaborative exploration of a specified paper using the Jigsaw reading approach. Each student will be entrusted with understanding a particular section of the paper in depth, with the goal to elucidate their findings to group members. This initiative encourages not only a profound individual comprehension of the material but also fosters a synergistic learning environment, where aiding group members in grasping complex concepts becomes paramount. It’s a step towards nurturing a learning community where knowledge is mutually shared and amplified through collaborative discussion.

Activities

  • TAs will give the assignment.

Final Project Demo

12/16 3 hours

Culminating in a display of knowledge acquired through learning, analysis, and execution, this final project demonstration stands as a testament to your grasp of data mining principles throughout this course. Through this initiative, participants could also gain valuable experience in collaborative teamwork.

Requirements

  • Each group should generate 4 minute youtube clips to show in the class
  • Final project requirement description will be given through emails