Data Mining

Theory & Lab

Course Description

This course provides students with an appreciation of the uses of data mining software in solving business decision problems. Students will gain knowledge of theoretical background to several commonly used data mining techniques and will learn about the application of data mining as well as acquiring practical skills in the use of data mining algorithms.

Key topics include data visualization and pre-processing, data mining in practice, models and patterns, classification trees, predictive modeling, descriptive modeling, classification models, and clustering.


Course Objectives

This course aims to achieve the following objectives:

  • Understanding the theoretical foundations and methodologies of data mining.
  • Applying data mining techniques and algorithms to real-world datasets.
  • Evaluating and innovating data mining approaches for practical and research applications.

Instructor Information

Instructor: Md. Riad Hassan
Office: A-510
Email: riad@cse.green.edu.bd

Course Outcomes (COs)

Upon successful completion of this course, students will be able to:

  • CO1: Apply crucial issues coherently and precisely using the knowledge of a variety of methods constituting the knowledge discovery process to solve real-world cases and studies.
  • CO2: Formulate the value and application of knowledge discovery and its problems to interpret them and to synthesize valid conclusions.
  • CO3: Establish the foundations of data mining and machine learning methods to be engaged in real-world related independent contexts of technological changes.

Topic Outline

Lecture Topic
1-3 Introduction to Data Mining, Knowledge Discovery Process, Data Types, Applications
4-7 Data Types, Statistics of Data, Similarity Measures, Data Quality, Data Cleaning, Data Integration
8-9 Data Transformation, Dimensionality Reduction (PCA)
10-12 Pattern Mining: Basic Concepts, Frequent Itemset Mining, Apriori Algorithm
13-14 Mining Frequent Patterns with Pattern Growth Approach
15 Pattern Evaluation Methods
  Midterm Exam
16-19 Classification: Basic Concepts, Decision Trees, Attribute Selection, Tree Pruning
20-21 Model Evaluation and Selection (Metrics, Cross-validation, Bootstrap)
22-23 Ensemble Methods (Bagging, Boosting, Random Forests), Class-imbalanced Data
24-25 Support Vector Machines (Linear, Nonlinear), Kernel Functions
26-27 Classification with Weak Supervision (Active Learning, Transfer Learning)
28-29 Cluster Analysis: Partitioning Methods (k-Medoids, k-Modes)
30-31 Hierarchical Clustering, Density-based Clustering (DBSCAN), Clustering Evaluation
  Final Exam

Text and Reference Materials


Assessment Methods

The final grade will be calculated based on the following components:

Assessment Method CO1 CO2 CO3 Total
Final Exam 20% 20%   40%
Midterm Exam 20% 10%   30%
Class Tests (Best 2 out of 3) 10%     10%
K/S/A Test 1 (Attendance + Presentation)   10%   10%
K/S/A Test 2 (Complex Assignment)     10% 10%
Total 50% 40% 10% 100%

Course Policies

  • Assignments: Late submissions will reduce one mark per day.
  • Class Tests: Two tests will be conducted. One makeup tests will also be connducted for those who miss class test with reasonable cause.