IFN645 Large Scale Data Mining


To view more information for this unit, select Unit Outline from the list below. Please note the teaching period for which the Unit Outline is relevant.


Unit Outline: Semester 2 2024, Gardens Point, Internal

Unit code:IFN645
Credit points:12
Pre-requisite:(IFN509 OR IFQ509 and enrolment in IN20 or IN27 ) or (IFN509 or IFQ509 and enrolment in IV53 or IV55 or IV56) or (192cps in IV04 or enrolment in IV54).
Equivalent:INN312
Coordinator:Yue Xu | yue.xu@qut.edu.au
Disclaimer - Offer of some units is subject to viability, and information in these Unit Outlines is subject to change prior to commencement of the teaching period.

Overview

The data that modern data scientists have access to is larger and more complex than in previous generations. Dealing with these data requires specialised algorithms and the use of a higher performance or cloud computing environment. This unit outlines the challenges and opportunities associated with big data and introduces data mining algorithms that scale to large datasets. This unit will expand on the material presented in earlier data mining units and students will use their programming knowledge to implement data mining algorithms in high-performance computing environments.

Learning Outcomes

On successful completion of this unit you will be able to:

  1. Discuss the challenges and opportunities associated with big data and the inability of some existing data mining algorithms to scale to these challenging problems.
  2. Critically assess the strengths and limitations of big data algorithms and be able to choose those appropriate algorithms to deal with different classes of problems.
  3. Develop data analytics solutions to apply appropriate data mining algorithms specially tailored towards big data to solve real world problems.
  4. Work individually or in a team to implement data analytics solutions and report the findings in written formats to specialist and non-specialist audiences.

Content

In this unit you will learn about the:

  • Challenges of big data (e.g. volume, velocity, veracity, variability)
  • Various types of big data (e.g. text, numeric, images, videos)
  • Data mining algorithms that scale to big data
  • Implementation of big data  analytics solutions on a high-performance computing environment

Learning Approaches

This subject will be delivered through the following means:

  • Lectures (2 hours) which provide the theoretical basis of the subject
  • Practicals (2 hours) which allow you to apply theory to practical (industry data-driven) problems using available software tools.

    The learning process will be focused on real-world scenarios. Emphasis will be placed on theoretical work, laboratory exercises and case studies. The exercises will be designed to reinforce key concepts and to assist in the completion of assessments. Problem handling assessments will be drawn from typical industry applications and real world data sources. 

Feedback on Learning and Assessment

Written feedback will be provided by teaching staff for Assessment Items 1 and 2. Informal feedback will be provided by teaching staff and peers in the weekly practical which will help for formal assessment.

Assessment

Overview

The assessments in this unit are designed for you to demonstrate a critical understanding of the data mining concepts acquired during the lectures, as well as the application of these concepts in real-world application settings acquired during practicals. The written examination will allow you to demonstrate your understanding of the methods and challenges associated with data mining.
Assessment criteria will be made available to you at the introduction of each assessment.

Unit Grading Scheme

7- point scale

Assessment Tasks

Assessment: Problem Solving Exercises

Short answer problem solving exercises addressing key components of the unit.

This assignment is eligible for the 48-hour late submission period and assignment extensions.

Weight: 25
Individual/Group: Individual
Due (indicative): Throughout the semester
Related Unit learning outcomes: 1, 2, 3

Assessment: Capstone Assignment

Implement a large-scale data mining project on a high-performance computing environment.

This assignment is eligible for the 48-hour late submission period and assignment extensions.

Weight: 35
Individual/Group: Either group or individual
Due (indicative): Towards end of semester
Related Unit learning outcomes: 3, 4

Assessment: Examination (Written)

Final Examination

Weight: 40
Individual/Group: Individual
Due (indicative): Central Examination Period
Related Unit learning outcomes: 1, 2

Academic Integrity

Students are expected to engage in learning and assessment at QUT with honesty, transparency and fairness. Maintaining academic integrity means upholding these principles and demonstrating valuable professional capabilities based on ethical foundations.

Failure to maintain academic integrity can take many forms. It includes cheating in examinations, plagiarism, self-plagiarism, collusion, and submitting an assessment item completed by another person (e.g. contract cheating). It can also include providing your assessment to another entity, such as to a person or website.

You are encouraged to make use of QUT’s learning support services, resources and tools to assure the academic integrity of your assessment. This includes the use of text matching software that may be available to assist with self-assessing your academic integrity as part of the assessment submission process.

Further details of QUT’s approach to academic integrity are outlined in the Academic integrity policy and the Student Code of Conduct. Breaching QUT’s Academic integrity policy is regarded as student misconduct and can lead to the imposition of penalties ranging from a grade reduction to exclusion from QUT.

Resources

No extraordinary charges or costs are associated with the requirements for this unit.

Risk Assessment Statement

There are no unusual risks associated with this unit