Data and Visual Analytics

Pick your own topic:You need to justify that the topic is interesting, relevant to the course, of suitable difficulty.
Required components: (1) at least one large dataset, (2) some non-trivial analysis/algorithms/computation (e.g., clustering, classification) performed on the dataset, and (3) an interactive user interface that interact with the algorithms (can be visual, voice-controlled, on tablet, desktop, etc.).
Harder way:
Joint projects with other courses might be negotiable. You must obtain the instructor's approval, and you need to clarify exactly what steps will be done for this course, as and for the other course.
Projects related to your dissertation/master-project are also possible, as long as there is no 'double-dipping', i.e., you clearly specify what the project will do, in addition to what you were planning to do for your thesis anyway.
Once you have selected a topic, you should do some background reading so that you are capable of describing, in some detail, what you expect to accomplish. For example, if you decide that you want to implement some new proposal for a multidimensional file structure, you will have to carefully read the paper that proposes similar structures, pinpoint their weaknesses, and explain how your approach will address these weaknesses. Once you have read up on your topic, you will be ready to write your proposal.

Your proposal should answer Heilmeier's questions (all 9 of them) whenever possible; if you think a question is not very relevant, briefly explain why. In other words, your proposal should describe what you plan to do (the probelm to address), why you want to do it, how you will do it (what tools? e.g., SQLite, PostgreSQL, Hadoop, Kinect, iPad, etc.), how your approach is better than the state of the art, why it may succeed, and when it does, what differences will it make, how you will measure success, how long it's gonna take, etc.

You must describe what portion of the project each team member will be doing.

Your proposal should be fewer than 1000 words (excluding references, titles, etc.), 12pt font, typed (e.g., latex/pdf/msword), and with pictures if useful. It should be self-contained. For example, don't just say: "We plan to implement Smith's Foo-Tree data structure [Smith86], and we will study its performance." Instead, you should briefly review the key ideas in the references, and describe clearly the alternatives that you will be examining.

Grading scheme & Submission instructions
60% for the survey
30% for innovation
10% for plan of activities
For every Heilmeier question that's not mentioned, deduct 5%.
You may consider organizing your proposal based on the Heilmeier questions (e.g., each section addresses one question)
Your survey should have at least 3 papers or book chapters per group member (outside of the reading list).Short papers, like PNAS, Nature, Science papers, count as 0.5.
Copying the abstract of the papers is obviously prohibited, constituting plagiarism.
For each paper, describe

(a) the main idea,

(b) why (or why not) it will be useful for your project, and

(c) its potential shortcomings, that you will try to improve upon.
Clear problem definition: give a precise formal problem definition, in addition to a jargon-free version (for Heilmeier question #1).
Provide a plan of activities and time estimates, per group member. List what each group member has done, and will do.
Team's contact person submits a softcopy via T-Square (i.e., that person submits for the whole team)
[-5% if not included] Distribution of team member effort. Can be as simple as "all team member contribute similar amount of effort". If effort distribution is too uneven, I may assign higher scores to members who contributed more.
Proposal Presentation [!!! tentative !!!]
Polo is figuring out the details; since may not have enough time for all teams to present in class; also figuring out how DL may participate.

3 min per team. See T-Square for your team's presentation date and time.

2.5 min maximum for presentation
0.5 min for Q&A + transition to next team
Time limit strictly enforced! You'll be booted off the podium when time is up.

Don't use too many slides; less is more! Fewer slides mean it's less likely that you will overrun. Being succint is hard, so practice your timing and delivery!

Make sure you answer Heilmeier questions, briefly mention your survey, expected innovation, plan of activites, etc. Presentation will be graded similarly as the proposal writeup.

Progress Report
This should be fewer than 1600 words, 12pt font, typed.

It mainly serves as a checkpoint, to detect and prevent dead-ends and other problems early on.

It should consist of the same sections as your final report (introduction, survey, etc), with a few sections "under construction", describing the work performed up to then, and the revised plans for the whole project.

Specifically, the introduction and survey sections should be in their final form; the section on the proposed method should be almost finished; the sections on the experiments and conclusions will have whatever results you have obtained, as well as `place-holders' for the results you plan/hope to obtain.

Grading scheme & Submission instructions
70% for proposed method (should be almost finished)
25% for the design of upcoming experiments / evaluation
5% for plan of activities (in an appendix, please show the old one and the revised one, along with the activities of each group member)
Clear list of innovations: give a list of the best 2-4  ideas that your approach exhibits.
Team's contact person submits a softcopy via T-Square (progress report only)
[-5% if not included] Distribution of team member effort. Can be as simple as "all team member contribute similar amount of effort". If effort distribution is too uneven, I may assign higher scores to members who contributed more.
Final Poster Presentation [!!! tentative !!!]
Each project team presents one poster, on either Tuesday or Thursday of the last week of class (assigned by the instructor).

Your team need to plan ahead, to design and print the poster days *before* your presentation day, to avoid last-minute rush.

The poster must be at least 20 inches (width) by 30 inches (height). Foam core poster boards will be provided to mount the poster.

Each team will have a few minutes (exact amount to be decided) to present the poster.

At least one project member should be present during the poster hours, or a significant, pre-arranged, subset of it.

Demo: it is optional but encouraged. If you do give a demo, please bring your own laptop (and everything else necessary: ethernet cable, power adaptors, etc)

Who will attend: We plan to open the sessions up to everybody.

Your poster should cover:

Motivation/Introduction: remind us what you're doing, why it's important and why we should care
Your approaches (algorithm and visualization): what it is, its intuition, why does it work, etc.
What's your data: where you got it, what's its characteristics (e.g., size on disk, # of records, temporal or not, etc.)
Experiment and results: how did you evaluate your approaches? What are the results? How do you methods compare to other methods (if any)?
Conclusions (and optinally future work/discussion)
Grading scheme: 2% for each of the above points, plus 5% for presentation delivery (e.g., good slides? Did you practice?)

Final Report
It will be a detailed description of what you did, what results you obtained, and what you have learned and/or can conclude from your work.


Writeup: fewer than 2800 words, 12pt font, typed. Describe in depth the novelties of your approach and your discoveries/insights/experiments, etc.  
Software: packaging, documentation, and portability. The goal is to provide enough material, so that other people can use it and continue your work.
Grading scheme & Submission instructions
Writeup[2%] Introduction - Motivation
[3%] Problem definition

[5%] Survey
Proposed method[10%] Intuition - why should it be better than the state of the art?
[35%] Description of your approaches: algorithms, user interfaces, etc.
Experiments/ Evaluation[5%] Description of your testbed; list of questions your experiments are designed to answer
[25%] Details of the experiments; observations (as many as you can!)
[5%] Conclusions and discussion
[-5% if not included] Distribution of team member effort. Can be as simple as "all team member contribute similar amount of effort". If effort distribution is too uneven, I may assign higher scores to members who contributed more.
Team's contact person submits one zip file, via T-Square, that contains the following (software + writeup softcopy) [10%]:a concise, short README.txt file, corresponding to the "user's manual". This file should describe the package in a few paragraphs, how to install it, how to use it, and how to run a demo.
a DOC directory, with your writeup, and your presentation slides (in your favorite form: latex, pdf, powerpoint, ms-word)
make sure that your package includes only the absolutely necessary set of files!

Due Dates
As announced on the course homepage

Based on materials by Prof. Christos Faloutsos
Powered by