Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CIS 375 BUSINESS DATA MINING
For your group project, your team has to include the following three components in your presentation:
1) The conceptual model. You will describe in detail a>processes, and techniques that we study in this course. The task could be from your current job, from an industry that
interests you, or a Kaggle or other data science competition platforms. You can research how data mining is applied to a
particular business task in a particular industry or you can propose a>known or written about (using data mining). In either case, it is likely that you will have to use your own creativity to flesh
out parts of the process where the details are not documented.
2) The hands-on work. You will use Python to mine actual data for a problem of interest which you described in the
conceptual model. These could be data from a problem from your current job, something of interest to the school, data
acquired from the web (e.g., https://www.kaggle.com/), etc. You will design the data mining task, mine the data, and
describe your results. You also will research existing solutions to the problem, if any have been proposed or documented.
Your own data and results need not be on par with actual industry results; the goal is for you to get as realistic a hands-on
experience as possible, given the constraints of what you have learned.
3) Discussion. You will discuss how mining data solved the problem you described earlier and provide actionable insights.
In addition, you will discuss potential limitations of the current approach as to data and methods employed and suggest how
future work can address them.
In presenting your project, think of yourselves as analysts employed by or retained by a company (large or small) or by a
funding source (e.g., a VC firm or incubator), who wants to understand the state of the art for using data mining for the task
in question. Review what has been done to date on your problem. Don’t worry too much about coming up with a novel idea.
It is more important to develop the idea well (within the scope of what we’ve discussed in class) and execute it using Python.
You should use the “data mining process” to structure your project. Keep in mind that it may be ineffective simply to
proceed linearly through the steps, and this may need to be reflected in your analysis. You should interact with me from the
preparation of your initial ideas, as a consulting group would interact with a firm or funding source in preparing a research
report. Use your imagination, prior experience, or ask me to help to fill in any gaps between the material available and what
you would be able to find out if you actually could interact with the client firm.
Deliverable #1: By 11/22 and 11/24, you will live-present to the class the conceptual part of your group project work. Each
group will have 5 minutes. Include in your presentations your ideas about: What is your conceptual model? What is the exact
business problem? What precisely is the data mining problem?
Deliverable #2: By 11/28, you group should submit your presentation slides (one submission per group) to the Canvas group
project section. You are obliged to present the submitted version during your in-class presentation.
Deliverable #3: On 11/29 and 12/1, you will live-present to the class the final outcome of your project. Each group will have
12 minutes (hard limit). All group members should contribute to the presentation. Include in your presentations your ideas
about: Is it supervised or unsupervised? What is a data instance? What might be the target variable What features would be
useful? Which models do you execute? How do you compare their results? Which one is the winning model? How exactly
would it add business value?
You will get the most out of the project if you interact with me during the development of your ideas. Talk to me especially
before choosing one of the business problems we cover in class (see the syllabus). And please feel free to come talk to me
about your ideas as often as you’d like.
Your presentation should include the information detailed below, in approximately the order given. Your
presentation need not have corresponding sections or bullet points, but I should be able to find the information. Be as
precise/specific as you can.
Business Understanding (take this seriously)
• Identify, define, and motivate the business problem that you are addressing.
• How (precisely) will a data mining solution address the business problem?
Data Understanding
• Identify and describe the data (and data sources) that will support data mining to address the business problem.
Include those aspects of the data that we routinely talk about in class and/or in the homeworks.
Data Preparation
• Specify how these data are integrated to produce the format required for data mining.
(Note: data preparation can be time consuming. Get started early. Talk to the Prof or TA if you need advice.)
Modeling
• Specify the type of model(s) built and/or patterns mined.
• Discuss choices for data mining algorithm: what are alternatives, and what are the pros and cons?
• Discuss why and how this model should “solve” the business problem (i.e., improve along some dimension of
interest to the firm).
Evaluation
• Discuss how the result of the data mining is/should be evaluated. How should a business case be developed to
project expected improvement? ROI? If this is impossible/very difficult, explain why and identify any viable
alternatives.
Deployment
• Discuss how the result of the data mining will be deployed.
• Discuss any issues the firm should be aware of regarding deployment.
• Are there important ethical considerations?
• Identify the risks associated with your proposed plan and how you would mitigate them.