Algumas Metodologias Formais

Data mining and knowledge discovery are important terms to data analysts. This section is to give you some hints about some more formal approaches to that process. Data mining means to discover patterns in data. We are looking in relationships between all of the variables. We can use those patterns to verify what we already believe or to gain new knowledge.

Brief Overview

CRISP-DM CRISP-DM stands for the CRoss Industry Standard Process for Data Mining. It is a multi-step waterfall process, meaning that each step moves to the next in a linear fashion: Business understanding, data understanding, data preparation, modelling, evaluation, deployment.

CRISP-DM stands out from other methodologies partially because it includes a step at the beginning that looks into the context of why the data is there and what it could be useful for. CRISP-DM emphasises that data mining techniques are in a sense strategic, in that they are often used by competing businesses to gain a competitive advantage. The business understanding step makes this explicit.

KDD One of the older processes, established in 1996, its steps include Selection, Pre-Processing, Transformation, Data Mining, Interpretation/Evaluation.

SEMMA SEMMA (Sample, Explore, Modify, Model, Assess) is another five-step process. The idea is to first gain an understanding of the data by exploring sample information, then conduct other steps.

The Upshot The upshot is that these methodologies are all doing what we’ve been intuitively. That is, we are looking for decent target data, tweak it to meet our needs and then build interesting products out of it.

Further Reading Azavedo, Ana and Santos, Manuel F (2008) “KDD, SEMMA and CRISP-DM: A parallel overview” http://www.iadis.net/dl/final_uploads/200812P033.pdf

« Anterior: Fazendo Coisas! Hackdays, Prêmios e Prototipos | Próximo: Glossário »

Fonte: Some formal methodologies

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution 3.0 License