Automating the Design of Data Mining Algorithms: An by Gisele L. Pappa

By Gisele L. Pappa

Data mining is a truly lively learn region with many profitable real-world app- cations. It involves a collection of options and strategies used to extract attention-grabbing or helpful wisdom (or styles) from real-world datasets, delivering invaluable aid for determination making in undefined, company, executive, and technology. even if there are already many sorts of knowledge mining algorithms to be had within the literature, it really is nonetheless dif cult for clients to settle on the very best information mining set of rules for his or her specific info mining challenge. additionally, facts mining al- rithms were manually designed; hence they comprise human biases and personal tastes. This publication proposes a brand new method of the layout of knowledge mining algorithms. - stead of counting on the sluggish and advert hoc means of guide set of rules layout, this e-book proposes systematically automating the layout of knowledge mining algorithms with an evolutionary computation procedure. extra accurately, we recommend a genetic p- gramming procedure (a kind of evolutionary computation strategy that evolves c- puter courses) to automate the layout of rule induction algorithms, a kind of cl- si cation process that discovers a suite of classi cation ideas from facts. We specialize in genetic programming during this ebook since it is the paradigmatic kind of computer studying approach for automating the new release of courses and since it has the good thing about appearing a world seek within the house of candidate strategies (data mining algorithms in our case), yet in precept different forms of seek equipment for this activity can be investigated within the future.

Show description

Read Online or Download Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach PDF

Best data modeling & design books

Medical Imaging and Augmented Reality Second International Workshop

This scholarly set of well-harmonized volumes offers vital and entire assurance of the interesting and evolving topic of clinical imaging structures. prime specialists at the foreign scene take on the most recent state of the art innovations and applied sciences in an in-depth yet eminently transparent and readable technique.


Metaheuristics express fascinating houses like simplicity, effortless parallelizability, and prepared applicability to forms of optimization difficulties. After a complete advent to the sector, the contributed chapters during this ebook contain motives of the most metaheuristics recommendations, together with simulated annealing, tabu seek, evolutionary algorithms, synthetic ants, and particle swarms, by means of chapters that exhibit their functions to difficulties similar to multiobjective optimization, logistics, motor vehicle routing, and air site visitors administration.

Extra resources for Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach

Example text

This does not mean that the algorithm would be good at class predictions. It means rather that the standard classification accuracy rate is too easy to be maximized when the class distribution is very unbalanced, and so a more challenging measure of predictive accuracy should be used instead in such cases. Other popular measures to evaluate predictive accuracy are sensitivity and specificity and some metric based on Receiver Operating Characteristic (ROC) analysis. 2 The Classification Task of Data Mining 21 examples correctly classified in the positive class (T P) divided by the total number of positive examples present in the test set (T P + FN).

The main difference between PN-Rules and the traditional algorithms is that the former finds two sets of rules: a set of P-rules and a set of N-rules. , examples belonging to the class predicted by the rule) in the training set. 3: LearnOneRule(R) bestRule = R candidateRules = 0/ candidateRules = candidateRules ∪ bestRule while candidateRules = 0/ do newCandidateRules= 0/ for each candidateRule CR do Refine CR Evaluate CR if Refine Rule Stopping Criterion not satisfied then newCandidateRules = newCandidateRules ∪ CR if CR is better than bestRule then bestRule = CR candidateRules = Select b best rules in newCandidateRules return bestRule the set of examples covered by the P-rules.

Examples that were not available in the original dataset and whose class is truly unknown to the user. In practice, it is the predictive accuracy on this third dataset, to be available only in the future, that will determine the extent to which the classification algorithm was successful in practice. The predictive accuracy on the test set is very interesting for academic researchers, but it is less useful to the user in practice, simply because the classes of examples in the test set are known to the user.

Download PDF sample

Rated 4.54 of 5 – based on 19 votes

About the Author