Modern Data Mining Algorithms in C++ and CUDA C
- Author: Timothy Masters
- ISBN: 1484259874
- Year: 2020
- Pages: 237
- Language: English
- File size: 2.3 MB
- File format: PDF, ePub
- Category: C & C++
Book Description:
Discover a number of data-mining algorithms that are useful for picking small sets of significant features from among unwieldy masses of candidates, or extracting useful features from quantified factors.
As a severe information miner, you will frequently come face to face with tens of thousands of candidate attributes for your prediction or classification application, with the majority of the attributes being of little or no worth. You’ll know that several of these features may be useful only in combination with certain other attributes while being virtually useless alone or in combination with the majority of others. Some attributes may have tremendous predictive power, but only within a small, specialized subject of the feature area. The problems that plague contemporary data miners are endless. This publication helps you resolve this problem by presenting modern feature selection methods along with the code to execute them. Some of those techniques are:
- Forward selection component analysis
- Local attribute selection
- Linking features and a goal with a hidden Markov model
Improvements on conventional stepwise selection
- Nominal-to-ordinal conversion
All algorithms are intuitively justified and supported by the applicable equations and explanatory material. The writer also presents and describes absolute, highly commented source code.
The example code is in C++ and CUDA C however Python or other code can be substituted; the algorithm is important, not the code that’s used to write it.
- Combine main component analysis with forwarding and backward stepwise selection to identify a compact subset of a massive assortment of variables that catches the maximum possible variation within the whole set.
- Identify features that might have predictive power over just a tiny subset of the feature domain. Such attributes can be used by contemporary predictive models but can be overlooked by other feature selection procedures.
- Find an inherent hidden Markov model that regulates the distributions of feature variables and the goal simultaneously. The memory inherent in this method is especially valuable in high-noise software such as the forecast of financial markets.
Boost traditional stepwise selection in three manners: examine a group of best-so-far’ feature sets; test candidate attributes as well as cross-validation to automatically and effectively limit model sophistication, and at every step estimate the likelihood that our results could be just the product of arbitrary good luck. In addition, we estimate the likelihood that the improvement obtained by adding a new factor could have been an only a great fortune. Take a possibly valuable nominal variable (a class or class membership) that is unsuitable for input to a prediction model, and assign to each class a sensible numeric value that can be utilized as model input.
Intermediate to advanced information science programmers and analysts. C++ and CUDA C experience is highly recommended. However, this book may be utilized as a framework using different languages such as Python.