Data Mining

DataMining

QuestionOne

Apriomining can handle convertible constraints but depends on the type ofstrains. When comes to constraints such as monotone, andanti-monotone are not accommodated into mining algorithms (Berry &ampLinoff, 2004). For instance when we use level framework does not needdirect pruning hence, constraining can be made and, therefore, can beconverted to the system.

QuestionTwo

Fromthe definition of the term colossal, it means a long sequence made ofa small number of an item. Therefore, colossal patterns contain moreof the core pattern. The relationship between the colossal patternand core pattern depends on the robustness. The robustness results tocore descendants which mean that when a small item or patternremoved, the resulting pattern will still have similar support set.The characteristic is similar to both colossal and core patterns(Berry &amp Linoff, 2004).

QuestionThree

Boostingis a learning machine which ensemble meta-algorithm and reduces biasprimarily and also variance. Its main function is to help inimproving the learners in that it sets weaker learners create asingle strong learner. Boosting is helpful in improving accuracy bycombining decision tree. It has also used ADTree which produceshighly accurate classifiers while generating trees in small size.

QuestionFour

Theensemble methods have been on the prime line of improving theclassification accuracy. The ensemble uses many models in improvingaccuracy. Also, it combines series of K learned models, for example,Model1, 2……….Model K with the aim of improving the accuracy.Popular methods are bagging which is used to average the predictionover a collection of classifiers also ensemble usually to combine aset of heterogeneous classifiers (Berry &amp Linoff, 2004).

QuestionFive

Classificationis the association between the instances features and the class theybelong to that classification algorithms are supposed to learn. Theclassification has also belonged to supervised. For example theinsurance company trying to assign customers into high-risk andlow-risk categories (Berry &amp Linoff, 2004)

Clusteringon another hand based on grouping items based on the similarities ofdata instances to each other. Example online movie company wasrecommending buying certain movies since other customer made similarmovie choices.

QuestionSix

Thetop-k ranking done by use of a top query those combines differentrankings. It outlines K objects with the highest score that dependson the aggregate function. The domains use top-k join operates. Thespace analysis also indicates the memory that requires the top-kalgorithms that perform sorted accesses. Milt way top-k join operatoralso adds upper advantage over evaluation tress of binary top-k joinoperators.

Reference

Berry,M., &amp Linoff, G. (2004). Datamining techniques.Indianapolis: Wiley.

Data Mining

DataMining

QuestionOne

Apriorimining depends on the type of convertible strains. For instance, aconvertible, neither monotone nor anti-monotone nor constraintscannot be accompanied into an Apriori mining algorithm. But withinthe wide level framework, no direct pruning based on the constrainingcan be made. For example, if the item set df violates constraint C,the constraint df will not be pruned into the system but can beinserted into frequency-pattern growth framework.

QuestionTwo

Colossalpatterns are those patterns that have a long sequence. Therefore,colossal patterns have far more core patterns. It, therefore,colossal patterns are more robust in that when a small number ofitems removed, the resulting pattern will still have similar supportset. Hence, robustness relationship between colossal and core patternresults to core descendants (Han &amp Kamber, 2006)

QuestionThree

Boostingis a term used to mean machine learning ensemble meta-algorithm usedto reduce bias primary and also variance especially in supervisedlearning. Also is a family of machine learning algorithms which isused to convert weak learners to the strong one. Boosting has beenhelpful in improving the accuracy of decision tree classifier sinceit combines decision trees (Han &amp Kamber, 2006). Also hasadditional feature called ADTree which produces highly accurateclassifiers while generating trees in small size

QuestionFour

Ensemblemethods have improved classification accuracy by using a combinationof several models and also by combining a series of K learned models,M2, M3……MK with the purpose of creating an improved model.

QuestionFive

Classificationof data typically is a supervised learning technique in which oneassign the pre-defined tag to an instance by the feature. For exampledeciding whether a particular patient record associated with acertain disease (Han &amp Kamber, 2006)

Clusteringof data on other hand is the grouping of similar or related datatogether and assigning them. For example in the hospital, patientsrecords are grouped according to symptoms rather than what thesymptoms shows.

Questionsix

Itwas done by using A top-query that combine different rankings. Itgives the K objects with the highest score that depends on theaggregate function. Ranking domains are used the top-k algorithm,top-k join operates (Han &amp Kamber, 2006).

Reference

Han,J., &amp Kamber, M. (2006). Datamining.Amsterdam: Elsevier.