Thursday, August 16, 2018

Data Mining



  • Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learningstatistics, and database systems.

  • Data mining is an interdisciplinary subfield of computer science with an overall goal to extract information (with intelligent method) from a data set and transform the information into a comprehensible structure for further use.

  • Data mining is the analysis step of the "knowledge discovery in databases"  Aside from the raw analysis step, it also involves database and data management aspects, data pre-processingmodel and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

The major steps involved in a data mining process are:-


  • Extract, transform and load data into a data warehouse.
  • Store and manage data in a multidimensional databases.
  • Provide data access to business analysts using application software.
  • Present analyzed data in easily understandable forms, such as graphs.


  • The first step in data mining is gathering relevant data critical for business. Company data is either transactional, non-operational or metadata. Transactional data deals with day-to-day operations like sales, inventory and cost etc. Non-operational data is normally forecast, while metadata is concerned with logical database design.



  • Patterns and relationships among data elements render relevant information, which may increase organizational revenue. Organizations with a strong consumer focus deal with data mining techniques providing clear pictures of products sold, price, competition and customer demographics.
  • The second step in data mining is selecting a suitable algorithm - a mechanism producing a data mining model. The general working of the algorithm involves identifying trends in a set of data and using the output for parameter definition.
  • The most popular algorithms used for data mining are classification algorithms and regression algorithms, which are used to identify relationships among data elements. 
  • Major database vendors like Oracle and SQL incorporate data mining algorithms, such as clustering and regression tress, to meet the demand for data mining.

         STAY UPDATED FOR MORE POST!
           SHARE IT WITH YOUR FRIENDS AND DON'T FORGET TO COMMENT BELOW!↓




No comments:

Post a Comment

Computational Biology

<!-- Google Tag Manager --> <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),ev...