Tuesday, July 21, 2015

Big data

BIG DATA : What is Big Data?

                        Big data is a collection of data sets, it is more large and complex that it becomes difficult to process using on-hand database management tools.
The challenges include capture, data curation, storage, search, sharing, analysis, and visualization.

The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.

Big data is the realization of greater business intelligence by storing, processing, and analyzing data that was previously ignored due to the limitations of traditional data management technologies.

Characteristics of Big Data

      Volume  :  Large volumes of data.

      Velocity :  Quickly moving data.

      Variety   : Structured, unstructured, images, etc.

      Veracity : Trust and integrity is a challenge and a must and is important for big data                      just as for traditional relational Databases.
      Value      : Big Data is having no use of unless we can turn it into value.



Three Types of Data:
·        Structured data : Relational data.
·        Semi Structured data : XML data.
·        Unstructured data : Word, PDF, Text, Media Logs.
     The Four Dimensions of Use in Big Data
      The users want to interact with their data,
ü Totality: Users have an increased desire to process and analyze all available data.

ü Exploration: Users apply analytic approaches where the schema is defined in response to the nature of the query.

ü Frequency: Users have a desire to increase the rate of analysis in order to generate more accurate and timely business intelligence.

ü Dependency: Users’ need to balance investment in existing technologies and skills with the adoption of new techniques.
    Tools and System :

}  Hands-on System
·        mySQL
·        MapReduce (YARN)
·        HDFS
·        Hbase
·        DynamoDB
·        Cassandra
·        Memcached
·        Redis
·        MongoDB
·        Pig
·        HIVE
·        Impala
·        Mahout
·        Spark

}  Design Knowledge
·        BigTable
·        Dynamo
·        Dremel
·        Spanner
·        Storm


No comments:

Post a Comment