Learning Experience about Android Bigdata

Monday, February 15, 2016

What is NoSQL?

What is NoSQL?

A NoSQL database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are different from those used by default in relational databases, making some operations faster in NoSQL. NoSQL databases are increasingly used in big data and real-time web applications.

NoSQL (Not Only SQL):

• Databases that “move beyond” relational data models. It does not have no tables, limited or no use of SQL.

ü Focus on retrieval of data and appending new data. But not necessarily tables.

ü Focus on key-value data stores that can be used to locate data objects.

ü Focus on supporting storage of large quantities of unstructured data .

ü SQL is not used for storage or retrieval of data.

ü No ACID (atomicity, consistency, isolation, durability).

• NoSQL focuses on a schema-less architecture (i.e., the data structure is not predefined)

• In contrast, traditional relation DBs require the schema to be defined before the database is built and populated.

ü Data are structured

ü Limited in scope

ü Designed around ACID principles.

Sunday, January 31, 2016

Hadoop.

Hadoop Framework:

Hadoop Definition

Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Hadoop is a distributed file system and data processing engine that is designed to handle extremely high volumes of data in any structure.

Hadoop has two components:

Technically, Hadoop consists of two key elements.

• The first is the Hadoop Distributed File System (HDFS), which permits the high-bandwidth, cluster-based storage essential for Big Data computing.

• The second part of Hadoop is then a data processing framework Called MapReduce. Based on Google's search technology, this distributes or "maps" large data sets across multiple servers.

• Each server are then aggregated in the so-termed "Reduce" stage.

• The focus is on supporting redundancy, distributed architectures, and parallel processing.

Hadoop Architecture:

Hadoop framework includes following four modules:

· Hadoop Common: These are Java libraries and utilities required by other Hadoop modules. These libraries provides file system and OS level abstractions and contains the necessary Java files and scripts required to start Hadoop.

· Hadoop YARN: This is a framework for job scheduling and cluster resource management.

· Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.

· Hadoop MapReduce: This is YARN-based system for parallel processing of large data sets.

We can use following diagram to explain four components available in Hadoop framework.

MapReduce:

Hadoop MapReduce is a software framework for easily writing applications which process big amounts of data in-parallel on large clusters or thousands of nodes of commodity hardware in a reliable manner.

Two different tasks that Hadoop programs perform:

· The Map Task: It takes input data and converts it into a set of data, the individual elements are divided into tuples (key/value pairs).

· The Reduce Task: This task takes the output from a map task as input and combines those data tuples into a smaller set of tuples. The reduce task is always performed after the map task.

Typically both input and output are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

JobTracker & TaskTracker

· The MapReduce framework works like master and slave.

· It consists of a single master JobTracker and one slave TaskTracker per cluster-node.

· The master is responsible for resource management, tracking resource consumption/availability and scheduling the jobs component tasks on the slaves, monitoring them and re-executing the failed tasks.

· The slaves TaskTracker execute the tasks as directed by the master and provide task-status information to the master periodically.

· The JobTracker is a single point of failure for the Hadoop MapReduce service which means if JobTracker goes down, all running jobs are halted.

Hadoop Distributed File System (HDFS)

The Hadoop Distributed File System is based on the Google File System (GFS) and provides a distributed file system that is designed to run on large clusters (thousands of computers) of small computer machines in a reliable, fault-tolerant manner.

HDFS uses a master/slave architecture where master consists of a single NameNode that manages the file system metadata and one or more slave DataNodes that store the actual data.

A file in an HDFS namespace is split into several blocks and those blocks are stored in a set of DataNodes. The NameNode determines the mapping of blocks to the DataNodes.

The DataNodes takes care of read and write operation with the file system. They also take care of block creation, deletion and replication based on instruction given by NameNode.

HDFS Architecture

Given below is the architecture of a Hadoop File System.

HDFS follows the master-slave architecture and it has the following elements.

Namenode:

The namenode is the commodity hardware that contains the GNU/Linux operating system and the namenode software. It is a software that can be run on commodity hardware.

The system having the namenode acts as the master server and it does the following tasks:

Manages the file system namespace.
Regulates client’s access to files.
It also executes file system operations such as renaming, closing, and opening files and directories.

Datanode:

The datanode is a commodity hardware having the GNU/Linux operating system and datanode software. For every node (Commodity hardware/System) in a cluster, there will be a datanode. These nodes manage the data storage of their system.

Datanodes perform read-write operations on the file systems, as per client request.
They also perform operations such as block creation, deletion, and replication according to the instructions of the namenode.

Block:

The user data is stored in the files of HDFS. The file in a file system will be divided into one or more segments and/or stored in individual data nodes. These file segments are called as blocks.

In other words, the minimum amount of data that HDFS can read or write is called a Block. The default block size is 64MB, but it can be increased as per the need to change in HDFS configuration.

Features of HDFS

It is suitable for the distributed storage and processing.
Hadoop provides a command interface to interact with HDFS.
The built-in servers of namenode and datanode help users to easily check the status of cluster.
Streaming access to file system data.
HDFS provides file permissions and authentication.

Goals of HDFS:

· Fault detection and recovery : Since HDFS includes a large number of commodity hardware, failure of components is frequent. Therefore HDFS should have mechanisms for quick and automatic fault detection and recovery.

· Huge datasets : HDFS should have hundreds of nodes per cluster to manage the applications having huge datasets.

· Hardware at data : A requested task can be done efficiently, when the computation takes place near the data. Especially where huge datasets are involved, it reduces the network traffic and increases the throughput.

The Algorithm

· Generally MapReduce paradigm is based on sending the computer to where the data resides!

· MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.

o Map stage : The map or mapper’s job is to process the input data. Generally the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data.

o Reduce stage : This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.

· During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the cluster.

· The framework manages all the details of data-passing such as issuing tasks, verifying task completion, and copying data around the cluster between the nodes.

· Most of the computing takes place on nodes with data on local disks that reduces the network traffic.

After completion of the given tasks, the cluster collects and reduces the data to form an appropriate result, and sends it back to the Hadoop server.

Inputs and Outputs

The MapReduce framework operates on <key, value> pairs, that is, the framework views the input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of the job, conceivably of different types.

The key and the value classes should be in serialized manner by the framework and hence, need to implement the Writable interface. Additionally, the key classes have to implement the Writable-Comparable interface to facilitate sorting by the framework. Input and Output types of a MapReduce job: (Input) <k1, v1> -> map -> <k2, v2>-> reduce -> <k3, v3>(Output).

	Input	Output
Map	<k1, v1>	list (<k2, v2>)
Reduce	<k2, list(v2)>	list (<k3, v3>)

Terminology:

· PayLoad - Applications implement the Map and the Reduce functions, and form the core of the job.

· Mapper - Mapper maps the input key/value pairs to a set of intermediate key/value pair.

· NamedNode - Node that manages the Hadoop Distributed File System (HDFS).

· DataNode - Node where data is presented in advance before any processing takes place.

· MasterNode - Node where JobTracker runs and which accepts job requests from clients.

· SlaveNode - Node where Map and Reduce program runs.

· JobTracker - Schedules jobs and tracks the assign jobs to Task tracker.

· Task Tracker - Tracks the task and reports status to JobTracker.

· Job - A program is an execution of a Mapper and Reducer across a dataset.

· Task - An execution of a Mapper or a Reducer on a slice of data.

· Task Attempt - A particular instance of an attempt to execute a task on a SlaveNode.

Cloud Computng..

What is Cloud Computing ?

Cloud Computing is the use of hardware and software to deliver a service over a network (typically the Internet). With cloud computing, users can access files and use applications from any device that can access the Internet. Anexample of a Cloud Computing provider is Google's Gmail.

Cloud computing services can be private, public or hybrid.

Tuesday, September 15, 2015

Basics components in Android applications

Before getting started with android programming, we need to know some of the basic components of Android applications. These are the buildings blocks of Android applications. These will let us understand Android application from scratch. Let us take a quick look at these components.

Activities

An activity in Android represents a single screen with user interface. An android application typically consists of several activities. An activity interacts with the user to do one thing, only ONE, such as Unlock screen, dial a phone, view home etc. An application consists of multiple activities that are loosely bound together. But only one activity can be specified as main activity which is displayed while launching the application. Every time an activity starts previous activity is stopped but the data is preserved.

For instance an email application will have one activity to display list of new emails, another activity to compose email, another activity for reading emails and so on.

Have a look at activities in detail here

View

An activity contains views and viewgroups. Views in Andriod are UI components that could be used to build user interface. Examples are buttons, label, and text boxes. One or more views can be grouped together into a ViewGroup. Types of View in android.

· Basic Views
· List Views
· Picker Views
· Menus
· Display Views
· Additional Views

ViewGroup provides a layout where you can order the appearance of views. Android support the following view groups.

LinearLayout
· TableLayout
· AbsoluteLayout
· FrameLayout
· RelativeLayout
· ScrollView

Services

Services are something that does not require user interface. It performs operation without user interaction in the background but does not get initiated without user invocation. Another component such as an activity should invoke. To be more technical, service does not have an independent thread, they make use of the main thread from hosting process. For instance you can download something while you get involved in some other application, where download is a service running in background.

Content Providers

Android is embedded with SQLite Database where all your data gets stored. Content providers in Android manage data that is being shared by more than one application. Consider a case where your Contacts are stored in centralized repository, so here many applications may require access to it or may even require modifying it. In such cases these applications need to query the centralized repository through content providers, so an application with proper permissions can read or modify the Contacts based on permissions. Content provider is concept evolved to manage common data based on permissions. This is a critical concept that has led to develop in-house android applications in a better way.

Broadcast Receivers

Broadcast receiver is also a component where you can register for system or application events. Once registered you will be notified about the events. Broadcast originates from the system as well as applications. Instance for broadcast originating from the system is ‘low battery notification’. Application level is, like when you download an mp3 file, Mp3 player gets notified about it, and gets added to player list. For this action to take place, mp3 player has to register for this event. To get detailed information on Broadcast receiver in Android, have a look at this tutorial

Intent

Intents are messages that allow components to request activities from other components. Consider an activity requires another activity to perform some functionality in such cases communication between activities occurs via Intent