Surprisingly, over 90% of the data in the world today has been created only in the last couple of years. With the revolution in mobile industry, social media networks, sharing of digital photos and videos online we are continuing to grow the world’s data at lightning speed. To manage such a large data we need something that is extraordinary system.
There comes the concept of Big Data and Hadoop ecosystems. Hadoop ecosystems has capability of more than just holding the data. It is a combination of different technologies and platforms which combine themselves up to collect, analyze, visualize and share the data. All these as a combination are making Big IT Giants to re-think the way they are handling their data so far.
We can judge the speed of growing data by just going through some of the basic things which happen in our daily life.
Every 60 seconds:
1. Over lakh tweets are created on Twitter
2. Nearly 10 lakh status updates happen on Facebook
3. Nearly 15 million instant messages are sent
4. Nearly 10 Lakh Google searches are conducted
5. Over 150 million emails are sent across the world.
6. 2000 TB of data is created either in form of videos, movies etc
7. 300 new mobile web users added
We can see the list continues to grow with each passing day. In coming times we have so much of data that we can’t think to handle that one using the old fashioned way.
Why we needed Hadoop:
Hardware Limitations: One major reason to have Hadoop in the picture is the problem we are facing in terms of Hardware. No doubt the hardware today are amazingly fast. WIth up to 256 GB of RAM available in market, network speed up to 10GB/sec, max local storage for computer is extended up to 24TB. But all these features are lagging and it is expensive to have specialized hardware when it comes to the speed of the inflow of data.
How Hadoop is overcoming the Hardware Limitations:
To overcome the modern hardware limitations we need a method of fast, cost-effective storage. Currently, in Hadoop the approach is to utilize multiple machines to spread out the data to as much machines as possible, which results in increasing the maximum speed of reading data.
Simple mathematical rule is applicable here, for example:
1 Hard drive = maximum of 100MB/Sec
10000 Hard drives x 100MB/Sec = maximum of 100 GB/sec (102,400 MB/Sec)
Each machine, or we call it a node, stores a part of the data. Nodes always have extra copies of data to ensure data is not lost in the case of node failure. If such case happens then the data is fetched from another node which has the copy of the data.
It is not always the hardware problem which we are facing, nowadays we are facing the problems with storage mechanism as well. Today, most of the data is stored in structured way in relational databases. Most relational databases enforce a set of rules to ensure that data is consistent. With these old fashioned rules in place, it becomes much harder to spread data out to multiple nodes to increase retrieval speed and therefore processing speed. Newer databases are offering different approaches to overcome these limitations (like direct processing of unstructured data in csv files, log files, video streaming etc and also processing data in the form of JSON or XML).
Traditional relational databases generally result in slower performance while ensuring transactional consistency across one or more database servers. This is due to the requirement that the storage of data must occur on each database server, limiting vertical scaling to the speed of the slowest server’s speed of storage. While transaction consistency may be critical for some systems, when datasets reach extreme scale, traditional database approach often cannot keep up and require alternative approaches to data storage and retrieval.
All this concludes that, big data architectures are required to overcome these limitations as our data grows beyond the reach of a single server or database cluster.
Webner Solutions is a Software Development company focused on developing CRM apps (Salesforce, Zoho), LMS Apps (Moodle/Totara), Websites and Mobile apps. If you need any software development assistance please contact us at firstname.lastname@example.org