Cloud Opinion South Africa

Tips to leverage the cloud for big data

In today's competitive business environment, data is an asset that can be critical to success. Data can drive insights into customer behaviour, help improve quality and cost of operations, drive innovative product features, and ultimately increase the bottom line.
Stuart Miles via
Stuart Miles via freedigitalphotos.net

With every click, swipe, pinch, tap, like, tweet, check-in, share, and API call, we are generating data. Big Data is all about storing, processing, analysing, organising, sharing, distributing and visualising these massive amounts of data so that companies can distil knowledge from it, gain valuable business insights from that knowledge, and make better business decisions, all as quickly as possible.

Cloud computing ensures that our ability to analyse large amounts of data, and to extract business intelligence, is not limited by capacity or computing power. The cloud gives us access to virtually limitless capacity, on-demand, and businesses pay only for the resources they consume. In doing so, it lowers total cost, maximises revenue and gets data processing done faster at scale.

Elasticity, the ability to grow or shrink technology infrastructure resources on demand, is a fundamental property of cloud computing that drives cost benefits. While traditional data warehouses that are tuned to answer regularly asked questions such as generating the nightly sales report, have capacity that can be easy to predict, the analytics to discover new trends and correlations in data is an activity that requires an unpredictable amount of compute cycles and storage.

For example, to process Big Data in a traditional on-premises set up, businesses have to provision for the maximum power they might need at some point in the future. To process Big Data in the cloud, businesses can expand and contract their infrastructure resources depending on how much they need at the present moment. They no longer have to wait for weeks or months to procure, acquire and setup physical servers and storage. With cloud computing, businesses can roll out hundreds or thousands of servers in hours.

To get the most out of data here are some simple ideas of how to use cloud services to access Big Data analytics and its full potential:

Enhance your data

Having good data is often better than having lots of data. Incorrect or inconsistent data might lead to skewed results. For example, when you have to analyse data from hundreds of different disparate sources, inconsistency in structure and format of datasets often lead to biased insight, especially when the data is not transposed or transformed into a common format. In order to get accurate and consistent data, it's important to enhance it, which can include cleansing, validating, normalising, deducing and collating the data.

Businesses can enhance their data programmatically through scripts and programs, however, some data analysis such as tagging photos, normalising catalogues or even simply checking spelling requires human intervention to ensure accuracy. Tapping into a diverse, on-demand and scalable workforce, like Amazon Mechanical Turk, is the key to enhancing data. Splitting large data analysis jobs into short tasks allows them to be completed quickly and can discern the quality and reliability of data which is something computers can't easily do.

Point your data source to the cloud

If your philosophy is to collect as much data as possible and measure everything, you will need massive storage capacity. Storage in the cloud is scalable, durable, reliable, highly available and most importantly, it's inexpensive. Another benefit of cloud storage is that instead of moving data in periodic batches, you can point your data source to the cloud, bring data closer to the compute resources for analysis, and reduce latency.

Additionally, storing data in the cloud makes it easy to share with partners and other stakeholders because they too can access the information anytime, from anywhere and can leverage the same pay-as-you-go, on-demand resources to extract and compute the data.

Analyse your data in parallel using an elastic supercomputer

The main challenges in effectively conducting Big Data analysis include installation and management of hardware, scaling capacity up and down elastically, and aggregating data from multiple sources. Additionally, data processing systems must allow for inexpensive Big Data experimentation as the questions you ask of your data are likely to change over time.

The open source Hadoop platform, and its ecosystem of tools, help solve these problems because it can horizontally scale to accommodate growing data volumes and can process unstructured and structured data in the same environment. Hadoop integrates with many technologies, such as statistical packages and a variety of programming languages, to accommodate complex data analytics.

Hadoop in the cloud removes the cost and complexity of setting up and managing an on-premises Hadoop installation. This means any developer or business has the power to do analytics without large capital expenditures. Today, it is possible to spin up a Hadoop cluster in the cloud within minutes on the latest high-performance network and computing hardware without making a capital investment to purchase the resources upfront. Organisations have the ability to expand and shrink a running cluster on demand meaning if they need answers to their questions faster, they can immediately scale up the size of their cluster to crunch the data quicker.

Access aggregated data in real-time with a 2-Tier processing model

To make large-scale data analytics simpler, optimise data into two tiers. First, use a batch tier to analyse massive datasets in parallel, and then store the aggregated data in a NoSQL data store called the Query Tier. In this format, data is organised and indexed on input so businesses can continuously query their large data sets in real-time. This is especially useful when you want to visualise your Big Data.

The cloud accelerates Big Data analytics. It gives enormous power to business divisions to work with large data sets, without limits. Since the cost of experimentation is low in the cloud, businesses can experiment with Big Data analysis often and respond to complex business questions quickly. The cloud provides instant scalability and elasticity and allows companies to focus on deriving business value from their data instead of maintaining and managing computing infrastructure. It enhances the ability and capability to ask interesting questions about data and get meaningful answers at a price point unmatched by traditional technologies.

About Attila Narin

Head of EMEA Solutions Architects, Amazon Web Services
Let's do Biz