Technology is changing how we interact with data and computation. Businesses are increasingly using big data to analyse information in minutes rather than days, empowering decision-makers with valuable real-time information. Big data refers to massive datasets that require more than traditional data analytics to process them within a reasonable time frame.
Startups and enterprises are also collecting more data than ever to gain business insights, improve process efficiencies and better target customers. According to the International Data Corporation, the amount of data enterprises create and store doubles every 18 months. Consequently, businesses are struggling to keep up, often swimming in more data than they can analyse and action.
Below, we predict what data trends are likely to emerge in 2017, and how this will impact business. Namely, the Internet of Things (IoT), Hadoop and Apache Spark, machine learning and cybersecurity.
1. Internet of Things
The Internet of Things (IoT) is the idea of having everything connected to the Internet – smartphones, vehicles, buildings, household appliances, micro-chipped animals, etc. The IoT has significant benefits for personal and professional purposes. You could connect your alarm clock to your coffee machine and toaster, which would communicate and make you breakfast as you wake up. The diagram below (Figure 1) illustrates the potential of the IoT to impact technology inside the home. On a professional level, companies could employ proximity-based advertising or track goods within a supply chain.
Figure 1 (Source: Entrepreneur.com)
Interconnectivity produces more data, allowing transparency within processes. For example, there are many areas of a distribution channel to optimise. You could tag a product in a distribution channel, and record locational data on its route from the factory to the warehouse and then to stores. This data would provide valuable information into the inefficiencies of the supply chain process, allowing you to draw reliable data-driven conclusions instead of guesswork and intuition.
As firms better understand the enormous value of collecting and analysing data from consumers, more will invest in IoT technology. A direct consequence of this is that businesses will end up with huge stores of data which they hope to analyse efficiently. Infrastructure surrounding the processing of big data will then also continue to mature.
2. Hadoop and Apache Spark
There are two main challenges with big data: storage and processing. The market leader for both purposes is Hadoop (formally Apache Hadoop), a platform for large datasets. As such, it has become almost synonymous with big data. Enterprises have widely adopted Hadoop over the last few years, and there has been an emergence of many third-party applications written for systems running Hadoop. However, the focus going forward for most enterprises will shift from adopting Hadoop to putting big data to good use.
To better understand why storage is a challenge, consider having a data file so large that a whole hard drive cannot contain it. The only way that you could store the file would be to break it up and save it across multiple hard drives – this is known as distributed storage. The practical benefits of this are that enterprises can efficiently store enormous files on the Hadoop platform. In the past, businesses had to process it using a custom software before they store it – a far more expensive option.
Processing big data is also a challenge. Traditionally, you would have to transfer data from the database to a computer for analysis in the same way that you transfer data from the Internet to a browser. However, moving large amounts of data over a network is extremely slow. Rather than moving the data over to the analytics software, Hadoop’s processing moves the analytics software to the data, resulting in much faster processing.
Recently, another product called Apache Spark has drawn attention as an alternative to Hadoop’s processing capabilities. However, unlike Hadoop, it does not have storage capabilities. More enterprises will be turning to Apache Spark’s capabilities as a big data platform in 2017. Its main advantage over Hadoop is that it is much faster by up to two to three orders of magnitude. As such, many businesses prefer it over Hadoop for real-time analytics, although it does not make Hadoop obsolete for static operations. As Apache Spark does not have storage capabilities, you can use it in conjunction with Hadoop or other big data storage platforms.
Figure 2 illustrates how Hadoop and Apache Spark will represent an area of explosive innovation as data experts find better and more efficient ways to speed up the storage and processing of big data. This will continue to be an area of intense growth in 2017.
Figure 2 (Source: Mapr.com)
3. Cloud Computing
A growing number of businesses are moving their systems to the cloud as the industry has matured. The cloud is more than just for storage. It also encompasses cloud computing, which involves a shared pool of computing resources available on demand. For example, if a website is hosted on a cloud and suddenly receives a spike in web traffic, the cloud can adapt to this demand by quickly deploying more computing resources to the website. When the traffic returns to normal levels, the cloud will de-provision the excess resources. This dynamic allocation is called elasticity and helps firms manage costs with regards to required capacity, especially with fluctuating or unpredictable demand.
Cloud computing has many other advantages, including scalability, ubiquity, speed and cost. IoT’s rise means traditional data centres are unable to scale as quickly with the rapid pace of data growth. The ubiquitous nature of cloud data stored on remote third-party infrastructure means that you can now access your data anywhere in the world. The ability to access data on any device facilitates unprecedented
2017 will be a year where the cloud industry will rocket with firms relying more on cloud infrastructure for their storage and computing needs. We will also see smaller cloud vendors capturing a greater market share. The current market leader is Amazon Web Services (AWS) closely followed by Microsoft Azure in second place, but smaller players like Google and IBM will also be developing quickly.
4. Machine Learning
Computer science and artificial intelligence (AI) will play an integral role in our future. Machine learning is a form of artificial intelligence in which a computer learns patterns without being programmed to recognise them. A standard model for machine learning is neural networks which reflect the way humans learn. In our brains, networks of interconnected neurons communicate with each other, where connections strengthen with stimulation or weaken with disuse. In the same way, you can program machines with a set of neurons designed to respond to and strengthen with patterns in unstructured data, and at a much faster and deeper level than humans could. This branch of machine learning is called deep learning.
With self-learning capabilities, a computer can recognise speech (such as with Siri), distinguish faces (such as with Facebook) and even unravel patterns in biological systems to help create effective drugs. In business, it can reveal customer trends, automate contract reviews and detect signs of cybersecurity breaches before they occur. 2017 will also see machine learning grow to be a supporting asset to presumptive analytics – in which human analysts form and test hypotheses about patterns in the data. With machine learning, computers themselves will be able to identify patterns that analysts might never consider, and do so without human bias.
Unlike science fiction predictions, machine learning and artificial intelligence will not replace human involvement, but rather support and automate tasks at a much faster, cheaper and deeper level.
Cybersecurity is a pertinent issue in general and a major factor for firms considering cloud, IoT and software-as-a-service (SaaS) data solutions. The primary reasons behind moving to the cloud for 2017 is projected to be business and data analytics, data storage and data management. At the same time, the two top concerns for cloud adoption are where you store data and cloud security. Cloud providers will increasingly target these concerns. Firms are also becoming more reliant on third-party SaaS cloud applications such as Salesforce and Xero as outsourcing solutions.
The IoT has received much scrutiny regarding security. In the recent October 2016 Dyn cyberattack, major sites like Twitter, the Guardian, CNN, Spotify and Netflix were temporarily brought down across Europe and America and was the biggest of its kind in history. The attack was largely attributed to the vulnerability of IoT devices such as cameras, baby monitors and DVR players. All too often, users overlook the susceptibility of small, simple devices, and forget how they connect to the largest web in the world – the Internet. Cybercriminals are also expected to use mobile phones as a point of entry more frequently, while at the same time, users will show greater mobile reliance and complacency.
The big data industry is maturing as firms seek to tap into the vast potential of collectable data. Development is occurring at every point of the data funnel: from data collection (IoT), storage (Hadoop), processing and mining (Apache Spark) and analysis (machine learning amongst other techniques). Data collection will continue to grow at an explosive rate, raising concerns about cybersecurity, but also providing transparency leading to valuable business insights. Importantly, these developments are not only valuable for data scientists and analysts but are also entering the boardroom, transforming the way businesses work and use technology into the future.
What developments in data analytics do you think will dominate 2017? Let us know your thoughts on LegalVision’s Twitter.