Seven Attributes of Big Data 2.0: My Advice for CIOs

Scott Gnau, CTO, Hortonworks
254
524
84

When Web 2.0 was coined, it was about new design patterns and business models emerging from the bubbles and shakeouts of subsequent computing revolutions. The term worked for everyone. It took off and became a meme that, to this day, others are trying to repeat.

At its recent I/O Conference, for example, Google spoke of AI and machine learning as a basis of Cloud 2.0. Artificial Intelligence in the cloud, delivered via voice, will talk to you and predict what it thinks you want to know. This will be the next generation experience in your home or car–put that on screen, Jarvis?

Many think the Internet of Things 2.0 will result from machine-to-machine connectivity to create ‘an interface of things’ and spawn smart connected environments in schools, businesses and even across industries. Systems will understand each other and work ‘in the context of a larger, commonly understood purpose.’

But, the problem persisting at hand is CIOs today often find it impossible to think of all of these clusters of technologies as one. Plus, what do all of the technologies then actually do as one?

  A successful IT strategy lies in a platform that can connect new tools to the legacy ones while handling the workload prioritization, security, governance, and operations 

I’m not into the meme game. I believe data is the common factor in all of them and, whatever you name it, it is the tipping point that is changing everything for enterprise IT, to people and processes, to platforms and architectures.

I believe there are 7 key attributes of this tipping point, and CIOs should be thinking and acting on these to lead their organizations on this journey.

1. It Starts with Access to All Data

The implication of Cloud 2.0 and IOT 2.0 is the real-time need for and access to data, however it's consumed or processed. It’s a world where we need to look into and across all the data.

This makes the data scientist the super admin of the future. He or she needs access to all the data to do their job, not just what IT thinks is relevant. As in mining for gold or iron, more raw material leads to more refined product.

Start by making sure your scientists have access to as much data as possible with the fewest constraints possible. This implies new and important governance and security designs.

2. Bring the Processing to the Data

DevOps was about making development and operations work closer together while automating software delivery and infrastructure.

With zettabytes involved, data movement is still relatively expensive, difficult and sometimes leads to governance nightmares. It’s just more efficient to bring the processing to the data, which requires multi-tenancy and mixed workload management. We need ops and data (and development) to work together tightly to make it happen.

Consider the rise of DataOps and its impact on your IT organization.

3. Be Connected not Converged

Big Data ‘next’ depends on a huge ecosystem to get to the level of technology investment required. That’s why we’ve already seen the explosion of new technologies and companies. For enterprises to succeed with data, apps and data need to be connected via a set of platforms within a logical framework. Convergence of such divergent requirements into single proprietary product will not work, and is not a good investment in the long term for any large corporation.

4. Data Scientist 2.0

We’ve already moved beyond the time of data as ‘experiment’ by early adopters, as today’s organizations see data as a mainstream part of business transformation.

Clearly, Data Scientist 2.0 is a new character in the play to come in from stage right, with new working methods. To scale, CIOs will require data analysts and scientists to re-use and collaborate at the same time as building a healthy competition to create a lasting stream of innovative new algorithms.

This will require internal and external communities of analysts and data scientists to foster collaboration and friendly competition. Call this the new agile development applied to the world of big data.

5. The Platform, Not the Tool, Will Matter

Tools have always been tools—useful for one or two things but you can’t build a house with just one hammer.

It’s certain that Hadoop is rapidly emerging as the open data technology on which the real time world of Data 2.0 is being based. Tools for capturing, managing and analyzing data are proliferating at an accelerated pace. But none of these tools are enough on their own.

A successful IT strategy lies in a platform that can connect new tools to the legacy ones while handling the workload prioritization, security, governance, and operations.

6. Data Will Put IT in Fast Reverse

IT project and program management is completely different in the next generation world of data.

Traditional projects went like this:

1. Requirements
2. Data Sourcing
3. ETL
4. App Dev
5. Delivery

BD2.0 Projects are:

1. Land the data raw
2. Data Science/Machine Learning
3. Data Sourcing, Requirements
4. Define Requirements
5. App Dev
6. Land more Data
7. Repeat

Your IT strategy needs to go into reverse too.

7. Legacy Doesn’t Have To Be

Existing operational systems are often the last mile to a business process or customer experience. I predict the world of big data will help these flourish again, with new insights and speed, versus rip and replace.

So don’t throw them away. Work out how to connect them.

Read Also

The "Black Box Paradox" in Big Data Analytics and Data-Driven Modeling

Daniel Lingenfelter, Staff Engineer, Seagate Technology

A Look into the Disrupting Industry of Cyber Space

Jerrod Chong, VP of Solutions, Yubico

The Storm Behind the Cloud: Ushering In the Next Era of Innovation

Merijn te Booij, Chief Marketing Officer, Genesys

Cloud: Enhancing Truly Unified Communications for Businesses

Ken Bisnoff, SVP of Strategic Opportunities, TelePacific