Getting Unstructured Data Under Control

Data growth is the one exponential we can all relate to and will continue to grow into the future. IDC envisions that in 2025, the amount of data will be generated just for that year alone will be a staggering 175 zettabytes, and we will keep a lot of it with the desire of transforming data into information and then transmuting that into money. This is the promise of what we have called data processing and information technology for the past five decades, at least. And that will never change.

Unstructured data – all those images and videos, emails, texts and social media content, data from Internet of Things (IoT) devices, streaming data and the like – is a particular challenge. There’s not just a lot of it, but it is hard to get a handle on. This data doesn’t tend to only live in corporate databases and data warehouses running on servers in central datacenters. This is often outside the firewalls, in the cloud and at the edge. IDC analysts predict that 2025, 80 percent of the data being generated will be unstructured, more than half of the data will come from IoT and 49 percent will be housed in the cloud (with the other 51 percent in data centers).

“Unstructured data is really clearly become one of – if not the most – valuable assets in many organizations,” Caitlin Gordon, vice president of product marketing for Dell EMC’s storage unit, tells The Next Platform. “It’s not as simple as it being in the four walls of the datacenter anymore. It’s now in the cloud – probably more objects than file at this point – and edge is really growing as a big place where there’s unstructured data being generated. And as always, IT budgets certainly aren’t tripling. They’re really forecasted to pretty much stay flat over that time. So how do you deal with this? What our customers have been leveraging from us and are really looking forward to continuing to work off of is, they want a single platform with massive amounts of scale so it can handle the data you have today. It can keep up with that data growth, but it can’t keep up with that data growth and add complexity that requires you to add resources to manage it. It has to be able to keep operating as simply as possible with as few resources as possible to manage it. And it will have to solve not just your datacenter challenges but be able to support core and cloud as well.”

Dell EMC has put a focus on storage in recent weeks. The company early last month unveiled PowerStore, taking essentially a ground-up approach to flash storage to create a platform for mid-level businesses and enterprises that can scale up (to 2.8 PB per appliance) or out (to eight active-active nodes), run modern and legacy workloads, leverage machine learning algorithms to increase efficiency and run the new PowerStore OS. Later in the month, parent company Dell Technologies teamed up with Google Cloud to bring file data in the public cloud through the introduction of OneFS for Google Cloud, which enables enterprises to more easily manage data-intensive workloads between corporate data centers and Google’s public cloud. It also uses Dell EMC Isilon filesystems on premises and the compute and analytics services in Google Cloud, which enables businesses to move workloads a large as 50 PB in a single file system between the two environments.

Dell EMC also is about two years away from rolling out its PowerMax system, which the company upgraded a year later with NVMe throughout and storage-class member (SCM) as persistent storage, thanks to dual-port Intel Optane SSDs.

This week, Dell EMC came around again, this time with the introduction of PowerScale, a new portfolio of storage systems that marries the OneFS storage operating system – which runs the Isilon storage systems popular with HPC and media and entertainment customers – and associated software with the company’s PowerEdge servers.

“We very quickly knew we were going to build this off of what we already had with OneFS, which has been in the portfolio with Isilon for well over a decade,” Gordon says. “We’ve been innovating and really setting the bar for unstructured storage and scale out with this platform and with the software for many, many years, whether it’s dedupe, HDFS [Hadoop Distributed File System] support, cloud integration, massive scale for some of the network automations and multicloud offerings we’ve introduced more recently. We’ve had a common tenet throughout of having a single file system being all policy-driven, API-based and within the system itself having automated data movement – discovering nodes automatically, automatically balancing data across this cluster so that not only can you support a huge amount of data, but you can do so with an architecture that’s going to seamlessly manage that for you.”

What Dell EMC has done is unhitched OneFS from its underlying and purpose-built hardware, enabling it to be paired with the PowerEdge systems. The uncoupling of OneFS from the Isilon appliances will help both enterprises embrace the new PowerScale offerings – housed in 1U server form factors – and also help Dell EMC “accelerate our innovation as well by focusing our engineering resources in the unstructured storage space on the OneFS software and then focusing our hardware engineering from a server perspective on PowerEdge. We’re able to really get optimal hardware and software and combine them in an appliance that’s really going to deliver some pretty unique value to our customers,” Gordon says.

The 1U size of the two PowerScale appliances – the F200 for SAS environments and the all-NVMe F600 system – expands the uses cases Dell EMC can address with the OneFS operating system, including edge environments, where much of the unstructured data increasingly is being created. Clusters can scale from 11 terabytes to 60PB and millions of file operations, and nodes can be added to the PowerScale or existing Isilon cluster in as little as a minute, according to the vendor. Enterprises can get up to 85 percent storage utilization from the systems and see up to 15.8 million IOPS per cluster.

The F200 is a single-socket system targeting such environments as the edge or remote offices and which holds four SSDs and up to 15.36TB of capacity. The F600 essentially doubles up on what the F200 can offer, with two sockets, eight drives and up to 61.44TB of capacity and is aimed at larger spaces like datacenters.

The latest version of the operating system – PowerScale OneFS v9.0 – Dell EMC has added such features as S3 object access, which is step toward bringing the vendor’s Isilon and ECS lineup for block storage in greater alignment. The OS also supports such protocols as NFS, SMB and HDFS, enabling the systems to run both modern and traditional applications.

In addition, the latest OS – which also will run on Isilion appliances – supports such management and container orchestration frameworks like Ansible and Kubernetes. Data analytics is being taken care of via Dell EMC’s new DataIQ software, which gives a single tool for finding and analyzing unstructured data scattered across a company’s environment, from the datacenter to the cloud and edge. System performance and capacity analytics is done through Dell EMC CloudIQ.

The PowerScale systems will be able to address a broad range of uses cases. Gordon notes that Isilon initially was aimed at media and entertainment companies, but that it since has grown to address businesses in almost two dozen verticals. John Shirley, vice president of unstructured storage product management, tells The Next Platform that the PowerScale systems can address file share workloads.

“But as we see more and more AI, we see people who need to go get value out of this huge amount of data,” Shirley says. “We see a lot of customers wanting to be able to have that simplicity and scale. They need the performance. Leading into PowerScale they need more insights into that data and help just wrangling the data and making some sense of it.”

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.

Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.