With companies forced to adapt to a remote, distributed workforce this past year, cloud adoption has accelerated at an unprecedented pace by +14% resulting in 2% or $13B above pre-pandemic forecasts for 2020 – with possibly more than $600B in on-prem to cloud migrations within the next few years. This shift to the cloud places growing importance on a new generation of data and analytics platforms to fuel innovation and deliver on enterprise digital transformation strategies. However, many organizations still struggle with the complexity, unscalable infrastructure and severe maintenance overheads of their legacy Hadoop environments and eventually sacrifice the value of their data and, in turn, risk their competitive edge. To tackle this challenge and unlock more (sometimes hidden) opportunities in their data, organizations are turning to open, simple and collaborative cloud-based data and analytics platforms like the Databricks Lakehouse Platform. In this blog, you’ll learn about the challenges driving organizations to explore modern cloud-based solutions and the role the lakehouse architecture plays in sparking the next wave of data-driven innovation.
Unfulfilled promises of Hadoop
Hadoop’s distributed file system (HDFS) was a game-changing technology when it launched and will remain an icon in the halls of data history. Because of its advent, organizations were no longer bound by the confines of relational databases, and it gave rise to modern big data storage and eventually cloud data lakes. For all its glory and fanfare leading up to 2015, Hadoop struggled to support the evolving potential of all data types – especially at enterprise scale. Ultimately, as the data landscape and accompanying business needs evolved, Hadoop struggled to continue to deliver on its promises. As a result, enterprises have begun exploring cloud-based alternatives and the rate of migration from Hadoop to the cloud is only increasing.
Teams migrate from Hadoop for a variety of reasons; it’s often a combination of “push” and “pull.” Limitations with existing Hadoop systems and high licensing and administration costs are pushing teams to explore alternatives. They’re also being pulled by the new possibilities enabled by modern cloud data architectures.
While the architecture requirements vary by organization, we see several common factors that lead customers to realize it’s time to start saying goodbye. These include:
- Wasted hardware capacity: Over-capacity is a given in on-premises implementations so that you can scale up to your peak time needs, but the result is that much of that capacity sits idle but continues to add to the operational and maintenance costs.
- Scaling costs add up fast: Decoupling storage and compute is not possible in an on-premises Hadoop environment, so costs grow with data sets. Factor that in with the rapid digitization resulting from the COVID-19 pandemic and the global growth rate. Research indicates that the total amount of data created, captured, copied, and consumed in the world forecast to increase by 152.5% from 2020 to 2024 to 149 Zettabytes. In a hyperdata growth world, runaway costs can balloon rapidly.
- DevOps burden: Based on our customers’ experience, you can assume 4 to 8 full-time employees for every 100 nodes.
- Increased power costs: Expect to pay as much as $800 per server annually based on consumption and cooling. That’s $80K per year for a 100 node Hadoop cluster!
- New and replacement hardware costs: This accounts for ~20% of TCO, which is equal to the Hadoop clusters’ administration costs.
- Software version upgrades: These upgrades are often mandated to ensure the support contract is retained, and those projects take months at a time, deliver little new functionality and take up precious bandwidth of the data teams.
The shift toward lakehouse architecture
A lakehouse architecture is the ideal data architecture for data-driven organizations. It combines the best qualities of data warehouses and data lakes to provide a single high-performance solution for all data workloads. Lakehouse architecture supports a variety of use cases, such as streaming data analytics to BI, data science and AI. Why do customers love the Databricks Lakehouse Platform?
- It’s simple. Unify your data, analytics and AI on one platform.
- It’s open. Unify your data ecosystem with open standards and formats.
- It’s collaborative. Unify your data teams to collaborate across the entire data and AI workflow.
Bye Bye Hadoop
Welcome Lakehouse