Resources

Published Papers

Published July 22, 2012: Improving Patient Insights With Textual ETL in the Lakehouse Paradigm https://databricks.com/blog/2021/07/22/improving-patient-insights-with-textual-etl-in-the-lakehouse-paradigm.html

Published May 19, 2021: Evolution to the Data Lakehouse https://databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html

Published Books

Building the Data Lakehouse

Published October 2021 https://technicspub.com/data-lakehouse/

Description

The data lakehouse is the next generation of the data warehouse and data lake, designed to meet today’s complex and ever-changing analytics, machine learning, and data science requirements.

Learn about the features and architecture of the data lakehouse, along with its powerful analytical infrastructure. Appreciate how the universal common connector blends structured, textual, analog, and IoT data. Maintain the lakehouse for future generations through Data Lakehouse Housekeeping and Data Future-proofing. Know how to incorporate the lakehouse into an existing data governance strategy. Incorporate data catalogs, data lineage tools, and open source software into your architecture to ensure your data scientists, analysts, and end users live happily ever after.

Data Architecture: A Primer for the Data Scientist, 2nd Edition

Published 2019

Description

Over the past 5 years, the concept of big data has matured, data science has grown exponentially, and data architecture has become a standard part of organizational decision-making. Throughout all this change, the basic principles that shape the architecture of data have remained the same. There remains a need for people to take a look at the “bigger picture” and to understand where their data fit into the grand scheme of things.

Data Architecture: A Primer for the Data Scientist, Second Edition addresses the larger architectural picture of how big data fits within the existing information infrastructure or data warehousing systems. This is an essential topic not only for data scientists, analysts, and managers but also for researchers and engineers who increasingly need to deal with large and complex sets of data. Until data are gathered and can be placed into an existing framework or architecture, they cannot be used to their full potential. Drawing upon years of practical experience and using numerous examples and case studies from across various industries, the authors seek to explain this larger picture into which big data fits, giving data scientists the necessary context for how pieces of the puzzle should fit together.

Key Features

  • New case studies include expanded coverage of textual management and analytics
  • New chapters on visualization and big data
  • Discussion of new visualizations of the end-state architecture