Up until a few years ago companies relied solely upon data warehouses for data storage. However, with the explosion of data and the need to support the increasingly larger volumes, data lakes have become a more practical solution. Data warehouses were being used to implement traditional processes (ETL) that require characterization, modeling, and development which created bottlenecks that delayed companies’ access to information. A data lake answers this problem by eliminating the need for time consuming ETL. It serves as a central repository and the the source to which all organizational data come from. This is often based on Hadoop technology but other cloud solutions can be used. The traditional data warehouse continues to serve a critical role, it just evolved to another level in data processing.
The challenges in developing a data lake are many so we won’t go into the whole list here. The guiding principles are:
There should be no size limitation as a working assumption
There are no limits on processing capabilities (this is a function of applied resources)
Preventing the lake from becoming a “swamp” requires close management of what is included, documenting the source of the data elements, the frequency of the update and other characteristics (read more the in the Data Catalog section).
WHY US ?
WHY US ?
Vision.bi has extensive experience in creating data lakes. Even before the current information age, Vision.bi was one of the first to realize the potential of Massively Parallel Processing (MPP) technologies in Israel. We were also a pioneer in implementing Hadoop based solutions as well as other data lake related technologies.
Want to learn more?