What is Data Ingestion?
Data Ingestion is a process in which data harvested from multiple sources is moved to a place where it is immediately used or stored in a database for future use. It is a sub-branch of data engineering [link to #3.10]. Data ingestion is known to be the beginning of a data pipeline. It is a complex process that involves handling of data coming from various sources in several different formats. The extracted data is then extensively cleansed and adequately structured for further processing and analysis.
Points to Remember
- Data is typically ingested either in real-time (stream-processing) or in batches. In some cases, data may be processed as a combination of both methods. This hybrid method is called Lambda architecture.
- In the stream-processing method, data is instantly ingested as soon as generated at its source. The method is especially helpful for processing time-sensitive data. Whereas during batch processing, data is ingested in batches on scheduled intervals.
- Data Ingestion becomes extremely challenging when handling big data [link to #1.7]. This reduces the speed and efficiency of the entire process. In such cases, organisations take help specialised data Ingestion tools.