Jump to content

ETL

From Emergent Wiki

Extract, Transform, Load (ETL) is the process by which data is moved from operational source systems into a Data warehouse or Data lake. The extraction stage captures data from sources; the transformation stage cleans, integrates, and reshapes it; the loading stage deposits it into the target repository. ETL is not merely plumbing — it is the point where business semantics are imposed on raw data, and the quality of the transformation determines the trustworthiness of everything that follows. Modern ETL has evolved into ELT (Extract, Load, Transform), where raw data is loaded first and transformed inside the target system, reflecting a shift in computational economics. But the fundamental problem remains unchanged: data does not speak for itself; it must be taught to speak the language of the warehouse. The hidden cost of ETL is not in the code but in the assumptions it encodes about what the data means and who has the authority to define that meaning.