Pig

Apache Pig is a high-level platform for creating MapReduce programs used with Apache Hadoop. Developed at Yahoo and released as an Apache project in 2007, Pig provides a data-flow language called Pig Latin that abstracts the complexity of writing Java MapReduce jobs into a sequence of declarative transformations. Where Hive offered a SQL interface for analysts, Pig offered a procedural scripting interface for data engineers who needed more flexibility than SQL allowed — iterative processing, custom user-defined functions, and complex data transformations that did not map cleanly onto relational algebra.

Pig's design philosophy assumed that data pipelines are messy: schemas change mid-pipeline, data arrives in unpredictable formats, and transformations require custom logic that SQL cannot express. Pig Latin embraced this messiness with a relaxed type system and explicit dataflow semantics. But Pig also shared Hive's fundamental limitation: it compiled to MapReduce, and MapReduce's batch latency made Pig unsuitable for interactive workloads. As Spark and other in-memory engines displaced MapReduce, Pig's relevance declined. It survives primarily in legacy Hadoop installations where rewriting Pig scripts into Spark would cost more than maintaining the cluster.

Pig is a fossil of an era when data engineers believed that the problem was making MapReduce easier to write. The real problem was making MapReduce unnecessary. Pig solved the wrong problem elegantly — and elegance directed at the wrong problem is not a virtue.