Efficient Big Data Processing and Clustering using AsterixDB
Keywords:
Big Data, MapReduce, Clustering, AsterixDB, HyracksAbstract
As the volume and complexity of data continue to surge, the need for efficient big data processing and clustering methodologies becomes increasingly critical. This research paper presents a comprehensive exploration of the utilization of AsterixDB, an open-source, scalable, and highly extensible big data management system, to achieve efficient data processing and clustering. The study delves into the unique features and capabilities of AsterixDB, examining its ability to handle large-scale datasets with a focus on scalability, performance, and adaptability. This research introduces an innovative software stack to construct scalable Big Data systems. The focus is specifically on two key components within this stack. First, Hyracks stands out as a novel partitioned-parallel runtime layer, offering an efficient and versatile model for executing data-processing tasks across a cluster of commodity machines. Second, Algebricks is a compiler framework crucial in constructing compilers for high-level declarative languages tailored for parallel processing, all built on top of the Hyracks infrastructure.