Mamatha S R, Saheli G S, Rajesh R, Arti Arya
This paper gives an overview of how Hadoop File System manages massive data as well as handles small files. As data is exponentially pouring in from all sides in all domains, it has become a necessity to manage and analyze such huge amount of data to extract useful information. This huge amount of data is technically termed as Big Data, which in turn falls under Data Science. Currently a lot of research is going on how to handle such vast pool of data. The Apache Hadoop is a software framework that uses simple programming paradigm to process and analyze large data sets(Big Data) across clusters of computers. The Hadoop Distributed File System(HDFS) is one such technology that manages the Big Data efficiently. In this paper, an insight of ”how HDFS handles big as well as small amount of data” is presented, reviewed and analyzed. As a result, summarized limitations of existing sys- tems are described in the paper along with the future scope of it.