Various Filesystems in Hadoop

Last Updated : 12 Aug, 2025

Hadoop is an open-source software framework written primarily in Java (with some components in C and shell scripting) that enables computation over massive volumes of data across a cluster of machines. Designed for batch/offline processing, Hadoop leverages commodity hardware i.e. inexpensive machines that provide both local storage and computation power to achieve distributed storage and processing.

While HDFS (Hadoop Distributed File System) is the most commonly associated storage layer in Hadoop, it is not the only filesystem supported. Hadoop provides a flexible architecture that allows it to interact with multiple filesystems. At its core, the Java abstract class:

org.apache.hadoop.fs.FileSystem

represents a filesystem in Hadoop. Different implementations extend this class, enabling Hadoop to work with both local and distributed storage systems.

Major Filesystems in Hadoop

Below is a detailed list of the filesystems supported by Hadoop:

Filesystem

URI scheme

Java implementation (all under org.apache.hadoop)

Description

Localfilefs.LocalFileSystemThe Hadoop Local filesystem is used for a locally connected disk with client-side checksumming. The local filesystem uses RawLocalFileSystem with no checksums.
HDFShdfshdfs.DistributedFileSystemHDFS stands for Hadoop Distributed File System and it is drafted for working with MapReduce efficiently. 
HFTPhftphdfs.HftpFileSystem

The HFTP filesystem provides read-only access to HDFS over HTTP. There is no connection of HFTP with FTP. 

This filesystem is commonly used with distcp to share data between HDFS clusters possessing different versions.    

HSFTPhsftphdfs.HsftpFileSystemThe HSFTP filesystem provides read-only access to HDFS over HTTPS. This file system also does not have any connection with FTP.
HARharfs.HarFileSystemThe HAR file system is mainly used to reduce the memory usage of NameNode by registering files in Hadoop HDFS. This file system is layered on some other file system for archiving purposes.
KFS (Cloud-Store)kfsfs.kfs.KosmosFileSystemcloud store or KFS(KosmosFileSystem) is a file system that is written in c++. It is very much similar to a distributed file system like HDFS and GFS(Google File System).
FTPftpfs.ftp.FTPFileSystemThe FTP filesystem is supported by the FTP server.
S3 (native)s3nfs.s3native.NativeS3FileSystemThis file system is backed by AmazonS3.
S3 (block-based)s3fs.s3.S3FileSystemS3 (block-based) file system which is supported by Amazon s3 stores files in blocks(similar to HDFS) just to overcome S3's file system 5 GB file size limit.  

Use cases of filesystem

  • Hadoop chooses the appropriate filesystem based on the URI scheme provided.
    Examples:

hdfs://namenode:9000/path/to/file
file:///local/path/to/file
s3n://bucket-name/path/to/file

  • While Hadoop can work with any filesystem implementation, distributed filesystems with data locality (like HDFS and KFS) are generally preferred for big data processing, as they minimize network overhead and improve performance.
  • HDFS remains the default choice for most Hadoop deployments due to its tight integration with the Hadoop ecosystem, replication strategy, and proven scalability.

Advantages of Supporting Multiple Filesystems

  • Flexibility: Developers can integrate Hadoop with cloud storage (Amazon S3), legacy systems (FTP), or on-premise storage.
  • Interoperability: Tools like distcp make it easy to copy and share data across different clusters or storage systems.
  • Cost Optimization: Organizations can combine local storage, cloud services, and distributed filesystems for efficient cost management.
  • Scalability: Systems like HDFS and KFS scale linearly by adding more nodes.
  • Security & Compatibility: Options like HSFTP (HTTPS-based) enhance security, while HAR archives optimize NameNode memory.
Comment