The Titanic disaster on April 14, 1912, resulted in over 1,500 deaths when the 46,000-ton ship sank to the ocean floor. In this project, weâll analyze the Titanic dataset using Hadoop MapReduce to find the average age of male and female passengers who died in the disaster.
Problem Statement
Using Titanic dataset, write a MapReduce program in Java to calculate the average age of males and females who did not survive the Titanic disaster.
Dataset Overview
You can download Titanic dataset from this Link. Below is the column structure of our Titanic dataset. It consists of 12 columns where each row describes the information of a particular person.Â

Step-by-Step Implementation
Step 1: View Sample Records
Here are the first 10 records of the dataset:

This data will be processed to extract gender and age for only those who didnât survive.
Step 2: Create Eclipse Project
Make project in Eclipse with below steps:
First Open Eclipse -> then, select File -> New -> Java Project -> Name it, Titanic_Data_Analysis -> then select use an execution environment -> choose JavaSE-1.8, then next -> Finish.

Now create a new class:
Right-click on src -> New -> Class with name, Average_age -> then click Finish

Step 3: Java Code for MapReduce
Write below code into into Average_age.java
// Required imports for Hadoop MapReduce
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
// Main class
public class Average_age {
// Mapper class
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text gender = new Text(); // To store gender (Male/Female)
private IntWritable age = new IntWritable(); // To store age value
// Map method: runs for each line of input
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString(); // Convert line to String
String[] str = line.split(", "); // Split line into fields using comma and space
// Proceed only if sufficient columns are present
if (str.length > 6) {
gender.set(str[4]); // 5th column = Gender
if (str[1].equals("0")) { // 2nd column = Survived (0 = Died)
if (str[5].matches("\\d+")) { // 6th column = Age (only if numeric)
int i = Integer.parseInt(str[5]); // Convert age to integer
age.set(i); // Set age as IntWritable
context.write(gender, age); // Emit (Gender, Age)
}
}
}
}
}
// Reducer class
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
// Reduce method: receives all ages grouped by gender
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0; // Total age sum
int count = 0; // Count of records
// Loop through all age values
for (IntWritable val : values) {
sum += val.get(); // Add age to sum
count++; // Increment count
}
int avg = sum / count; // Calculate average age
context.write(key, new IntWritable(avg)); // Emit (Gender, Average Age)
}
}
// Driver/main method
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration(); // Create configuration
Job job = new Job(conf, "Averageage_survived"); // Define job name
job.setJarByClass(Average_age.class); // Set main class
// Set output types for Mapper
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// Set output types for Reducer
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// Set Mapper and Reducer classes
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
// Set input and output formats
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
// Set input and output file paths from command line arguments
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// Optional: delete output path if it already exists
Path out = new Path(args[1]);
out.getFileSystem(conf).delete(out, true);
// Run the job and wait for completion
job.waitForCompletion(true);
}
}
Step 4: Add Required Hadoop JARs
Now we need to add external jar for the packages that we have import. Download the jar package Hadoop Common and Hadoop MapReduce Core according to your Hadoop version.
Check Hadoop Version with below command:
hadoop version

Now we add these external jars to our Titanic_Data_Analysis project.
Right Click on Titanic_Data_Analysis -> then select Build Path-> Click on Configure Build Path and select Add External jars and then add jars from it's download location then click -> Apply and Close.

Step 5: Export the Project as JAR
Now export the project as jar file. Right-click on Titanic_Data_Analysis choose Export then go to Java -> JAR file click -> Next and choose your export destination then click -> Next.
Choose Main Class as Average_age by clicking -> Browse and then click -> Finish -> Ok.

Step 6: Start Hadoop Daemons
Start Hadoop DaemonsÂ
start-dfs.sh
start-yarn.sh
Check if daemons are running:
jps

Step 7: Upload Dataset to HDFS
Use this command to upload the Titanic dataset to Hadoopâs HDFS:
hdfs dfs -put /home/user/Documents/titanic_data.txt /
Check if uploaded:
hdfs dfs -ls /

Step 8: Run the JAR File
Now run the exported .jar file on Hadoop:
hadoop jar /home/user/Documents/Average_age.jar /titanic_data.txt /Titanic_Output

Step 9: View the Output
After the MapReduce job completes, you can check the final results through the Hadoop web interface.
Visit:
http://localhost:50070/
Then navigate to: Utilities -> Browse the file system-> /Titanic_Output/-> part-r-00000.
Additionally, in the terminal run:
hdfs dfs -cat /Titanic_Output/part-r-00000

In the above image, we can see that the average age of the female is 28 and male is 30 according to our dataset who died in the Titanic Disaster.