MapReduce Program - Finding The Average Age of Male and Female Died in Titanic Disaster

The Titanic disaster on April 14, 1912, resulted in over 1,500 deaths when the 46,000-ton ship sank to the ocean floor. In this project, we’ll analyze the Titanic dataset using Hadoop MapReduce to find the average age of male and female passengers who died in the disaster.

Problem Statement

Using Titanic dataset, write a MapReduce program in Java to calculate the average age of males and females who did not survive the Titanic disaster.

Dataset Overview

You can download Titanic dataset from this Link. Below is the column structure of our Titanic dataset. It consists of 12 columns where each row describes the information of a particular person.

Step-by-Step Implementation

Step 1: View Sample Records

Here are the first 10 records of the dataset:

This data will be processed to extract gender and age for only those who didn’t survive.

Step 2: Create Eclipse Project

Make project in Eclipse with below steps:

First Open Eclipse -> then, select File -> New -> Java Project -> Name it, Titanic_Data_Analysis -> then select use an execution environment -> choose JavaSE-1.8, then next -> Finish.

Now create a new class:

Right-click on src -> New -> Class with name, Average_age -> then click Finish

Step 3: Java Code for MapReduce

Write below code into into Average_age.java

Java

// Required imports for Hadoop MapReduce
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

// Main class
public class Average_age {

    // Mapper class
    public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

        private Text gender = new Text();          // To store gender (Male/Female)
        private IntWritable age = new IntWritable();  // To store age value

        // Map method: runs for each line of input
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            String line = value.toString();        // Convert line to String
            String[] str = line.split(", ");       // Split line into fields using comma and space

            // Proceed only if sufficient columns are present
            if (str.length > 6) {
                gender.set(str[4]);                // 5th column = Gender

                if (str[1].equals("0")) {          // 2nd column = Survived (0 = Died)
                    if (str[5].matches("\\d+")) {  // 6th column = Age (only if numeric)
                        int i = Integer.parseInt(str[5]); // Convert age to integer
                        age.set(i);                // Set age as IntWritable
                        context.write(gender, age); // Emit (Gender, Age)
                    }
                }
            }
        }
    }

    // Reducer class
    public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

        // Reduce method: receives all ages grouped by gender
        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {

            int sum = 0;       // Total age sum
            int count = 0;     // Count of records

            // Loop through all age values
            for (IntWritable val : values) {
                sum += val.get();   // Add age to sum
                count++;            // Increment count
            }

            int avg = sum / count;  // Calculate average age
            context.write(key, new IntWritable(avg)); // Emit (Gender, Average Age)
        }
    }

    // Driver/main method
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();                     // Create configuration
        Job job = new Job(conf, "Averageage_survived");              // Define job name

        job.setJarByClass(Average_age.class);                         // Set main class

        // Set output types for Mapper
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        // Set output types for Reducer
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // Set Mapper and Reducer classes
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        // Set input and output formats
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        // Set input and output file paths from command line arguments
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        // Optional: delete output path if it already exists
        Path out = new Path(args[1]);
        out.getFileSystem(conf).delete(out, true);

        // Run the job and wait for completion
        job.waitForCompletion(true);
    }
}

Step 4: Add Required Hadoop JARs

Now we need to add external jar for the packages that we have import. Download the jar package Hadoop Common and Hadoop MapReduce Core according to your Hadoop version.

Check Hadoop Version with below command:

hadoop version

Now we add these external jars to our Titanic_Data_Analysis project.

Right Click on Titanic_Data_Analysis -> then select Build Path-> Click on Configure Build Path and select Add External jars and then add jars from it's download location then click -> Apply and Close.

adding-external-jar-files-to-our-project

Step 5: Export the Project as JAR

Now export the project as jar file. Right-click on Titanic_Data_Analysis choose Export then go to Java -> JAR file click -> Next and choose your export destination then click -> Next.

Choose Main Class as Average_age by clicking -> Browse and then click -> Finish -> Ok.

Step 6: Start Hadoop Daemons

Start Hadoop Daemons

start-dfs.sh
start-yarn.sh

Check if daemons are running:

jps

Step 7: Upload Dataset to HDFS

Use this command to upload the Titanic dataset to Hadoop’s HDFS:

hdfs dfs -put /home/user/Documents/titanic_data.txt /

Check if uploaded:

hdfs dfs -ls /

Step 8: Run the JAR File

Now run the exported .jar file on Hadoop:

hadoop jar /home/user/Documents/Average_age.jar /titanic_data.txt /Titanic_Output

Step 9: View the Output

After the MapReduce job completes, you can check the final results through the Hadoop web interface.

Visit:

http://localhost:50070/

Then navigate to: Utilities -> Browse the file system-> /Titanic_Output/-> part-r-00000.

Additionally, in the terminal run:

hdfs dfs -cat /Titanic_Output/part-r-00000

In the above image, we can see that the average age of the female is 28 and male is 30 according to our dataset who died in the Titanic Disaster.