Close Menu
AI News TodayAI News Today

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    We Got Chatbots to Turn Over Personal Information. How to Keep Yours Safe

    T-Mobile vs. Verizon: Is It Time to Choose a New Carrier?

    Linux devs are fighting the new age-gated internet

    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook X (Twitter) Instagram Pinterest Vimeo
    AI News TodayAI News Today
    • Home
    • Shop
    • AI News
    • AI Reviews
    • AI Tools
    • AI Tutorials
    • Chatbots
    • Free AI Tools
    AI News TodayAI News Today
    Home»AI Tools»Exploring Patterns of Survival from the Titanic Dataset
    AI Tools

    Exploring Patterns of Survival from the Titanic Dataset

    By No Comments13 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Exploring Patterns of Survival from the Titanic Dataset
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Introduction

    Titanic shipwreck was a major historical incident that shaped how we view human survival during disasters. Even a century later, this tragic incident still offers valuable insights and lessons.

    The RMS Titanic was one of the largest and most luxurious ship of its time. It was nicknamed “The Unsinkable” by its proud makers. On April 10th, 1912, it set out on its first journey from England to New York. The Titanic took with it all classes of people, the wealthy and the poor. It was commanded by the Senior Captain Edward John Smith. During the course of its voyage, the Titanic received multiple warnings of ice on the Atlantic, which made it change its course twice. But on the 4th day of its voyage, 14th April, it collided with a huge iceberg that led to the beginning of the slow sinking of this luxurious ship. The ship sent radio signals to other nearby ships for help, but only one of them responded. The captain ordered the passengers to be evacuated. According to the protocol, the women and children were to be evacuated first using the lifeboats available on the ship. But as we will see in our explorations, it did not really happen as such. Certain other factors also played a role in determining the survival of the passengers aboard. It seemed as if some groups of people were more likely to survive than others, and this is what we will explore in this article.

    The sinking of this “Unsinkable” ship caused the death of 1502 out of 2224 of its passengers and crew.

    The Project

    Titanic dataset is a very beginner-friendly dataset, and that is why it is widely used as the starting point in data science learning. Not only does it provide interesting patterns for data analytics, but it retains its value in combining both historical context with real human decision-making under crisis conditions.

    In this article, we will do an exploratory data analysis of the Titanic Dataset. We will see what the data looks like, what the different attributes are at play, and how these different attributes affected the survival of the passenger. This is a beginner-friendly tutorial that requires a basic understanding of Python fundamentals, importing libraries and employing its functions for data analysis. By combining data storytelling and pattern recognition, to the previous articles and projects on it through its insights as to how social inequality, evacuation behavior, and family structure influence survival outcomes.

    The Dataset

    In this tutorial, we will access the Titanic dataset and use Python pandas, matplotlib, and seaborn to explore how different factors played a role in the survival of the passengers. Let us download and load the data so that it is accessible in our code.

    You can get the dataset from the : Github Link

    Loading the Dataset

    Once you have the data URL, you can access it as a pandas dataframe. We will have to install/import pandas for this. Pandas is a powerful Python library for data analysis and manipulation. If not already installed in your IDE, install it from the terminal through pip as follows:

    pip install pandas

    Once the installation is complete, import the library in your Python file by aliasing it as pd:

    import pandas as pd

    Next, read the data using the Pandas read_csv function. Make sure you add the URL as follow:

    url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
    
    df = pd.read_csv(url)

    This will load the file as a pandas dataframe in the variable “df”. We will do the data analysis and exploration using this dataframe that has the data we need stored. Let us read the data in this dataframe using the head() function that returns the first 5 lines by default of the dataframe:

    print(df.head())
    df.head() (Image by Author)

    We can also use the Pandas library’s iloc[0] functions to get access to all the column names/attributes:

    print(df.iloc[0])
    df.iloc[0] (Image by Author)

    Here we can see the first 5 lines of the dataset, along with the column names. As can be seen in the image above, the dataset has the following attributes:

    1. PassengerId — this is id of the passenger, a numerical value to identify each passenger
    2. Survived — this refers to whether the passenger on board survived the shipwreck or not
    3. Pclass — this is regarding the class of the passenger
    4. Name — this is the name of the passenger, with appropriate titles
    5. Sex — gender of the passengers
    6. Age — age group of the passengers on board
    7. SibSp — this refers to the number of siblings or spouses on board
    8. Parch — this refers to the number of parents or children on board
    9. Ticket- this is the ticket number of the passenger 
    10. Fare — this refers to the ticket price 
    11. Cabin — this is the cabin number of the passenger
    12. Embarked — this refers to where the passenger embarked from C = Cherbourg, Q = Queenstown, S = Southampton

    As can be seen above, there are a few columns or attributes that are of interest to us in determining whether a person survived the Titanic or not. Attributes such as names and ticket number do not seem to influence the survival of passengers. In order to have a clear view of this, let us do some data analysis to find out the relation between different attributes and how they each influence survival individually and as combinations:

    Data Analysis

    Before we formally start the data analysis, let us install/import the relevant Python libraries.

    The first one is Matplotlib. This library offers visualization features for data. We will plot graphs using this library. The second one is Seaborn. Seaborn is a Python data visualization library based on matplotlib, and allows us to create visuals, plots, and figures based on the data. Let us install and import these into our Python file.

    pip install matplotlib
    pip install seaborn

    Now import these with alias names just as we did with the pandas library into the main coding file.

    import seaborn as sns
    import matplotlib.pyplot as plt

    Now, let us see how different attributes affected survival:

    Describing the Dataset

    First let us have a generic overview of the data. We will use the describe() function for this. We have also added the pd.set_option to stop data truncation.

    pd.set_option('display.max_columns', None)
    print(df.describe())
    describe() function (Image by Author)

    As we can see in the image above, the function describe() gives a statistical summary of the entire dataset using metrics like count, mean, standard deviation, etc. The information beneficial here is:

    • There are a total of 891 entries of passengers (from count = 891)
    • The survival rate is 38% (from the mean of survived = 0.38)
    • Most passengers belonged to 3rd class (mean of Pclass = 2.3 closer to 3)
    • Some of the passengers’ age data is missing (the count of Age is not equal to the entries)
    • Most of the passengers were young (the mean age = 29.6)
    • The youngest passenger was 0.4 years (less than 6 months), and the oldest was 80 years old
    • The average ticket price was around £32.38 (mean fare)
    • Ticket price varied enormously (high standard variation for the fare = 49.69)
    • Massive economic inequality, fare for some was 0, and for others as high as £512
    • Age quartiles: 25% were younger than 20, half were younger than 28, and 75% were younger than 38

    Now that we know the generic date insights, let us deep dive into a more detailed analysis.

    Survival Facts

    First, let us do some general survival analysis:

    survival_counts = df['Survived'].value_counts()
    print(survival_counts)
    Survival Facts (Image by Author)
    plt.figure(figsize=(6,4))
    
    sns.countplot(
        x='Survived',
        data=df
    )
    
    plt.title("Titanic Survival Distribution")
    Titanic Survival Distribution (Image by Author)

    We tapped into the survival attribute and found a count of 549 for 0, which did not survive, and 342 for 1, that is survived. This is a 38% survival rate as was previously received from the describe() function. Now, let us move to the factors that affected this survival.

    Survival by Gender

    Let us see how this survival rate was influenced by gender. Did one gender have an edge in survival over the other? We know the priorities were women and children, but what exactly does the data show?

    
    gender_survival = pd.crosstab(
        df['Sex'],
        df['Survived'],
        normalize='index'
    )
    
    print(gender_survival)
    
    
    plt.figure(figsize=(6,4))
    
    sns.barplot(
        x='Sex',
        y='Survived',
        data=df
    )
    
    plt.title("Survival Rate by Gender")
    
    plt.ylabel("Survival Rate")
    
    plt.show()
    
    Survival Rate by Gender (Image by Author)
    Survival Rate by Gender (Image by Author)

    As can be seen from both the report and the plot above, the men’s survival rate was just 18%. Whereas, as much as 74% women survived the shipwreck.

    Survival by Passenger Class

    Now, let us analyse how passengers from different classes survived the incident.

    class_survival = pd.crosstab(
        df['Pclass'],
        df['Survived'],
        normalize='index'
    )
    print(class_survival)
    
    plt.figure(figsize=(7,5))
    
    sns.barplot(
        x='Pclass',
        y='Survived',
        data=df
    )
    
    plt.title("Survival Rate by Passenger Class")
    
    plt.xlabel("Passenger Class")
    
    plt.ylabel("Survival Rate")
    
    plt.show()
    Survival by Passenger Class (Image by Author)
    Survival by Passenger Class (Image by Author)

    As can be seen from the report and plot above, about 62% of passengers from the 1st class survived, 47% from the second class, and only 24% from the third class. We can infer from this very basic plot that the first class, which paid heavily for the ship’s luxuries, has a higher chance of survival; they were preferred over the other two classes.

    Survival by Age

    Let us see how passengers of different ages survived. Did children have a higher chance of survival?

    plt.figure(figsize=(10,6))
    
    sns.histplot(
        data=df,
        x='Age',
        hue='Survived',
        bins=30,
        multiple='stack',
        alpha=0.6
    )
    
    plt.title("Age Distribution by Survival")
    
    plt.show()
    Age Distribution by Survival (Image by Author)

    From this stacked histogram, we can draw several meaningful insights about how age is related to survival on the Titanic.

    • Most passengers who were onboard were young adults in the age bracket of 20 and 30
    • Children less than 10 show higher survival representation with a bigger orange colored stack as compared to the blue one
    • Adult non-survivors dominated the dataset, with bars representing non-survivors between 20 and 40 being bigger
    • Survival declines in the older age group; this may be due to elderly passengers facing certain age-restricted challenges in evacuation
    • The non-survivor portions of the bars dominate most age ranges, implying that more passengers died than survived overall, aligning with the overall survival rate of approximately 38%

    To summarize, the survival on the Titanic favored younger passengers, while young adult populations experienced the highest mortality rates.

    Children Priority

    Were the children actually prioritized? Let us answer that with some analytics:

    df['IsChild'] = df['Age'] < 16
    child_survival = pd.crosstab(
        df['IsChild'],
        df['Survived'],
        normalize='index'
    )
    
    print(child_survival)
    
    sns.barplot(
        x='IsChild',
        y='Survived',
        data=df
    )
    
    plt.title("Child vs Adult Survival")
    plt.show()
    Child vs Adult Survival (Image by Author)
    Child Priority (Image by Author)

    As can be seen from the above, around 59% of the children survived, which is a direct reflection of how the children were actually prioritized.

    Now let us analyse how family size impacted survival.

    Family Size Analysis

    The family size attribute is dependent on two different attributes of the dataset: SibSp and Parch. SibSp is the number of siblings and spouses of the passenger onboard. Whereas Parch is the number of parents and children of the passenger.

    Let us see how the family size affected survival:

    df['FamilySize'] = (
        df['SibSp'] + df['Parch'] + 1
    )
    plt.figure(figsize=(10,6))
    
    sns.barplot(
        x='FamilySize',
        y='Survived',
        data=df
    )
    
    plt.title("Survival Rate by Family Size")
    plt.show()
    Survival Rate by Family Size (Image by Author)

    The plot above shows how survival probability changed depending on the number of family members traveling together on the Titanic. The code is simple, it adds the number of siblings/spouse and parents/children, plus the passenger themself as the family size. the y-axis of the plot represents the survival probability so each bar shows the percentage of passengers with a particular family size to have survived. We can see from the bar chart above that:

    • Passengers traveling alone had lower survival, probably becuase the passengers traveling alone had less social support, no assistance during evacuation, or lower priority compared to families
    • Small families with family sizes of about 2, 3, and 4 had the highest survival rates, which may be because of them helping each other out during evacuation, stayed coordinated and received priority in lifeboat boarding
    • Very large families with family size greater than 6 had lower survival rates, probably due to difficulty in coordinating evacuation and families refusing to separate on lifeboats.

    As we can see, survival was not linearly related to the family size, but a moderately sized family had a higher survival rate.

    Survival by Fare Paid

    Lastly, let us see how the ticket price affected survival. We can analyse this using a violin plot as below:

    plt.figure(figsize=(12,6))
    
    sns.violinplot(
        data=df,
        x='Survived',
        y='Fare',
        inner='quartile'
    )
    
    plt.xticks(
        [0,1],
        ['Did Not Survive', 'Survived']
    )
    
    plt.title(
        "Ticket Fare Distribution by Survival"
    )
    
    plt.ylabel("Fare Paid")
    
    plt.show()
    Ticket Fare Distribution (Image by Author)

    The violin plot shows a clear relationship between ticket fare and survival on the Titanic. Survivors generally paid higher fares, while most non-survivors were concentrated in lower fare ranges. This suggests that first-class and wealthier passengers had a significant survival advantage, likely due to better cabin locations and easier access to lifeboats. However, the overlap between the two groups also indicates that wealth alone did not determine survival, as factors like gender, age, and evacuation timing also played important roles.

    Concluding the Findings

    We know now that certain facts like being female, a child, belonging to the first class, and having a moderate family size played a role in the passenger’s survival. Let us combine these features to determine the survival rate.

    
    # CREATE FEATURES
    
    # Child column
    df['IsChild'] = df['Age'] < 16
    
    # Family size column
    df['FamilySize'] = (
        df['SibSp'] + df['Parch'] + 1
    )
    
    # Moderate family size
    df['ModerateFamily'] = (
        (df['FamilySize'] >= 2) &
        (df['FamilySize'] <= 4)
    )
    
    # Combine all favorable conditions
    combined_condition = (
        (df['Sex'] == 'female') &
        (df['Pclass'] == 1) &
        (df['ModerateFamily'] == True)
    ) | (
        (df['IsChild'] == True)
    )
    
    # Create a new category column
    df['HighSurvivalGroup'] = combined_condition
    
    
    # PLOT SURVIVAL RATE
    
    plt.figure(figsize=(8,5))
    
    sns.barplot(
        data=df,
        x='HighSurvivalGroup',
        y='Survived'
    )
    
    plt.xticks(
        [0,1],
        ['Other Passengers', 'High Survival Group']
    )
    
    plt.ylabel("Survival Rate")
    
    plt.title(
        "Survival Rate Based on Combined Passenger Factors"
    )
    
    plt.show()
    Survival Rate based on Combined Preferred Factors

    The above code combined all the favourable circumstances for survival and compared passengers with these characteristics
    vs everyone else. As can be seen from the graph, the “High Survival Group” had dramatically higher survival rates.

    Conclusion

    In this article, we have successfully analyzed the Titanic dataset using pandas, matplotlib, and seaborn. This is an easy and beginner-friendly tutorial to understand how we can interpret data, plot graphs, and gather insights from them. From the above findings, we can easily group certain features as being favourable to survival. Moreover, these data analytics and findings can also help us in creating an efficient machine learning algorithm in predicting the survival of the Titanic passengers.

    Dataset Exploring Patterns Survival Titanic
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleRivian spinoff Mind Robotics raises another $400M
    Next Article Mark Zuckerberg announces ‘completely private’ encrypted Meta AI chat
    • Website

    Related Posts

    AI Tools

    The Next AI Bottleneck Isn’t the Model: It’s the Inference System

    AI Tools

    The Counterintuitive Networking Decisions Behind OpenAI’s 131,000-GPU Training Fabric

    AI Tools

    I Let CodeSpeak Take Over My Repository

    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    We Got Chatbots to Turn Over Personal Information. How to Keep Yours Safe

    0 Views

    T-Mobile vs. Verizon: Is It Time to Choose a New Carrier?

    0 Views

    Linux devs are fighting the new age-gated internet

    0 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    AI Tutorials

    Quantization from the ground up

    AI Tools

    David Sacks is done as AI czar — here’s what he’s doing instead

    AI Reviews

    Judge sides with Anthropic to temporarily block the Pentagon’s ban

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    We Got Chatbots to Turn Over Personal Information. How to Keep Yours Safe

    0 Views

    T-Mobile vs. Verizon: Is It Time to Choose a New Carrier?

    0 Views

    Linux devs are fighting the new age-gated internet

    0 Views
    Our Picks

    Quantization from the ground up

    David Sacks is done as AI czar — here’s what he’s doing instead

    Judge sides with Anthropic to temporarily block the Pentagon’s ban

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Terms & Conditions
    • Privacy Policy
    • Disclaimer

    © 2026 ainewstoday.co. All rights reserved. Designed by DD.

    Type above and press Enter to search. Press Esc to cancel.