So, you want to get into the airplane business? (Project Summary)

Image Source: AirPartner

The company we work for wants to expand its operations and is looking to acquire airplanes for both commercial and private enterprise purposes. Due to a lack of experience in this arena, the company’s Head of Aviation has tasked us with engaging data on aircraft accidents and incidents, and surmising recommendations based on insights gained from this investigation as to which airplanes would be the best to invest in.

In essence, we need to answer a crucial question: Which airplanes carry the lowest risk? Or, in other words, What are the characteristics of the airplanes with the lowest risk?

Data Understanding

The dataset we will be using in our investigation is the Aviation Accident Database & Synopses, up to 2023, which can be found on kaggle.com. This database contains information on more than 88,000 aircraft accidents and incidents dating back to the 1960s, and each entry holds data across more than 30 categories.

This data is rather limited, as it only covers information on accidents and incidents, and as such can be misleading. For example, it would be wrong to conclude that a certain type of airplane’s over-representation in this dataset indicates its lack of safety or high risk associated with it, as a more reasonable conclusion would be that this over-representation is due to the fact that this model is more widely used by the general public. Ideally, we would have data on more than just accidents and incidents associated with aircraft types, so that we would be able to measure the rate of an aircraft’s journey ending in an incident vs. an aircraft’s journey ending safely.

This dataset also includes information regarding incidents associated with not just airplanes, but other types of aircraft such as helicopters as well. Furthermore, the categories contain much information that is superfluous to our purposes, such as the location of the incident, the intended destination, weather, etc.

The categories that are critical to our analysis, and which we will focus on, are as follows:

  • Number of Engines
  • Engine Type
  • Make and Model of Airplane
  • Injury Severity and Total Injuries
  • Aircraft Damage

There are also a number of confounding factors that are not accounted for in this dataset but which would impact on the results, namely:

  • Age of the aircraft
  • Aircraft maintenance policies of the responsible airlines
  • Experience and talent of the pilot and screw

Data Preparation

Thankfully we are able to use the Pandas library in Python to easily manipulate the dataset for our purposes. We begin by excluding all the following entries:

  • Incidents prior to 1982, due to changes in flight and aviation standards.
  • Incidents during and after the outbreak of the COVID-19 pandemic, which saw a sharp decline in air travel and would skew our findings.
  • All incidents pertaining to any aircraft except airplanes, which we are focused on.
  • Any incident involving an amateur-made airplane.

Calculating Damage Severity

One of the relevant categories, Aircraft Damage, only indicates one of three broad entries: Minor, Substantial, Destroyed.

In order to make better use of these entries, we used a lambda function to assign a value of “1” to any entry with Aircraft Damage listed as Substantial or Destroyed, and a value of “0” to aircrafts with “Minor” damage. This way, we are able to calculate an average “Damage Severity” for each airplane type.

Creating New Datasets

Next, we use this cleaned dataset to create a number of new datasets that can in a more focused way study the relationship between the critical categories identified above.

Namely, these new datasets interrogate the relationship between Airplane Type (Make/Model), Engine Type, and Number of Engines, on the one hand, and Total Injuries and Damage Severity, on the other hand.

Visualizations

Using Tableau Public, we’re able to deploy these new datasets and in a more clear way model the relationships between the factors listed. The Tableau Public Dashboard can be accessed here.

As we can see from the graph on the top-left, airplanes with Turbo Fan engines have much lower rates of serious and minor injuries, as well as significantly higher rates of uninjured persons.

The graph on the top-right demonstrates how airplanes with 2–4 engines have lower rates of damage severity to the vehicles, as airplanes with a single engine have higher rates of damage severity.

Finally, the bottom graph, which looks at the relationship between airplane models/number of engines and rates of severe damage to the vehicle, shows that Cessna planes with 4 engines have the lowest severe damage rates.

Summary & Recommendations

Despite the limitations presented by the dataset, we were able to surmise several insights and put together three key recommendations to the Head of Aviation.

  1. The safest option would be to acquire Cessna airplanes with 4 Turbo Fan engines.
  2. It is best to avoid airplanes with Reciprocating engines.
  3. Ensure that the airplane, regardless of model, has at least 2 engines.