Igor's Techno Club

90th Percentile Meaning: Statistics In Simple Terms

In this article, we dive into the world of percentiles, a fundamental concept in statistics that helps us make sense of data distributions. From basic definitions to practical applications, we explore how percentiles work, their importance in various fields, and how they compare to other statistical measures like the mean. Whether you're a student, professional, or simply curious about data analysis, this guide will equip you with the knowledge to interpret and use percentiles effectively.

Meaning of the 90th percentile

In simple terms, the 90th percentile is a value in a dataset that's greater than 90% of the other values. It means that 90% of the data falls below this point, while 10% is above it. This measure is often used to understand the distribution of data and identify high-end values.

What is Percentile in general

Percentile is simply a way to understand where a particular value stands in relation to the rest of a group of numbers. It tells you what percentage of values fall below a certain point in a dataset.

Percentiles help us understand if a score is high or low compared to others. They make it easier to see how well you did, whether it's on a test, in a game, or in any situation where you're being compared to a group.

What is P50?

P50 is a shorthand way of referring to the 50th percentile. Here's what it means:

  1. The 50th percentile, or P50, is the median of a dataset.

  2. It's the middle value when all the numbers in a dataset are arranged in order from lowest to highest.

  3. Half of the values in the dataset are below P50, and half are above it.

  4. For example, if P50 of test scores is 75, it means:

    • 50% of students scored 75 or lower
    • 50% of students scored 75 or higher
  5. P50 is often used in statistical analysis and reporting because it gives a good indication of the "typical" or "average" value in a dataset, without being skewed by extreme values (unlike the mean).

  6. You might see similar notation for other percentiles, like P25 (25th percentile) or P90 (90th percentile).

P50 is particularly useful because it divides the dataset into two equal halves, giving a balanced view of the data's central tendency.


Please subscribe to the blog



What is P90?

The 90th percentile (P90) is calculated similar to P50, for for 90th percent. Lets see different example how it may be used.

The 90th percentile (P90) of annual salaries in a company represents the salary level at which:

  1. 90% of employees earn this amount or less.
  2. 10% of employees earn more than this amount.

For example, if the P90 of annual salaries in a company is $120,000, it means:

This information can be useful in several ways:

  1. For employees: It gives an idea of the upper range of salaries in the company.
  2. For management: It helps in setting salary caps or identifying highly paid positions.
  3. For job seekers: It provides insight into the potential earnings at the higher end of the pay scale.

P90 in this context helps to understand the upper tier of the salary distribution without being skewed by a few extremely high salaries (like those of top executives). It gives a more realistic view of what a high earner in the company might make, as opposed to looking at the absolute highest salary.

What is the Mean?

The mean, also known as the average, is a common way to describe a set of numbers. To find the mean, you add up all the numbers and then divide by how many numbers there are.

For example, let's say we have the ages of five children:

To find the mean age, we add these numbers together and then divide by 5:

Mean = (8 + 10 + 7 + 9 + 6) ÷ 5 = 40 ÷ 5 = 8

So, the mean age of the children is 8 years.

Average (mean) vs P50 (median)

Why the Difference Matters

  1. Outliers: The mean can be affected by outliers (very high or low values). A single outlier can change the mean a lot. The median (P50) is not affected by outliers. It gives a better idea of the typical value when outliers are present.

  2. Shape of Data: When data is evenly spread out (like a bell curve), the mean and median are close or the same. When data is skewed (more values on one side), the median is a better measure of the typical value.

For example, let's look at household incomes in a small town:

The mean income is $182,143, which is much higher than the median income of $50,000. This is because of the outlier (the $1,000,000 income). The median gives a better sense of a typical income in this town.

Visualizing the Differences

Let's look at a picture to better understand the difference between mean and median.

Income Plot

In this income plot, the green dashed line (mean) is much higher than the red dashed line (median) because of the outlier income of $1,000,000. This outlier greatly affects the mean, making it less representative of the typical income in the dataset.

When to Use Mean and When to Use P50

Real-World Uses

  1. Income Data: Governments often report median household income instead of mean to avoid the effect of very high incomes.

  2. Test Scores: Schools might use the median score to understand student performance, especially if a few students have very low or high scores.

  3. House Prices: Real estate listings might use the median house price to give a better idea of what most houses cost in an area.

Common questions

What does it mean to be in the 90th percentile?

In the context of a test, scoring in the 90th percentile means:

What does 90th percentile mean?

The 90th percentile means that 90% of the values in a dataset are below this point, and only 10% are above it. To put it in simpler terms: Imagine you're in a class of 100 students who all took a test. If you scored in the 90th percentile, it means:

You did better than 90 students Only 9 students scored higher than you You're among the top 10 performers in the class


Like the article so far? Support the author

Patreon


What does the 25th percentile mean?

Similar to the previous example, we can explain 25th percentaile as:

Imagine you're in a class of 100 students who all took a test. If you scored in the 25th percentile, it means:

You did better than 25 students 75 students scored higher than you

Is The 50th Percentile The Median?

Yes

How to find the median

The median is the middle value in a dataset when it is arranged in order. To find the median, you can follow these steps:

Example with an odd number of elements

Let's say you have a dataset of numbers: [12, 4, 7, 1, 9, 3, 8, 6, 5]

  1. Arrange the dataset in order from smallest to largest: [1, 3, 4, 5, 6, 7, 8, 9, 12]
  2. Find the middle value(s) of the dataset. If the dataset has an odd number of elements, the middle value is the median. If the dataset has an even number of elements, the median is the average of the two middle values.

In this example, since there are 9 elements (an odd number), the median is the 5th value: 6

Example with an even number of elements

Let's say you have a dataset of numbers: [10, 3, 6, 2, 8, 4, 5, 7]

  1. Arrange the dataset in order from smallest to largest: [2, 3, 4, 5, 6, 7, 8, 10]
  2. Find the middle value(s) of the dataset. Since there are 8 elements (an even number), the median is the average of the two middle values: (6 + 7) / 2 = 6.5

So, the median of this dataset is 6.5.

How to calculate percentile in Excel

To calculate percentiles in Excel, you can use the PERCENTILE or PERCENTILE.INC function. Here's a step-by-step guide:

Enter your data into a column. In an empty cell, use one of these formulas: =PERCENTILE(range, k) or =PERCENTILE.INC(range, k) Where:

'range' is the array or range of data 'k' is the percentile you want (as a decimal)

For the 90th percentile, you would use 0.9 for k. Example: If your data is in cells A1:A100, the formula would be: =PERCENTILE(A1:A100, 0.9) Excel will then calculate the 90th percentile of your data set. Note:

PERCENTILE is available in older versions of Excel PERCENTILE.INC is for Excel 2010 and later

How to calculate percentile in Python

def simple_percentile(data, percentile):
    sorted_data = sorted(data)
    n = len(sorted_data)
    index = int(np.ceil((n * percentile) / 100)) - 1
    return sorted_data[index]

# Example usage
data = [15, 20, 35, 40, 50]
print(f"90th percentile: {simple_percentile(data, 90)}")
print(f"50th percentile: {simple_percentile(data, 50)}")

Conclusion

Understanding the difference between P50 (median) and mean helps us make better sense of data. The mean gives an overall average, but can be affected by outliers. The median gives the middle point, which is often more typical, especially when data is skewed. Knowing when to use each measure is important in understanding and presenting data accurately.

#average #data analysis #excel #median #percentiles #statistical metrics