The Pareto distribution, also known as the power law distribution, is a continuous probability distribution that describes phenomena where a small number of events or values contribute disproportionately to the overall distribution. This is often observed in real-world scenarios such as wealth distribution, city population sizes, and website traffic. NumPy, the fundamental package for scientific computing in Python, provides powerful tools for working with the Pareto distribution, allowing us to analyze, simulate, and understand these power law phenomena.
NumPy's Pareto Distribution Function: numpy.random.pareto
The core function for generating Pareto random variables in NumPy is numpy.random.pareto
. This function utilizes the standard form of the Pareto distribution, defined as follows:
import numpy as np
def pareto_pdf(x, a):
"""
Calculates the probability density function (PDF) of the Pareto distribution.
Args:
x: The value at which to evaluate the PDF.
a: The shape parameter of the distribution.
Returns:
The probability density at x.
"""
return a * (1/x)**(a + 1)
The Pareto distribution is defined by a single shape parameter, typically denoted as 'a' (often called the "alpha" parameter). This parameter controls the heaviness of the tail of the distribution, and its value directly influences the probability of extreme events. Here's a breakdown of the numpy.random.pareto
function:
Syntax:
numpy.random.pareto(a, size=None)
Parameters:
a
: The shape parameter of the Pareto distribution. It must be a positive number (a > 0).size
: An integer or tuple of integers representing the shape of the output array. If omitted, a single Pareto random variate is returned.
Return Value:
The function returns an array of Pareto random variates with the specified shape, following the defined distribution with the given shape parameter 'a'.
Generating Pareto Random Variables
Let's illustrate how to use numpy.random.pareto
to generate samples from a Pareto distribution with a shape parameter of 2. We'll generate 1000 random variables:
import numpy as np
# Shape parameter (a)
a = 2
# Generate 1000 random Pareto variates
pareto_samples = np.random.pareto(a, size=1000)
# Print the first 10 samples
print(pareto_samples[:10])
Output:
[0.19805375 0.5353166 0.76902043 1.67280278 0.37562805 0.09220094
1.32054709 0.2806294 1.03186356 0.83267785]
This code generates 1000 random numbers from a Pareto distribution with a shape parameter of 2. Notice that the distribution is heavily skewed towards smaller values.
Exploring the Pareto Distribution: Practical Examples
Example 1: City Population Distribution
The Pareto distribution is often used to model city population sizes. The distribution is such that a small number of cities have very large populations, while the vast majority have much smaller populations. Let's simulate a distribution of city populations using NumPy's pareto
function:
import numpy as np
import matplotlib.pyplot as plt
# Shape parameter (a)
a = 1.5
# Generate 100 city population sizes
city_populations = np.random.pareto(a, size=100)
# Sort the populations in descending order
city_populations = np.sort(city_populations)[::-1]
# Plot the distribution
plt.plot(city_populations)
plt.xlabel('City Rank')
plt.ylabel('Population')
plt.title('Simulated City Population Distribution (Pareto)')
plt.show()
Output:
The output will show a plot representing the population distribution of the 100 simulated cities. The plot will demonstrate the classic Pareto pattern: a few very large cities and a long tail of smaller cities.
Example 2: Wealth Distribution
The Pareto distribution has also been used to model wealth distribution. It's often said that a small percentage of the population owns a significant proportion of the wealth. Let's simulate this scenario:
import numpy as np
import matplotlib.pyplot as plt
# Shape parameter (a)
a = 1.8
# Generate 1000 wealth values
wealth_values = np.random.pareto(a, size=1000)
# Sort the wealth values in descending order
wealth_values = np.sort(wealth_values)[::-1]
# Plot the distribution
plt.plot(wealth_values)
plt.xlabel('Individual Rank')
plt.ylabel('Wealth')
plt.title('Simulated Wealth Distribution (Pareto)')
plt.show()
Output:
Similar to the city population example, the output will show a plot representing the wealth distribution, highlighting the power law phenomenon.
Conclusion:
The Pareto distribution is a powerful tool for modeling real-world phenomena that follow a power law. NumPy's numpy.random.pareto
function provides an efficient and versatile way to work with this distribution, allowing us to generate random samples, analyze data, and simulate scenarios that exhibit this important statistical pattern.