Why the Monte Carlo Method?

You may have heard other engineers talk about Monte Carlo Methods or Monte Carlo Analysis and thought what on earth is that?  At Five Flute we get a kick out of sharing what we love about engineering and Monte Carlo Analysis is one of our favorite engineering superpowers.  We like it so much that we think all engineers should understand how to use it to solve common (or uncommon!) problems in mechanical engineering.  Selfishly, that is the motivation behind this post.  

To that end, we are going to start with a beginners mind and quickly walk through the Monte Carlo Method; how it works, what it is used for, and why it is useful for engineers in particular.  To clarify our understanding we will work through a number of examples of varying difficulty, including solutions using pseudocode, google sheets and python.  Programming skills are helpful but certainly not a prerequisite for getting something out of this post.  If you have questions about how to apply the Monte Carlo Method to your particular engineering problem then please reach out to william@fiveflute.com and we’ll be happy to help.

Method Description

The fundamental concept with Monte Carlo methods is to use randomness to solve deterministic problems.  For example, if I roll three six-sided dice how often will the sum of the dice be greater than seven?  The problem itself is deterministic, meaning it has one and only one answer.  Dice behave the same way no matter where they are rolled and who rolls them.  And yet, there is an element of randomness as well.  You cannot predict what any individual die will land on before you roll it.  

It is, however, quite easy to see the result of a single dice roll and check against the condition we laid out in our problem statement (do they add up to more than seven?).  That's the beauty and power of the Monte Carlo method.  There are many problems where computing the answer directly is extremely difficult, but checking if the outcome of an “experiment” (in this case a dice roll) is trivial.  The method relies on conducting a large number of sample experiments or trials, such that counting the outcome of each trial tells us the answer to the original question (Note: we will use the terms sample and trial somewhat interchangeably throughout this post).  

Continuing the dice example, we can use a computer to simulate the rolling of the dice 1000 times.  For reference, we have created this simulation in a google sheet linked here.  For each trial we can check if the sum of the dice is greater than seven.  If so, we log a value of TRUE for that trial or FALSE if the sum of the dice is less than or equal to seven.  You end up with some number of TRUE results across the total number of trials.

Dice roll simulation results table illustration

The probability of rolling a sum greater than seven is thus equal to the number of TRUE results divided by the total number of trials.  The law of large numbers dictates that as the number of trials gets bigger and bigger, the ratio of TRUE results to total trials approaches the actual probability we are trying to solve for.

This is true for all Monte Carlo simulations you create.  The more trials you run, the higher confidence you can have that the results of your simulation are accurate. Extending this to the error in your model, we can define error convergence as the reduction in model error as a function of trial or sample count.  This can often be represented and even checked graphically.

Illustrated error convergence showing how simulation results change as a function of trial count

We will discuss error convergence in greater depth in a later example, but for now it is enough to know that it is something we should be thinking about at a minimum, and checking explicitly if we are relying on our Monte Carlo models to be very accurate (< 3% error or better).  

Our first google sheets simulation using 1000 trials shows the following results.

So you might be thinking, so what!?  I’m trying to do mechanical engineering here, not learn how to play craps better.  Hang in there! The challenge is to figure out ways to express our engineering problems in a similar fashion to the dice problem so that we can use statistical inference to solve them.  

Fundamentals - Dice and Beyond

Now that we have a simplistic view of the method we can introduce a more formal definition.  In Math, Numerics, & Programming: for Mechanical Engineers, Yang et. al. introduce the following definition.

Broadly, Monte Carlo methods are a class of computational techniques based on synthetically generating random variables to deduce the implication of the probability distribution

There is a lot to break down there, but we can take it one step at a time.

Computational techniques are methods of solving problems using computers by  step-wise, repeated, and iterative solution methods.  Like using an array formula in a spreadsheet, or running a function in a for loop in python.

Random variables are variables with numerical values based on a random process or phenomenon, like a dice roll.  There are two primary types that we care about from the standpoint of Monte Carlo methods; discrete and continuous. We’ve already seen one discrete random variable, represented physically by a single die.  When you roll the die it can land on any of its six faces having values of 1, 2, 3, 4, 5 or 6.  It is discrete because there are n possible values where n = 6.  In the case of the die, each of the six discrete values is equally likely to occur.  Rolling a two is likely to occur one out of every six rolls, just as rolling a five is also likely to occur with the same frequency.  Mathematically we can represent the die with a probability mass function which shows the probability of the discrete random variable being a particular value in the sample space.  In this case the distribution is uniform, so the probability mass function for an individual die looks like this.

Probability mass function for a single six sided die.

In reality, many processes exhibit randomness but in non discrete ways.  Take for example, the outer diameter of a machine turned shaft with a target dimension of 1.000” and a specified symmetric tolerance of ±0.002”.  There are many factors that can impact the actual diameter of the shaft when it is manufactured:  lathe accuracy, tool wear, programming errors, temperature, operator errors, etc.  One common method for modeling processes like this is to use the normal distribution function.  This is a continuous random variable with a non-uniform probability density function.  Ideally you are familiar with normal distributions from statistics, but we will do a brief review here.

The normal distribution is described by the Gaussian function (general form below).

Gaussian function, generic form.

A normally distributed random variable can be modeled using the Gaussian function where

Thus the equation takes the form below.

Normal distribution function.

Note that when

the exponent term goes to zero and

This means that the peak of normal distribution is equal to

showing the trade-off between variance, height, and width of the normal distribution function.  We can represent this trade-off graphically as well, just to help build our intuition about normal distributions.

mean and standard deviation of normal distributions visualized


Continuing with our shaft example, we know that the shaft will have a manufacturing error on the diameter.  A common practice is to assume that three standard deviations will fall within the specified tolerance range of ±0.002”, meaning 99.7% of parts will be within a diameter range from 0.998” to 1.002”.

3 sigma coverage of tolerance band 

We can represent the tolerance on the diameter using the normal distribution function and our three sigma assumption.  The probability density function becomes

(plotted below).

Now we have a general model for the manufacturing tolerance on this feature.  The keys to using this model in a Monte Carlo simulation are randomly sampling from this distribution and using enough samples such that we get accurate results.  Just as rolling a single die was simulated by randomly choosing a number between 1 and 6 (inclusive), simulating the manufacture of a shaft is accomplished by randomly sampling the probability density function of its diameter.  You can think of this as choosing any random point in the area under the curve, and then using the associated function (y) value corresponding to the x-value of the random point.  Before we finish this example, it will help to understand the general framework for basic Monte Carlo simulations by defining a few common functions and developing a reusable simulation structure using pseudocode.

Basic Monte Carlo Simulation Framework

  1. Construct PDFs.  Represent all variables as probability density functions, either discrete or continuous.
  2. Construct functions to sample from these distributions.  In google sheets you can use the NORMINV() function combined with RAND() for random sampling from normal distributions.  In python you can use the np.random.normal from the numpy package.
  3. Construct a trial check function.  This function looks at an individual simulation trial and checks the results of that trial against some deterministic outcome.  For example, is the sum of the three dice greater than seven (true or false)?  Note, this function should aggregate and store the results of your individual trials for later use.  In excel or g-sheets this usually means constructing a trial results column.  In python or MATLAB you can use a list, tuple, or array (or other data structure) to store the trial results.
  4. Construct a simulation results check function.  This function poses your engineering question in such a way that it can be answered by running some computation on the individual trial results.  This is where your creativity really comes into play, and it is also where the flexibility and power of the method comes alive.  Coming up with elegant ways to relate the aggregate trial data to engineering problems is what allows this method to work in so many domains. We will explore a non-traditional usage of this method in the Computing pi example section of this post.
  5. Construct the main simulation using the following pseudocode as a guide.

# For each trial (main simulation loop)

# sample from the relevant probability distribution functions

# check the samples using your trial check function(s)

# store the results of each trial check

# Run the simulation results check function to compute your simulation results

Optional: Wrap the entire simulation in a convergence check function that checks the simulation results as trial count is increased.  The goal here is to ensure that you are running enough trials to get a trustworthy result, and this function allows you to both understand and dial in your error in a computationally efficient way.  If it was free to run 100 billion trials on every simulation you would not need to check convergence, but in reality computation time needs to be considered for more complicated models.  It is important to know the minimum number of trials needed to construct a valid simulation for your particular problem.

Note: There are numerous other simulation frameworks that you can use for Monte Carlo simulation.  Some simple simulations benefit from running the results check function inside the main simulation loop, incrementing results variables after each trial.  The suitability of one framework over the other depends on the nature and complexity of the problem being solved and your personal preference.

Random Variables In Simulation - Shaft Diameter Problem

Now let us return to the shaft diameter example by posing the following problem. The 1.000” diameter shaft in question is used in a conveyor drive system that has already been designed.  The parts for 2000 conveyor units have been fabricated already, including 2000 shafts. Through testing, engineers have discovered that shafts with diameters below 0.999 will fail due to fretting corrosion and are unusable.  Shafts with diameters above 1.0015 are usable, but require a costly thermal press fit using special installation tooling.  Suitable replacement shafts cost $320 to manufacture.  The thermal press fit costs $20 per installation. Measuring the shafts individually will cause a delay in the project schedule. It is your job to estimate how many new shafts must be manufactured and what is the total cost of new shafts and special installation procedures?

We can tackle this problem using our Monte Carlo simulation framework.

Sample from PDF  

We will do this by sampling from our normal distribution function with mean of 1.0 and standard deviation of 0.002 / 3 (±0.002 symmetric tolerance with 3 tolerance band coverage).  In google sheets we use this function to sample from the distribution of shaft diameters.

Trial Check Function

There are three conditions we are trying to check:

  • Corrosion failure: shaft diameter < 0.999
  • Requires thermal press fit: shaft diameter > 1.0015
  • Shaft OK: 0.999 <= shaft diameter <= 1.0015

We can check these conditions with the following function in google sheets.

Results Check Function

To check our results we need to count the number of corrosion failures and thermal press fits required in our sample population.  We can do this by running the following function on the trial check array.

With the three conditions counted for our trial data set, we can get the total cost by summing the individual costs of each condition as follows.

Example Spreadsheet Simulation and Results Interpretation

We’ve created a 2000 trial simulation of the shaft diameter problem in google sheets that is available here.  From our first simulation we obtained these results.

It's important to remember a few key facts about samples and the sample population when interpreting our results.  Each trial is a random sample that tends to exhibit the same properties as the population from which it was drawn, but there is no guarantee that it will have a particular value (precisely because each trial is a proper random subset of the population).  Remember our question in this case was “how many new shafts must be manufactured and what is the total cost of new shafts and special installation procedures?”  Without measuring all the shafts we cannot know the exact number.  All we can do is make inferences about the likely population of shafts.  Our individual simulation of 2000 shafts represents only one possible future outcome and does not give us the complete picture of all possible outcomes, which will have its own probability density function which we can compute.  To gain more information about the distribution of shaft diameters we could conduct n simulations of 2000 trials and then aggregate the results into a histogram of shaft conditions (corrosion, thermal fit, or OK).    

For now just keep in mind that if you are trying to answer questions concerning the probability of certain outcomes, you may be able to get very reliable answers by sampling a very large population.  If you are trying to get exact answers about real world populations it may not be possible unless you rephrase the question.  We will explore this in a later example using python.

Monte Carlo Simulation Use Cases

There are a number of common use cases in mechanical engineering to which Monte Carlo simulation naturally applies.  A survey paper on the application of Monte Carlo methods in engineering highlights the following domains where it is commonly used.

Application to reliability, statistical tolerancing and uncertainty is fairly straightforward and is what the majority of engineers will use it for in their day to day practice.  There are, however, creative ways to construct results checking functions such that we can solve problems in a variety of domains (including numerical integration of complex functions).  To demonstrate this we will create a simulation to compute the digits of pi.  

Computing Pi

For this example we will take advantage of some interesting geometric properties of squares and inscribed circles while also using symmetry to simplify the trial sample space.  The question is, how can we use the Monte Carlo method to compute pi?  Our answer will use the area ratio method, along with simple checks on each trial.  

First we can start by looking at the relationship between the area of a unit square and a circle inscribed inside the unit square’s boundary that is tangent to the square along all four sides.  We can derive a relationship between pi and the area of the square and inscribed circle.

Relationship between unit square and inscribed circle

Next we imagine randomly selecting points anywhere inside the square.  If these points are distributed randomly and there are enough of them, we can reliably compute the ratio of the square’s area to that of the inscribed circle.  The key assumption is that the distribution of points inside the circle is uniform.  Using our previous derivation, we can relate the sample count to pi as follows.

pi expressed as a function of uniformly distributed simulation trials


Finally, we can exploit symmetry to make writing our sample PDF and trial check functions easier.  We will use only the first quadrant in our simulation.

Reducing the simulation space to the first quadrant only

Apply the Framework

Again we can apply our Monte Carlo simulation framework to solve this problem. For this example we will create solutions using both Google Sheets and python.

  1. Construct and sample from PDF.   In this case we want a random x, y coordinate inside the first quadrant unit square.  This is achieved by sampling from the uniform probability distribution between 0 and 1 inclusive for each coordinate value (x and y).  In Google sheets we can use the =RAND() function. In python we can use np.random.uniform(1, 0) from the numpy package.
  2. Construct trial check function.  We are looking for a way to check if each trial lies inside the inscribed circle.  We know that points inside the circle have a radius of less than or equal to 1.  Thus we can compute the radius from our coordinate values using radius TODO FIX FORMULA= x2+y2.  In google sheets we check this computation using the following formula.

In python we can create a function called isInsideCircle as follows:

def isInsideCircle(x, y):
    """
    This is our trial results check function
    args: x and y coordinates of trials
    returns: type <int>, 1 if point is inside circle, 0 if outside
    """
    r = np.sqrt((x * x) + (y * y))
    
    if r <= 1:
        return 1
    else:
        return 0
  1. Construct a simulation results check function.  Based on our derivation, we can compute pi by adding up the total number of samples inside the circle, multiply this by four and dividing by the total number of samples.  

The complete simulation is included on the third tab of this google sheet.  Plotting points inside and outside of the circle gives us a sense of how sparse the sample set is with only 1000 trials.

The python solution is included below.

import numpy as np

n = 100000 # number of trials

def isInsideCircle(x, y):
    """
    This is our trial results check function
    args: x and y coordinates of trials
    returns: type <int>, 1 if point is inside circle, 0 if outside
    """
    r = np.sqrt((x * x) + (y * y))
    
    if r <= 1:
        return 1
    else:
        return 0
    
results = [] # initialize a list to store our trial results

# Main simulation loop
for trial in range(n):
    results.append(isInsideCircle(np.random.uniform(0, 1), np.random.uniform(0, 1)))

# compute pi from aggregate trial results    
pi = 4 * sum(results) / len(results)
print("monte carlo method computed value of pi is: ", pi)   

    

Checking Convergence

In our google sheets example for computing pi we only ran 1000 trials.  Because of this our answer is not very accurate.  The advantage of using a programming language like python is that you can very easily parameterize the number of trials in your model and run larger simulations than is practical in excel or google sheets. Parameterizing the number of trials per simulation allows us to check for model convergence by exploring the impact of trial count on the accuracy of our simulation.  Using the pi example is a great starting place because we know the answer already and can therefore compute the actual error for any particular simulation.  By putting the simulation code that we used earlier into a loop where we gradually increase trial count it is possible to visualize convergence.

We can also compute the error % directly along with a moving average of this error to further demonstrate the convergence trend.

Examining the last 500 trials in detail shows that error rates as high as 1.5% are still common at trial counts below 10,000.  This simulation tends to converge slowly and would require an enormous amount of trials to reach accuracy in the 0.1% range.

Running a single simulation with 100,000,000 trials results in a value of 3.14185172 with an error of 0.008246%.  This simulation took several minutes to run using the python script.  However in more complicated models where there are multiple PDFs to sample and more computational overhead in each trial loop, running this many trials is often prohibitive.  It becomes critical to think about the error that you can tolerate and how that will impact the computational cost of your models before you begin constructing them.  

Simple tolerance stack example

Now that we’ve seen how to use the Monte Carlo method for a variety of problems, let’s return to the common case of tolerance analysis in mechanical engineering. Suppose we are designing a hydraulic piston and cylinder actuator that uses O-rings to seal around the piston (see assembly illustration below).

The O-ring is designed to work in a certain interference (compression) range which we will call X.  The cylinder, piston and O-ring have the following dimensions and manufacturing tolerances.

O-ring diameter = 3mm ±0.09 mm = DO

Cylinder inner diameter = 25mm ±0.1 mm = DC

Piston ring groove diameter =22.5mm ±0.03 mm = DP

In order for the piston seal to function properly, the O-ring compression must remain between 0.3mm and 0.6mm.  With the above manufacturing tolerances, what percentage of actuators will not be functional?  

Monte Carlo Solution

Similar to the shaft example, we will assign normal probability distributions to all three components of our assembly, assuming there is 3 sigma coverage in the specified tolerance band.  We will then sample from each distribution to “build an assembly” for each simulation trial.  Finally we will check each assembly against our functional O-ring specification and aggregate the results.

import numpy as np
import matplotlib.pyplot as plt

n = 1000000 # number of trials

# Component specifications
sigma = 3       # assume 99.7% of components in tolerance spec
dp = 22.5       # piston diameter
tol_dp = 0.03   # piston diameter tolerance
do = 3          # o ring diameter
tol_do = 0.09   # o ring diameter tolerance
dc = 25         # cylinder diameter
tol_dc = 0.1    # cylinder diameter tolerance


def calculateInterference(dp, do, dc):
    """
    in: diameters
    out: o ring interference 
    """
    
    interference = dp + do - dc
    return interference

def checkInterference(interference, lower, upper):
    """
    in: o ring interference
    out: Bool, T for pass, F for fail
    """
    
    if interference < lower or interference > upper:
        return False
    else:
        return True

# set interference limits
lower = 0.3 # smallest permissable interference 
upper = 0.6 # largest permissable interference
      
interference_list = []  # initialize a list to store interference values per trial 
results = []            # initialize a list to store our trial results

# Main simulation loop
for trial in range(n):
    # Sample PDF for each component
    piston_sample = np.random.normal(dp, tol_dp/sigma)
    oring_sample = np.random.normal(do, tol_do/sigma)
    cylinder_sample = np.random.normal(dc, tol_dc/sigma)
    
    # compute and store interference 
    interference = calculateInterference(piston_sample, oring_sample, cylinder_sample)
    interference_list.append(interference)
    
    # Trial check: log results of interference check
    results.append(checkInterference(interference, lower, upper))

# results check
goodAssemblyCount = sum(results)
failurePercentage =  100*((n - goodAssemblyCount) / n)

print('Percentage of failed piston assemblies: ', failurePercentage, "%")

Our one million trial simulation shows a piston assembly failure percentage of 1.4738 %.

However, we don’t know the distribution of failures or how we might modify our design to prevent these failures.  To visualize the distribution of O-ring interference we can plot a histogram of our simulation data with the following python code.

#---Plot results---
#setup histogram parameters for plotting
n_bins = 500
heights, bins, _ = plt.hist(interference_list, bins=n_bins) # get positions and heights of bars

bin_width = np.diff(bins)[0]
bin_pos = bins[:-1] + bin_width / 2

# mask for coloring failures differently
maskleft = (bin_pos <= lower)
maskright = (bin_pos >= upper)

# plot data in three steps
plt.figure(figsize=(14, 4))
plt.bar(bin_pos, heights, width=bin_width, color='#281E78', label='Good Assemblies')
plt.bar(bin_pos[maskleft], heights[maskleft], width=bin_width, color='#ea4228', label='left tail failures')
plt.bar(bin_pos[maskright], heights[maskright], width=bin_width, color='#49C6E5', label='right tail failures')

# axis labels and vertical markers of left and right tail failures
plt.xlabel("O ring interference (mm)")
plt.ylabel("# samples")
plt.axvline(lower, linestyle='--',linewidth=1, color='grey')
plt.axvline(upper, linestyle='--',linewidth=1, color='grey')

# remove upper and right spines, set ticks
ax = plt.gca()
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')

plt.legend()
plt.show()

The histogram of O-ring interference shows that all failures are happening on the right tail of the distribution, meaning there is too much interference.

Histogram of O-ring interference values in simulated assemblies

Given that there is room on the left tail of the distribution a simple fix is to adjust one of the dimensions in the tolerance stack to shift the assembly distribution left. We can simulate this by reducing the piston ring groove diameter from 22.5mm to 22.45mm, while keeping all tolerances the same.  Simulating this condition shows failure is reduced to 0.1118% by centering the O-ring interference distribution between the left and right tail cutoffs.  This is visualized below

In this case, the simulation allows us to not only estimate the probability of failures, but also it points us in the direction of an easy design fix.

Takeaways

The main thing we hope you take away from this post is that the Monte Carlo method is not difficult to implement and can be incredibly flexible and useful for a variety of common mechanical engineering problems.  Understanding the basic framework allows you to quickly create useful simulations in spreadsheets or python code in just a few minutes.  When assessing the accuracy of your simulations it is important to remember these key facts:

  • The probability density functions you use must be representative of the actual system behavior in the real world.  Check your assumptions and be conservative.
  • The samples you draw from the probability density functions must actually be random samples, otherwise your models will not represent the population accurately.
  • The law of large numbers dictates that as sample count increases, model accuracy increases and error decreases.  The rate of error decrease is influenced heavily by the amount of variance in your random variables.  Larger variance requires more samples to regress to the mean.
  • For models that require highly accurate answers, trial count should be increased until the accuracy of the simulation no longer changes, or changes an acceptably small amount.  Checking convergence does not guarantee that the model is accurate, but it does yield efficient results by allowing you to balance the cost of compute against marginal increases in model accuracy.  In some ways this is similar to the process of reducing mesh size in finite element models until the resulting stress and strain distributions begin to stabilize with each subsequent model.  

If you made it all the way to this point then congrats, I know this was a long one so thanks for sticking with it.  We hope that you now feel comfortable and confident enough to try out Monte Carlo simulation on your own engineering problems.  If you have any questions, need additional help working through the examples, or think you’ve spotted a mistake in this post please reach out to william@fiveflute.com.  Thanks!