Data is seldom in a form ready for use in a simulation model. Usually some analysis and conversion needs to be performed for the data to be useful as an input parameter to the simulation. Random phenomena must be fitted to some standard, theoretical distribution such as a normal or exponential distribution (Law and Kelton, 1991), or be input as a frequency distribution.
To define a distribution using a theoretical distribution requires that the data, if available, be fit to an appropriate distribution that best describes the variable. An alternative to using a standard theoretical distribution is to summarize the data in the form of a frequency distribution that can be used directly in the model. A frequency distribution is sometimes referred to as an empirical or user-defined distribution.
Whether fitting data to a theoretical distribution, or using an empirical distribution, it is often useful to organize the data into a frequency distribution table. Defining a frequency distribution is done by grouping the data into intervals and stating the frequency of occurrence for each particular interval. To illustrate how this is done, the following frequency table tabulates the number and frequency of observations for a particular task requiring a certain range of time to perform.
Delivery Time(days) | Number of Observations | Percentage | Cumulative Percentage |
0-1 | 25 | 16.5 | 16.5 |
1-2 | 33 | 21.7 | 38.2 |
2-3 | 30 | 19.7 | 57.9 |
3-4 | 22 | 14.5 | 72.4 |
4-5 | 14 | 9.2 | 81.6 |
5-6 | 10 | 6.6 | 88.2 |
6-7 | 7 | 4.6 | 92.8 |
7-8 | 5 | 3.3 | 96.1 |
8-9 | 4 | 2.6 | 98.7 |
9-10 | 2 | 1.3 | 100.0 |
Total Number of Observations = 152
While there are rules that have been proposed for determining the interval or cell size, the best approach is to make sure that enough cells are defined to show a gradual transition in values, yet not so many cells that groupings become obscured.
Note in the last column of the frequency table that the percentage for each interval may be expressed optionally as a cumulative percentage. This helps verify that all 100% of the possibilities are included.
When gathering samples from a static population, one can apply descriptive statistics and draw reasonable inferences about the population. When gathering data from a dynamic and possibly time varying system, however, one must be sensitive to trends, patterns, and cycles that may occur with time. The samples drawn may not actually be homogenous samples and, therefore, unsuitable for applying simple descriptive techniques.
Distribution | Syntax | Individual Components |
Beta | B(a,b,c,d) |
a=shape value 1 b=shape value 2 c=lower boundary d=upper boundary |
Binomial | B(a,b |
a=batch size b=probability of success |
Erlang | ER(a,b) |
a=batch size b=integer shape parameter |
Exponential | E(a) | a=mean |
Gamma | G(a,b) |
a=shape value b=scale value |
Geometric | GEO(a) | a=probability of success |
Inverse Gaussian | IG(a,b) |
a=shape value b=scale value |
Lognormal | L(a,b) |
a=mean b=standard deviation |
Normal | N(a,b) |
a=mean b=standard deviation |
Pearson5 | P5(a,b) |
a=shape value b=scale value |
Pearson6 | P6(a,b,c) |
a=shape value b=shape value c=scale value |
Poisson | P(a) | a=quantity |
Triangular | T(a,b,c) |
a=minimum b=mode c=maximum |
Uniform | U(a,b) |
a=mean b=half range |
Weibull | W(a,b) |
a=shape value b=scale value |
Any negative value returned by a distribution that is used for a time expression will be automatically converted to zero.
© 2015 ProModel Corporation • 556 East Technology Avenue • Orem, UT 84097 • Support: 888-776-6633 • www.promodel.com