Distributions

Data is seldom in a form ready for use in a simulation model. Usually some analysis and conversion needs to be performed for the data to be useful as an input parameter to the simulation. Random phenomena must be fitted to some standard, theoretical distribution such as a normal or exponential distribution (Law and Kelton, 1991), or be input as a frequency distribution.

To define a distribution using a theoretical distribution requires that the data, if available, be fit to an appropriate distribution that best describes the variable. An alternative to using a standard theoretical distribution is to summarize the data in the form of a frequency distribution that can be used directly in the model. A frequency distribution is sometimes referred to as an empirical or user-defined distribution.

Whether fitting data to a theoretical distribution, or using an empirical distribution, it is often useful to organize the data into a frequency distribution table. Defining a frequency distribution is done by grouping the data into intervals and stating the frequency of occurrence for each particular interval. To illustrate how this is done, the following frequency table tabulates the number and frequency of observations for a particular task requiring a certain range of time to perform.

Delivery Time(days)	Number of Observations	Percentage	Cumulative Percentage
0-1	25	16.5	16.5
1-2	33	21.7	38.2
2-3	30	19.7	57.9
3-4	22	14.5	72.4
4-5	14	9.2	81.6
5-6	10	6.6	88.2
6-7	7	4.6	92.8
7-8	5	3.3	96.1
8-9	4	2.6	98.7
9-10	2	1.3	100.0

Total Number of Observations = 152

While there are rules that have been proposed for determining the interval or cell size, the best approach is to make sure that enough cells are defined to show a gradual transition in values, yet not so many cells that groupings become obscured.

Note in the last column of the frequency table that the percentage for each interval may be expressed optionally as a cumulative percentage. This helps verify that all 100% of the possibilities are included.

When gathering samples from a static population, one can apply descriptive statistics and draw reasonable inferences about the population. When gathering data from a dynamic and possibly time varying system, however, one must be sensitive to trends, patterns, and cycles that may occur with time. The samples drawn may not actually be homogenous samples and, therefore, unsuitable for applying simple descriptive techniques.

Process Simulator Distributions

Distribution	Syntax	Individual Components
Beta	B(a,b,c,d)	a=shape value 1 b=shape value 2 c=lower boundary d=upper boundary
Binomial	B(a,b	a=batch size b=probability of success
Erlang	ER(a,b)	a=batch size b=integer shape parameter
Exponential	E(a)	a=mean
Gamma	G(a,b)	a=shape value b=scale value
Geometric	GEO(a)	a=probability of success
Inverse Gaussian	IG(a,b)	a=shape value b=scale value
Lognormal	L(a,b)	a=mean b=standard deviation
Normal	N(a,b)	a=mean b=standard deviation
Pearson5	P5(a,b)	a=shape value b=scale value
Pearson6	P6(a,b,c)	a=shape value b=shape value c=scale value
Poisson	P(a)	a=quantity
Triangular	T(a,b,c)	a=minimum b=mode c=maximum
Uniform	U(a,b)	a=mean b=half range
Weibull	W(a,b)	a=shape value b=scale value

General Components

Any negative value returned by a distribution that is used for a time expression will be automatically converted to zero.