AN ANALYSIS OF THE TIMES OF STOPPAGES OF EQUIPMENTS IN A FOOD INDUSTRY: A CASE STUDY UMA ANÁLISE DOS TEMPOS DE PARADAS DE EQUIPAMENTOS EM UMA

The main objective of this study is to analyze the data related to the times of stoppages of equipment due to several causes in a food industry in the fruit sector located in São Paulo State, Brazil. Several causes can affect the performance of such equipment as the industry sector where the equipment is used, the type of equipment, the period of operation and the harvest (year).The results of this analysis could be of great interest to managers of industry in terms of better planning on the use of machinery, detection of the risk factors for the use of machinery and minimization of maintenance shutdowns.


INTRODUCTION
The study of variations in the times of stoppages of equipment used in an industry are of great interest to the industrial engineers. The statistical modeling of these data is useful for diagnosis (performance indicators by means of the queuing theory), inference or simulation, especially for decision making on investment, planning production, allocation of maintenance, among many others. The discovery of possible factors that act in these variations can be of great interest to engineers and industrial managers.
The main purpose of this study is to analyze a dataset related to the times of stoppages due to equipment failures in an industry of the food sector in São Paulo State, Brazil. For this analysis, initially it will be used an variance model considering data transformed to a logarithmic scale to verify possible differences in average times due to several categorical factors. In a second analysis of the data, it will be used a multiple linear regression model considering the transformed data to confirm which factors are most important in the variability of the times of stoppages and also to be used in forecasts. A third analysis will be considered using reliability modeling techniques assuming the data in the original scale.
The study of the reliability of components and systems and possible causes that can lead to large losses is of great interest for the industries to minimize costs. The analysis of data related to the times up to failures of different equipment for food industry can lead to better strategies for maintenance for the different types of equipment and possible discoveries of factors that lead to better performance of equipment, such as sector, season, type of equipment, models of equipment, work periods, temperature, among many other factors.
Studies on the times up to failures and discoveries of factors that can increase these times are important for the industrial managers to make decisions that can mean significant gains for the industry (BRWON; KAHR; PETERSON, 1974;AL-NAJJAR, 1996).
Monitoring indicators are essential to measure and to verify the actual performance of the maintenance system. These indicators of reliability usually presented in the literature are: mean time to failure (MTTF) or mean time between failures (MTBF), mean time to repair (MTTR), and availability (A) (MENDES; RIBEIRO, 2015;ZADHOOSH;FATAHI, 2015). The MTTR, in this study denoted as stoppage times, includes required time for repairing failures troubleshooting, resolution downtime, repairs, and any tests that we need to ensure the elimination of the problem (TSAROUHAS; ARVANITOYANNIS; AMPATZIS, 2009).
The identification of appropriated probability distributions for the stoppage times is of great importance for better studies and consequently better inferences and predictions in industrial applications (MONTGOMERY; RUNGER, 2010).
There are two approaches to identify the distributions for the failure-repair data: 1) the empirical method, that derives the empirical distributions directly from the failure data, and does not require the estimation of the distribution's parameters; 2) theoretical distributions method, that focuses on identifying the candidate theoretical distribution, estimating the parameters and performing a goodness-of-fit test (TSAROUHAS; ARVANITOYANNIS; AMPATZIS, 2009).
In this way, a popular statistical lifetime model given by the Weibull distribution (a theoretical method) has been much used in the analysis of medical lifetime data (ACHCAR;BROOKMEYER;HUNTER, 1985) or for industrial lifetimes (BABUY; JAYABALAN, 2009;NELSON, 2004;MEEKER;ESCOBAR, 1998) given its great flexibility of fit. Reliability studies are related to the quality and industrial productivity, and have been the goal of many researchers and industrial engineers, since the competitiveness of an industry is associated with better reliability (NELSON, 2004;BILLINTON;ALLAN, 1983).
Based on the literature (TURRIONI; MELLO, 2012;CAUCHICK MIGUEL, 2010;YIN, 2005;BERTRAND;FRANSOO, 2002), methodologically this work could be classified as applied, objective and of descriptive quantitative approach. Bertrand and Fransoo (2002) define the quantitative research in production engineering such as the one where it is possible to model a problem that presents variables whose relationships are causal and quantitative.
In this sense, it becomes possible to quantify the behavior of the dependent variables in a specific field, enabling the researcher to make predictions. In general, the quantitative research uses mathematical modeling, statistical or computational (simulation) methods. In this paper, as research techniques, it will be used a bibliographic research and intensive direct observation, according to the classification of Lakatos and Marconi (2008) or the bibliographic research and case study, according to the classification of Gil (2008) and Yin (2005).
The article is organized as follows: in section 2, it is introduced the problem and a descriptive analysis of the data; In the section 3 it is introduced a statistical modeling for the stoppage times; In section 4 it is introduced a statistical analysis for the times of stoppages considering different statistical models; In section 5 it is introduced a discussion of the obtained results and some final considerations.

PRESENTATION OF THE PROBLEM AND DESCRIPTIVE ANALYSIS OF THE DATA
The studied food industry is located in the São Paulo State, southeast of Brazil and operates in the processing of tropical fruit for manufacture of concentrated juices.
The main raw materials of the company are: mango, guava, pineapple, passion fruit and acerola, being the product obtained classified as a semi-industrialized, once the processed product will serve as the raw material for manufacturers of ready-to-drink juices, jams, jellies and others.
Strategically located with facilitating logistics for agricultural supplies, proximity of various suppliers, labor availability and flow of production, and designed for internal market or export, the plant that served as the basis for the study has an estimated processing capacity of 500 tons of fruit per day, available in a period of months ranging from October to March each year. The company has a staff of approximately 200 employees where 150 are directly linked to the production process.
One of the characteristics of the processing fruit industries is the seasonality in the supply of raw material, leading to a carefully planning of industrial operations to seek strategies to optimize the maximum use of the needed resources: labor, raw materials, machinery, some other inputs and time.
In the constant search for competitiveness, companies seek to eliminate waste, increasing productivity, with fewer resources, more quickly, and therefore at a lower cost. For this reason, efficiency indicators of industrial processes have been implemented to facilitate the identification of weak points in the production line and thus enabling the deployment of operational strategies and management to ensure the highest possible productivity.
Another remarkable characteristic of the processing fruit industry is that the production process is classified as a continuous process having a physical arrangement for each different product, which leads to a system with a high production volume and a low variety of products.
Therefore, it is essential to have a high performance in the availability of equipment with minimization due to the loss of time due to failures in machines since consequently maintenance interventions usually represent a waste that can not be recovered besides costs not foreseen in advance.
A macro overview of business processes can be seen in Figure 1, which represents the simplified flow chart of the fruit processing. (utilities).The times of downtime may be affected by some factors such as type of equipment (E for electrical machine, M for mechanical machine) and work period (1 (first), 2 (second), 3 (third)).The descriptive statistics of these data are presented in Table 1. Figure 2 shows the box-plots graphs of the mean time between daily arrivals obtained using the MINITAB ® software version 14.
From the results in Table 1 and Figure 3, it is possible to infer that there is a great variability in the stoppage times due to failures and the industry has great interest in identifying the factors that affect this variability. This is also confirmed by the box-plots graphs in Figure 2, where we can observe that the distribution for the times of stoppage in the original scale has an asymmetrical form.
The application of descriptive statistics to the failure data is very effective for drawing conclusions with regard to the identification of most important failures (TSAROUHAS; ARVANITOYANNIS; AMPATZIS, 2009).
By these results, it is observed that the machines (equipment) and machine types have different average times due to breakage; this fact apparently also occurs for periods (the period 1 apparently has smaller times of equipment stoppages).
In Figure 3, it is presented the histograms for all stoppage times in the original scale and in the transformed scale (logarithmic scale). We observe better symmetry for the transformed data (an indication of approximate normality).
In Figure 4, it is presented the normal probability plot for the transformed data (logarithmic scale); from this graph we observe approximate normality for the transformed data (approximately linear relationship). In Figure 5, the box-plots of logarithms of the stoppage times are presented.  To find out which factors are significant in the stoppage times, we assume different statistical analyses for the stoppage data assuming different statistical models.

STATISTICAL MODELING FOR THE STOPPAGE TIMES
For a first statistical analysis of the stoppage times of the food industry assuming the data transformed to a logarithm scale, we use ANOVA (analysis of variance) methods to compare means of the different groups. Analysis of variance (ANOVA) is in a simplified way a collection of statistical models in which the observed variance in a response variable is partitioned into components attributable to different sources of variation. In this way, ANOVA provides a statistical test of whether or not the means of several groups are all equal, that is, a generalization of t-tests to more than two groups. Different approaches are presented in the literature for analysis of variance; the most common uses a linear model that relates the response to the treatments and blocks (BOX;HUNTER, 1978;MONTGOMERY, 2010). Using ANOVA with only one classification, the null hypothesis establishes that all groups are simply random samples of the same population. Rejection of the null hypothesis implies that different treatments or groups have different means. The normal-linear model based ANOVA analysis assumes independence, normality and homogeneity of the variances of the residuals.
As a second statistical analysis also considering the logarithms of the stoppage times, we use multiple linear regression models analysis. Regression analysis is used for prediction and forecasting in many scientific areas (DRAPER; SMITH, 1998;FOX, 1997;RAWLINGS;PANTULA;DICKEY, 1998). This approach provides concepts and methods for modeling and analyzing several variables by relating a dependent variable Yi ,i=1,…,n where n is the sample size with one or k fixed independent variables given in a vector denoted as xi = (x1i,….,xki). Indeed, in the regression models there is associated to the vector of independent variables, a vector of unknown regression parameters, denoted as β.
A general regression model is defined as, Yi = f(xi, β) + εi ( 1) where f(xi, β) is a specified function and the error term εi is a random variable assumed to have a specified probability distribution. This random error includes all other factors which could influence the dependent variable Y not included in the regression model. A particular regression model is a linear model, which is given by Yi = β0 + β1 x1i +….+ βk x1i + εi (2) for i=1,…,n where the vector of regression parameters is given by β = (β0 , β1 , ….βk) and the error term εi is a random variable assumed to be normally distributed with mean zero and constant variance 2 . The reliability function for time t* is given by, Note that (4) represents the probability that the stopping times are greater than a fixed value t* (t*≥0).
Assuming a Weibull distribution with probability density function (3), we have, The hazard function h(t) (or instantaneously failure rate) of the Weibull distribution (NELSON, 2004;MEEKER;ESCOBAR, 1998) is given from the relation h(t) = f(t)/R(t) by, Observe that if α=1, we have an exponential distribution, that is, the exponential distribution is a special case of the Weibull distribution. The hazard function h(t) given by (6) is strictly increasing for α> 1 (that is, the times of occurrence of events of interest are smaller in the terminology of industrial reliability), strictly decreasing for α< 1 (that is, the times of occurrence of events of interest are larger or last longer in the terminology of industrial reliability), and constant for α = 1. Thus, there is a great flexibility of fit for the data.
Also note that from model (7), the scale parameter λ defined in (3) is related with the covariates vector from the relationship, λi = exp(β0 + β1 x1i +….+ βk x1i) that is, the regression model defined by (7) defines a regression model in the scale parameter (LAWLESS, 1982) assuming the same shape parameter.

STATISTICAL ANALYSIS FOR THE STOPPAGE TIMES
Initially we consider an analysis of variance model with a classification considering the qualitative variables machines, types of machines, harvest and periods for the transformed data (logarithm of stoppages) using the MINITAB ® software.
In Table 2 we have those results. From the results of Table 2, it is observed that the period factor leads to some significant difference between the stoppage times (logarithmic scale) since the p-value is equal to 0.092 (significance with a 10% significance level) .The other factors do not show significant differences between the levels of each factor.
As a second statistical data analysis, we consider a multiple linear regression model assuming independent errors with normal distribution N(0, 2 ) with constant variance 2 .
The response variables are given by the logarithms of the stoppage times and the explanatory variables are given by: a "dummy" or categorical variable for the mechanical type of machine =1, 0= other part; a "dummy" or categorical variable for the harvest 1=1, 0=other part; a "dummy" or categorical variable for the machine M1=1, 0=other part; a "dummy" or categorical variable for the machine M2=1, 0=other part; a "dummy" or categorical variable for the machine M3=1, 0=other part; a "dummy" or categorical variable for the machine M4=1, 0=other part; a "dummy" or categorical variable for the machine M5=1, 0=other part; a "dummy" or categorical variable for the machine M6=1, 0=other part; a "dummy" or categorical variable for the period 1=1, 0=other part; a "dummy" or categorical variable for the period 2=1, 0=other part and the following multiple linear regression model (see (2)  In Table 3, we have the obtained estimates, the standard-errors of the least squares estimates (LSE), the obtained values for the t-Student distribution test and the p-values associated with each regression parameter.
From the results of Table 3, it is observed that the period factor leads to a significant difference between the stoppage times (logarithmic scale) since the p-value associated with the period 1 when compared with other periods is equal to 0.024 (significance with a 5% significance level equals), that is, the period 1 has smaller machines stoppage times since the regression coefficient associated with the variable "dummy" indicator of period 1 has negative signal. In the same way it is possible to detect some significance of the machine M2 since the observed p-value is equal to 0.055 (significance with a significance level close to 5%).
In Figure 6, we have the graphs of the residuals of the fitted regression model. It is observed that the assumptions for the validity of the inferences are verified (normality of the errors and constant variance). As a third analysis, we assume a Weibull regression model (see (7)) for the stoppage times defined by, log(ti) = β0 + β1 indicator harvest1i + β2 indicator machine M1i + β3 indicator machine M2i + β4 indicator machine M3i + β5 indicator machine M4i + β6 indicator machine M5i + β7 indicator machine M6i + β8 indicator period1i + β9 indicator period2i + β10 indicator machine typei + σ*εi (10) Where ti are the stoppage times and have an extreme value distribution (see (7)).
Also, note that the scale parameter λ defined in (3)  In practice, in general, we maximize the logarithm of the likelihood function in the determination of the maximum likelihood estimators (MLE).
For the data analysis of the stoppage times, we assume the Weibull distribution with density (3) and the regression model (10). From the software MINITAB version 14, we obtained the maximum likelihood estimators (see Table 4). From the results of Table 4, we also observe that the period factor leads to a significant difference between the stoppage times since the p-value associated with the period 1 when compared to the other periods is lower than 0.001 (significance with a 5% significance level) and the period 2 also indicates a significant difference (p-value equals to 0.011), that is, the periods 1 and 2 have smaller stoppage times of the machines.
In the same way it is possible to detect some significant difference between machines M2 and M3 since the observed p-values are respectively equal to 0.058 and 0.074 (significance at a 10% significance level).
Observe that the confidence interval for the shape parameter α includes the value 1, an indication that the exponential distribution also could be fitted to the data. From the results of Table 4, we see that the use of the Weibull distribution to analyze the data set (times until failure) in the original scale leads to greater sensitivity in the detection of significant effects than using a linear regression model for the transformed data (logarithm scale) with normal errors (that is, assuming a log-normal distribution for the original data).

DISCUSSION OF THE RESULTS OBTAINED AND CONCLUSIONS
The presented statistical analysis for the data from the Brazilian industry of the food sector can be of great interest in identifying the causes of the great variability in the stoppage times of different equipment used in the production line. Often these equipments are of great cost and the identification of possible factors that lead to the increase of these stoppage times, that is, less stopping occurrences in a fixed time period, can be of great industrial interest.
The use of different statistical modeling can lead to major gains in inferences and possible forecasts. This was observed in the case study using techniques of ANOVA (analysis of variance) and multiple regression models to the data transformed to a logarithm scale assuming normal errors with constant variance and Weibull regression models assuming the stoppage times in the original scale.
Using these different statistical models, it was possible to detect two main factors that affect the variability of the data: periods of the industry and machinery. In a future study, it would also be possible to incorporate this in these models other factors such as seasons of the year, temperature, relative humidity of the air between several other factors.
It is important to emphasize that the statistical approach taken in this article can bring benefits for the various production systems.