##plugins.themes.bootstrap3.article.main##

The Birnbaum–Saunders Distribution (BSD) is a positive Chile distribution that is often used to analyze life span data. Today, Birnbaum Sanders Distribution (BSD) has gained increasing popularity in various fields such as air pollution, business, earth sciences, industry and medicine. BSD is a special model due to its specificity. We have presented some results of reconstructed version of the BS distribution and a method of generating random numbers from this distribution. This article examines three regression models according on Birnbaum-Sanders distribution. The first model is derived from the distribution. The second model is derived from parameter re-averaging. The third model is derived through the logarithmic transformation of the response variable. The main purpose of this article is to compare the performance of three Birnbaum-Sanders regression models. Finally, an example of genuine data is presented to compare the regression models.

Introduction

The linear regression is one of the most widely used and effective statistical tools. This model is based on the method of the least squared error. It has been around for many years due to the relationship that the standard linear regression model has with various fields of knowledge, especially in statistics. It is noted that the model suggested by Gauss is not applicable in researches where the data do not provide indication of normality. This model includes the change of response to the display family related to the subject. Such a model is known as a generalized linear model (GLM).

There are models used to model data that has a positive range and is skewed, such as the gamma or inverse Gaussian distribution. In general, working with these distributions is not easy. Biparametric distribution is a distribution created from the univariate transformation of the standard normal distribution. It was introduced to analyze the data of Chula with positive amplitude. The parameters of this distribution are called scale and location. This distribution is more flexible than the known distributions for analyzing this type of data. Birnbaum-Sanders distribution is used to analyse the wear life of aluminum coupons, insurance data, wear time of bearings and data related to minerals in bones. We tried to study regressions based on BS distribution. There are three approaches to Bernbaum-Sanders distribution which have been presented by [1] and used to build flexible regression models. The first approach is based on the logarithmic transformation of the response variable.

Here we examine three approaches. The first approach is according on the parameterization of the BS distribution, which relates the mean of the distribution to the explanatory variables. The second approach is according on the logarithmic transformation of the response variable. The third approach directly relates the scale parameter of the BS distribution to the explanatory variables. It should be noted that the first and third approaches are equivalent, the difference mainly lies in applying a logarithmic transformation to the response variable. To compare the three regression methods, an example with material wear data is provided. The continuation of this article is organized as follows. In section 2, the distribution as well as the relevant regression models and, in particular, an algorithm that condenses the necessary steps for computational implementation are presented. Birnbaum and Saunders is a very important distribution that can be obtained for life span, which arose from a physical problem. Birnbaum-Saunders (BS) distribution which actually describes and defines the entire period of time studied that the type of cumulative damage caused by the initiation and growth of a dominant crack dominates the threshold and causes failure. This distribution is more flexible than known distributions for analyzing this type of data. Outside of the context of the reliability and usability of BSD distributions implemented in various other disciplines, more details about these programs and an extensive bibliographic review can be found in [2]. BSD is defined in terms of the standard normal distribution through the random variable.

We mentioned earlier that the Birnbaum-Saunders (BS) distribution has a positive skew distribution that is often used to analyze life span data. Which is widely and generally used for regression analysis in this context that more than auxiliary variables are involved in the life test. The Birnbaum-Saunders distribution (BSD) is considered to be positively skewed in the majority. The Generalized Birnbaum-Saunders (GBS) distribution are new class of positively skewed models with heavier and lighter tails than the standard Birnbaum-Saunders (BS) distribution which is mainly used to study data analysis and longevity. However, all parts of the theoretical reasoning and interesting features of the GBS model, arguably, enable its application beyond the analysis of lifetime data. This distribution is one-sided and we mentioned earlier that it has positive deviation, positive provision and two parameters related to its scale and shape [3]–[5]. Interest in the BS distribution is due to physical theoretical arguments, its attractive features, and its relationship to the normal model. Although the origin of BS distribution is material fatigue, it should be remembered that BS distribution has many applications, but most of its applications have been used in the following cases: agriculture, commerce, pollution, finance engineering, forest, food, and textile industries, and informatics, medicine, insurance, mortality, microbiology, pharmacology, psychology, nutrition, queuing theory, quality control, water quality, toxicology and wind energy.

Linear regression is one of the most widely used statistical tools that started on the least square method. Due to the relationship that the normal linear regression model has had with various fields of knowledge for many years. However, the model proposed by Gauss is not applicable in research where the data do not provide evidence of normality. Motivated to solve this problem in [5], they defined a wide class of models that have a linear structure. These models include the fact that the distribution of the response variable belongs to the exponential family, which gives useful properties to the class of models. The model is known as a generalized linear model (GLM). Among GLMs there are members that are used to model data that have a positive range and are skewed, such as the gamma or inverse Gaussian distribution. In general, working with these distributions is not an easy task. Biparametric distribution is a distribution that is created from the univariate transformation of the standard normal distribution. It was introduced to analyze the data of Chula with registration range. The parameters of this distribution are called location and scale [7]. This distribution has more flexibility than known distributions for analyzing this type of data. Among the examples in which Birnbaum-Sanders distribution is used, we can mention the wear life of aluminum coupons, analysis of insurance data, wear time of bearings and data related to minerals in bones. In this study, we are looking for regression based.

Let’s study and analyze the distribution of BS. In this regard, it is possible to use the three Birnbaum-Sanders distribution approaches that are presented according to [3] to build flexible regression models. The first approach is centered on the logarithmic revolution of the response variable [2]. The second approach is based on the parametrization of the BS distribution, which links the distribution mean to the explanatory variables [4]. The third approach, according to [2], directly defines the scale parameter of the BS distribution as Link explanatory variables. It should be noted that the first and second approaches are equivalent, the difference lies mainly in the application of logarithmic transformation for the response variable. To compare the three regression methods, an example with material wear data is presented.

Background of Birnbaum-Sanders Distributions

Data whose response falls in the interval (0, 1) such as indices, proportions, or rates appear very frequently in different fields of knowledge, mainly the areas of social sciences, engineering, economic sciences, and medicine. Some practical examples of these types of data are the proportion of patients who die from a certain disease or virus (SARS-CoV-2, Diabetes, HIV or Cancer) in a country or city; the Human Development Index or the illiteracy rate in a region or country; the proportion of deaths due to exposure to smoking or other exposure factors; the mortality rate from traffic accidents in a city; the percentage of items that do not meet the minimum requirements in an assembly line; and the portion of income that a family spends on entertainment. For the analysis of data such as those described above, statistical methodologies developed from distributions with support in the interval (0, 1) are required. In this sense, several probability distributions and regression models have been proposed; see Ferrari and Cribari-Neto [1], Kumaraswamy [2], Martínez-Flórez et al. [3], [4], Kieschnick and Mccullough [5], Mazucheli et al. [6], [7].

The BS distribution depends on the standard normal model. Here we use the following expression:

ϕ ( z ) = ( 2 π ) 1 2 = exp ( 1 2 z 2 )   a n d   Φ ( z ) = θ z ϕ ( u ) d u ; z ϵ R

We already mentioned that the BS distribution is continuous, unimodal and right-skewed, properties common to most lifetime distributions. The randomness of T, which follows the BS distribution, can be expressed as BSD(α,β). α is called the scale parameter and the β symptom is the place parameter. Using the Z=1/α[(T/β)1/2(β/T)1/2] notation, Z follows the standard normal functions.

T = β 4 [ α Z + ( α Z ) 2 + 4 ] 2

Here TBS(α,β). Then, the cumulative distribution function (CDF) for the random variable T that follows the two-parameter Birnbaum–Saunders distribution is as:

F T ( t ; α , β ) = Φ [ ( 1 α ) { ( t β ) 1 2 ( β t ) 2 3 } ] , 0 < t < α , β > 0

The density function of this distribution is expressed in the form:

F T ( t ; α , β ) = 1 2 2 π α β [ { ( t β ) 1 2 ( β t ) 2 3 } ] exp [ 1 2 a 2 ( t β + β t 2 ) ] , t , α , β > 0

In addition, the following characteristics are: E(T)=β/2(α2+2),var(T)=β2/2(5α4+4α2),αTBS(α,αβ) for α>0 and T1BS(α,β1). These characteristics are useful for various statistical purposes such as generating random numbers, estimating parameters and modeling based on regression.

One of the main motivations to study the RBS distribution is easier way to obtain statistical properties which are not possible through the classical parametrization. Such a reparameterization is useful because, first, moment estimates for the original parameterization of the BS distribution do not have a closed-form, but this is possible [9] and, second, it allows a response variable to be modeled in its original scale [8], which is not possible with the parameterizations proposed until now. The objectives of this chapter are: (i) to provide some results on moments of a reparameterized version of the BS distribution and a generator of random numbers; (ii) to propose estimators for this reparameterization; (iii) to study the performance of these estimators; and (iv) to apply the results to real data. The studied estimators are based on ML, moment, MM methods and (GMM).

Log-BS-Distribution

True random variable of Y value with density is function in the form:

f Y ( y ; α , β , σ ) = 2 α σ 2 π cosh ( y γ α ) exp [ 2 α 2 sinh 2 ( y γ α ) ]

It is defined that cosh(x) is displayed in the form of cosh(x)=(expx+expx)/2. Here α is the shape parameter, σ is the scale parameter and γ is the place parameter.

Regression Model

In this section, we explain regression model. In this model, the methods of determining the validity of regression models include comparison of predictions, analyzing lifetime data, model coefficients with theory, and collecting new data to check model predictions. Attempts have been made to compare the results with theoretical model calculations and data division or cross-validation in which the majority of data and sometimes a part of the data are used to estimate the model coefficients and the rest of the data are used to measure the accuracy of the model prediction. Below is a descriptive overview of these methods.

Multivariate regression models (MRMs) are widely used in engineering and finance research, health agricultural and biological sciences, mainly for two purposes: prediction and effect estimation. When building a regression model, various strategies and ways have been recommended:

  1. An appropriate statistical method that matches the data structure will be used.
  2. Prevent or overcorrect the model.
  3. By limiting the number of variables based on the number of events, an adequate sample size should be ensured.
  4. Always evaluate the performance of the final model against calibration and detection criteria. If resources allow, validate the prediction model on external data.
  5. Be aware of the pitfalls associated with automated variable selection methods (such as stepwise).

BS Regression Model

If t1,t2,t3,.tn are independent random variables, then TiBS(α,θi).

In this case, the observations corresponding to them are in the form of t=(t1,t2,t3,.tn)T.

In addition, p is a deterministic variable with values x=(1,x1,x2,,xp)T, and the vector of regression coefficients β=(β0,β1,β3,,βp)T is considered, so that θi=exp(xiTβ).

The BS regression model studied by Wafa et al. (2019) [9] can be written in the form:

T i = θ i ϕ i = e x p ( x i T β ) ϕ i

where ϕi BS(α,1) and Ti BS(α,θi).

Log-BS Regression Model

If t1,t2,t3,.tn are independent random variables, then TiBS(α,θi).

In this case, t=(t1,t2,t3,.tn)T are the observations corresponding to them.

Furthermore, Y1logBS(t1),Y2logBS(t2),.YnlogBS(tn) that is YilogBS(α,μi), where µi=log(θi) and y=(y1,t2,y3,yn)T. By applying the logarithm to the model, we have regression:

Y 1 log B S ( T i ) = µ i + ϵ i = x i T β + ϵ i , i = 1 , 2 , 3 , 4 , 5 , . n .

where µi=xitβ,ϵi=log(ϕi)logBS(α,0),β and xi are defined as in model (BS Regression model).

A New Parameterization for the BS Distribution

The RBS distribution is reparameterized with respect to the original parameterization by α=2/δ and β=δµ/[δ+1], such that δ=2/α2 and µ=β[1+α2/2], where δ>0 and µ>0 are shape and mean parameters, respectively. Motivations and justifications for this new parameterization are detailed in the next section. Thus, if Y RBS(µ,δ), then its PDF is given by:

f ( y ; µ , δ ) = exp ( δ 2 ) δ + 1 4 π µ y 3 2 [ y + δ μ δ + 1 ] exp ( δ 4 [ { δ + 1 } y δ μ + δ µ { δ + 1 } y ] ) , y > 0

From one can note that RBS and standard normal RVs are related by:

Y = δ µ δ + 1 [ Z 2 δ + ( Z 2 δ ) 2 + 1 ]   a n d
Z = δ 2 [ { δ + 1 } Y μ δ μ δ { δ + 1 } Y ]

Thus, from the above equation, the cumulative distribution function (CDF) and the quantile functin (QF) of Y ∼ RBS(µ,δ) are given as:

F ( y ; µ , δ ) = Φ ( δ 2 [ { δ + 1 } Y μ δ μ δ { δ + 1 } Y ] ) , y > 0 , a n d
y ( q ; µ , δ ) = F 1 ( q ) = δ µ δ + 1 [ z ( q ) 2 δ + { z ( q ) 2 δ } 2 + 1 ] 0 < q < 1 ,

where z(q) is the qth quantile of the standard normal distribution and F − 1 is the inverse CDF of Y. The hazard rate function of Y is defined by:

h ( y ; µ , δ ) = exp ( δ 2 ) δ + 1 4 π μ y 3 [ y + δ π δ + 1 ] exp ( δ 4 [ { δ + 1 } y μ δ + μ δ { δ + 1 } y ] ) Φ ( δ 2 [ { δ + 1 } t μ δ μ δ { δ + 1 } y ] ) , y > 0

Regression Models Based on BS Distributions

Rieck and Nedelman defined that if Y BS(α,β), then Z=log(Y) follows a log-BS distribution with shape parameter and location parameter γ=log(β)R, denoted by Zlog BS(α,γ). In this regression model, the original response must be transformed to a logarithmic scale. Thus, although in this scale one is modeling the mean, say γ=log(β), in the natural scale one is modeling β=exp(γ), which in the BS case corresponds to the median.

Reference [8] proposed a new approach to the BS regression model. Formally, Y1,,Yn are independent random variables, where Yi RBS(µi,δ), for i=1,,n, and y=[y1,,yn]. Then, a statistical model based on by the systematic can be defined as:

g ( µ i ) = η i = x i β , i = 1 , 2 , , n ,

where β=[β1,,βp], for p<n, is a vector of unknown parameters to be estimated, and xi=[1,xi2,,xip] represents the values of p regressors, such that µi=g1(xiβ), with g1() being the inverse function of g(·). In the model given in the link function g:RR+ is strictly monotone, positive, and at least twice differentiable; for example, g(µ)=log(µ) or g(µ)=µ.

Formally, Var[Yi] is a function of µi and, consequently, of the regressors xi. Then, because we are modeling the mean based on a particular structure, we are also modeling the variance. Therefore, situations in which the variance is not constant can be analyzed by using this model. The log-likelihood function of the model given in (g(µi)=ηi=xiβ) for θ=[β,δ]is (θ;y)=ni=i(µi,δ;yi), where

i ( µ i , δ ; y i ) = δ 2 1 2 l o g ( { δ + 1 } μ i y i 3 { δ y i + y i + δ μ i } 2 { δ + 1 } y i 4 μ i μ i δ 2 4 { δ + 1 } y i )

The score functions for βj, with j=1,,p, and δ are, respectively, given by:

U β j ( θ ) = ( θ ) β j = 1 = 1 n [ 1 2 µ i + δ { δ y i + δ y i + µ i } + { δ + 1 } y i 4 µ i δ 2 4 { δ + 1 } y i ] 1 g ( µ i ) x ij i = 1 n [ y i µ i ]   a i x i j

and

U δ ( θ ) = ( θ ) δ = i = 1 n δ 2 { δ + 1 } + { y i + µ i } { δ y + y i + δ µ i } y i 4 µ δ { δ + 2 } µ i 4 { δ + 1 } 2 y i = i = 1 n [ y i µ i ]

where g() is the derivative of g(). In matrix form, we have:

U β ( θ ) = X A [ y µ ]  and  U δ ( θ ) = 1 [ y µ ]

where yi=δ{δy+yi+δμi}+{δ+2}yi4µi2δ24μi,µi=12μi,yi={yi+µi}{δy+yi+δµi}yi4μiδ{δ+2}µi4{δ+1}2yi, μi=δ/2[δ+1]= I=[1,,1] is a vector of ones with n elements, X=(x1,,xn), with xi given in (g(µi)=ηi=xiβ), for i=1,,n,[yµ]=[{y1µ1},,{ynµn}],[yµ]=[{yy1µ1},,{ynµn}] and A=aiδijn= diag(a)is n×n matrice, with a = [a1,...,an]⊤ with ™i j n being the Kronecker Delta that is a function of the 3 arguments i,j and n. If i and j are equals then the function takes the value 1, otherwise is equal to zero. The value n is the order of the square matrix. Thus, the score vector is U(θ)=[Uβ(θ),Uδ(θ)].

RBS Regression Model

If t1,t2,t3,.tn are independent random variables then TiRBS(ηi,σ), i=1,2,3,4,5,.n and t=(t1,t2,t3,.tn)T are the observations corresponding to them. RBS regression model density function liquor has a systematic component in the form:

h ( η i ) = η i = x i β , i = 1 , 2 , 3 , n

In the above expression, ηi=h1(xiTβ) and h:RR+.

A function is a relation that is strictly uniform, positive and at least twice differentiable [6].

In this paper, h(η)=log(η) is used. It is noteworthy that the variance of Yi is a function of ηi. Therefore, although the mean is being modeled, the variance is also modeled by the simple fact that var(Yi)=ηi2/Φ is modeled.

Example of Material Exhaustion Data

The presented regression are applied on a real data set related to the failure time of ten thousand hardened steel pieces placed at four different stress levels. These data were taken from laboratories.

Princeton Mobil Research and Development Company was acquired at Princeton University in the United States of America.

Table I provides descriptive statistics for the downtime data set, which includes items such as central statistics, Coefficient of variation (CV), standard deviation (SD) and coefficient of skewness (CS).

Minimum Mean Median SD CS CV Maximum
0.013 3.76 0.85 7.43 3.2 198.2 37.5
Table I. Descriptive Statistics of the Response Variable (Losing Time)

Table I provides descriptive statistics for downtime data sets, including Standard deviation (SD) Coefficients Variation (CV), and Coefficient of Skewness (CS).

Table II shows the results of estimates, corresponding standard errors (CSE) and pvalues of the ttest for the parameters of the log-BS, BS and RBS regression model. In addition, AIC and BIC values associated with these models are reported. It can be seen from Table II that the estimates of β1 are completely similar. However, the estimates of β0 are different. The log-BS and BS models lower values than those observed for the log−BS model. That is, to fit the model, RBS and BS models are more suitable. Log-BS had Equal values of 0.0977, while the RBS model has an estimate equal to 0.6961.

Model Parameter Estimate Standard error p-value BIC AIC
α 1.279 0.1438 < 0.001
BS β 0 0.0978 0.1707 0.566 130.18 125.24
β 1 14.1163 1.5714 < 0.001
α 1.279 0.1438 < 0.001
Log-BS β 0 0.0977 0.1707 0.566 134.30 129.11
β 1 14.1163 1.5714 < 0.001
α 1.22 0.2748 < 0.001
RBS β 0 0.6961 0.1935 < 0.001 130.18 125.24
β 1 14.1170 1.5718 < 0.001
Table II. Estimates, ML, Standard Errors, and P-values of Specified Models for Downtime Data

It is noteworthy that the width from the origin in the log-BS and BS models is not statistically significant at the 5% level.

This means that this parameter is not influential in the model. In addition, the BIC and AIC values for the BS and RBS models are equal, but these values are lower than the values observed for the log–BS model. That is, RBS and BS models are more suitable for fitting the model.

Conclusion

In this research, an attempt has been made to compare three Birnbaum-Sanders regression models. These models are based on Birnbaum-Sanders, Birnbaum-Sanders and Birnbaum-Sanders logarithms. According to the real data related to the failure time, the values of AIC and BIC for RBS and BS models are equal to and less than the values observed for the log-BS model. That is, the fit of RBS and BS models are more suitable for these data. Therefore, three Birnbaum-Sanders regression models are compared and the numerical consequences of the performance of these regressions are reported and show that different models can be used when dealing with regression data with positive amplitude and following a skewed distribution to the right is appropriate.

References

  1. Ferrari S, Cribari-Neto F. Beta regression for modelling rates and proportions. J Appl Stat. 2004 Aug 1;31(7):799–815.
     Google Scholar
  2. Kumaraswamy P. A generalized probability density function for double-bounded random processes. J Hydrol. 1980 Mar 1;46(1–2):79–88.
     Google Scholar
  3. Martínez-Flórez G, Tovar-Falón R. Regression models based on the unit Sinh-normal distribution. Math. 2021;9:1231.
     Google Scholar
  4. Martínez-Flórez G, Azevedo-Farias RB, Tovar-Falón R. New class of unit-power-skew-normal distribution and its associated regression model for bounded responses. Math. 2022;10:3035.
     Google Scholar
  5. Kieschnick R, Mccullough BD. Regression analysis of variates observed on (0,1). Stat Model. 2003;3:193–213.
     Google Scholar
  6. Mazucheli J, Menezes AFB, Ghitany ME. The unit-Weibull distribution and associated inference. J Appl Probab Stat. 2018;13:1–22.
     Google Scholar
  7. Wafa MN, Hussani SA, Pazhman J. Evaluation of students’ mathematical ability in Afghanistan’s schools using cognitive diagnosis models. Eurasia J Math Sci Technol Educ. 2020;16(6):57–64.
     Google Scholar
  8. Wafa MN. Assessing school students’ mathematic ability using DINA and DINO models. Int J Math Trends Technol. 2019;65(12):153–65.
     Google Scholar
  9. Wafa MN, Zia Z, Frozan F. Consistency and ability of students using DINA and DINO models. Eur J Math Stat. 2023 Jul 6;4(4):7–13.
     Google Scholar


Most read articles by the same author(s)