15,269
Views
2
CrossRef citations to date
0
Altmetric
Research Article

One missing value problem in Latin square design of any order: Exact analysis of variance

ORCID Icon & | (Reviewing Editor)
Article: 1411222 | Received 04 Sep 2017, Accepted 25 Nov 2017, Published online: 18 Dec 2017

Abstract

This research proposes a simplified exact approach based on the general linear model for solving the K × K Latin square design (LSD) with one replicate and one missing value, given the lack of ready-made mathematical formulas for the sub-variance. Under the proposed scheme, the effects of the potential variable were determined by means of the regression sums of squares under the full and reduced treatment models. The mathematical expressions could be applied to the LSD with one missing value of any order. Moreover, the treatment, row and column sums of squares are unbiased.

Public Interest Statement

On the Fisher’s principles, a classical experimental design (e.g. one-way ANOVA, Latin square design (LSD), 2-level factorial design, factional factorial design, and so on) is a powerful methodology in order to explain causal mechanisms between independent variables and response variable by means of the identification of variation of data. LSD is of great use for analyzing one potential variable and two block variables. One missing experimental data could, however, pose significant challenges to the analysis. In this research, an incomplete LSD of any order with one missing experimental data was of the exact approach based on the general linear model. Due to the lack of ready-made formula, this research paper has thus proposed the explicit and mathematical formulae for the treatment sum of squares for ease of comparisons of mean squares, along with an F-test.

1. Introduction

In science and engineering, design of experiments (DOE) refers to the experimental situations or strategies for analysis of quantitative responses associated with the experimental units. DOE is classified into various types, including the classical DOE based on Fisher’s principles, Shainin experiment, Taguchi experiment. Specifically, the DOE based on Fisher’s principles involves randomization, replication and blocking (Montgomery, Citation2008). In the design and improvement of products and production, the role of experimentation is to identify the influencing factors (determinants) of the response variable and manipulate the determinants such that the response variable outcome closely resembles the desired nominal value. In fact, Fisher’s classical DOE is a form of statistical hypothesis testing under the analysis of variance (ANOVA) (Speed, Citation1992). Meanwhile, ANOVA is defined as a collection of statistical procedures to compare the between-group variation with the within-group variation (Montgomery, Citation2008).

A Latin square design (LSD) is an efficient design of experiments for three factors, whereby only one factor is of primary interest (i.e. the potential variable) while the other two (the nuisance variables or factors) are blocked to restrain extraneous variability in experimental units. The word “Latin square design” is abbreviated to “LSD” in this research. Latin letters are used to symbolize the level of the factor of primary interest. In the LSD, the levels of the two nuisance variables are identified with the rows and columns of a two-way table; every level of the factor of primary interest appears once in each column and once in each row; and the two-factor and three-factor interaction effects are assumed non-existent. Besides the randomized complete block design (RCBD), in which the effect of a single nuisance variable is blocked, the LSD also utilizes the blocking technique to separate the variations of nuisance variables from the experimental error. Unlike in the LSD, in the Latin rectangle design (Mead, Gilmour, & Mead, Citation2012) the numbers of columns and rows (blocks) are not identical for the two nuisance factors, and the Latin letters in each row (or column) can be replicated. In Youden (Citation1937), the Youden square design or the distinct Latin rectangle design was proposed whereby the number of blocks on one side is greater than the other side’s, and the number of treatments (Latin letters) is equal to the number of blocks of the former.

In a real scientific test under certain conditions, experimenters might face a difficult situation in which a set of experimental observations is not complete. The incomplete-observation situation can be commonly divided into two situations: (1) the initial intention to occur the incomplete observations due to a limitation on the number of experimental units, i.e. material units, articles, or subjects, (2) the accidental situation. The first situation can be the existence of balanced characteristic or unbalanced arrangement. For instance, Youden (Citation1937), Yates (Citation1936), and Ai, Li, Liu, and Lin (Citation2013), respectively proposed the Youden square design, the balanced incomplete block design (BIBD), and the balanced incomplete Latin square design (BILSD). Such a balanced arrangement can help make the ANOVA easier with the simple formulae to determine the treatment and error sums of squares. In the second situation which might occur from bad control of some variables, the reading values from experiment are abnormal or not observed. Hence, their values might be cut from a set of observations, leading to the unbalanced or asymmetrical arrangement. It is important to note that there is no certain formula for the ANOVA in the incomplete-observation experimental design

The work of Allan and Wishart (Citation1930) seems to be the earliest paper specifically considering the analysis of incomplete-data problem by means of the differentiation based on the overall mean. In Yates (Citation1933) and Sirikasemsuk (Citation2016a), the non-iterative and iterative missing plot techniques were proposed whereby the differential calculus was utilized to determine the missing experimental data with minimal error sum of squares. The estimates of the missing experimental data however contribute to an upward bias of the treatment sum of squares. Thus, the bias is determined and subtracted from the initial treatment sum of squares (Little & Rubin, Citation2002). In Coons (Citation1957), Cochran (Citation1957) and Wilkinson (Citation1958), the analysis of covariance (ANCOVA) technique was proposed for solving the incomplete-data experimental designs. In fact, the earliest paper with a reference to the ANCOVA was Bartlett’s (Citation1937).

Table tabulates existing methods to solve the incomplete-data experimental problems. However, the single imputation methods based on the mean (or mode) substitution, listwise deletion and pairwise deletion are excluded.

Table 1. Existing methods to solve the incomplete-data experimental problems

Many recent research studies considered aspects of combinatorics, examples of which were the studies on the construction of the orthogonal Latin squares by Zhang (Citation2013) and Donovan and Şule Yazıcı (Citation2014); and the studies on the completability of the incomplete Latin squares from the partial Latin squares by Euler (Citation2010) and Casselgren and Häggkvist (Citation2013).

In Table , all the methods, except the exact approach, must estimate the missing observations. As a matter of fact, the missing observations should never be estimated because the estimate values are not experiment-based. Thus, it is advisable that the exact approach with the general linear model be adopted to solve the incomplete-data experimental design problems (Montgomery, Citation2008; Sirikasemsuk, Citation2016a). Specifically, this research proposes a simplified exact approach (the general regression significant test) for the K × K LSD with one replicate and one missing experimental data, where K is the order of the LSD.

The organization of this research is as follows: Section 1 is the introduction. Section 2 details the general ANOVA table for a complete LSD with K × K order and the components. Section 3 deals with a K × K LSD with one missing data, the estimated parameter values of the full effect model and the regression sum of squares, while Section 4 concerns those of the reduced-treatment effect model and the regression sum of squares. Section 5 derives the simplified formulas of the sums of squares. The concluding remarks are provided in Section 6. The notations are provided in the Appendix.

2. Analysis of variance in complete K × K LSD

The full effect model of yijk, given the complete K × K LSD, is expressed as (1) yijk=μ+ωi+τj+λk+εijk(1)

where ɛijk is independently, identically and normally distributed, i.e. ɛijk ~ N(0, σ2).

Table presents an example of the LSD with K × K order whose components are summarized and tabulated using an ANOVA table, as shown in Table .

Table 2. The complete Latin square design of K × K order

Table 3. The ANOVA table for the complete K × K LSD

The sums of squares for a Latin square experiment are expressed as

(2) SStr=j=1Kτ^j2=j=1Ky¯·j·-y¯2(2)

(3) SSrow=i=1Kω^i2=i=1Ky¯i··-y¯2(3)

(4) SScolumn=k=1Kλ^k2=k=1Ky¯··k-y¯2(4) (5) SStotal=i=1Kk=1Kyi(j)k-y¯2(5) (6) SSE=SStotal-(SStr+SSrow+SScolumn)(6)

3. Incomplete LSD and regression sum of squares under the full model

For the missing-data LSD, the sums of squares in Equations (2)–(4) are invalid. The general regression significance test could instead be applied to the incomplete LSD for ANOVA. According to Montgomery (Citation2008), the computational formulas for the sums of squares of treatments, rows, columns and errors could respectively be expressed as(7) SStr=R(μ,ω,τ,λ)-R(μ,ω,λ)(7)

(8) SSrow=R(μ,ω,τ,λ)-R(μ,τ,λ)(8)

(9) SScolumn=R(μ,ω,τ,λ)-R(μ,ω,τ)(9)

(10) SSE=i=1Kk=1Kyi(j)k2-R(μ,ω,τ,λ)(10)

where R(µ, τ, λ) and R(μωτ) are the regression sums of squares of the reduced effect model of yijk, in which the effects of rows and columns are overlooked, respectively; and if one observation is missing, the degrees of freedom of SStotal and SSE in Table would respectively be K2 − 2 and K2 − 3 K + 1.

Meanwhile, the theoretical regression sum of squares of the full effect model of yijk is expressed as (11) R(μ,ω,τ,λ)=μ^y+i=1Kω^iyi··+j=1Kτ^jy·j·+k=1Kλ^ky··k(11)

In Sirikasemsuk (Citation2016b), the estimated values of all parameters (Equation (11)) of the incomplete LSD with one missing observation were derived and the regression sum of squares of the full effect model of yijk could be expressed as(12) R(μ,ω,τ,λ)=AlliKyi··2+AlljKy·j·2+AllkKy··k2K+(1-K)2y2-ysum\_m2+ysum\_m-2y2KK-1K-2(12)

where ysum_m = yr⋅⋅ + y⋅m⋅ + y⋅⋅c.

To find the treatment sum of squares (see Equation (7)), it is assumed that the treatment effects (τj) are not considered in Equation (1), i.e. τj = 0 for all values of j. The estimated μ,ωi, and λk will be substituted with μ^NT, ω^iNT, and λ^kNT instead of μ^, ω^i and λ^k. With the treatment effects of a single factor is of primary interest ignored, this linear statistical model of yijk is referred to as “the reduced-treatment effect model” in this research. Thus, its regression sum of squares, R(µ, ω, λ), can be expressed as Equation (13).

(13) R(μ,ω,λ)=μ^NTy+i=1Kω^iNTyi··+k=1Kλ^kNTy··k(13)

The estimated model parameters, i.e. μ^NT, ω^iNT and λ^kNT, will be later detailed in Section 4. It should be noted that the determination of the parameter estimates in R(μ, τ, λ) and R(μωτ) is similarly carried out for R(μωλ) in Section 4. The expressions of R(μ, τ, λ) and R(μωτ),  including their parameter estimates, are not demonstrated in this research.

4. Estimated values of all parameters and regression sum of squares under the reduced-treatment model

With the exact approach, it is necessary to find the estimates of the fitted values of the reduced-treatment effect model prior to R(μωλ) according to Equation (13). In addition, the parameter estimates for the reduced-treatment effect model can be divided into two categories: The first category refers to the parameter estimates directly influenced by the missing value, i.e. μ^NT, ω^rNT and λ^cNT (see Proposition 1), while the second category consists of the remaining parameter estimates directly unaffected by the missing value, which can be derived and shown in Equations (22)–(23).

Proposition 1: In the K × K LSD with one missing experimental data, the estimates of the fitted parameters: μ^NT, ω^rNT and λ^cNT, in the reduced-treatment effect model can be determined by(14) μ^NT=(K-2)y+yr··+y··cK(K-1)2(14) (15) ω^rNT=(K-1)μ^NT+yr··-yK(15)

and(16) λ^cNT=(K-1)μ^NT+y··c-y···K(16)

Proof. Based on the restricted assumptions, i.e. AlliKω^i=0 and AllkKλ^k=0, the least square normal equations for the reduced-treatment effect model in which the parameter estimates are directly influenced by the missing-experimental-data position can be expressed as(17) μ:K2-1μ^NT-ω^rNT-λ^cNT=y(17) (18) ωr:K-1μ^NT+K-1ω^rNT-λ^cNT=yr··(18) (19) λc:K-1μ^NT-ω^rNT+K-1λ^cNT=y··c(19)

Multiplying (K − 2) on both sides of Equation (17), then adding Equations (18) and (19), and rearranging, the parameter estimate of μ is expressed in Equation (14). The fitted parameters ω^rNT and λ^cNT in Equations (15) and (16) can be easily solved from Equations (18) and (19). This completes the proof. 

In the second category of the reduced-treatment effect model in which the treatment effect is ignored, the normal equations can be expressed as(20) ωi:Kμ^NT+Kω^iNT+k=1Kλ^kNT=yi··(20) (21) λk:Kμ^NT+i=1Kω^iNT+Kλ^kNT=y··k(21)

where i ≠ r and k ≠ c. The remaining parameter estimates are subsequently determined as(22) ω^iNT=yi··K-μ^NT(22) (23) λ^kNT=y··kK-μ^NT(23)

where i ≠ r, k ≠ c; and μ^NT in Equations (22) and (23) is substituted with Equation (14). It is noted that the fitted parameters ω^iNT and λ^kNT in Equations (22) and (23) can be easily solved from Equations (20) and (21).

Proposition 2: In the K × K LSD with one missing experimental data, the regression sum of squares for the reduced-treatment effect model of yijk can be expressed as(24) R(μ,ω,λ)=Alli=1Kyi··2+Allk=1Ky··k2K+yr··+y··c-yK(yr··+y··c)+(K-2)yKK-12(24)

Proof. The determination of R(μ,ω,λ) can be carried out in a similar fashion to that of R(μ,ω,τ,λ) in the paper of Sirikasemsuk (Citation2016b) and is presented as below.

Substituting Equations (15), (16), (22) and (23) into Equation (13), we obtain(25) R(μ,ω,λ)=Alli=1Kyi··2+Allk=1Ky··k2K+Kμ^NTyr··+y··c-yKyr··+y··c-μ^NTy(25)

Substituting Equation (14) in Equation (25) together with the algebraic simplification yields Equation (24). This completes the proof. 

5. Sums of squares for incomplete LSD with one missing experimental data

Proposition 3: In the K × K LSD with one missing experimental data, the sums of squares for the treatments, rows, and columns can be determined as

(26) SStr=AlljKy·j·2K+1(K-1)×yr··+y·m·+y··c-y2K-2-yr··+y··c-y2K-1+y2y·m·-yK(26)

(27) SSrow=AlliKyi··2K+1(K-1)×yr··+y·m·+y··c-y2K-2-y·m·+y··c-y2K-1+y2yr··-yK(27)

(28) SScolumn=AllkKy··k2K+1(K-1)×yr··+y·m·+y··c-y2K-2-yr··+y·m·-y2K-1+y2y··k-yK(28)

Proof. Based on Equation (7), the treatment sum of squares (SStr) in Eq (Equation26) can be derived by subtracting Equation (24) in Preposition 2 from Equation (12). The determinations of the row and column sums of squares are similarly carried out for SStr above. This completes the proof. 

An attracting illustration is given by an elongation experiment (Ott & Longnecker, Citation2010) which was laid out in a 5 × 5 LSD as shown in Table . There were five different versions of the stockings (treatments) by each of five investigators on five separate days.

Table 4. Elongation data

The treatment, column, row and error sums of squares without bias can be easily calculated as presented in Table . In addition, the sum square of treatment using the missing plot technique is biased and cannot be used immediately in the ANOVA table, according to Ott and Longnecker (Citation2010).

Table 5. The analysis of the variance

6. Summary

An incomplete LSD normally results in an unbalanced design, rendering the conventional sums of squares formulas invalid. Despite the prevalence of the missing-value techniques, the estimate of the missing experimental data is not experiment-based. Meanwhile, the existing exact approach failed to provide the simplified and straightforward mathematical formulas for the sub-variance calculation. This research has thus proposed the simplified exact approach based on the general linear model for solving the K × K LSD with one replicate and one missing experimental data. Under the proposed exact approach, the effects of the potential variable (the factor of primary interest) were determined by means of the regression sums of squares of the full and reduced-treatment models. In addition to the ease of computation, the treatment, row and column sums of squares are unbiased. More importantly, the mathematical expressions could be applied to the LSD with one missing experimental data of any order.

Funding

The authors received no direct funding for this research.

Additional information

Notes on contributors

Kittiwat Sirikasemsuk

Kittiwat Sirikasemsuk is an assistant professor in Industrial Engineering, Faculty of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Thailand. He received Doctor of Philosophy (PhD) from Industrial Systems Engineering, Asian Institute of Technology, Thailand, in 2013. He has extensive experiences in lean manufacturing and various quality engineering techniques. His research activities cover a wide range of area in: design of experiments, supply chain design, measures of bullwhip effect in supply chains, quality engineering, lean manufacturing, etc.

Kanogkan Leerojanaprapa

Kanogkan Leerojanaprapa is a lecturer in Statistics department, King Mongkut’s Institute of Technology Ladkrabang, Thailand. She holds a PhD in Management Science since 2014 from University of Strathclyde, UK. Her research area includes quality control, risk analysis, supply chain management, bayesian network, and applications of statistical models. She has published articles in various peer reviewed international journals and conferences.

References

  • Ai, M., Li, K., Liu, S., & Lin, D. K. (2013). Balanced incomplete Latin square designs. Journal of Statistical Planning and Inference, 143(9), 1575–1582.10.1016/j.jspi.2013.05.001
  • Allan, F. E., & Wishart, J. (1930). A method of estimating the yield of a missing plot in field experimental work. The Journal of Agricultural Science, 20(3), 399–406.10.1017/S0021859600006912
  • Bartlett, M. S. (1937). Some examples of statistical methods of research in agriculture and applied biology. Supplement to the Journal of the Royal Statistical Society, 4, 137–183.10.2307/2983644
  • Casselgren, C. J., & Häggkvist, R. (2013). Completing partial Latin squares with one filled row. Column and Symbol. Discrete Mathematics, 313, 1011–1017.10.1016/j.disc.2013.01.019
  • Cochran, W. G. (1957). Analysis of covariance: Its nature and uses. Biometrics, 13(3), 261–281.10.2307/2527916
  • Coons, I. (1957). The analysis of covariance as a missing plot technique. Biometrics, 13(3), 387–405.10.2307/2527922
  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39, 1–38.
  • Donovan, D. M., & Şule Yazıcı, E. (2014). A polynomial embedding of pairs of orthogonal partial Latin squares. Journal of Combinatorial Theory, Series A, 126, 24–34.10.1016/j.jcta.2014.04.003
  • Euler, R. (2010). On the completability of incomplete Latin squares. European Journal of Combinatorics, 31, 535–552.10.1016/j.ejc.2009.03.036
  • Healy, M., & Westmacott, M. (1956). Missing values in experiments analysed on automatic computers. Applied Statistics, 5, 203–206.10.2307/2985421
  • Kramer, C. Y., & Glass, S. (1960). Analysis of variance of a Latin square design with missing observations. Applied Statistics, 9, 43–50.10.2307/2985758
  • Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken: John Wiley & Sons.10.1002/9781119013563
  • Mead, R., Gilmour, S. G., & Mead, A. (2012). Statistical principles for the design of experiments: Applications to real experiments, Vol. 36. Cambridge: Cambridge University Press.10.1017/CBO9781139020879
  • Montgomery, D. C. (2008). Design and analysis of experiments (7th ed.). New York, NY: John Wiley & Sons.
  • Ott, R. L., & Longnecker, M. (2010). An introduction to statistical methods and data analysis (6th ed.). Boston: Cengage Learning.
  • Rubin, D. B. (1972). A non-iterative algorithm for least squares estimation of missing values in any analysis of variance design. Applied Statistics, 136–141.10.2307/2346485
  • Rubin, D. B. (1987). Multiple imputation for non-response in surveys. Hoboken: John Wiley & Sons.10.1002/SERIES1345
  • Sirikasemsuk, K. (2016a). A review on incomplete latin square design of any order. In N. Rusli, W. M. Zaimi, K. A. Khazali, M. J. Masnan, W. S. Daud, N. Abdullah, …, Y. N. Yusuf (Eds.), AIP Conference Proceedings 2016, (vol. 1775, pp. 030022). Melville: AIP Publishing.
  • Sirikasemsuk, K. (2016b). One missing value problem in Latin square design of any order: Regression sum of squares. In 2016 Joint 8th International Conference on Soft Computing and Intelligent Systems (SCIS) and 17th International Symposium on Advanced Intelligent Systems (pp. 142–147). Japan: IEEE Press.
  • Speed, T. P. (1992). Introduction to fisher (1926) the arrangement of field experiments. In S. Kotz & N. L. Johnson (Eds.), Breakthroughs in statistics, Vol. 2 (pp. 71–81). New York, NY: Springer.10.1007/978-1-4612-4380-9
  • Wilkinson, G. N. (1958). Estimation of missing values for the analysis of incomplete data. Biometrics, 14(2), 257–286.10.2307/2527789
  • Yates, F. (1933). The analysis of replicated experiments when the field results are incomplete. Empire Journal of, Experimental Agriculture, 2(1), 129–142.
  • Yates, F. (1936). Incomplete randomized blocks. Annals of eugenics, 7, 121–140.10.1111/j.1469-1809.1936.tb02134.x
  • Youden, W. J. (1937). Use of incomplete block replications in estimating tobacco-mosaic virus. Contributions from Boyce Thompson Institute, 9(1), 41–48.
  • Zhang, H. (2013). 25 new r -self-orthogonal Latin squares. Discrete Mathematics, 313, 1746–1753.10.1016/j.disc.2013.04.021

Appendix

In this current research, the notations and their respective definitions are provided below:

yijk=

the ijkth observation taken under row i, column k and treatment j

i=

index of rows (i = 1, 2, 3, …, K)

j=

index of treatments (j = 1, 2, 3, …, K)

k=

index of columns (k = 1, 2, 3, …, K)

K=

the order of LSD

μ=

the common effect or the overall mean of the observations

ωi=

the ith row effects

τj=

the jth treatment effects

λk=

the kth column effects

ɛijk=

the normally distributed zero-mean random errors in the ijkth observation

μ^=

the estimate of the parameter of μ

ω^i=

the estimate of the parameter of the ith row effect

τ^j=

the estimate of the parameter of the jth treatment effect

λ^k=

the estimate of the parameter of the kth column effect

y=

the grand total

y..k=

the kth column total

y.j.=

the jth treatment total

yi..=

the ith row total

SStr=

the treatment sum of squares

SSrow=

the row sum of squares

SScolumn=

the column sum of squares

SSE=

the error sum of squares

SStotal=

the total sum of squares

μ^NT=

the estimate of μ for the reduced model ignoring the treatment effect

ω^iNT=

the estimate of ωi for the reduced model ignoring the treatment effect

λ^kNT=

the estimate of λk for the reduced model ignoring the treatment effect

r=

index of the row in which the observation is missing

m=

index of the treatment (letter) in which the observation is missing

c=

index of the column in which the observation is missing

ω^r=

the parameter estimate of the rth row effect for the full effect model

τ^m=

the parameter estimate of the mth treatment effect for the full effect model

λ^c=

the parameter estimate of the cth column effect for the full effect model

ω^rNT=

the estimate of ωr for the reduced model ignoring the treatment effect

λ^cNT=

the estimate of λc for the reduced model ignoring the treatment effect

R(μ,ω,τ,λ)=

the regression sum of squares for the full effect model of yijk

R(µ, ω, λ)=

the regression sum of squares for the reduced-treatment effect model of yijk