665
Views
0
CrossRef citations to date
0
Altmetric
STATISTICS

New two-way discrete frequency table with application to English Premier League data

ORCID Icon, , , &
Article: 2063538 | Received 01 Nov 2020, Accepted 02 Apr 2022, Published online: 24 Apr 2022

ABSTRACT

A bivariate discrete frequency table, one of the significant exploratory data analysis (EDA) tools, organizes data systematically. The existing frequency table is straightforward, but when the number of elements in the data is large enough, the table can be complicated. In this research, we proposed a new bivariate discrete frequency table by grouping the elements in each variable. The table can be constructed using the R code provided with the article. We described the table using simulations from the bivariate binomial distribution, bivariate Poisson distribution. Real data, obtained from the English Premier League website, is also used to illustrate the new table. The findings indicated that the proposed bivariate frequency table provides a better alternative when the number of elements is substantial and reveals the essential data features.

1. Introduction

Data are more attractive and capture the minds of people if depicted in either tabular or graphical form. The tabular representations are precise and provide the reader with apparent features of the data; however, the graphical representations have more visual significance since they are useful in detecting patterns in a dataset (Beniger & Robyn, Citation1978; Davies, Citation1929; Gelman, Citation2011; Gelman et al., Citation2002; Kastellec & Leoni, Citation2007; Xu & Wang, Citation2020). The hidden raw data features can only be uncovered if the data is organized in a meaningful form, such as a frequency table. A frequency table partitions raw data into classes of appropriate sizes, displaying observations and their respective number of occurrences (Kenney, Citation1939; Manikandan, Citation2011; Mohammed, Adam, Ali et al., Citation2020). Generally, the main reason for summarizing raw data is to explore the extra information therein. It is also easier to understand the underlying distribution, the features of variables, and know the statistical tool to be used for inference.

Data obtained as a result of measurements such as length, height, weight, or temperature, assume values within interval or range. Such measured observations are called continuous data. If x1,x2,,xn are continuous observations, xˉCD, MedianCD and the modeCD. Continuous data take values within a given interval and generally are measured values such as the amount of rainfall, length, or area, whereas discrete data are whole numbers (Gardiner et al., Citation1979). A set of discrete data is often obtained by counting or enumeration, while continuous data are usually obtained through measurement (Fisher & Marshall, Citation2009; Kenney, Citation1939). Discrete data are countable finite observations and the table that summarizes the discrete data. The elements are the natural classes; there are no class limits and class boundaries (Gravetter et al., Citation2020; Kenney, Citation1939). The discrete frequency table is classified into two, based on the number of variables. A table that organizes data on only a single discrete variable is known as a univariate discrete frequency table. Meanwhile, a bivariate discrete frequency table is a table that displays data on two joint discrete variables.

The existing bivariate discrete frequency table is straightforward and very significant. However, when the number of elements in the joint discrete data is large enough, it leads to a very long table that can be difficult to handle. In this research, we proposed a new bivariate discrete frequency table containing datasets with a large number of elements. The table can be constructed by grouping the elements in the joint discrete data.

2. Bivariate discrete frequency table

Let (x1,y1),(x1,y1),,(xn,yn) be n pairs of discrete observations of variables X and Y, the existing frequency table is given as Table . The notations, m1 and m2,, respectively, denote the number of elements in the two joint discrete datasets, xi, i=1,2,,m1 denote the elements of variable X displayed in the columns and yj, j=1,2,,m2 are the elements of the second variable Y presented in the rows, fij is the joint frequency of variables X and Y in cell ij.

Table 1. Typical bivariate discrete frequency table

Number of classes

The number of classes for the continuous frequency tables is mainly dependent on the size of the data and several scientific rules, such as the rules proposed by Sturges (Citation1926), Cochran (Citation1954), Doane (Citation1976), Scott (Citation1979), and Freedman and Diaconis (Citation1981), can be used to determine the number of classes. Meanwhile, the number of classes for the discrete frequency tables depends on the number of data elements. Thus, when the number of elements in a dataset is small, no matter how big is the dataset, the frequency table will have a small number of classes. Whereas, when the number of elements is large, irrespective of the data’s size, the frequency table will have a large number of classes (Mohammed, Adam, Zulkafli et al., Citation2020).

3. Proposed bivariate discrete frequency table

The proposed table can be constructed by grouping the elements into classes, as shown in Table . The simplest case is when the elements of the two variables are grouped into two. The table can be described using three different cases, as, respectively, illustrated using Tables , and Table . The first case is when the number of elements in both the variables is even, the table is complete. Meanwhile, the second case is when the number of elements for the first variable is even and that of the second variable is odd, the table is incomplete, the last class of the second variable has a single element. The third case is when the number of elements in the first variable is odd while the second variable is even, the table is incomplete. The last class of the first variable will have a single element. if m1 and m2 are both less than 10, m1<10 and m2<10, in the existing bivariate frequency table, Table can be used; modification is not necessary.

Table 2. General g-element bivariate discrete frequency table

Table 3. Bi-element bivariate discrete frequency table with both m1 and m2 even

Table 4. Bi-element bivariate discrete frequency table with m1 even and m2 odd

Table 5. Bi-element bivariate discrete frequency table with m1 odd but m2 even

The values c1 and c2, are, respectively, the number of classes for variables X and Y, m1 and m2 are the number of elements in the two joint datasets, n1i and n2j, i=1,2,,m1, j=1,2,,m2 are the number of elements in each class of the two variables. Also, xmo1,xmo2,,xmoc1 and ymo1,ymo2,,ymoc2 are the modes of the classes of the two variables, they represent the magnitude of observations in each class. The proposed bivariate frequency table is complete if m1modg=0 and m2modg=0. This implies n11=n12=n1c1=g and n21=n22==n2c2=g. However, when either m1modg0 or m2modg0, table is incomplete. This condition results to either n1c1g or n2c2g. That is, the number of elements of the last class of either of the two variables or both have a different number of elements.

Mode of the proposed frequency table

When dealing with the discrete data, the mode is the most suitable measure of location. In the proposed table, the mode (Mo) is used to represent the observations in each class. It is possible to have more than one mode in a class when two or more observations have the same highest frequency. The modes xmo1,xmo2,,xmoc1 and ymo1,ymo2,,ymoc2, of the classes of the two variables, which represent the magnitude of observations in each class, are the elements in the classes which occurred the most. Since the existing class rules are applied to the continuous frequency tables, there are no existing rules for the discrete case. Therefore, we derived a rule for grouping the elements into class intervals for the proposed discrete frequency table. The grouping criteria consider the neighboring elements, either in ascending or descending order since they have similar characteristics. The idea behind grouping the elements is to get a manageable table since a substantial number of elements in the data results in a very long table, which cannot be easily handled. In the proposed frequency table, all the classes can have an equal number of elements, but sometimes either the first or last class may have different elements. If the number of elements in both the two variables is less than 10, there is no need for grouping.

Modifying the Cochran (Citation1954) rule with the number of elements instead of the sample size (n), we derived a formula for grouping the elements (g) as

(1) g=1,ifm<10m5,ifm10(1)

where m=max(m1,m2) and g is the grouping number. The proposed bivariate discrete frequency table can be constructed in the R package using the code given in the appendix.

To describe the proposed table, we performed two simulation studies using bivariate binomial and Poisson distributions. Two joint discrete variables X and Y are said to have a bivariate binomial distribution if their probability density function is given by

P(X=x,Y=y)=fXY(x,y;πx,πy,n),
=fX(x;πx,n)fY(y;πy,n),
=nxπxx(1πx)nxnyπyy(1πy)ny,

where πxx and πyy are, respectively, the first and second successes, and n is the common number of trials. The EX=nπx, VarX =nπx(1πx), Ey =nπy, Var(Y) = nπy(1πy). Meanwhile, the bivariate Poisson distribution is given by

P(X=x,Y=y)=fXY(x,y;μ1,μ2,μ3),
=e(μ1+μ2+μ3)μ1xx!μ2yy!kxkykk!μ3μ1μ2k.

The notations, μ1, μ2, μ3, the parameters of the distribution, are positive real numbers, k is an integer between 0 and min(x,y). The mean and variance of variable X are equal, that is, EX=VarX=μ1+μ3. So also the mean and variance of variable Y, EY=VarY=μ2+μ3. The covariance of X and Y is given as CovXY=μ3 Karlis and Ntzoufras (Citation2003).

4. Results and discussion

Simulation

In this study, to observe the pattern of the bivariate discrete frequency table and illustrate the proposed table, we performed simulations using bivariate binomial and Poisson distributions. Two different studies using 100 samples from bivariate binomial distribution both with parameters πx=0.5, πy=0.5, but different n, n=20 and n=50 are carried out. The third study used 100 samples of sizes 1000 from bivariate Poisson distribution with parameters μ1=2.5, μ2=3.5, and μ3=2.5, meanwhile, the fourth used 100 samples of sizes 1000 from bivariate Poisson distribution with parameters μ1=6.5, μ2=5.5, and μ3=4.5.

The first study shows that the number of elements for variables X and Y are respectively within the range 13,,18, and 13,,17. The least joint frequency among the 100 samples is 0, while the maximum frequency is 89. One sample is used to construct the existing bivariate discrete frequency table, Table , and to describe the bi-element bivariate discrete frequency table, Table . The elements in the bivariate discrete data are partitioned into two, as suggested by EquationEquation (1). The classes of variable X have an equal number of elements; hence, the classes are complete. Meanwhile, all the variable X classes have the same number of elements except the last class; therefore, the classes are incomplete. The class mode s, xmo and ymo, represent the magnitude of observations in each class of the two variables.

Table 6. Bivariate discrete frequency table constructed using a sample of size 1000 simulated from the bivariate binomial distribution with parameters n=20, πx=0.5, πy=0.5

Table 7. Bi-element bivariate discrete frequency table constructed using a sample of size 1000 simulated from the bivariate binomial distribution with parameters n=20, πx=0.5, πy=0.5

For the second study from the bivariate binomial distribution, the pattern indicates that the numbers of elements for variables X and Y are both in the interval 19,,25, and the minimum joint frequency is 0, and the maximum frequency is 39. Again, a sample is used to depict the existing bivariate discrete frequency table, Table , and to illustrate the tri-element bivariate discrete frequency table, Table . As suggested by EquationEquation (1), the elements in the sample data are grouped into three. Both variables’ classes have an equal number of elements; hence, the proposed table is complete. The class modes, xmo and ymo, represent the magnitude of observations in each class of the two variables. The class modes are the elements in each class that occurred the most. A class could have more than one element as a mode if two or more elements appeared equally.

Table 8. Bivariate discrete frequency table constructed using a sample of size 100,000 simulated from the bivariate binomial distribution with parameters n=50, πx=0.5, πy=0.5

Table 9. Tri-element bivariate discrete frequency table constructed using a sample of size 100,000 simulated from the bivariate binomial distribution with parameters n=50, πx=0.5, πy=0.5

The third study using data from bivariate Poisson distribution with Parameters μ1=2.5, μ2=3.5, and μ3=2.5 shows that the numbers of elements for variable X and Y are, respectively, in the interval 14,,19 and 11,,17. Meanwhile, the smallest frequency of the 100 samples is 0, whereas the highest frequency is 49. The existing and the bi-element bivariate discrete frequency tables, and , are both constructed using one of the samples. EquationEquation (1) suggested partitioning the elements in the sample data into two. All the variable X classes have an equal number of elements except the last class; hence, the classes are incomplete. Whereas variable Y classes have the same number of elements; hence, the classes are complete. Class modes xmo, and ymo, which represent the magnitude of observations in the variables’ classes, are the elements in each class that occurred the most.

Table 10. Bivariate discrete frequency table constructed using a sample of size 1000 simulated from the bivariate poisson distribution with parameters μ1=2.5, μ2=3.5, and μ3=2.5

Table 11. Bi-element bivariate discrete frequency table constructed using a sample of size 1000 simulated from the bivariate poisson distribution with parameters μ1=2.5, μ2=3.5, and μ3=2.5

The fourth simulation study using a bivariate Poisson distribution shows that the number of elements for variable X and Y are, respectively, within the interval 18,,29 and 18,,27. While the least joint frequency of the 100 samples is 0 and the maximum frequency is 32. One sample is used to construct the existing bivariate discrete frequency table, , and depict the tri-element bivariate discrete frequency table, . As suggested by EquationEquation (1), the elements in the sample data are grouped into three. The last class of variable X has a different number of elements; hence, the variable classes are incomplete. While all variable Y classes have an equal number of elements; therefore, the classes are complete.

Table 12. Bivariate discrete frequency table constructed using a sample of size 1000 simulated from the bivariate poisson distribution with parameters μ1=6.5, μ2=5.5, and μ3=4.5

Table 13. Tri-element bivariate discrete frequency table constructed using a sample of size 1000 simulated from the bivariate poisson distribution with parameters μ1=6.5, μ2=5.5, and μ3=4.5

Moreover, the proposed bivariate discrete frequency table is demonstrated with the first having a different number of elements, using data from bivariate Poisson distribution with Parameters μ1=2.5, μ2=3.5, and μ3=2.5. is the existing table, while is the proposed table with elements partitioned into three and the first class of the table having a different number of elements.

Table 14. Bivariate discrete frequency table constructed using a sample of size 1000 simulated from the bivariate poisson distribution with parameters μ1=6.5, μ2=5.5, and μ3=4.5

Table 15. Tri-element bivariate discrete frequency table, where the first class is having a different number of elements, constructed using a sample of size 1000 simulated from the bivariate poisson distribution with parameters μ1=6.5, μ2=5.5, and μ3=4.5

Application

Moreover, to illustrate the proposed table using real data, we used the English Premier League Team . The data, which covers seasons 2006/2007 to 2017/2018, was obtained from the English Premier League website and deposited on the Kaggle website. The data contains 41 variables and 240 observations, but only two variables, the number of wins and clean sheets, are used in this study. presents the bivariate discrete frequency table of the number of wins and clean sheets for 12 English Premier League seasons. Meanwhile, displays the tri-element bivariate discrete frequency table. The variables X and Y respectively represent the number of clean sheets and wins. In all the seasons, Manchester City recorded the highest number of wins 32 with 18 clean sheets, followed by Chelsea with 30 wins and 16 clean sheets. The least performed club in all the seasons is Derby County, with only one win and three clean sheets. Using EquationEquation (1), the proposed bivariate frequency table, , is constructed by grouping the elements into three, tri-element. Both variables’ classes have different elements in the last class; hence, the proposed table is incomplete. The class modes, xmo and ymo, represent the magnitude of observations in each class of the two variables.

Table 16. Bivariate discrete frequency table constructed using data on the number of wins and clean sheets for English Premier League clubs from season 2006/2007 to 2017/2018

Table 17. Tri-Element bivariate discrete frequency table constructed using data on the number of wins and clean sheets for English Premier League clubs from season 2006/2007 to 2017/2018

In the proposed table, , the number of clean sheets and wins are grouped into three. Only one club recorded wins in the interval 1,3,4 with clean sheets in the interval 2, 3, 4, two clubs having a number of wins in the interval 1, 3, 4, with clean sheets in the interval 5, 6, 7. Up to the last wins class where we have one club having a number of wins in the interval 30,32 with a number of clean sheets in the interval 14, 15, 16. So also, only one club had a number of wins and clean sheets in the intervals 30,32 and 17, 18, 19, respectively. The proposed bivariate frequency table, , is more manageable as compared with the existing table counterpart, . Indeed, the existing table ceases to be practical when the elements in the two variables, m1 and m2 are large enough.

5. Conclusion

The proposed bivariate discrete frequency table is more manageable and straightforward as compared with the existing counterpart. Indeed, the table provides a better option when the number of elements in the paired discrete data is substantial.

Public interest statement

Exploratory data analysis (EDA) plays a significant role in statistics. The existing bivariate discrete frequency table, one of the EDA tools, is straightforward, but when the number of elements in the data is large enough, the table can be complicated. This reasearch proposed a new bivariate discrete frequency table by grouping the elements in each variable. The new table is described using simulations from the bivariate binomial distribution, bivariate Poisson distribution, and real data, obtained from the English Premier League website. The public will find the new table helpful, as it provides a better alternative when the number of elements is substantial and reveals the essential data features.

Acknowledgements

This research is partially funded by Universiti Putra Malaysia grant GP/2018/969400. The first author is supported by Tetfund, Federal government of Nigeria scholarship grant.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the

Notes on contributors

M. B. Mohammed

M. B. Mohammed is a Lecturer in the Department of Mathematics and Computer Science, Federal University of Kashere, Gombe State, Nigeria. His research interest is in exploratory data analysis, extreme value theory, and circular statistics, among others.

M. B. Mohammed is a lecturer at the Department of Mathematics and Statistics, Federal University of Kashere, Gombe State, Nigeria. He was born on 16 May 1982 in Gombe, Gombe State, Nigeria. He, respectively, obtained his national diploma and bachelor’s degree in statistics from Federal Polytechnic Damaturu and Modibbo Adama University of Technology Yola, Nigeria. He finished his Master of Sciences in Statistics from the University of Ilorin, Nigeria, in 2015. He obtained his Ph.D. in statistics in the area of Exploratory Data Analysis (EDA).

His research interest is in Exploratory Data Analysis, Extreme Value, Circular Statistics, and Survival Analysis.

H. S. Zulkafli

H. S. Zulkafli is also a senior lecturer in the Institute for Mathematical Research, Universiti Putra Malaysia. She has vast experience in Bayesian statistics.

N. Ali

N. Ali is a senior lecturer in the Institute for Mathematical Research, Universiti Putra Malaysia. Her research interest is in extreme value theory.

O. R. Olaniran

O. R. Olaniran is a lecturer in the statistics department, University of Ilorin, Kwara State, Nigeria. His research areas are; biostatistics and data science. He has experience in Bayesian statistics, mathematical statistics and inference, statistical computing, biostatistics and survival analysis, and time series and econometrics.

References

  • Beniger, J. R., & Robyn, D. L. (1978). Quantitative graphics in statistics: A brief history. TheAmerican Statistician, 32(1), 1–11. https://www.tandfonline.com/doi/abs/10.1080/00031305.1978.10479235
  • Cochran, W. (1954). Some methods for strengthening the common chi-square test. Biometrics, 10(4), 417–451. https://doi.org/10.2307/3001616
  • Davies, G. R. (1929). The analysis of frequency distributions. Journal of the American Statistical Association, 24(168), 349–366. https://doi.org/10.1080/01621459.1929.10502532
  • Doane, D. P. (1976). Aesthetic frequency classifications. The American Statistician, 30(4), 181–183. https://www.tandfonline.com/doi/abs/10.1080/00031305.1976.10479172
  • Fisher, M. J., & Marshall, A. P. (2009). Understanding descriptive statistics. Australian Critical Care, 22(2), 93–97. https://doi.org/10.1016/j.aucc.2008.11.003
  • Freedman, D., & Diaconis, P. (1981). On the histogram as a density estimator: L2 theory. Probability Theory and Related Fields, 57(4), 453–476. https://doi.org/10.1007/BF01025868
  • Gardiner, V., Gardiner, G., & Catmog, G. G. (1979). Analysis of frequency distributions.
  • Gelman, A. (2011). Why tables are really much better than graphs. Journal of Computational and Graphical Statistics, 20(1), 3–7. https://doi.org/10.1198/jcgs.2011.09166
  • Gelman, A., Pasarica, C., & Dodhia, R. (2002). Let’s practice what we preach: Using graphs instead of tables. The American Statistician, 56(56), 121–130. https://doi.org/10.1198/000313002317572790
  • Gravetter, F. J., Wallnau, L. B., Forzano, L.-A. B., & Witnauer, J. E. (2020). Essentials of statistics for the behavioral sciences. Cengage Learning.
  • Karlis, D., & Ntzoufras, I. (2003). Analysis of sports data by using bivariate poisson models. Journal of the Royal Statistical Society: Series D (The Statistician), 52(3), 381–393. https://doi.org/10.1111/1467-9884.00366
  • Kastellec, J. P., & Leoni, E. L. (2007). Using graphs instead of tables in political science. Perspectives on Politics, 5(4) , 755–771. https://doi.org/10.1017/S1537592707072209
  • Kenney, J. F. (1939). Mathematics of statistics. D. Van Nostrand.
  • Manikandan, S. (2011). Frequency distribution. Journal of Pharmacology & Pharmacotherapeutics, 2(1), 54. https://doi.org/10.4103/0976-500X.77120
  • Mohammed, M. B., Adam, M. B., Ali, N., & Zulkafli, H. S. (2020). Improved frequency table’s measures of skewness and kurtosis with application to weather data. Communications in Statistics - Theory and Methods, 1–18. https://doi.org/10.1080/03610926.2020.1752386
  • Mohammed, M. B., Adam, M. B., Zulkafli, H. S., & Ali, N. (2020). Improved frequency table with application to environmental data. Mathematics and Statistics, 8(2), 201–210. https://doi.org/10.13189/ms.2020.080216
  • Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66(3), 605–610. https://doi.org/10.1093/biomet/66.3.605
  • Sturges, H. A. (1926). The choice of a class interval. Journal of the American Statistical Association, 21(153), 65–66. https://doi.org/10.1080/01621459.1926.10502161
  • Xu, D., & Wang, Y. (2020). Area-proportional visualization for circular data. Journal of Computational and Graphical Statistics, 29(2), 351–357. https://doi.org/10.1080/10618600.2019.1654881

6. Appendix

Bitable <- function(data = data, group = group){

# data is the bivariate discrete data

# group is the number of elements in each class,

# which can be determined using EquationEquation 1.

## creating the univariate frequency table

Unitable<-function(data,colNum){

freq<-data[,colNum]

n <- length(freq)

id <- sum(grepl(“\”.,freq))

sorted_data <- sort(freq)

uni_freq<-unique(sorted_data)

n_freq<-length(uni_freq)

if(id = = 0){

freq2<-table(freq)

freq_discrete1<-as.data.frame(freq2)

col_1<-c(1:n_freq)

freq_discrete2<- cbind(col_1,freq_discrete1)

colnames(freq_discrete2)[1:3]<-c(“class”,”xi”,”f”)

freq_discrete2

} else {print(“Not Discrete Data”)}

}

spliter <- function(x, n, force.number.of.groups = TRUE,

len = length(x), groups = trunc(len/n), overflow = len%%n) {

if(force.number.of.groups) {

f1 <- as.character(sort(rep(1:n, groups)))

f <- as.character(c(f1, rep(n, overflow)))

} else {

f1 <- as.character(sort(rep(1:groups, n)))

f <- as.character(c(f1, rep(“overflow”, overflow)))

}

g <- split(x, f)

if(force.number.of.groups) {

g.names <- names(g)

g.names.ordered <- as.character(sort(as.numeric(g.names)))

} else {

g.names <- names(g[-length(g)])

g.names.ordered <- as.character(sort(as.numeric(g.names)))

g.names.ordered <- c(g.names.ordered, “overflow”)

}

return(g[g.names.ordered])

}

stat_mode <- function(x, return_multiple = TRUE, na.rm = FALSE) {

if(na.rm){

x <- na.omit(x)

}

ux <- unique(x)

freq <- tabulate(match(x, ux))

mode_loc <- if(return_multiple) which(freq = = max(freq)) else which.max(freq)

return(ux[mode_loc])

}

split_list_into_single = function(LIST,ind){

xlist = LIST[[ind]]

paste0(xlist[1:length(xlist)],”,”,collapse = ““)

}

get_mode = function(xx,ff,ind){

xm = rep((xx[[ind]]),ff[as.numeric(xx[[ind]])])

paste0(stat_mode(xm)”,,”,collapse = ““)

}

create_tab = function(data, colNum, nvar){

#nvar is the number of variates intended

# colNum is the column number

Freq_table = Unitable(data, colNum) ## generate frequency table

x = Freq_table[,2] #subset x

freq = Freq_table[,3]

if(length(x)%%nvar = = 0) {

splits = spliter(x,n = length(x)/nvar,force.number.of.groups = T)

}

else {

splits = spliter(x,nvar,force.number.of.groups = F)

}

sumf = as.numeric(lapply(splits,function(i) sum(freq[i])))

tab = matrix(NA,nrow = length(splits),ncol = 3)

for(i in 1:length(splits)){

tab[i,] = c(split_list_into_single(splits,i),sumf[i],get_mode(splits,freq,i))

}

if(sumf[length(splits)]! = 0){

tabf = data.frame(tab)

colnames(tabf) = c(paste0(“xi”,”,”,”xi+1”,”,”,”xi+2”,” … ”),”freq”,

paste0(“Mode”,”(“”,xi”,”,”,”xi+1”,”,”,”xi+2”,” … ”,”)”))

tabf

}

else{

tabf = data.frame(tab)

colnames(tabf) = c(paste0(“xi”,”,”,”xi+1”,”,”,”xi+2”,” … ”),”freq”,

paste0(“Mode”,”(“”,xi”,”,”,”xi+1”,”,”,”xi+2”,” … ”,”)”))

tabf[-length(splits),]

}

}

#### Creating the bivariate frequency table

## Function that categorizes the elements in the data

cut.unique = function(x, group){

cut.unique = function(x,group){

uniquevalues = unique(x)

sort.uni = sort(uniquevalues)

ngroups = ceiling(length(sort.uni)/group)

em = NULL

if(length(sort.uni)%%group = = 0){

for(i in 1:ngroups){

emf = NULL

for(j in (group-1):0){

emf = c(emf,paste0(sort.uni[i*group-j],”,”))

}

em = c(em, paste0(emf,collapse = ““))

}

}

else{

ngroups2 = ceiling((length(sort.uni)-(length(sort.uni)%%group))/group)

em. = NULL

for(i in 1:ngroups2){

emf. = NULL

for(j in (group-1):0){

emf. = c(emf.,paste0(sort.uni[i*group-j]”,,”))

}

em. = c(em., paste0(emf.,collapse = ““))

}

em. = c(em.,paste0(sort.uni[((length(sort.uni)-(length(sort.uni))%%group))+1):length(sort.uni)

em = em.

}

em

}

#Function that creates unique classes of group size g

#install.packages(“stringr”)

library(stringr)

cutdiscrete = function(x,g){

# x: discrete values;

# g: intended number of groups

uniqueg = cut.unique(x,g)

fvec = matrix(NA,nrow = length(x),ncol = length(uniqueg))

for(i in 1:length(uniqueg)){

classi = as.numeric(stringr::str_extract_all(uniqueg[i], “\d+”)[[1]])

if(length(classi) ! = 1){

begin = classi[1]

endin = classi[length(classi)]

fvec[,i] = ifelse((x≥ begin) & (x≤ endin),i,0)

}else{

fvec[,i] = ifelse(x = = classi,i,0)

}

}

ffvec = rowSums(fvec)

ffvec2 = factor(ffvec,labels = uniqueg)

ffvec2

}

## A function that construct the two-way frequency table

bivfreqtab = function(data,group){

if(max(length(unique(data[[1]])),length(unique(data[[2]]))) ≤ 10){

tab = table(data[[1]],data[[2]])

}else{

s1 = cutdiscrete(data[[1]],group)

s2 = cutdiscrete(data[[2]],group)

tab = table(s1,s2)} return(tab)

}

# get the modes using create_tab function

data1 = data.frame(data[[1]])

data2 = data.frame(data[[2]])

mod1 = create_tab(data1, 1, nvar = group)[,3] # modes of the first variable

mod2 = create_tab(data2, 1, nvar = group)[,3] # modes of the second variable

tab = bivfreqtab(data = data, group = group)

tabc = rbind(colmode = paste0(mod2), tab, deparse.level = 0)

tabf = data.frame(cbind(rowmode = c(““,paste0(mod1)), tabc))

colnames(tabf) = c(“rowmode”, colnames(tab))

tabf

}

}