703
Views
1
CrossRef citations to date
0
Altmetric
Research Articles

Occupational health knowledge discovery based on association rules applied to workers’ body parts protection: a case study in the automotive industry

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 1875-1888 | Received 25 Apr 2022, Accepted 15 Nov 2022, Published online: 08 Dec 2022

Abstract

Occupational Health Protection (OHP) is mandatory by law and can be accomplished by considering the participation of others besides occupational physicians. The data shared can originate knowledge that might influence other processes related to occupational risk prevention. In this study, we used Artificial Intelligence (AI) methods to extract patterns among records shared under these circumstances over two years in the automotive industry. Records featuring OHP data against physical working conditions were selected, and a database of 383 profiles was designed. As Occupational Health Protection profiles under study are associated with work functional ability reduction, the body part(s) (n = 14) where it occurred were identified. Association Rules (ARs) coupled with Natural Language Processing techniques were applied to find meaningful hidden relationships and to identify the occurrence of protection profiles being assigned to at least two body parts simultaneously. After filtering ARs using three metrics (support, confidence, and lift), 54 ARs were found. The distribution of simultaneous body parts is presented as being higher in Special projects (n = 5). The results can use in: (i) design a multi-site body parts functional work ability (loss) model; (ii) model the capacity of organizations to retain workers in their working settings and (iii) prevent work-related musculoskeletal symptoms.

1. Introduction

Employee health protection has become a competitive advantage for industries in various fields. Although workers’ health care may be costly for the industry, it has many benefits for both workers and that industry. These include (1) creating a sense of job security and importance in workers Kuhnert et al. (Citation1989), (2) reducing medical costs by preventing severe injuries to workers Miller and Levy (Citation2000), and (3) reducing costs related to absenteeism by preventing injury to workers. These reduce the industry’s costs and prevent the forced absence of trained, specialized, and experienced company employees, which leads to the improvement of the industry Galizzi and Boden (Citation2003).

Various organizations establish special protocols to protect the health of their employees. In industrial settings where workers are exposed to jobs featuring work-related musculoskeletal disorders risk factors, besides Occupational Physicians (OPs), are also responsible for workers’ health protection. One of the possible contributions is sharing the results from their risk assessments and therefore supporting the decision-making process regarding the match between job demands in each workplace and workers’ remaining work ability Santos et al. (Citation2020). In automotive industry, biomechanical risk factors and musculoskeletal symptoms are explained in terms of the upper limbs and low back. In other words, the team of ergonomists evaluated the biomechanical risk factors utilizing the European Worksheet Method (EAWS) Schaub et al. (Citation2013). In this organization where this study took place, ergonomics experts are taking part in the decision-making process as they are also responsible for the risk assessment regarding the occupational biomechanical overload risk factors Bernardes et al. (Citation2022); Assunção et al. (Citation2021). In the automotive sector, this approach is often employed and validated. The theoretical model that supports this strategy overcomes the traditional concept of limiting values of NIOSH (recommended weight lifting) Dick et al. (Citation2016) and in ISO 11226, ISO 11228-1, ISO 11228-2, and ISO 11228. Furthermore, the EAWS method results in a traffic light scheme point to classify the exposure severity level of each workstation evaluated. EAWS is divided in four categories for the evaluation of (1) working postures and movements requiring little additional physical efforts; (2) action forces of the entire body or hand-finger system; (3) manual material handling (>3 kg); (4) repetitive loads of the upper limbs. As a result, OPs determine each worker’s authorized exposure to occupational risk factors based on their current state of health. These profiles, which are written in Portuguese, instead detail the organizational and physical constraints of the workplace. They share, in a prior step, the OHPP in which they establish thresholds for occupational exposure because the decision has to identify jobs that are compatible with workers’ health status and only OPs are permitted to do them. The OP takes into consideration clinical data, but they only share data, by what is called ‘medical restriction’; The medical restrictions are described in a way that the OP states which exposure OP considers to be necessary to ensure the worker’s health protection by;

  • Establishing to two levels of severity. (a) Should Not Used (MN) – when the functional work ability is at its lowest level. (b) Must Not Used (SN) – when the functional work ability stills allow a given exposure dose.

  • Identifying the body region in which the ability changed.

  • Characterizing the work functional ability itself: (a)Above the shoulder; (b) Rotation/bending; (c) Torsion; (d) Force application; (e) Manual handling of loads; (f) Use of tools that transmit vibrations; Hence, Occupational Health Protection Profiles are entitled for this database to include medical restrictions for workers.

The matching procedure implies that OP assigns each worker an authorized exposure to occupational risk factors, given the worker’s health status. In other words, Occupational Health Protection Profiles (OHPPs) are established by the OPs without reproducing any clinical data. Instead, these profiles report limitations regarding working conditions, both physical and organizational, and are written in Portuguese. OP records the Functional Work Ability (FWA) in the worker’s OHPP. Hence, it records one or more comments about the worker’s FWA based on her/his health status in the system so that the technical teams can do the necessary planning for the continuation of the worker’s job demands. As the decision has to identify jobs that are compatible with workers’ health status and only OPs are allowed to do them, they share, in a previous step, the OHPPs in which they establish thresholds for occupational exposure.

In this study, OHPPs text records collected between 2019 and 2020 are used. We propose applying Natural Language Processing (NLP) Mollaei et al. (Citation2022a) to workers’ occupational health Portuguese text. After text mining with structured data, we provide a new approach to simultaneous body regions represented by ARs that is described by pattern recognition algorithms employing hidden rules as a strategy, which was undertaken in the automotive sector of the Portuguese manufacturing industry.

We extracted the hidden rules from this data. This method, called AR mining, helps us find meaningful and significant patterns in workers’ FWA and present them to technical teams. Additionally, AR mining has an advantage in comparison with other methods such as ANalysis Of VAriance (ANOVA) Ferreira et al. (Citation2019) as the main domain of data is nominal. As a result, it can contribute by modeling an indicator to assess the capacity of organizations to retain their manpower in their working settings and how to cope with work-related absenteeism. The findings could also be used to prevent work-related musculoskeletal disorders because they identify the body parts with the lowest FWA and, hence, improvements to working conditions should be planned accordingly in production areas and Special project one.

The rest of this article is structured in the following way: Section 2 reviews the previous related work. Section 3 describes the data collection method and the proposed approach of this research in detail. Section 4 provides experiments and results, and Section 5 provides conclusions and suggestions for future work.

2. Related work

Today, many medical devices use AI technologies in their hardware and software. In addition, with the electronic registration of medical records in many medical, administrative, and industrial centers, medical data has become an important database for data science researchers. Analysis of this data can be beneficial in the prevention, diagnosis, and treatment of diseases.

On the other hand, Kakhki et al. (Citation2020) predicted the extent of body part(s) damage in accidents occurring at grain handling facilities using machine learning algorithms. In this way, the incidence of such incidents can be reduced. In another study, Kim et al. (Citation2020) used NLP techniques to extract keywords from pathology reports in electronic health records. They trained the KEA Witten et al. (2005) and WINGNUS Nguyen and Luong (Citation2010) deep learning models through supervised learning and then evaluated them on labeled and unlabeled data.

AR mining, as defined in Knowledge Discovery and Data Mining (KDD), is finding relationships in datasets stored in the form of itemsets. Researchers in studies have mined association rules to find explainability Mollaei et al. (Citation2022b) and causality Goldberg (Citation2019); Hela et al. (Citation2018), relationship Rashid et al. (Citation2014), or both Lakshmi and Vadivu (Citation2017). By applying the Apriori method to a diabetic patient database, Stilou et al. (Citation2001) discovered interesting ARs. In another study, Yang et al. (Citation2013) used the ARs to extract useful patterns from Traditional Korean Medicine (TKM) texts to identify the relationships between the symptoms and the associated medicines. Harahap et al. (Citation2018) first clustered patient prescriptions and found the top 10 diseases. Then, by ARs, they identified the relationship between the disease and the medicine. Chou (Citation2020) also obtained results using the AR mining method on Taiwanese medical data that could improve the treatment of sub-healthy people at high risks. Sarıyer and Öcal Taşar (Citation2020) have used the AR mining method to determine the relationship between the patient’s diagnosis and diagnostic test requirements. Extracted rules validated by emergency department experts and practitioners can optimize time and cost for patients.

The health of employees in industries is such an important issue that it has caused managers in various industries to pay attention to it. Extracting ARs from staff health data leads to the discovery of hidden patterns that managers can use to advance industries for further development. In a study, Cheng et al. (Citation2010) have extracted cause-and-effect relationships related to construction project hazards from Taiwanese construction industry (causes of accidents) data using the AR mining method. In another study, Huang (Citation2013) extracted the ARs between the lifestyles of factory workers in Taiwan and the metabolic syndrome. In addition, Weng et al. (Citation2016) proposed a method based on ARs to analyze work zone crash casualties’ characteristics and contributory factors. They did this research on the Michigan work zone crash data from 2004 to 2008.

The aforementioned research demonstrated the importance of AR mining in a variety of disciplines, especially in well-known applications. AR is a rule-based ML strategy (part of unsupervised learning Anand Hareendran and Vinod Chandra (Citation2017)) for determining interesting variable connections in real-world databases and linkages between commonly used objects. If/then expressions, for example, emphasize connections between disparate data sets in databases or other data sources. ARs are also used in other domains, such as web usage mining, intrusion detection, continuous production, and bioinformatics Miswan et al. (Citation2021). Despite the fact that previous research that extracted ARs from health and safety data was useful to industry executives, none of them addressed the relationship between job demand and a worker’s work ability based on body part(s). To fill this research gap, in this research, we presented a new approach by combining NLP and AR mining methods. The current study was motivated by the above-mentioned importance of ARs and looked into them further in the context of industry use. AR mining is a well-known technique for revealing hidden patterns Hela et al. (Citation2018). The feature is especially useful for supporting a multi-site body part FWA loss model via AR mining, as well as determining how varied job demands influence changes in workers’ work ability in regulating which working conditions should be improved.

3. Methodology

By law, occupational health surveillance is mandatory. At least every two years, each worker should be scheduled for a health exam (a) by the OPs (b). Whenever changes in workers’ work ability are identified (c) OHPPs are assigned (d) to the worker in order to ensure that a decision will be made regarding the tasks that he/she could perform. Each OHPPs is recorded in a digital file (d) and represents the dimensions considered as needed to establish the desired Occupational Health Protection Profile (OHPPs) which reproduces several data fields. NLP techniques (e) were used to read the data in occupational physicians’ FWA status of workers and reproduce them in an excel database to process each cleaned comment and extract the keywords (f). As stated earlier, association rule mining (g) was used to find the relationship between different body parts (). In this section, our proposed method for extracting hidden knowledge from records with data related to workers with changes in FWA is described in detail. This process has been done in two phases;

Figure 1. Outline of the main idea of this research to use the mined association rules in the discovery of body part(s). (a) workers has appointment every two year. (b) OPs understanding exposure to occupational risk factor. (c) When changes in a worker’s work abilities are discovered, medical parameters are defined to the worker (d) so that a decision can be made about the jobs he or she can complete. Each medical limitation is stored in a digital file (d) that reflects the dimensions that are thought necessary to create the desired OHPPs, which includes various data fields (e) to read the data fields in an excel database, which is from PDF files, (f) NLP techniques were utilized (g). As previously indicated, association rule mining was utilized to discover the relationship between various body parts. (g) It might be used by OPs.

Figure 1. Outline of the main idea of this research to use the mined association rules in the discovery of body part(s). (a) workers has appointment every two year. (b) OPs understanding exposure to occupational risk factor. (c) When changes in a worker’s work abilities are discovered, medical parameters are defined to the worker (d) so that a decision can be made about the jobs he or she can complete. Each medical limitation is stored in a digital file (d) that reflects the dimensions that are thought necessary to create the desired OHPPs, which includes various data fields (e) to read the data fields in an excel database, which is from PDF files, (f) NLP techniques were utilized (g). As previously indicated, association rule mining was utilized to discover the relationship between various body parts. (g) It might be used by OPs.
  • In the first phase (3.1), the required data was collected and prepared.

  • In the second phase (3.2), the knowledge contained in the final data was extracted and analyzed using the Apriori method.

The results from the 2nd phase will be relevant not only to understand how different job demands are influencing changes in workers’ work ability but also, which working conditions improvements should take place.

3.1. Data collection

In the automotive industry, decreases in FWA are a significant concern as they can result in work absence. Each OHPPs has 12 data fields describing either organizational (n = 4), individual (n = 2), work related parameters (n = 4) and time (n = 2). The most relevant ones for our study are the ones reported as ‘work related’, meaning, changes in FWA data. The database designed for this study uses 3 of the 12 data fields: one from the organizational cluster (area), one from an individual cluster (employee number), and one from a work-related cluster (OHPPs). They are represented by a three-layer structure: the first layer states the level of protection; the second layer refers to the deviation on FWA, which will always be reported to one of the fourteen (14) body parts: neck, trunk, shoulder (LFootnote1&RFootnote2), elbow (L&R), wrist (L&R), hand fingers (L&R), knee (L&R), foot (L&R) and the third layer reports to the additional information labelled as ‘comments’.

The database designed for this study uses 3 of the 12 data fields: one from the organizational cluster (area), one from an individual cluster (employee number), and one from a work-related cluster (OHPPs). To properly extract the text from this last cluster, an NLP protocol had to be designed. This protocol will also have to be sensitive to the 3 layered structure of each sentence, as already explained.

For this study, we considered OHPPs associated with FWA registered in the health system from January 2019 to December 2020. In this period, 2025 records were registered. As each record is reported to a single worker and the same worker can have more than 1 record in the mentioned period, a cleaning procedure took place in order to avoid repetitive records for the same worker. After this, the number of records decreased to 391, which is the total number of workers to whom medical restrictions were assigned between two years. The distribution of these 391 records regarding the areas where workers are assigned is as follows: 51% in Assembly, 20% in Body Construction, 13% in Special Projects, 6% in Quality Assurance, 4% in Paint, and 4% in Metal Stamping. Other areas such as Product Management & Planning, Logistic, Plant Manager, and Finance contain less than 2% of the records from which reliable knowledge cannot be extracted. Therefore, we have not considered these areas for this research. shows this distribution.

Figure 2. Distribution of data records among the areas.

Figure 2. Distribution of data records among the areas.

As OHPPs are written by the OPs in text format, several deviations were found between different OPs. Minor deviations, like non-word symbols, extra spaces, misspellings, and incorrect use of capital letters, were found. Additionally, major ones were also identified, like the first layer, which addressed the criteria MN and or SN being written at the end of the sentence instead of the beginning of it. Hence, we used the Natural Language Toolkit (NLTK)Footnote3 to clean up the data. We also used NLP techniques, to process each cleaned sentence and extract the keywords for the ‘body part’ and ‘risk label’ assigned to it by the OPs. Whenever an OHP does not explicitly describe the body part under protection but instead assigns processes, e.g. ‘manual material handling’ or ‘vibration tools’, no body parts were reported in the database. Finally, we stored the obtained information in the form of tabular data structures. As an example, shows two rows of final tabular structured data, as an example

Table 1. Two rows of final tabular structured data.

3.2. Proposed method

In Section 3.1, we explained how OHPPs are written by OPs and how each sentence is structured to be addressed to the desired exposure condition and protection level: MN or SN. In this study, we used AI methods to extract patterns in workers’ health data regarding OHPP and proposed AR mining as a new approach. The goal is to determine (i) whether the OHPP is addressing different body parts at the same time and, if so, (ii) how many body parts are involved, (iii) what relationship they have, and (iv) how they differ between working areas. In other words, this association could be used as a criterion to evaluate an organization’s ability to retain its workforce in its working environments and, consequently, manage work-related absenteeism if occupational health protection is being addressed to multiple body part(s) at the same time and there are still jobs with demands that match the work ability of those workers. According to this, the higher the number of body part(s) being reported as associated, the more resilient the organization might be.

As stated before, MN protection categories are addressed by OPs in order to ensure the highest level of health protection possible, leading to a procedure whose final result is the identification of jobs that match with workers’ FWA. Regarding occupational exposure management, protections classified as MN lead to the absence of the correspondent risk factor. Therefore, theoretically, the dose has to be equal to zero regardless of the risk exposure dimension: duration, frequency, and intensity. Although the automotive industry’s working systems are complying more and more with ergonomic design criteria, working conditions are still far from meeting zero dose exposure. This way, MN protections might represent the intended protection and not the worker’s remaining FWA. Since the risk exposure dimensions are used to match job demands with FWA, it might be an advantage to review the occupational health protection assignment procedure. Instead of qualitative expressions targeted at intended health protection, quantitative data related to the risk exposure dimensions should be preferred. For example, if in area X one of the extracted rules shows ‘WristL,ShoulderLElbowL’, it means that the workers’ exposure to occupational risk factors is likely to change FWA in those body parts.

We know that occupational technicians in this industry, in addition to discovering the FWA, also pay attention to the restrictions on the body parts MN or SN. We also modified the final tabular dataset we generated in Section 3.1 to use it in the AR mining process. In the table, we divided each column of body parts into two columns: one for the MN label and one for the SN label. For example, the ShoulderL column has been converted into two columns named ShoulderLMN and ShoulderLSN. In this way, we had a table with 28 columns, the cells inside of which were filled with 0 and 1.

There are many algorithms for mining ARs, such as the Apriori Agrawal et al. (Citation1994), Ogihara et al. (Citation1997) and FP-Growth tree Han et al. (Citation2000) algorithms. In this study, we use the Apriori algorithm introduced by Agrawal et al. in 1994. This algorithm is the most classical and important induction algorithm for mining frequent itemsets and ARs Solanki and Soni (Citation2013); Riondato and Upfal (Citation2014). Apriori algorithm is a method that can apply to database, especially transactional databases containing a certain number of fields or items. The advantage of this algorithms is more common in comparison with other association rules algorithms, especially in industry Cheng et al. (Citation2010); Huang (Citation2013); Agrawal et al. (Citation1994); Han et al. (Citation2000); Istrat and Lalić (Citation2017); Ranjan and Sharma (Citation2019).

Additionally, the comparison between apriori and FP-Growth caused to be selected for our research: (a) apriori is an array-based and FP-growth is the tree-base algorithm, (b) apriori is Join and Prune technique, besides, FP-Growth builds conditional frequent pattern tree and conditional pattern base from dataset to satisfy min support, (c) apriori uses breadth-first search-based, conversely, FP-Growth uses depth-first search, (d) record generations is extremely parallelizable, in FP-growth data is interdependent with per node requires the node, (e) data scans multiple times to for generating candidate set by Apriori, but, in FP-Growth, twice dataset scans to create pattern frequent tree Agrawal et al. (Citation1994); Han et al. (Citation2000); Istrat and Lalić (Citation2017); Ranjan and Sharma (Citation2019). So, we used this algorithm to mine the ARs among the abilities inflicted on workers in every work area of the automotive industry.

Details of this algorithm are shown below. Let BP={bp1,bp2,bp3,,bp28} be a set of 28 items in the columns, each item representing one of the body parts (bpi) combined with its risk level. Let D={mr1,mr2,mr3,,mrn} be our data set in a work area, where each row represents the MR (mri) of a worker in an medical appointment. Each mri in D is a subset of BP. n indicates the number of MRs in the dataset. In a rule XY extracted from this data set, X is called the antecedent and Y is called the consequent. In this rule, each of X and Y are subsets of BP (X,YBP) that do not intersect with each other (XY). There are three main measures: Support, Confidence, and Lift for evaluating the rules and achieving stronger rules. These ‘stronger rules’ refer to the rules with a higher possibility of happening.

  • Support is an indicator of how many times a subset of BP appears in members of D (|||| denote the number of members). (1) Support(X)=||{tD|Xt}||||{tD}||(1)

  • Confidence is the percentage of all transactions satisfying X that also satisfy Y. (2) Confident(XY)=Support(XY)Support(X)(2)

  • Lift is used to determine negative interdependence (Lift < 1), independence (Lift = 1), or positive interdependence (Lift > 1) between antecedent and consequence. (3) Lift(XY)=Support(XY)Support(X)*Support(Y)(3)

In different references Xu (Citation2016); Fournier-Viger et al. (Citation2012); Hikmawati et al. (Citation2021), various criteria are used to determine the parameter selection and their measures, which are different from each other. However, most of these references state that there is no definite rule to fix the value of the Apriori parameters. The goal is to generate a fair balance of support and confidence values to get the most appropriate rules. For each of them, a threshold is defined to eliminate items to exclude from the frequent itemset, which is obtained with appropriate domain knowledge and a reasonable level of support and confidence. In fact, the thresholds minimum support and minimum confident have a significant impact on the top-k ARs as well as the quality of the rules provided by algorithms: a) If the threshold is set too high, the algorithms provide too few results, ignoring important information. b) If the threshold is set too low, the algorithms can produce a large number of results and become exceedingly sluggish. Furthermore, the association rules will satisfy if; support XY>=Δ and confidence XY>=δ (δ and Δ are the minimum confident and minimum support). In this study, 1% and 10% determined as a Δ and δ (minimum 1% frequency and 10% reliability). This algorithm can extract stronger rules by considering a minimum value for Support (min_Support) and a minimum value for Confidence (min_Confidence).In addition, a value of more than ‘1’ can be considered as a minimum value for Lift (min_Lift) to derive rules that show a more positive interdependence. The rule formation process is repeated until it finds the desired number of rules based on the real-environment of industry. The strongest rules presented were more than ARs of declare in this article. However, we selected the important rules based on minimum support and minimum confidence. The final extracted rules considered the important via the defined criteria that can be presentable and acceptable for the industry domain.

4. Experiments & discussion

There are previous studies regarding using AI algorithms based on multiple body parts to industry executives [5,6,7]. None of them has addressed the relationship between various multi-side body parts in areas of the automotive industry. Kakhki et al. (Citation2020); Davoudi Kakhki et al. (Citation2019a,Citation2019b) In other words, the prospective associations rule to conclude the creation of a multi-site body part functional work ability (loss) model, the modeling of organizations’ ability to keep employees in their current working settings, and the prevention of work-related musculoskeletal symptoms. In this study, we used Apriori algorithm to extract hidden knowledge from records in terms of body regions and automotive industry areas. So, we found hidden patterns between 14 body parts and the place where workers work. These body parts can be paired (two body parts) or multi-body parts (an antecedent and consequent). The causality and expandability of rules are interpreted with expert-driven (ergonomics). If occupational health protection is being provided for multiple body parts simultaneously and there are still jobs with requirements that are compatible with the work ability of those workers, this association could be used as a criterion to assess an organization’s capacity to keep its workforce in its working environments and, consequently, manage work-related absenteeism. According to this, the higher the number of body part(s) is being reported as associated, the more resilient the organization might be. For more explanation, causality is inherently linked to decision-making, as it lets technical teams such as occupational physicians better predict the future and intervene to change it by showing which body parts have the capacity to affect others. Implementing the Apriori algorithm has made it possible to learn a causal model from observational data. While this model has the potential to help technical teams’ decisions, it is known that the output of this algorithm improves decision-making. That is, the causal inference method has been evaluated on the rules for uncovering knowledge as well as the utility of such output for FWA of workers’ analysis. Simply presenting more information to people may not have the intended effects, particularly when they can combine this information with their existing knowledge. While this study showed that a causal model can be used to choose interventions and predict outcomes, that work has not presented the structures of the complexity found in machine learning, or how such information is interpreted in the context of existing knowledge. In other words, based on association rules in a certain place, we can determine which body parts of employees already have physical issues, and then we can prevent these body parts from developing physical issues in the future (like using in job rotation plan).

In this section, we present the results of implementing the proposed approach, which is able to find hidden patterns in our data that are often not found with alternative procedures. After running the apriori algorithm on the created dataset, the obtained rules showed interesting results. By considering them, several processes in the organization might be influenced, as we will demonstrate in the next sections. According to description in methodology, we tuned the measures of apriori algorithm and set the min_Support=0.01, the min_Confidence=0.1, and the Lift > 1. We obtained 299 rules from the Assembly area, 166 rules from the Body construction area, 488 rules from the Special projects area, 63 rules from the Quality assurance area, 25 rules from the Paint area, and 103 rules from the Metal stamping area (. The OHPP with MN represents the highest level of protection possible. Therefore, we can infer that workers assigned to this category have their FWA at their lowest level.

Figure 3. Statistics of mined association rules from different areas with Supp = 0.01, Conf = 0.1 and Lift > 1.

Figure 3. Statistics of mined association rules from different areas with Supp = 0.01, Conf = 0.1 and Lift > 1.

By using the apriori algorithm only for MN cases, the ARs will find the body parts that are being simultaneously targeted. We only list the rules with Confidence = 1 in the tables in this section. In areas where many rules have been discovered (Assembly area, Body construction area, and Special projects area), we have imposed more limitations by checking if different ARs show the same body parts although in a different sequence. If so, one of the ARs was deleted from the original table. shows some of the rules extracted from each work area. In the following subsections, we have evaluated the ARs mined from each area separately.

Figure 4. Scheme of several Association Rules mined from areas: (a) Assembly, (b) body construction, (c) special projects, (d) quality assurance, (e) paint, and (f) metal sampling (the numbers on each shape correspond to that area’s mined ARs table row).

Figure 4. Scheme of several Association Rules mined from areas: (a) Assembly, (b) body construction, (c) special projects, (d) quality assurance, (e) paint, and (f) metal sampling (the numbers on each shape correspond to that area’s mined ARs table row).

4.1. Assembly area

OHPPs were assigned to 199 out of 2074 workers from the Assembly area, within a two-year period. It means that 10% of the workers in the Assembly had health protection in this period. Two hundred and ninety nine association rules were found for the 199 workers: 7% (n assigned to 199 out of 2074 workers from the Assembly area, within a two-yen assigned to 199 out of 2074 workers from the Assembly MN are reported in the after skipping the rules reporting the same body parts, although in a different sequence. Therefore, in assembly, 14 ARs report the combination of body parts with simultaneous assignment. Eleven (79%) out of 14 body parts were found and the number of body parts being simultaneously associated was 4 (max) in 10 rules (e.g. rule 13 fingerL,wristL,elbowLshoulderL).

Table 2. Mined association rules from assembly area.

When analyzing the distribution of each body part being reported, the highest frequency found was 6, being identified either in the upper body (trunk) as well as the upper limbs (wrist left and wrist right) and lower limbs (knee left). The data suggests that an assembly area can have, at the same time, the cause of the problem and the solution to it. The ‘cause’ was because the exposure to occupational risk factors of 199 workers had severe consequences on their work ability as they were targeted with OHPPs at the highest protection level. Additionally, because 11 out of 14 body parts are associated with those protections, and there are at least 3 body parts that are simultaneously associated with each profile (Table 8 and Table 9 in the Supplementary material. The ‘solution’ referred because the workers from whom these results were reported were still working in this same production area. Having said that, it means that jobs matching the OHPP were found in the Assembly. The ‘solution’ also applies because, under these circumstances, workers are not contributing to work-related absenteeism.

For example, the rule ¨kneeL,wristRfootL¨ in row 4 of the shows a strong pattern and, because its lift is greater than 1, it is considered a strong rule. There is evidence that exposure to occupational risk factors is likely to affect FWA in these body parts. It also means that the assembly has jobs that are compliant with OHPP in assigning the same body parts.

4.2. Body construction area

Worker’s health protection profiles were referred to for 78 of the whole 661 workers in Body construction during two years. This represents that throughout this period, 12% of this areás workers were covered by health protection. There were 166 ARs discovered among the 78 workers, with 8% (n = 14) assigned to MN and 92% (n = 152) assigned to SN. After ignoring the rules for reporting the same body parts in a different sequence, from 14 to 10 ARs remained for MN in the . As a result, 10 ARs record the combination of body parts with simultaneous assignment. Five (36%) out of 14 body parts were found, and the number of body parts being simultaneously associated was 4 (max) in 5 rules (Table 8 and Table 9 in the Supplementary material). When analyzing the distribution of each body part being reported, the highest frequency of 10 was identified in the lower limbs (left foot). In parallel, the wrist and fingers on the right side were also recognized with frequency 8. When compared to Assembly, Body construction had a higher percentage of workers (10% vs 12%) assigned with OHP, although fewer body parts were represented in the ARs: (11 vs 5). For the same maximum number (n = 4) of body parts being simultaneously associated. In this production area, all ARs with MN OHP report lower limbs. As the protection given is addressed to the highest level possible, it has, somehow, to be associated with the exposure to working conditions

Table 3. Mined association rules from body construction area.

This finding is relevant as it shows a possible paradox between: (i) the ergonomics risk assessment results (no evidence of knees and feet being exposed to demanding working conditions); (ii) the workers’ subjective opinion (complaints reported to OPs during health appointments regarding lower limbs) and (iii) the clinical evidence, observed by the OP, that determined the protection as MN level. Rule 2 (¨footLkneeL,fingerR¨) can be an example of how important health protection systems such as the one under study are and the fact that they combine multidisciplinary (Ergonomics and OPs) and participatory (workers’ opinion) approaches.

4.3. Quality assurance area

Among the 165 workers who work in this area, 23 (14%) have been assigned to OHPP. Sixty three rules were discovered: %10 (n %10 (red165 workers who work in n %10 (red165 workers who work in this area, 23 (14%)MN are reported in the . Five (36%) of 14 body parts were identified among OHPP and the maximum number of BP being simultaneously reported is 4 in a solo rule. The trunk had a maximum frequency of 6 (100% of rules) when considering the distribution of BPs in the highest protection level (MN) (Tables 8–10, in Supplementary material). As mentioned before, in Body construction for foot left, in Quality assurance, the trunk is associated with ARs in all OHPP. It is expected that this circumstance would be a result of workers’ exposure to occupational risk factors. As these results are, also, reported to MN protection, it is likely that, by 2019-2020, Quality assurance have jobs that match with OHP assigned to trunk.

Table 4. Mined association rules from quality assurance area.

According to ARs mining results belong to Quality assurance and having an OHPP assigned, trunk is likely to be reported. Reduction in FWA among Quality assurance workers can take place, according to Rule 6 in (¨shoulderL,elbowR,wristRtrunk’), simultaneously. As mentioned before, in Body construction for foot left, in Quality assurance, the trunk is associated with ARs in all OHPP. It is expected that this circumstance would be a result of workers’ exposure to occupational risk factors. As these results are, also, reported to MN protection, it is likely that, by 2019-2020, Quality assurance have jobs that match with OHP assigned to trunk.

4.4. Paint area

In the two years during which the current study took place, 550 workers were working in the Paint area. Among these, 16 (3%) were assigned OHPPs, which, as presented before, reproduce the occupational exposure allowed by the OPs. Among 16 workers, 25 ARs gathered: 8% (n = 2) are assigned to MN and 92% (n = 23) were reported to SN. After filtering the rules and achieving strong and reliable rules, two ARs have been discovered, which we have given in . The number of body parts reported in the OHPP from Paint area workers is 3: shoulder left, elbow left, and trunk (Table 8 and Table 9 in Supplementary material). The results suggest that the number of body parts being assigned in OHPP might be correlated with the diversity of exposure to occupational risk factors. In that sense, the higher the diversity of working conditions with the presence of occupational biomechanical risk factors, the higher the number of body parts being reported in OHPP.

Table 5. Mined association rules from paint area.

The simultaneous BPs reported by ARs were 2 (max) as shown, for example, in Rule 1. (‘elbowL → shoulderR’). The number of body parts reported in the OHPP from Paint area workers is 3: shoulder left, elbow left, and trunk (Table 8 and Table 9 in the Supplementary material). The results suggest that the number of body parts being assigned in OHPP might be correlated with the diversity of exposure to occupational risk factors. In that sense, the higher the diversity of working conditions with the presence of occupational biomechanical risk factors, the higher the number of body parts being reported in OHPP.

4.5. Metal stamping area

The rules reproduced in are reported to 16 workers (8%), out of 204, from Metal stamping with OHPP assigned. One hundred and three ARs were discovered: 95% (n = 98) as SN and 5% (n = 5) as MN. The simultaneous body regions reported by ARs were 3 (max) and the number of body parts reported was 3: shoulder left, wrist left, and knee right, with the same frequency distribution (n = 4, 80%). Taking into consideration that manual material handling (MMH) is the most representative process in this production area and the consequences of being exposed to it are mainly 11 reported to trunk Assunção et al. (Citation2021); Bernardes et al. (Citation2022) it would be expected that ARs would also report this body part. Although, as mentioned in Section 3.1, no body part was assigned to OHP referring processes like MMH (Table 8 and Table 9 in the Supplementary material).

Table 6. Mined association rules from metal stamping area.

In , rule 5 (‘kneeR → wristL, shoulderL’) states a strong pattern reports the body part(s) that are associated with each other and correlated with the typical exposure during MMH processes. As discussed in Paint area, low exposure diversity is also associated with fewer body part(s) being identified by ARs mining. Therefore, in metal stamping, three out of five body part(s) are reported

4.6. Special projects area

The labeling of ‘Special Projects Area’ was a decision of convenience for clustering workers assigned to jobs out of their production areas. This only takes place when the OP reports a worker as being unfit for jobs in production areas. By the time of this study, 51 workers were assigned to special projects. Typically the OHPP of workers belonging to ‘Special projects’ is the most challenging in terms of the match between remaining work ability and job demands. Whenever a match is not verified, a worker is not fit at all to work at the automotive factory, and therefore, he/she will be absent from work. For the 51 workers (13% of workers with OHP), 499 ARs were discovered, with 5% (n = 24) being classified as MN and 95% (n = 464) being presented as SN. After removing the 7 rules reporting the same body parts but in a different sequence, the 17 ARs discovered for MN are presented in the . Out of 14 body parts, 8 (57%) were identified, and the number of body parts being simultaneously identified is 5 (max) in 7 rules. When analyzing the distribution of each body part reported, the highest frequency of being identified was in the upper limbs, such as finger left (n = 16) and neck (n = 16), respectively (Tables 8–10 in the Supplementary material).

Table 7. Mined association rules from special projects area.

For instance, rule 8 (‘shoulderL, neck → wristL, f ingerL, elbowR’) reports that ”Special projects” can cope with five body parts being simultaneously reported in OHPP and with FWA reductions in neck (94%), fingers left (94%), elbow right (88%), wrist left (71%) and shoulder left (59%) (more information in the Supplementary material.

5. Conclusion & future work

One of the critical issues in industries is the workers’ occupational health protection. Preventing physical issues to workers’ body parts have many benefits for industries and their workers. Industries prevent the loss of their skilled labor due to medical absence by protecting the health of their workers. This is so important that some large industries have developed systems to create workers’ OHPP. Occupational health physicians monitor the health status of workers and record Medical Restrictions (MR) in the system according to their physical issues so that technical teams can consider them in determining the workers’ job demands. However, sometimes these physical issues with workers’ body parts are asymptomatic during a medical examination, or a body part may be injured soon.

In this study, we proposed a new approach by combining Natural Language Processing and ARs mining method, using which we extracted hidden patterns from the MRs in the OHPPs of workers in the Portuguese automotive industry. The results showed important correlations in injuries to different body parts that could assist even occupational physicians in the process of figuring out latent injuries and recording FWA restrictions in workers’ OHPP. Using this method can prevent the aggravation of injuries and may reduce the number of workers who cannot work based on these health problems.

Another impact of this case study in the Portuguese automotive industry is to determine (i) whether the OHPPs are targeting distinct body parts at the same time, and if so, (ii) how many body parts are involved, (iii) what relationship they have, and (iv) how they differ between working areas. In other words, if OHPPs are addressing multi-site body parts simultaneously and there are still jobs with demands that fit the work ability of those workers, this association could be used to assess the organizations’ capacity to retain their manpower in their working settings and to cope with work-related absenteeism. The more body parts listed as connected, the more resilient the organization may be, according to this.

The maximum level of health protection possible leads to a procedure resulting in the identification of jobs that fit workers’ FWA. When it comes to assigning the thresholds for exposure to occupational risk factors, MN protection will represent the absence of the risk factor in the working conditions. As a result, regardless of the risk exposure dimension (duration, frequency, or intensity), the dose must theoretically be equal to zero. Despite the fact that automotive industry working systems are increasingly complying with ergonomic design criteria, zero dose exposure to biomechanical risk factors is far from reality Bernardes et al. (Citation2022); Kuijer et al. (Citation2012); Seidel et al. (Citation2019). As a result, based on rules gathered, the MN protections may represent the intended protection rather than the worker’s remaining functional work ability. This should be a topic to discuss internally as it could be advantageous to change from a qualitative to quantitative system to establish protection levels.

Regardless of the need for improvements, the system under study presents 3 key success factors: (i) multidisciplinarity between technical expertise (Ergonomics and OPs) which are relevant for assessing both workers’ health and workers’ work ability; (ii) participatory protocol, by considering workers’ opinion as an input for establishing occupational health protection; and (iii) explainability, accomplished by bringing engineering approaches, like KDD, as a support procedure for promoting discussions among the different stakeholders. As this result is based on the areas of the automotive industry, it may change the areas, such as adding new areas, deleting areas, or merging ones in the future. So, we can have more or less OHPP with workers’ FWA. We believe that our method will be reliable to extract new ARs to not only be usable for technical teams but also give them new insight to assess the organizations’ capacity to retain those workers who have FWA. Hence, this study is specific and should be considered as a milestone to reveal directions (such as using their big data in other machine learning methods Almasi and Rouhani (Citation2021, Citation2016)) for further studies.

Authors’ contributions

N.M. and H.G. conceived the presented idea. N.M. developed the theory and performed the computations. J.R. and C.B. verified the analytical methods. H.G. and C.F. supervised the findings of this work. All authors discussed the results and contributed to the final manuscript. All authors have read and agreed to the published version of the manuscript.

Supplemental material

Supplemental Material

Download Latex File (16.5 KB)

Disclosure statement

The authors report there are no competing interests to declare.

Data availability statement

The data presented in this study are available from the corresponding author on reasonable request.

Additional information

Funding

This work was partly supported by science and technology foundation (FCT), under projects OPERATOR (Ref. 04/SI/2019) and PREVOCUPAI (DSAIPA/AI/0105/2019), and Ph.D. grants PD/BDE/142816/2018 and PD/BDE/142973/2018.

Notes

References

  • Agrawal R, Srikant R, et al. 1994. Fast algorithms for mining association rules. Proc. 20th int. conf. very large data bases, VLDB. Vol. 1215. Citeseer; p. 487–499.
  • Almasi ON, Rouhani M. 2016. Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets. Turk J Elec Eng & Comp Sci. 24(1):219–233.
  • Almasi ON, Rouhani M. 2021. A geometric-based data reduction approach for large low dimensional datasets: Delaunay triangulation in SVM algorithms. Mach Learn App. 4:100025.
  • Anand Hareendran S, Vinod Chandra S. 2017. Association rule mining in healthcare analytics. International Conference on Data Mining and Big Data. Springer; p. 31–39.
  • Assunção A, Moniz-Pereira V, Fujão C, Bernardes S, Veloso AP, Carnide F. 2021. Predictive factors of short-term related musculoskeletal pain in the automotive industry. IJERPH. 18(24):13062.
  • Bernardes S, Assunção A, Fujão C, Carnide F. 2022. Functional capacity profiles adjusted to the age and work conditions in automotive industry. Occupational and environmental safety and health. Vol. iii. Springer; p. 555–567.
  • Cheng CW, Lin CC, Leu SS. 2010. Use of association rules to explore cause–effect relationships in occupational accidents in the Taiwan construction industry. Saf Sci. 48(4):436–444.
  • Chou HM. 2020. A collaborative framework with artificial intelligence for long-term care. IEEE Access. 8:43657–43664.
  • Davoudi Kakhki F, Freeman SA, Mosher GA. 2019a. Use of logistic regression to identify factors influencing the post-incident state of occupational injuries in agribusiness operations. Applied Sciences. 9(17):3449.
  • Davoudi Kakhki F, Freeman SA, Mosher GA. 2019b. Use of neural networks to identify safety prevention priorities in agro-manufacturing operations within commercial grain elevators. Applied Sciences. 9(21):4690.
  • Dick RB, Hudock SD, Lu ML, Waters TR, Putz-Anderson V. 2016. Manual materials handling. Physical and biological hazards of the workplace. Sebastopol, CL: O’Reilly; p. 33–52.
  • Ferreira F, Gago MF, Bicho E, Carvalho C, Mollaei N, Rodrigues L, Sousa N, Rodrigues PP, Ferreira C, Gama J. 2019. Gait stride-to-stride variability and foot clearance pattern analysis in idiopathic Parkinson’s disease and vascular parkinsonism. J Biomech. 92:98–104.
  • Fournier-Viger P, Wu CW, Tseng VS. 2012. Mining top-k association rules. Canadian Conference on Artificial Intelligence. Springer; p. 61–73.
  • Galizzi M, Boden LI. 2003. The return to work of injured workers: evidence from matched unemployment insurance and workers’ compensation data. Labour Economics. 10(3):311–337.
  • Goldberg LR. 2019. The book of why: the new science of cause and effect: by judea pearl and dana mackenzie, basic books (2018). England: Taylor and Francis Online. ISBN: 978–0465097609.
  • Han J, Pei J, Yin Y. 2000. Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2):1–12.
  • Harahap M, Husein A, Aisyah S, Lubis F, Wijaya B. 2018. Mining association rule based on the diseases population for recommendation of medicine need. J Phys: conf Ser. 1007:12017.
  • Hela S, Amel B, Badran R. 2018. Early anomaly detection in smart home: a causal association rule-based approach. Artif Intell Med. 91:57–71.
  • Hikmawati E, Maulidevi NU, Surendro K. 2021. Minimum threshold determination method based on dataset characteristics in association rule mining. J Big Data. 8(1):1–17.
  • Huang YC. 2013. The application of data mining to explore association rules between metabolic syndrome and lifestyles. Health Inf Manag. 42(3):29–36.
  • Istrat V, Lalić N. 2017. Association rules as a decision making model in the textile industry. Poland: Fibres & Textiles in Eastern Europe.
  • Kakhki FD, Freeman SA, Mosher GA. 2020. Applied machine learning in agro-manufacturing occupational incidents. Procedia Manuf. 48:24–30.
  • Kim Y, Lee JH, Choi S, Lee JM, Kim JH, Seok J, Joo HJ. 2020. Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records. Sci Rep. 10(1):1–9.
  • Kuhnert KW, Sims RR, Lahey MA. 1989. The relationship between job security and employee health. Group Organ Stud. 14(4):399–410.
  • Kuijer P, Van der Molen HF, Frings-Dresen MH. 2012. Evidence-based exposure criteria for workrelated musculoskeletal disorders as a tool to assess physical job demands. Work. 41(Supplement 1):3795–3797.
  • Lakshmi KS, Vadivu G. 2017. Extracting association rules from medical health records using multi-criteria decision analysis. Procedia Comput Sci. 115:290–295.
  • Miller TR, Levy DT. 2000. Cost-outcome analysis in injury prevention and control: eighty-four recent estimates for the United States. Medical Care. 38(6):562–582.
  • Miswan NH, Sulaiman M, Chan CS, Ng CG. 2021. Association rules mining for hospital readmission: a case study. Mathematics. 9(21):2706.
  • Mollaei N, Cepeda C, Rodrigues J, Gamboa H. 2022a. Biomedical text mining: applicability of machine learning-based natural language processing in medical database.
  • Mollaei N, Fujao C, Silva L, Rodrigues J, Cepeda C, Gamboa H. 2022b. Human-centered explainable artificial intelligence: automotive occupational health protection profiles in prevention musculoskeletal symptoms. IJERPH. 19(15):9552.
  • Nguyen TD, Luong MT. 2010. Wingnus: keyphrase extraction utilizing document logical structure. In Proceedings of the 5th International Workshop on Semantic Evaluation. p. 166–169.
  • Ogihara ZP, Zaki M, Parthasarathy S, Ogihara M, Li W. 1997. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining. Citeseer.
  • Ranjan R, Sharma A. 2019. Evaluation of frequent itemset mining platforms using apriori and FP-growth algorithm. Int J Info Sys Manag Sci. 2(2):1–6.
  • Rashid MA, Hoque Mt A, Sattar SA 2014. Association rules mining based clinical observation. rXiv preprint arXiv:14012571
  • Riondato M, Upfal E. 2014. Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees. ACM Trans Knowl Discov Data. 8(4):1–32.
  • Santos S, Folgado D, Rodrigues J, Mollaei N, Fujao C, Gamboa H. 2020. Explaining the ergonomic assessment of human movement in industrial contexts. BIOSIGNALS. 1:79–88.
  • Sarıyer G, Öcal Taşar C. 2020. Highlighting the rules between diagnosis types and laboratory diagnostic tests for patients of an emergency department: use of association rule mining. Health Informatics J. 26(2):1177–1193.
  • Schaub K, Caragnano G, Britzke B, Bruder R. 2013. The European assembly worksheet. Theore Issu Ergo Sci. 14(6):616–639.
  • Seidel DH, Ditchen DM, Hoehne-Hückstädt UM, Rieger MA, Steinhilber B. 2019. Quantitative measures of physical risk factors associated with work-related musculoskeletal disorders of the elbow: a systematic review. IJERPH. 16(1):130.
  • Solanki S, Soni N. 2013. A survey on frequent pattern mining methods apriori, ECLAT, FP growth. Inter J Comp Tech. 10(X):86–100.
  • Stilou S, Bamidis PD, Maglaveras N, Pappas C. 2001. Mining association rules from clinical databases: an intelligent diagnostic process in healthcare. Stud Heal Tech Info. 84(2):1399–1403.
  • Weng J, Zhu JZ, Yan X, Liu Z. 2016. Investigation of work zone crash casualty patterns using association rules. Accid Anal Prev. 92:43–52.
  • Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG. 2005. Kea: practical automated keyphrase extraction. Design and Usability of Digital Libraries: case Studies in the Asia Pacific. IGI Global. 23:129–152.
  • Xu Y. 2016. Research of association rules algorithm in data mining. IJDTA. 9(6):119–130.
  • Yang DH, Kang JH, Park YB, Park YJ, Oh HS, Kim SB. 2013. Association rule mining and network analysis in oriental medicine. PLoS One. 8(3):e59241.