1,156
Views
0
CrossRef citations to date
0
Altmetric
Educational Assessment & Evaluation

Evaluation of content validity and face validity of secondary school Islamic education teacher self-assessment instrument

, ORCID Icon &
Article: 2308410 | Received 26 Oct 2023, Accepted 17 Jan 2024, Published online: 26 Feb 2024

Abstract

This study aims to test the content and face validity of Secondary School Islamic Education Teacher Self-Assessment Instrument (SSIET-SAI) using Content Validity Ratio (CVR), Content Validity Index (CVI) and Cohen Kappa Index (CKI) analysis. They are nine professional experts from universities and schools and two other expert field. Validation process involving 209 items with four main constructs and the constructs namely Reflection of Sustainability of Knowledge, Reflection of Teaching, Humanity Introspection and Continuous Professional Development. Overall findings noted (N = 9, CVI = 0.96, CVR = 183 agreed and refined items) for content validity and (N = 2, CKI = 0.785) for face validity. SSIET-SAI have great potential to be highlighted as a psychometric measuring tool of self-assessment indicator for Islamic education teacher in secondary school.

Introduction

Educational evaluation is the process of carefully assessing the variety of information depicting and appraising some characteristic of an educational process. It assigns testing, tools of measurement, evaluation evidences from various sources as an additional supplement information. Interviews, questionnaires, observations and case studies are among the sources to contribute the information needed that will be analyse carefully. It is the most appropriate technique to give the wise and comprehensive interpretation of the value judgement towards individual or group under study.

Questionnaires are the most commonly used data collection method in educational and evaluation research that help gather information on knowledge, attitudes, opinions, behaviours, facts, and other information (Bhattacharyya et al., Citation2017; Yu et al., Citation2022) Development of a valid and reliable questionnaire is a must to reduce measurement error that can be defined as the discrepancy between respondents’ attributes and their survey responses. There has been an increasing prominence on establishing the reliability and validity of various measures and instruments used in collecting data on program effectiveness over the years (Plecki et al., Citation2012) and ascertaining the validity and reliability of a measurement tool as one of the most common tasks often encountered in social science research (Hair et al., Citation2019).

Nevertheless, most teaching evaluation questionnaires have not presented sufficient evidence of validity (Masuwai & Saad, Citation2016) and field of teacher education has been criticized for its lack of attention to validity and reliability issues in evaluation research (Grossman, Citation2008). The research by Bhattacharyya et al. (Citation2017) evidenced that from the review of 748 research studies, it was found that a third of the studies reviewed did not report procedures for establishing validity (31%) or reliability (33%). To overcome the measurement error and to quantify the measurability of questionnaires, validity and reliability is to be done prior to the commencement of the research.

Literature review

Reliability and validity are vital aspects of a quantitative research inquiry. A good instrument was considered by the analysis of reliability and validity. Validity is the fundamental consideration in promising the quality of instrument and is vital to be examine (Wang et al., Citation2023). Validity represents the extent to which specific items on a tool accurately assess the concept being measured in the research study and to ensures that the questions being asked permit valid inferences to be made (Anggraini et al., Citation2023). If the instrument affords a measure of what it actually measures, validity is established (Masuwai & Saad, Citation2016).

The four types of validity in educational research recommended by Bolarinwa (Citation2015) and Zhang and Garcia (Citation2023) are face validity, content validity, construct validity and criterion-related validity. Content validity was introduced by Lawshe (Citation1975) is required high measurement quality (Polit et al., Citation2007) and essential for the new develop instrument (Ebel, Citation1967). Content validity check is one of the steps taken in instrument development model by Strachota et al. (Citation2006) and Miller, Lovler and McIntire (Citation2013) initially for item construction evaluation.

Content validity refers to the content of the questions or items of the instrument being measured is truly representative and accurate (Fox, Citation1994) and refers to the extent to which a measure denotes all facets of a given construct (Bolarinwa, Citation2015). It is arguable that in one test, some questions in the questionnaire are not relevant to test the intended subject. Content validity can be a trivial issue whether the questionnaire contains the right questions to define the construct being assessed or not. In other aspects, content validity shows the meaning of the measurement coverage of a concept (Babbie et al., Citation2007).

Bhattacharyya et al. (Citation2017), and Tobón et al. (Citation2020) stated that validity of a questionnaire can be established using a panel of experts which explore theoretical construct. Experts are those who have experience in the field of education. According to Hambleton and Patsula (Citation1999), the validity of external experts must be based on the following criteria: (1) experts in the language, knowledge and culture of a subject; (2) involving more than one expert and translator from various perspectives and referencing the item. The expert panel was given the opportunity to provide suggestions and recommendations on the items to improve the sentence structure for clarity and conciseness assurance (Lynn, Citation1986). The satisfaction of instrument agreement, appropriate and valid is necessary to achieve item stability. Any item identified as unsuitable will be discarded. It works to approve the items on the test representing each construct to be measured (Miller et al., Citation2013). Whereas, unclear items need to be corrected or rearranged or even eliminated.

Meanwhile, face validity on the instrument was carry out to describes the overall picture, the accuracy of the items in describing one item test (Bornstein et al., Citation1994) and subjective (Elangovan & Sundaravel, Citation2021). Face validity is an assessment of the level in measuring the clarity and purpose of a construct used to evaluate. According to Oluwatayo (Citation2012), face validity can be achieved by the presence of experts in the field being studied or psychometrically someone who is inexperienced provides a degree of appropriateness to the instrument being measured for the intended purpose quantitatively. It is an important process in reporting validity procedures.

However, there are past researchers who do not consider face validity as a valid component of content validity. Face validity is considered the most minimal and basic index of content validity (Sekaran & Bougie, Citation2011; Zhang & Garcia, Citation2023). Content validity itself requires statistical evaluation which is profound compared to face validity which only requires intuitive evaluation (Hair et al., Citation2013). In fact, face validity is seen as no psychometrics in assessing validity because it is used as a surface-based procedure that need to be followed by content validity through expert evaluation (Gay et al., Citation2012). In conclusion, face validity is not accepted in proving the validity of the test (Khidhir & Rassul, Citation2023; Miller et al., Citation2013) but it recommended to be done together with content validity (Mackison et al., Citation2010).

Therefore, this study is aimed to examine the validity and reliability of the SSIET-SAI administered for Islamic education teacher in the secondary school as suggested (Feng et al., Citation2023; Johnson & Christensen, Citation2012; Miller et al., Citation2013). for assess the items (questions/variables) are constructed to be able to measure the self-assessment constructs. In addition, this study also aims to explore a further statistical analysis in validating the SSIET-SAI. The objectives of this study are to:

  1. Test the content validity of SSIET-SAI using Content Validity Ratio and Content Validity Index.

  2. Test the face validity of SSIET-SAI using Cohen Kappa Index.

Methodology

Research design

The study uses a quantitative approach in the form of a written questionnaire survey of Secondary School Islamic Education Teacher Self-Assessment Instrument (SSIET-SAI) to test the face and content validity of the instrument.

Survey instrument

The research instrument consists of 209 self-assessment items formed from the main aspects of teacher indicator of IET based on literature highlights, previous studies and the findings from the qualitative study that had done by the researcher. The Instrument Specification Table (IST) is constructed with equal weighting which are twenty to twenty-five percent which is forty to sixty items for each construct. Items built through conceptualization and operationalization according to the Islamic education context must be verified in terms of content to ensure items representing the theoretical domain of the construct as well as the face and content validity procedure (Khidhir & Rassul, Citation2023).

The instrument name Secondary School Islamic Education Teacher Self-Assessment Instrument (SSIET-SAI) developed with three main constructs of self-assessment namely: (1) Reflection of Sustainability of Knowledge; (2) Reflection of Teaching; and (3) Humanity Introspection. In addition, one other construct added to complement the correlation of self-assessment towards Continuous Professional Development. shows the details of SSIET-SAI.

Table 1. The constructs and number of item of SSIET-SAI.

Samples and procedures

The study consists of two types of experts, namely professional experts and lay experts as conducted by Effendi et al. (Citation2017). The sampling technique used is purposive sampling that is judgement sampling. This type of sampling is explained through the sample was selected based on expertise for the subject being studied. This sampling is used to obtain the necessary information through certain individuals who knowledgeable (Sekaran & Bougie, Citation2011). The judgement sampling is the one most suitable to be used in this study because the expert panel was chosen for a specific justification to obtain confirmation of the content of SSIET-SAI-related items.

Content validation procedures

The experts for content validity are a total of six experts who have been identified and invited to examine the research instrument for face validity (Lynn, Citation1986) as shown in the . The two main guidelines that will be used in the selection of experts are as follows:

Table 2. The Detail of Experts of Content Validation.

  1. Academic experts who have experience of 10 years and more in Islamic Education (10 years).

  2. Educators with expertise and skills in domains and thematic concepts based on career experience.

shows the number of appointed expert panels, areas of expertise and number of years of experience in the field. All the expert panels are academic experts who work in the field of education as teachers, lecturers and researchers.

Face validation procedures

The experts for face validity are a total of two experts who have been identified and invited to examine the research instrument for face validity (Lynn, Citation1986) as shown in . The two main guidelines that will be used in the selection of experts are as follows:

Table 3. The detail of experts of face validation.

  1. Academic experts who have experience of 10 years and more in Education (10 years).

  2. Educators with expertise and skills in thematic concepts, research and languages based on career experience.

shows the number of appointed expert panels, areas of expertise and number of years of experience in the field. All the expert panels are academic experts who work in the field of education as teachers, lecturers, researchers, administrators.

The appointed expert panel will be given an expert appointment letter for both processes along with a content and face validity form for evaluation. The instrument is distributed by the researcher herself or through email together with a cover letter, synopsis of the research and declaration form for each appointed expert. The completed instrument will be returned to the researcher through the same medium or by email. The panel also comes with clear instructions on the evaluation process.

Evaluation instrument

Expert content validation form

This instrument contains expert demographic information. The items involved are name, date, institution, position, field of expertise, work experience, gender, email address and signature. It is followed by the content validity section which provides complete instructions to assess the accuracy and comprehensiveness of the domain of the item trying to be described (Elangovan & Sundaravel, Citation2021; Oluwatayo, Citation2012). Content validity is determined based on the judgment of a panel of professional and field experts by asking them to evaluate the level of importance of the items on the instrument based on the combination types of challenges and SSIET-SAI indicators. A three-point scale is used for each item, which are:

  1. Essential (very important) – Scale 1.

  2. Useful but not essential – Scale 2.

  3. Not necessary – Scale 3.

In addition, Lynn (Citation1986) suggested that panel experts are also required to identify areas of deficiency and provide suggestions and recommend ways to improve the order of sentences to ensure clarity and brevity based on any difficulties found in translating the instructions in filling out the instrument.

Expert face validation form

This instrument contains expert demographic information. The items involved are name, date, institution, position, field of expertise, experience, gender, email address and signature. In the face validity sections which provides complete instructions for considering the appropriateness of SSIET-SAI as intended. According to Oluwatayo (Citation2012), the evaluation criteria taken into account are:

  1. Use of correct and appropriate grammar.

  2. Adjusted use of appropriate language.

  3. Use of correct spelling.

  4. Correct sentence structure.

  5. Appropriate writing size.

  6. Appropriate format.

  7. Appropriate content.

The feedback received from the panel of experts used is agree (Yes) and disagree (No). Agree value means that the item is organized empirically and according to the classification of thematic categories (Sangoseni et al., Citation2013). In addition, the expert panel was also asked to provide comments and suggestions to improve the instrument.

Data analysis

Content validity ratio

This method inspired by Lawshe (Citation1975) helps the researcher make the decision to retain or drop an item on the instrument through CVR calculation. CVR emphasizes the importance of an item for maintained in the instrument. In summary, it aims to filter items empirically on the instrument with quantitative procedures to ensure that each item is correct representing the content of the construct domain. The strength of CVR method, it is more transparent and directed, user-friendly, simple computer calculations, tables available determining the critical cut off value and emphasizing the issue of expert agreement to the level of the items that are considered ‘very important’ or essential (Lindell & Brandt, Citation1999).

The CVR value is in the range of –1 to +1, where the value is close to +1 indicating that experts agree that the item is very important in content validity. A CVR value less than zero (CVR < 0) indicates less than half the panel experts believe that such measurement items are very important. CVR value equal to zero (CVR = 0) means half of the sample size of the expert panel shows that the measurement item is very important and the other half agree it is not important. CVR values ​​above zero (CVR > 0) indicate half of the panel experts believe most measurement items are meet content validity. Once the CVR is obtained for each item, evaluation of Content Validity Index (CVI) was followed (Lawshe, Citation1975).

Content validity index

The Content Validity Index (CVI) of the instrument was conducted after CVR was obtained (Lawshe, Citation1975). The content validity of the instrument is determined by using the value level of the CVI which involves the level of agreement between experts (Lynn, Citation1986). Based on recommendations from previous studies, the minimum level of agreement between six experts at >0.83 and 0.5 significance level was determined. It is means that three out of six experts for one content validity must agree as a condition that the item is relevant for the thematic domain, or the item requires re-referencing to remove all ambiguity and as a condition for an accurate response.

Dichotomous values ​​of agree and disagree were used to assess content validity (Vargas & Luis, Citation2008 as referenced in Sangoseni et al., Citation2013). Favourable (F+) denoted items that were considered either as relevant, needed minor rewording for relevance, succinct and concise as it is. This item was assigned a score of +1.0. Unfavourable (F-) denoted items that were considered either nor relevant or unable to determine their relevance based on current sentence structure. This item was given a score of +0.0 (Masuwai & Saad, Citation2016). shows the interpretation of the CVI (Lynn, Citation1986).

Table 4. Interpretation of the Content Validity Index (CVI).

The content validity of the instrument is determined based on the value of CVI (Lynn, Citation1986). It is related to the level of agreement between expert panels. The cumulative average between the experts is determined by the numerical value of the proportional items of the instrument achieved by the relevant assessment by the content expert (Polit & Beck, Citation2006).

An expert panel consisting of six people, an CVI value above 80% or 0.80 is considered a high level of agreement. Meanwhile, CVI values ​​below 80% indicate that the items for the instrument do not sufficiently explain the thematic domain being studied because it raises issues of objectivity and inconsistency (Sangoseni et al., Citation2013). In this situation, the instrument needs to be modified before it continues to the next stage to determine the validity and reliability of the instrument.

Cohen Kappa Index

For Cohen Kappa Index (CKI), the items are indicators of evaluation done by two evaluators by determining the agreement of similarity of sets of data categories. A comparison between the level of agreement between dichotomous values ​​or values ​​given by two raters for certain qualitative variables which can be achieved with the help of simple percentages, such as the ratio of the number of values ​​between the two raters agreeing with the total number of values.

A Kappa value equal to +1 indicates complete agreement between the two raters, while –1 indicates complete disagreement (McHugh, Citation2012; Rodrigues et al., Citation2017). If Kappa estimates a value of 0, then it shows that there is no correlation between the evaluations of the two evaluators. Any agreement and disagreement are sole. A Kappa value of 0.70 is generally considered satisfactory (Explorable Psychology Experiments, Citation2012). shows the interpretation of Cohen Kappa (McHugh, Citation2012).

Table 5. The interpretation of Cohen Kappa.

Responses from the expert panel using an agreement scale (Yes) or (No) and will be analyzed using the CKI to determine the face validity of the study instrument (Cohen, Citation2013). According to Cohen (Citation1960), ‘the suggested procedure is to have two (or more) independent judges to categorize the sample units and to determine the level, significance and stability of sampling for their agreement’ (p. 37–38). The CKI analysis is to achieve consensus for the theme or unit analysis of the construct being studied. It will be confirmed by two or more expert panels that have the same purpose or give the same assessment (Oluwatayo, Citation2012). According to Bowling (Citation2009), the easiest way to calculate agreement between experts is to use percentages.

Results

All members of the experts were academicians who have worked in the field of education as educators/authors/researches (six experts on content validity and two experts on face validity). The number of years in practice ranged from 13 to 34 years, mean years of experience for all the experts were 14 years (n = 11).

Content validity ratio for content validity

Result shows that there are 91 items achieved a full agreement (CVR = 1.00) from the experts. 76 items with the value of CVR = 0.66 that need to be refined. Meanwhile, there is 39 items below the set value (CVR = 0.33) and 5 items (CVR = 0.00). shows the CVR results of the instrument.

Table 6. The result of content validity ratio.

Based on the result of CVR, researcher decided to accept all the recommendation and comments from the experts. The important of item to measure the construct has been given full of consideration to retained the items although it was included below the set value. Some of the items also discarded due to double-barrel, repetition and unimportant of the item. shows the researcher decision on the items of the instrument.

Table 7. The decision of items after content validity ratio.

Content validity index for content validity

After the CVR value of each item is identified, the item that is below the value of 0.333 will discarded. Then, the CVI value will be calculated as a whole with the total number of items still remains (Lawshe, Citation1975). For new instruments, the researcher must obtain the CVI value equivalent to 0.8 and above for new instruments (Davis, Citation1992).

From the CVI analysis conducted, there were two experts who gave full agreement (CVI = 1.00) to all the constructed items with only the comments on the choice of words. Two other experts gave an agreement value of 0.99. While the remaining two experts are with values ​​of 0.93 and 0.89. The average CVI value for the six experts is 0.96 which indicates a high value of agreement. shows the CVI value obtained for this study.

Table 8. The content validity index value.

From the CVI carried out, the researcher has a strong consideration to improve the instrument. Based on the suggestions and comments received from experts as well as discussions with supervisors, the researcher has revised and refined each item.

Cohen Kappa Index for face validity

Validity values ​​were analyzed through seven evaluation criteria that were given to two experts. Validity values ​​have been calculated by using Cohen Kappa calculation through SPSS software. shows that both expert reviewers of the instrument have had equal agreement on 11 criteria and have a difference of opinion on three others criteria. The Kappa Coefficient value shows a value of 0.785 where it is at a good level of validity and suitable to be used to collect research data (Bernard & Ryan, Citation2010). This means there is significant agreement between expert reviewers of the first and second instruments (p = 0.001 < alpha = 0.05). Kappa values ​​of 0.60 to 0.79 are generally considered satisfactory (McHugh, Citation2012). All comments and suggestions from experts were taken and the researcher made corrections and improvements to the instrument. shows the expert comments for face validity form.

Table 9. The experts agreement on face validity.

Expert 1 suggested to improve on grammatically part especially on sentence structure and acronym words. Expert 2 recommended to improve the format for respondents use-friendly. The researcher restructured the items based on suggestion and feedback received. Item 2, 4, 7, 13, 18, 24, 28, 34, 37, 38, 41, 50, 55, 57, 62. 64, 65, 66, 68, 69, 70, 72, 75, 78, 80, 81, 84, 85, 86, 87, 90, 91, 92, 93, 94, 95, 101, 106, 108, 109, 112, 116, 117, 167, 168, 170 and 172 were refined based on the correction of grammar and sentence structure used.

Discussion

This study established the face and content validity of SSIET-SAI designed to assess the implementation of Self-Assessment in the point of view of Secondary School Islamic Education Teacher. For face validity, the Cohen’s Kappa Index delineates chance agreement where both expert reviewers of the instrument have had equal agreement with the Kappa Coefficient value 0.683. It shows a good level of validity as conducted by Sangoseni et al. (Citation2013) and Masuwai et al. (Citation2016). Although, face validity is often said to be very casual and soft as an active measure of validity (Engel & Schutt, Citation2016; Khidhir & Rassul, Citation2023), it is still used as form of conciseness of the item in the instrument (Sangoseni et al., Citation2013).

However, Content Validity Index used in this study does not indicate the level of agreement; rather it measures the proportion of agreement among a group of experts. The characteristic makes the CVI very robust in that eliminates ambivalence and allows straightforward interpretation, which helps in constructing more reliable and valid data concerning content validity. The average CVI value for the six experts is 0.96 which indicates a high value of agreement as conducted by Masuwai et al. (Citation2016), Elas et al. (Citation2019) and Yusop et al. (Citation2023). The items on the final instrument strongly represented the thematic domains. shows the revision number of item after content validity analysis.

Table 10. The decision of items after content validity ratio.

Thus, face and content validity were had the same purpose in determining the validity of the questionnaire. Whereas, content validity is seen more intensely in determining the content of the item. Both have their own importance and cannot be ignored in a validity test for a single instrument. Face and content validity has been taken into account in this study where both have reached a good level of agreement from experts.

Conclusion

This study succeeded in obtaining the agreement of test candidates, professional experts and field experts for all 183 items built. This procedure is very important to follow Test Construction Model that requires procedures for the construction of new instrument. For face validity, Kappa Coefficient value with 0.683 shows a good level of two raters. Whilst, for content validity the average CVI value for the six experts is 0.96 which indicates a high value of agreement and the use of CVR can show clearly the strengths and weaknesses of the item through expert agreement. Strength of CVR highlighted in this study when the difference in expert agreement can be seen clearly and simple. The items on the final instrument strongly represented the thematic domains. The researcher suggested that all 183 refined items will conduct a pilot study for IET and will analyse using the Exploratory Factor Analysis (EFA) in future research. EFA is a statistical technique that is used to reduce data to a smaller set of summary variables and used to examine internal reliability of the instrument. The objective of EFA is to determine the number of factors underlying the diversity and various relationships between the items; identify the items that accumulate in certain factors; and allows for the removal of items that are not included in any of the extracted factors. Theoretically, with the valid instrument, it can be used for actual research and can be further for any analysis intended. Practically, this study established the validity recommendation on educational research for content and face validity of the instrument. It is an important consideration in promising the quality of the instrument and is essential to examine. This study provided the valid instrument for Islamic Education Teacher as preparation for self-assessment.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The research was funded by the Universiti Kebangsaan Malaysia. Grant number GP-K019312.

Notes on contributors

Azwani Masuwai

Azwani Masuwai was received the bachelor’s degree in Islamic studies from Universiti Kebangsaan Malaysia (UKM) (2003) and the master’s degree in education from Universiti Pendidikan Sultan Idris (2017). She had 20 years experienced as a teacher at secondary school and currently pursuing the Ph.D. degree in Islamic education at UKM. She has published 20 academic articles and recently working on the development of self-assessment inventory for teaching quality that focuses on reflection and teacher attributes for Islamic Education teacher.

Hafizhah Zulkifli

Hafizhah Zulkifli was received the bachelor’s degree and the master’s degree in Islamic education from UKM in 2008 and 2011. She also received the Ph.D. degree in Islamic education from International Islamic University Malaysia (IIUM) in 2019. Currently, she is a lecturer in UKM and has published more than 30 academic articles and interested in research on philosophy for children called Hikmah (wisdom) pedagogy in Malaysia, Moral Education and Islamic Education.

Mohd Isa Hamzah

Mohd Isa Hamzah was received the bachelor’s degree in Usuluddin from University Al-Azhar (1992), the master’s degree (Master of Latters) and the Ph.D. degree in Education from University of Birmingham in 1995 and 2008. He is a senior lecturer in UKM from year 2000 until now. He is an expert in art and applied arts for Education (Islamic Studies) and Curriculum and has published more than 60 academic articles.

References

  • Anggraini, F. D. P., Aprianti, A., Muthoharoh, N. A., Permatasari, I., & Azalia, J. L. (2023). Validity and reliability questionnaire test of knowledge, Attitudes, and behavior on dengue fever prevention. International Journal on Health and Medical Sciences, 1(2), 46–54.
  • Babbie, E., Halley, F., & Zaino, J. (2007). Adventures in social research: Data analysis using SPSS14.0 and 15.0 for Windows (1954). In A note on the multiplying factors for various chi square approxim (6th ed.). Pine Forge Press
  • Bhattacharyya, S., Kaur, R., Kaur, S., & Amaan Ali, S. (2017). Validity and reliability of a questionnaire: A literature review. Chronicles of Dental Research, 6(2), 17–24.
  • Bernard, H. R., & Ryan, G. W. (2010). Analyzing qualitative data. Sage Publications.
  • Bornstein, R. F., Rossner, S. C., Hill, E. L., & Stepanian, M. L. (1994). Face validity and fakability of objective and projective measures of dependency. Journal of Personality Assessment, 63(2), 363–386. https://doi.org/10.1207/s15327752jpa6302_14
  • Bolarinwa, O. A. (2015). Principles and methods of validity and reliability testing of questionnaires used in social and health science researches. The Nigerian Postgraduate Medical Journal, 22(4), 195–201. https://doi.org/10.4103/1117-1936.173959
  • Bowling, A. (2009). Research methods in health. McGraw-Hill Education, United Kingdom.
  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104
  • Cohen, J. (2013). Statistical power analysis for the behavioral sciences. Routledge.
  • Davis, L. L. (1992). Instrument review: Getting the most from a panel of experts. Applied Nursing Research, 5(4), 194–197. https://doi.org/10.1016/S0897-1897(05)80008-4
  • Ebel, R. L. (1967). Evaluating content validity. In D. Payne, & R. McMorris, (Eds.), Educational and psychological measurement: Contributions to theory and practice (pp. 85–94). Waltham Blaisdell.
  • Effendi, M., Matore, E. M., Idris, H., Rahman, N. A., & Khairani, A. Z. (2017, April 10–11). Kesahan kandungan pakar instrumen IKBAR bagi pengukuran AQ menggunakan Nisbah Kesahan Kandungan. Proceeding of International Conference on Global Education V (ICGE V), Padang, Indonesia, (pp. 979–997).
  • Engel, R. J., & Schutt, R. K. (2016). The practice of research in social work. Sage Publications.
  • Elas, N., Majid, F., & Narasuman, S. (2019). Development of technological pedagogical content knowledge (TPACK) for english teachers: The validity and reliability. International Journal of Emerging Technologies in Learning (iJET), 14(20), 18–33. https://doi.org/10.3991/ijet.v14i20.11456
  • Elangovan, N., & Sundaravel, E. (2021). Method of preparing a document for survey instrument validation by experts. MethodsX, 8, 101326–101326. https://doi.org/10.1016/j.mex.2021.101326
  • Explorable Psychology Experiments. (2012). Retrieved June 1st, 2015, from https://explorable.com/social-psychology-experiments.
  • Feng, Y., Ou-Yang, Z.-Y., Lu, J.-J., Yang, Y.-F., Zhang, Q., Zhong, M.-M., Chen, N.-X., Su, X.-L., Hu, J., Ye, Q., Zhao, J., Zhao, Y.-Q., Chen, Y., Tan, L., Liu, Q., Feng, Y.-Z., & Guo, Y. (2023). Cross-cultural adaptation and psychometric properties of the Mainland Chinese version of the manchester orofacial pain disability scale (MOPDS) among college students. BMC Medical Research Methodology, 23(1), 159. https://doi.org/10.1186/s12874-023-01976-8
  • Fox, R. (1994). Validating lecturer effectiveness questionnaires in accounting. Accounting Education, 3(3), 249–258. https://doi.org/10.1080/09639289400000023
  • Gay, L. R., Mills, G. E., & Airasian, P. (2012). Educational research: Competencies for analysis and applications (10th ed.). Merrill Prentice Hall.
  • Grossman, P. (2008). On measuring mindfulness in psychosomatic and psychological research. Journal of Psychosomatic Research, 64(4), 405–408. https://doi.org/10.1016/j.jpsychores.2008.02.001
  • Hair, J. F., Celsi, M. W., Oritinau, D. J., & Bush, R. P. (2013). Essentials of marketing research (3rd ed.). McGraw Hill.
  • Hair, J. F., Jr, L.d.s. Gabriel, M., da Silva, D., & Braga Junior, S. (2019). Development and validation of attitudes measurement scales: Fundamental and practical aspects. RAUSP Management Journal, 54(4), 490–507. https://doi.org/10.1108/RAUSP-05-2019-0098
  • Hambleton, R. K., & Patsula, L. (1999). Increasing the validity of adapted tests: Myths to be avoided and guidelines for improving test adaptation practices. Journal of Applied Testing Technology, 1, 1–16.
  • Johnson, B., & Christensen, L. (2012). Educational research quantitative, qualitative, and mixed approaches (4th ed.). SAGE Publications Inc.
  • Khidhir, R. J., & Rassul, T. H. (2023). Assessing the validity of experts’ value judgment over research instruments. Zanco Journal of Human Sciences, 27(5), 324–343.
  • Kline, T. J. (2005). Psychological testing: A practical approach to design and evaluation. Sage Publications.
  • Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
  • Lindell, M. K., & Brandt, C. J. (1999). Assessing interrater agreement on the job relevance of a test: A comparison of CVI, T, rWG (J)}, and r* WG (J)} indexes. Journal of Applied Psychology, 84(4), 640–647. https://doi.org/10.1037/0021-9010.84.4.640
  • Lynn, M. R. (1986). Determination and quantification of content validity. Nursing Research, 35(6), 382–385. https://doi.org/10.1097/00006199-198611000-00017
  • Mackison, D., Wrieden, W. L., & Anderson, A. S. (2010). Validity and reliability testing of a short questionnaire developed to assess consumers’ use, understanding and perception of food labels. European Journal of Clinical Nutrition, 64(2), 210–217. https://doi.org/10.1038/ejcn.2009.126
  • Masuwai, A., Tajudin, N. M., & Saad, N. S. (2016). Evaluating the face and content validity of a Teaching and Learning Guiding Principles Instrument (TLGPI): A perspective study of Malaysian teacher educators. Geografia, 12(3), 11–21.
  • McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282. https://doi.org/10.11613/BM.2012.031
  • Miller, L. A., Lovler, R. L., & McIntire, S. A. (2013). Foundations of psychological testing: A practical approach (4th ed.). SAGE Publications Inc.
  • Oluwatayo, J. A. (2012). Validity and reliability issues in educational research. Journal of Educational and Social Research, 2(2), 391–400.
  • Plecki, M. L., Elfers, A. M., & Nakamura, Y. (2012). Using evidence for teacher education program improvement and accountability: An illustrative case of the role of value-added measures. Journal of Teacher Education, 63(5), 318–334. https://doi.org/10.1177/0022487112447110
  • Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what’s being reported? Critique and recommendations. Research in Nursing & Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147
  • Polit, D. F., Beck, C. T., & Owen, S. V. (2007). Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Research in Nursing & Health, 30(4), 459–467. https://doi.org/10.1002/nur.20199
  • Rodrigues, I. B., Adachi, J. D., Beattie, K. A., & MacDermid, J. C. (2017). Development and validation of a new tool to measure the facilitators, barriers and preferences to exercise in people with osteoporosis. BMC Musculoskeletal Disorders, 18(1), 540. https://doi.org/10.1186/s12891-017-1914-5
  • Sangoseni, O., Hellman, M., & Hill, C. (2013). Development and validation of a questionnaire to assess the effect of online learning on behaviors, attitudes, and clinical practices of physical therapists in the United States Regarding Evidence-based Clinical Practice. Internet Journal of Allied Health Sciences and Practice, 11(2), 1–13. https://doi.org/10.46743/1540-580X/2013.1439
  • Sekaran, U., & Bougie, R. (2011). Research methods for business: A skill building approach (5th ed.). John Wiley & Sons.
  • Strachota, E., Schmidt, S. W., & Conceição, S. C. (2006, October). The development and validation of a survey instrument for the evaluation of instructional aids. In Proceedings of the 2006 Midwest Research-to-Practice Conference in Adult, Continuing, Extension, and Community Education (pp. 205–210).
  • Tobón, S., Juárez-Hernández, L. G., Herrera-Meza, S. R., & Núñez, C. (2020). Assessing school principal leadership practices. Validity and reliability of a rubric. Educación XX1, 23(2), 187–210.
  • Vargas, D. D., & Luis, M. A. V. (2008). Development and validation of a scale of attitudes towards alcohol, alcoholism and alcoholics. Revista Latino-Americana de Enfermagem, 16(5), 895–902. https://doi.org/10.1590/s0104-11692008000500016
  • Wang, B., Rau, P. L. P., & Yuan, T. (2023). Measuring user competence in using artificial intelligence: Validity and reliability of artificial intelligence literacy scale. Behaviour & Information Technology, 42(9), 1324–1337. https://doi.org/10.1080/0144929X.2022.2072768
  • Yusop, S. R. M., Rasul, M. S., Yasin, R. M., & Hashim, H. U. (2023). Identifying and Validating Vocational Skills Domains and Indicators in Classroom Assessment Practices in TVET. Sustainability, 15(6), 5195. https://doi.org/10.3390/su15065195
  • Yu, S., Abbas, J., Draghici, A., Negulescu, O. H., & Ain, N. U. (2022). Social media application as a new paradigm for business communication: The role of COVID-19 knowledge, social distancing, and preventive attitudes. Frontiers in Psychology, 13, 903082. https://doi.org/10.3389/fpsyg.2022.903082
  • Zhang, Z., & Garcia, L. (2023). Examining Dimensionality and Validity of the Academic Integrity Survey Instrument. Journal of Education and Development, 7(1), 46. https://doi.org/10.20849/jed.v7i1.1326