Boateng et al. ( . Best practices for developing and validating scales for health, social, and behavioral research: a primer. , 149.
Willis and Lessler . Question appraisal system QAS-99. .
*Method details
Survey instruments or questionnaires are the most popular data collection tool because of its many advantages. Collecting data from a huge population in a limited time and at a lower cost, convenient to respondents, anonymity, lack of interviewer bias and standardization of questions are some of the benefits. However, an important disadvantage of a questionnaire is poor data quality due to incomplete and inaccurate questions, wording problems and poor development process. The problems are critical and can be avoided or mitigated [14] .
To ensure the quality of the instrument, using a previously validated questionnaire is useful. This will save time and resources in development process and testing its reliability and validity. However, there can be situations wherein a new questionnaire is needed [5] . Whenever a new scale or questionnaire needs to be developed, following a structured method will help us to develop a quality instrument. There are many approaches in scale development and all the methods include stages for testing reliability and validity among them.
Even though there are many literatures available on the reliability and validity procedures, many researches struggle to operationalize the process. Collingridge [8] wrote in the Methodspace blog of Sage publication that he repeatedly asked professors on how to validate the questions in a survey and unfortunately did not get an answer. Most of the time, researchers send the completely designed questionnaire with the actual measurement scale without providing adequate information for the reviewers to provide proper feedback. This paper is an effort to develop a document template that can capture the feedback of the expert reviewers of the instrument.
This paper is structured as follows: Section 1 provides the introduction to the need for a validation format for research, and the fundamentals of validation and the factors involved in validation from various literature studies are discussed in Section 2. Section 3 presents the methodology used in framing the validation format. Section 4 provides the results of the study. Section 5 presents explanation of how the format can be used and feedback be processed. Finally, Section 6 concludes the paper with a note on contribution.
A questionnaire is explained as “an instrument for the measurement of one or more constructs by means of aggregated item scores, called scales” [21] . A questionnaire can be identified on a continuum of unstructured to structure [14] . A structured questionnaire will “have a similar format, are usually statements, questions, or stimulus words with structured response categories, and require a judgment or description by a respondent or rater” [21] . Research in social science with a positivist paradigm began in the 19th century. The first use of a questionnaire is attributed to the Statistical Society of London as early as 1838. Berthold Sigismund proposed the first guidelines for questionnaire development in 1856, which provided a definite plan for the questionnaire method [13] . In 1941, The British Association for the Advancement of Science provided Acceptance of Quantitative Measures for Sensory Events [26] provided a much pervasive application or questionnaire in research, similar to Guttman scale [15] , Thurstone Scale [27] and Likert Scale [18] .
Carpenter [6] argued that scholars do not follow the best practices in the measurement building procedure. The author claims that “the defaults in the statistical programs, inadequate training and numerous evaluation points can lead to improper practices”. Many researchers have proposed techniques for scale development. We trace the prominent methods from the literature. Table 1 presents various frameworks in scale development.
Frameworks of Scale development.
Author & Framework | Steps | Remarks |
---|---|---|
Churchill Paradigm for Developing Better Measures of Marketing Constructs | 8-Step process. (1) specify domain of construct, (2) generate a sample of items, (3) collect data, (4) purify measure, (5) collect data, (6) assess reliability, (7) assess validity and (7) develop standards. | He recommended a multi-item measure to diminish the difficulties of a single-item measure. Experts are consulted during the item development stage. A focus group of 8 to 10 participants are triggered for an open discussion on the concept. When a researcher wants to include items, experienced researchers can attest identical statements. Every statement will be reviewed for the preciseness of words, double-barreled statements, positive and negative statements, socially acceptable responses and even to remove the item. |
Hinkin Three stages scale development | Following are the stages of the scale construction: (1) Item generation, (2) Scale development under which Design of developmental study, scale construction and reliability assessment are the steps, (3) Scale evaluation. | The study recommended the use of subject matter experts in developing the conceptual definition. |
Hinkin et al. Seven-step scale development procedure. | (1) Item Generation, (2) Content Adequacy Assessment, (3) Questionnaire Administration, (4) Factor Analysis, (5) Internal Consistency Assessment, (6) Construct Validation and (7) Replication. | The authors propose ‘content adequacy assessment’ as a necessary step in scale development. They are of the concern that this step is being overlooked and researchers land in trouble after collecting large datasets. The authors argue that there are several content assessment methods and recommend using experts in a content domain for the assessment. |
Rossiter C-OAR-SE scale development. | The steps of the framework are as follows: (1) Construct definition, (2), Object classification, (3) Attribute classification, (4) Rater identification, (5) Scale formation, and (6) Enumeration and reporting. | This framework has been exclusively proposed for scale development in marketing research where the construct is defined in terms of object, attribute and rater entity (OAR). The scale depends on only content validity than any other types of validity and places more emphasis on reasonable arguments and the agreement of experts. The author distinguishes content validity from face validity and argues that “content validity is conducted before the scale is developed, that the items will properly represent the construct”, whereas “face validity is a post hoc claim that the items in the scale measure the construct”. The author presented a prototype of an expert judge's rating form. |
DeVellis Eight-step scale construct method | (1) Determine clearly what it is you want to measure, (2) Generate the Item pool, (3) Determine the format for measurement, (4) Have the initial item pool reviewed by experts, (5) Consider the inclusion of Validation items, (6) Administer Items to a development sample, (7) Evaluate the items and (8) Optimize scale length. | The author proposes an exclusive step in which the generated items are validated by experts. The expert panel is required to evaluate how each item is relevant to measure the concept based on the working definition of the construct. The experts are also expected to assess the clarity and conciseness of the items. The experts can also indicate any missing phenomenon that the researcher failed to include. However, the final decision on considering the expert's comments is with the researcher. |
Carpenter 10 step scale development and Reproting | (1) Research the intended meaning and breadth of the theoretical concept, (2) Determine sampling procedure, (3) Examine data quality, (4) Verify the factorability of the data, (5) Conduct Common Factor Analysis, (6) Select factor extraction method, (7) Determine the number of factors, (8) Rotate factors, (9) Evaluate items based on a priori criteria and (10) Present results. | The author claims that “Interviews, focus groups, and expert feedback are critical in the item generation and dimension identification process” and recommends that “the pool of items needs to be concise, clear, distinct, and reflect the chosen conceptual definition”. |
Reeves and Marbach-Ad [22] argued that the quantitative aspect of social science research is different from science in terms of quantifying the phenomena using instruments. Bollen [4] explained that a social science instrument measures latent variables that are not directly observed, although inferred from observable behaviour. Because of this characteristic of social science measures, there is a need to ensure that what is being measured actually is measuring the intended phenomenon.
The concept of reliability and validity was evolved as early as 1896 by Pearson. The validity theory from 1900 to 1950 basically dealt with the alignment of test scores with other measures. This was operationally tested by correlation. The validity theory was refined during the 1950s to include criterion, content and construct validity. Correlation of the test measure to an accurate criterion score is the criterion validity. In 1955, criterion validity was proposed as concurrent validity and predictive validity. Content validity provides “domain relevance and representativeness of the test instrument”. The concept of construct validity was introduced in 1954 and got increased emphasis, and from 1985 it took a central form as the appropriate test for validity. The new millennium saw a change in the perspectives of validity theory. Contemporary validity theory is a metamorphosis of epistemological and methodological perspectives. Argument-based approach and consequences-based validity are some new concepts that are evolving [24] .
American Educational Research Association (AERA), American Psychological Association (APA) and National Council on Measurement in Education (NCME) jointly developed ‘Standards for educational and psychological testing’. It is described as “the degree to which evidence and theory support the interpretations of test scores for posed uses of tests” [1] .
Based on the ‘Standards’, the validity tests are classified on the type of evidence. Standards 1.11 to 1.25, describe various evidence to test the validity [1] . Table 2 presents different types of validities based on evidence and their explanation.
Types of validity.
Types of evidence | Explanation |
---|---|
Content-oriented evidence | “Validity evidence can be obtained from an analysis of the relationship between the content of a test and the construct it is intended to measure”. |
Evidence regarding cognitive processes | “Evidence concerning the fit between the construct and the derailed nature of the performance or response actually engaged in by test takers”. |
Evidence regarding internal structure | “Analyses of the internal structure of a test can indicate the degree to which the relationships among test items and test components conform to the construct on which the proposed test score interpretations are based”. |
Evidence concerning relationships with conceptually related constructs | “Evidence based on relationships with other variables provides evidence about the degree to which these relationships are consistent with the construct underlying the proposed test score interpretations”. This includes convergent and discriminant validity. |
Evidence regarding relationships with criteria | “Evidence of the relation of test scores to a relevant criterion”. This includes concurrent and predictive validity. |
Evidence based on consequences of tests | “The validation process involves gathering evidence to evaluate the soundness of these proposed interpretations for their intended use”. |
( Source: [1] )
Souza et al. [25] argued that “there is no statistical test to assess specifically the content validity; usually researchers use a qualitative approach, through the assessment of an experts committee, and then, a quantitative approach using the content validity index (CVI).”
Worthington and Whittaker [29] conducted a content analysis on new scales developed between 1995 and 2004. They specifically focused on the use of Exploratory and Confirmatory Factor Analysis (EFA & CFA) procedures in the validation of the scales. They argued that though the post-tests in the validation procedure, which are usually based on factor-analytic techniques, are more scientific and rigorous, the preliminary steps are necessary. Mistakes committed in the initial stages of scale development lead to problems in the later stages.
Messick [20] proposed six distinguishable notions of construct validity for educational and psychological measurements. Among the six, the foremost one is the content validity that looks at the relevance of the content, representativeness and technical quality. In a similar way Oosterveld et al. [21] developed taxonomy of questionnaire design directed towards psychometric aspects. The taxonomy introduces the following questionnaire design methods: (1) coherent, (2) prototypical, (3) internal, (4) external, (5) construct and (6) facet design technique. These methods are related “to six psychometric features guiding them face validity, process validity, homogeneity, criterion validity, construct validity and content validity”. The authors presented these methods under four stages: (1) concept review, (2) item generation, (3) scale development and (4) evaluation. After the definition of the construct in the first stage, the item pool is developed. The item production stage “comprises an item review by judges, e.g., experts, or potential respondents, and a pilot administration of the preliminary questionnaire, the results of which are subsequently used for refinement of the items”.
This paper mainly focuses on the expert validation done under the face validity and content validity stages. Martinez [19] provides a clear distinction between content validity and face validity. “Face validity requires an examination of a measure and the items of which it is composed as sufficient and suitable ‘on its face’ for capturing a concept. A measure with face validity will be visibly relevant to the concept it is intended to measure, and less so to other concepts”. Though face validity is the quick and excellent first step for assessing the appropriateness of measure to capture the concept, it is not sufficient. It needs to be interpreted along with other forms of measurement validity.
“Content validity focuses on the degree to which a measure captures the full dimension of a particular concept. A measure exhibiting high content validity is one that encompasses the full meaning of the concept it is intended to assess” [19] . An extensive review of literature and consultation with experts ensures the validity of the content.
From the review of various literature studies, we arrive at the details of validation that need to be done by experts. Domain or subject matter experts both from academic and industry, a person with expertise in the construct being developed, people familiar with the target population on whom the instrument will be used, users of the instrument, data analysts and those who take decisions based on the scores of the test are recommended as experts. Experts are consulted during the concept development stage and item generation stage. Experts provide feedback on the content, sensitivity and standard settings [10] .
During the concept development stage, experts provide inputs on the definition of the constructs, relating it to the domain and also check with the related concepts. At the item generation stage, experts validate the representativeness and significance of each item to the construct, accuracy of each item in measuring the concept, inclusion or deletion of elements, logical sequence of the items, and scoring models. Experts also validate how the instrument can measure the concept among different groups of respondents. An item is checked for its bias to specific groups such as gender, minority groups and linguistically different groups. Experts also provide standard scores or cutoff scores for decision making [10] .
The second set of reviewers who are experts in questionnaire development basically check the structural aspects of the instrument in terms of common errors such as double-barreled, confusing and leading questions. This also includes language experts, even if the questionnaire is developed in a popular language like English. Other language experts are required in case the instrument involves translation.
There were many attempts to standardize the validation of the questionnaire. Forsyth et al. [11] developed a Forms Appraisal model, which was an exhaustive list of problems that occur in a questionnaire item. This was found to be tiresome for experts. Fowler and Roman [12] developed an ‘Interviewer Rating Form’, which allowed experts to comment on three qualities: (1) trouble reading the question, (2) respondent not understanding the meaning or ideas in the question and (3) respondent having difficulty in providing an answer. The experts had to code as ‘ A ’ for ‘No evidence of a problem’, ‘ B ’ for ‘Possible problem’ and ‘ C ’ for ‘Definite Problem’. Willis and Lessler [28] developed a shorter version of the coding scheme for evaluation of questionnaire items called “Question appraisal system (QAS)”. This system evaluates each item on 26 problem areas under seven heads. The expert needs to just code ‘Yes’ or ‘No’ for each item. Akkerboom and Dehue [2] developed a systematic review of a questionnaire for an interview and self-completion questionnaire with 26 problems items categorized under eight problem areas.
Hinkin [16] recommended a "best practices" of “clearly cite the theoretical literature on which the new measures are based and describe the manner in which the items were developed and the sample used for item development”. The author claims that “in many articles, this information was lacking, and it was not clear whether there was little justification for the items chosen or if the methodology employed was simply not adequately presented”.
Further to the qualitative analysis of the items, recent developments include quantitative assessments of the items. “The content adequacy of a set of newly developed items is assessed by asking respondents to rate the extent to which items corresponded with construct definitions” [16] . Souza et al. [25] suggest using the Content Validity Index (CVI) for the quantitative approach. Experts evaluate every item on a four-point scale, in which “1 = non-equivalent item; 2 = the item needs to be extensively revised so equivalence can be assessed; 3 = equivalent item, needs minor adjustments; and 4 = totally equivalent item”. The number of items with a score of 3 or 4 and dividing it with the total number of answers is used to calculate an index of CVI. The CVI value is the percentage of judges who agree with an item, and the index value of at least 0.80 and higher than 0.90 is accepted.
The problems with conducting a face validity and content validity may be attributed to both scale developer and the reviewer. Scale developers do not convey their requirements to the experts properly, and experts are also not sure about what is expected by the researcher. Therefore, a format is developed, which will capture the requirements information for scale validation from both the researcher and the experts.
A covering letter is an important part when sending a questionnaire for review. It can help in persuading a reviewer to support the research. It should be short and simple. A covering letter first invites the experts for the review and provides esteem to the expert. Even if the questionnaire for review is handed over personally, having a covering letter will serve instructions for the review process and the expectations from the reviewer.
Boateng et al. [3] recommended that the researcher specifies the purpose of the construct or the questionnaire being developed, justifying the development of new instruments by confirming that there are no existing instruments are crucial. If there are any similar instruments, how different is the proposed one from the existing instruments.
The covering letter can mention the maximum time required for the review and any compensation that the expert will be awarded. This will motivate the reviewer to contribute their expertise and efforts. Instructions on how to complete the review process, what aspects to be checked, the coding systems and how to give the feedback are also provided in the covering letter. The covering letter ends with a thank you note in advance and personally signed by the instrument developer. Information on further contact details can also be provided at the end of the covering letter.
Boateng et al. [3] proposed that it is an essential step to articulate the domain(s) before any validation process. They recommend that “the domain being examined should be decided upon and defined before any item activity. A well-defined domain will provide a working knowledge of the phenomenon under study, specify the boundaries of the domain, and ease the process of item generation and content validation”.
In the introduction section, the research problem being addressed, existing theories, the proposed theory or model that will be investigated, list of variables/concepts that are to be measured can be elaborated. Guion [30] defended that for those who do not just accept the content validity by the evaluations of operational definition alone, five conditions will be a tentative answer: “(1) the content domain should be grounded in behavior with a commonly accepted meaning, (2) the content domain must be defined in a manner that is not open to more than one interpretation, (3) the content domain must be related to the purposes of measurement, (4) qualified judges must agree that the domain has been sufficiently sampled and (5) the response content must be dependably observed and evaluated.” Therefore, the information provided in the ‘Introduction’ section will be helpful to the expert to do a content validity at the first step.
After the need for the measure or the survey instrument is communicated, the domain is validated. The next step is to validate the items. Validation may be done for developing a scale for a single concept or as a questionnaire with multiple concepts of measure. For a multiple construct instrument, the validation is done construct-wise.
In an instrument with multiple constructs, the Introduction provides information at the theory level. The domain validation is done to assess the relevance of the theory to the problem. In the next section, the domain validation is done at variable level. Similar to the Introduction, details about the construct is provided. The definition of the construct, source of the definition, description of the concept, and the operational definition are shared to the experts. Experts will validate the construct by relating it to the relevant domain. If the conceptualization and definition are not properly done, it will result in poor evaluation of the items.
New items are developed by deductive method or deductive method. In deductive methods, items are generated from already existing scales and indicators through literature review. In inductive technique, the items are generated through direct observation, individual interviews, focus group discussion and exploratory research. It is necessary to convey how the item is generated to the expert reviewer. Even when the item or a scale is adopted unaltered; it becomes necessary to validate them to assess their relevance to a particular culture or a region. Even in such situations, it is necessary to inform the reviewer about the source of the items.
Experts review each item and the construct as a whole. For each item, item code, the item statement, measurement scale, the source of item and description of the item are provided. In informing the source of the item, there are three options. When the item is adopted as it is from the previous scales, the source can be provided. If the item is adapted by modifying the earlier item, the source and the original item can be informed along with description of modification done. If the item is developed by induction, the item source can be mentioned. First, experts evaluate each item to assess if they represent the domain of the construct and provide their evaluation and 4-point or 3-point scale. When multiple experts are used for the validation process, this score can also be used for quantitative evaluation. The quality parameters of the item are further evaluated. Researchers may choose the questionnaire appraisal scheme from many different systems available. An open remarks column is provided for experts to give any feedback that is not covered by the format. A comments section is provided at the end of the construct validation section where the experts can give the feedback such underrepresentation of the construct by the items.
The same way, the information regarding each of the demography items that will be required in the questionnaire is also included in the format. Finally, space for the expert to comment on the entire instrument is also provided. The template of the evaluation form is provided in the Appendix.
Since the feedback is a qualitative approach, mathematical or statistical approach is not required for inferring the review. Researcher can retain, remove or modify the statements of the questionnaire as indicated by the experts as essential, not essential and modify. As we have recommended using the quality parameters of QAS for describing the problems and issues, researcher will get a precise idea on what need to be corrected. Remarks by the experts will carry additional information in form of comments or suggestion that will be easy to follow when revising the items. General comments at the end of each scale or construct will provide suggestions on adding further items to the construct.
Despite the various frameworks available for the available to the researchers for developing the survey instrument, the quality of the same is not at the desirable level. Content validation of the measuring instrument is an essential requirement of every research. A rigorous process expert validation can avoid the problems at the latter stage. However, researchers are disadvantaged at operationalising the instrument review process. Researchers are challenged with communicating the background information and collecting the feedback. This paper is an attempt to design a standard format for the expert validation of the survey instrument. Through a literature review, the expectations from the expert review for validation are identified. The domain of the construct, relevance, accuracy, inclusion or deletion of items, sensitivity, bias, structural aspects such as language issues, double-barreled, negative, confusing and leading questions need to be validated by the experts. A format is designed with a covering page having an invitation to the experts, their role, introduction to the research and the instrument. Information regarding the scale and the list of the scale item are provided in the subsequent pages. The demography questions are also included for validation. The expert review format will provide standard communication and feedback between the researcher and the expert reviewer that can help in developing a rigorous and quality survey instruments.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
[OPTIONAL. This is where you can acknowledge colleagues who have helped you that are not listed as co-authors, and funding. MethodsX is a community effort, by researchers for researchers. We highly appreciate the work not only of authors submitting, but also of the reviewers who provide valuable input to each submission. We therefore publish a standard ``thank you'' note in each of the articles to acknowledge the efforts made by the respective reviewers.]
Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.mex.2021.101326 .
.pdf version of this page
Part I: The Instrument
Instrument is the general term that researchers use for a measurement device (survey, test, questionnaire, etc.). To help distinguish between instrument and instrumentation, consider that the instrument is the device and instrumentation is the course of action (the process of developing, testing, and using the device).
Instruments fall into two broad categories, researcher-completed and subject-completed, distinguished by those instruments that researchers administer versus those that are completed by participants. Researchers chose which type of instrument, or instruments, to use based on the research question. Examples are listed below:
Rating scales | Questionnaires |
Interview schedules/guides | Self-checklists |
Tally sheets | Attitude scales |
Flowcharts | Personality inventories |
Performance checklists | Achievement/aptitude tests |
Time-and-motion logs | Projective devices |
Observation forms | Sociometric devices |
Usability refers to the ease with which an instrument can be administered, interpreted by the participant, and scored/interpreted by the researcher. Example usability problems include:
Validity and reliability concerns (discussed below) will help alleviate usability issues. For now, we can identify five usability considerations:
It is best to use an existing instrument, one that has been developed and tested numerous times, such as can be found in the Mental Measurements Yearbook . We will turn to why next.
Part II: Validity
Validity is the extent to which an instrument measures what it is supposed to measure and performs as it is designed to perform. It is rare, if nearly impossible, that an instrument be 100% valid, so validity is generally measured in degrees. As a process, validation involves collecting and analyzing data to assess the accuracy of an instrument. There are numerous statistical tests and measures to assess the validity of quantitative instruments, which generally involves pilot testing. The remainder of this discussion focuses on external validity and content validity.
External validity is the extent to which the results of a study can be generalized from a sample to a population. Establishing eternal validity for an instrument, then, follows directly from sampling. Recall that a sample should be an accurate representation of a population, because the total population may not be available. An instrument that is externally valid helps obtain population generalizability, or the degree to which a sample represents the population.
Content validity refers to the appropriateness of the content of an instrument. In other words, do the measures (questions, observation logs, etc.) accurately assess what you want to know? This is particularly important with achievement tests. Consider that a test developer wants to maximize the validity of a unit test for 7th grade mathematics. This would involve taking representative questions from each of the sections of the unit and evaluating them against the desired outcomes.
Part III: Reliability
Reliability can be thought of as consistency. Does the instrument consistently measure what it is intended to measure? It is not possible to calculate reliability; however, there are four general estimators that you may encounter in reading research:
Relating Reliability and Validity
Reliability is directly related to the validity of the measure. There are several important principles. First, a test can be considered reliable, but not valid. Consider the SAT, used as a predictor of success in college. It is a reliable test (high scores relate to high GPA), though only a moderately valid indicator of success (due to the lack of structured environment – class attendance, parent-regulated study, and sleeping habits – each holistically related to success).
Second, validity is more important than reliability. Using the above example, college admissions may consider the SAT a reliable test, but not necessarily a valid measure of other quantities colleges seek, such as leadership capability, altruism, and civic involvement. The combination of these aspects, alongside the SAT, is a more valid measure of the applicant’s potential for graduation, later social involvement, and generosity (alumni giving) toward the alma mater.
Finally, the most useful instrument is both valid and reliable. Proponents of the SAT argue that it is both. It is a moderately reliable predictor of future success and a moderately valid measure of a student’s knowledge in Mathematics, Critical Reading, and Writing.
Part IV: Validity and Reliability in Qualitative Research
Thus far, we have discussed Instrumentation as related to mostly quantitative measurement. Establishing validity and reliability in qualitative research can be less precise, though participant/member checks, peer evaluation (another researcher checks the researcher’s inferences based on the instrument ( Denzin & Lincoln, 2005 ), and multiple methods (keyword: triangulation ), are convincingly used. Some qualitative researchers reject the concept of validity due to the constructivist viewpoint that reality is unique to the individual, and cannot be generalized. These researchers argue for a different standard for judging research quality. For a more complete discussion of trustworthiness, see Lincoln and Guba’s (1985) chapter .
Comments are closed.
Research Rundowns was made possible by support from the Dewar College of Education at Valdosta State University .
Blog at WordPress.com.
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
Methodology
Published on September 2, 2022 by Kassiani Nikolopoulou . Revised on June 22, 2023.
Criterion validity (or criterion-related validity ) evaluates how accurately a test measures the outcome it was designed to measure. An outcome can be a disease, behavior, or performance. Concurrent validity measures tests and criterion variables in the present, while predictive validity measures those in the future.
To establish criterion validity, you need to compare your test results to criterion variables . Criterion variables are often referred to as a “gold standard” measurement. They comprise other tests that are widely accepted as valid measures of a construct .
The researcher can then compare the college entry exam scores of 100 students to their GPA after one semester in college. If the scores of the two tests are close, then the college entry exam has criterion validity.
When your test agrees with the criterion variable, it has high criterion validity. However, criterion variables can be difficult to find.
What is criterion validity, types of criterion validity, criterion validity example, how to measure criterion validity, other interesting articles, frequently asked questions about criterion validity.
Criterion validity shows you how well a test correlates with an established standard of comparison called a criterion.
A measurement instrument, like a questionnaire , has criterion validity if its results converge with those of some other, accepted instrument, commonly called a “gold standard.”
A gold standard (or criterion variable) measures:
When a gold standard exists, evaluating criterion validity is a straightforward process. For example, you can compare a new questionnaire with an established one. In medical research, you can compare test scores with clinical assessments.
However, in many cases, there is no existing gold standard. If you want to measure pain, for example, there is no objective standard to do so. You must rely on what respondents tell you. In such cases, you can’t achieve criterion validity.
It’s important to keep in mind that criterion validity is only as good as the validity of the gold standard or reference measure. If the reference measure suffers from some sort of research bias , it can impact an otherwise valid measure. In other words, a valid measure tested against a biased gold standard may fail to achieve criterion validity.
Similarly, two biased measures will confirm one another. Thus, criterion validity is no guarantee that a measure is in fact valid. It’s best used in tandem with the other types of validity .
There are two types of criterion validity. Which type you use depends on the time at which the two measures (the criterion and your test) are obtained.
Concurrent validity is demonstrated when a new test correlates with another test that is already considered valid, called the criterion test. A high correlation between the new test and the criterion indicates concurrent validity.
Establishing concurrent validity is particularly important when a new measure is created that claims to be better in some way than its predecessors: more objective, faster, cheaper, etc.
Remember that this form of validity can only be used if another criterion or validated instrument already exists.
Predictive validity is demonstrated when a test can predict future performance. In other words, the test must correlate with a variable that can only be assessed at some point in the future, after the test has been administered.
For predictive criterion validity, researchers often examine how the results of a test predict a relevant future outcome. For example, the results of an IQ test can be used to predict future educational achievement. The outcome is, by design, assessed at some point in the future.
A high correlation provides evidence of predictive validity. It indicates that a test can correctly predict something that you hypothesize it should.
Criterion validity is often used when a researcher wishes to replace an established test with a different version of the same test, particularly one that is more objective, shorter, or cheaper.
Although the original test is widely accepted as a valid measure of procrastination, it is very long and takes a lot of time to complete. As a result, many students fill it in without carefully considering their answers.
Criterion validity is assessed in two ways:
The measure to be validated, such as a test, should be correlated with a measure considered to be a well-established indication of the construct under study. This is your criterion variable.
Correlations between the scores on the test and the criterion variable are calculated using a correlation coefficient , such as Pearson’s r . A correlation coefficient expresses the strength of the relationship between two variables in a single value between −1 and +1.
Correlation coefficient values can be interpreted as follows:
You can automatically calculate Pearson’s r in Excel , R , SPSS or other statistical software.
Positive correlation between a test and the criterion variable shows that the test is valid. No correlation or a negative correlation indicates that the test and criterion variable do not measure the same concept.
You give the two scales to the same sample of respondents. The extent of agreement between the results of the two scales is expressed through a correlation coefficient.
You calculate the correlation coefficient between the results of the two tests and find out that your scale correlates with the existing scale ( r = 0.80). This value shows that there is a strong positive correlation between the two scales.
Discover proofreading & editing
If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.
Research bias
Criterion validity and construct validity are both types of measurement validity . In other words, they both show you how accurately a method measures something.
While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.
Construct validity is often considered the overarching type of measurement validity . You need to have face validity , content validity , and criterion validity in order to achieve construct validity.
When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.
Construct validity is often considered the overarching type of measurement validity , because it covers all of the other types. You need to have face validity , content validity , and criterion validity to achieve construct validity.
Reliability and validity are both about how well a method measures something:
If you are doing experimental research, you also have to consider the internal and external validity of your experiment.
Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.
Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Nikolopoulou, K. (2023, June 22). What Is Criterion Validity? | Definition & Examples. Scribbr. Retrieved August 29, 2024, from https://www.scribbr.com/methodology/criterion-validity/
Other students also liked, what is convergent validity | definition & examples, what is content validity | definition & examples, construct validity | definition, types, & examples, "i thought ai proofreading was useless but..".
I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”
Version | Summary | Created by | Modification | Content Size | Created at | Operation |
---|---|---|---|---|---|---|
1 | + 1483 word(s) | 1483 | 2022-03-10 03:27:19 | | | ||
2 | formating | -48 word(s) | 1435 | 2022-03-14 03:32:50 | | | | |
3 | reference | Meta information modification | 1435 | 2022-03-15 07:42:16 | | |
Questionnaire is one of the most widely used tools to collect data in especially social science research. The main objective of questionnaire in research is to obtain relevant information in most reliable and valid manner. Thus the accuracy and consistency of survey/questionnaire forms a significant aspect of research methodology which are known as validity and reliability. Often new researchers are confused with selection and conducting of proper validity type to test their research instrument (questionnaire/survey).
Validity explains how well the collected data covers the actual area of investigation [ 1 ] . Validity basically means “measure what is intended to be measured” [ 2 ] .
Face validity is a subjective judgment on the operationalization of a construct. Face validity is the degree to which a measure appears to be related to a specific construct, in the judgment of non-experts such as test takers and representatives of the legal system. That is, a test has face validity if its content simply looks relevant to the person taking the test. It evaluates the appearance of the questionnaire in terms of feasibility, readability, consistency of style and formatting, and the clarity of the language used.
In other words, face validity refers to researchers’ subjective assessments of the presentation and relevance of the measuring instrument as to whether the items in the instrument appear to be relevant, reasonable, unambiguous and clear [ 3 ] .
In order to examine the face validity, the dichotomous scale can be used with categorical option of “Yes” and “No” which indicate a favourable and unfavourable item respectively. Where favourable item means that the item is objectively structured and can be positively classified under the thematic category. Then the collected data is analysed using Cohen’s Kappa Index (CKI) in determining the face validity of the instrument. DM. et al. [ 4 ] recommended a minimally acceptable Kappa of 0.60 for inter-rater agreement. Unfortunately, face validity is arguably the weakest form of validity and many would suggest that it is not a form of validity in the strictest sense of the word.
Content validity is defined as “the degree to which items in an instrument reflect the content universe to which the instrument will be generalized” (Straub, Boudreau et al. [ 5 ] ). In the field of IS, it is highly recommended to apply content validity while the new instrument is developed. In general, content validity involves evaluation of a new survey instrument in order to ensure that it includes all the items that are essential and eliminates undesirable items to a particular construct domain [ 6 ] ). The judgemental approach to establish content validity involves literature reviews and then follow-ups with the evaluation by expert judges or panels. The procedure of judgemental approach of content validity requires researchers to be present with experts in order to facilitate validation. However it is not always possible to have many experts of a particular research topic at one location. It poses a limitation to conduct validity on a survey instrument when experts are located in different geographical areas (Choudrie and Dwivedi [ 7 ] ). Contrastingly, a quantitative approach may allow researchers to send content validity questionnaires to experts working at different locations, whereby distance is not a limitation. In order to apply content validity following steps are followed:
1.An exhaustive literature reviews to extract the related items.
2.A content validity survey is generated (each item is assessed using three point scale (not necessary, useful but not essential and essential).
3.The survey should sent to the experts in the same field of the research.
4.The content validity ratio (CVR) is then calculated for each item by employing Lawshe [ 8 ] (1975) ‘s method.
5.Items that are not significant at the critical level are eliminated. In following the critical level of Lawshe method is explained.
If a relationship is causal, what are the particular cause and effect behaviours or constructs involved in the relationship? Construct validity refers to how well you translated or transformed a concept, idea, or behaviour that is a construct into a functioning and operating reality, the operationalization. Construct validity has two components: convergent and discriminant validity.
Discriminant validity is the extent to which latent variable A discriminates from other latent variables (e.g., B, C, D). Discriminant validity means that a latent variable is able to account for more variance in the observed variables associated with it than a) measurement error or similar external, unmeasured influences; or b) other constructs within the conceptual framework. If this is not the case, then the validity of the individual indicators and of the construct is questionable (Fornell and Larcker [ 9 ] ). In brief, Discriminant validity (or divergent validity) tests that constructs that should have no relationship do, in fact, not have any relationship.
Convergent validity, a parameter often used in sociology, psychology, and other behavioural sciences, refers to the degree to which two measures of constructs that theoretically should be related, are in fact related. In brief, Convergent validity tests that constructs that are expected to be related are, in fact, related.
With the purpose of verifying the construct validity (discriminant and convergent validity), a factor analysis can be conducted utilizing principal component analysis (PCA) with varimax rotation method (Koh and Nam [ 9 ] , Wee and Quazi, [ 10 ] ). Items loaded above 0.40, which is the minimum recommended value in research are considered for further analysis. Also, items cross loading above 0.40 should be deleted. Therefore, the factor analysis results will satisfy the criteria of construct validity including both the discriminant validity (loading of at least 0.40, no cross-loading of items above 0.40) and convergent validity (eigenvalues of 1, loading of at least 0.40, items that load on posited constructs) (Straub et al., [ 11 ] ). There are also other methods to test the convergent and discriminant validity.
Criterion or concrete validity is the extent to which a measure is related to an outcome. It measures how well one measure predicts an outcome for another measure. A test has this type of validity if it is useful for predicting performance or behavior in another situation (past, present, or future).
Criterion validity is an alternative perspective that de-emphasizes the conceptual meaning or interpretation of test scores. Test users might simply wish to use a test to differentiate between groups of people or to make predictions about future outcomes. For example, a human resources director might need to use a test to help predict which applicants are most likely to perform well as employees. From a very practical standpoint, she focuses on the test’s ability to differentiate good employees from poor employees. If the test does this well, then the test is “valid” enough for her purposes. From the traditional three-faceted view of validity, criterion validity refers to the degree to which test scores can predict specific criterion variables. The key to validity is the empirical association between test scores and scores on the relevant criterion variable, such as “job performance.”
Messick [ 12 ] suggests that “even for purposes of applied decision making, reliance on criterion validity or content coverage is not enough. The meaning of the measure, and hence its construct validity, must always be pursued – not only to support test interpretation but also to justify test use”. There are two types of criterion validity namely; concurrent validity, predictive and postdictive validity.
Reliability concerns the extent to which a measurement of a phenomenon provides stable and consist result (Carmines and Zeller [ 13 ] ). Reliability is also concerned with repeatability. For example, a scale or test is said to be reliable if repeat measurement made by it under constant conditions will give the same result (Moser and Kalton [ 14 ] ).
Testing for reliability is important as it refers to the consistency across the parts of a measuring instrument (Huck [ 15 ] ). A scale is said to have high internal consistency reliability if the items of a scale “hang together” and measure the same construct (Huck [ 16 ] Robinson [ 17 ] ). The most commonly used internal consistency measure is the Cronbach Alpha coefficient. It is viewed as the most appropriate measure of reliability when making use of Likert scales (Whitley [ 18 ] , Robinson [ 19 ] ). No absolute rules exist for internal consistencies, however most agree on a minimum internal consistency coefficient of .70 (Whitley [ 20 ] , Robinson [ 21 ] ).
For an exploratory or pilot study, it is suggested that reliability should be equal to or above 0.60 (Straub et al. [ 22 ] ). Hinton et al. [ 23 ] have suggested four cut-off points for reliability, which includes excellent reliability (0.90 and above), high reliability (0.70-0.90), moderate reliability (0.50-0.70) and low reliability (0.50 and below)(Hinton et al., [ 24 ] ). Although reliability is important for study, it is not sufficient unless combined with validity. In other words, for a test to be reliable, it also needs to be valid [ 25 ] .
You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.
All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.
Original Submission Date Received: .
Find support for a specific problem in the support section of our website.
Please let us know what you think of our products and services.
Visit our dedicated information section to learn more about MDPI.
Relationship between playing musical instruments and subjective well-being: enjoyment of playing instruments scale.
2.1. participants, 2.2. measures, 2.2.1. preliminary enjoyment of playing instruments items, 2.2.2. music use questionnaire, 2.2.3. music receptivity scale, 2.2.4. use of music inventory, 2.2.5. satisfaction with life scale and the emotional frequency test, 2.2.6. subjective happiness scale (shs), 2.3. procedure, 2.4. statistical analysis, 3.1. exploratory factor analysis of the epis, 3.2. confirmatory factor analysis of the epis, 3.3. criterion-related validity of the epis, 3.4. relationship between the epis and subjective well-being, 4. discussion, 5. limitations of the study, 6. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest, abbreviations.
EPIS | Enjoyment of Playing Instruments Scale |
UMI | Use of Music Inventory |
MUSE | Music Use Questionnaire |
MRS | Music Receptivity Scale |
SWLS | The Satisfaction with Life Scale |
EFT | Emotional Frequency Test |
SHS | Subjective Happiness Scale |
CVI | Content Validity Index |
IRB | Institutional Review Board |
SPSS | Statistical Package for Social Sciences |
AMOS | Analysis of Moment Structure |
EFA | Exploratory Factor Analysis |
CFA | Confirmatory Factor Analyses |
TLI | Tucker−Lewis Index |
CFI | Comparative Fit Index |
RMSEA | Root Mean Square Error of Approximation |
SRMR | Standardized Root Mean Square Residual |
CR | Composite reliability |
AVE | Average Variance Extracted |
ES | Enjoyment of Singing |
SRWs | Standardized Regression Weights |
df | Degree of Freedom |
CI | Confidence Interval |
Click here to enlarge figure
NO | Items | Factor Loadings | h | ||
---|---|---|---|---|---|
1 | 2 | 3 | |||
1. | I like learning to play challenging music. | 0.553 | 0.401 | ||
2. | Learning to play musical instruments together with others can make me feel close to them. | 0.407 | 0.371 | ||
3. | I like playing musical instruments as a group with other people. | 0.822 | 0.632 | ||
4. | Playing musical instruments together improves family ties. | 0.450 | 0.311 | ||
5. | It is fun to play musical instruments with other people. | 0.822 | 0.620 | ||
6. | Playing musical instruments with others makes us more united. | 0.516 | 0.473 | ||
7. | I am looking forward to practicing playing musical instruments with other people. | 0.820 | 0.653 | ||
8. | I feel good when I realize that my musical instrument playing skills have improved. | 0.599 | 0.445 | ||
9. | I feel good when I play an instrument in front of people and receive applause. | 0.696 | 0.451 | ||
10. | I feel a sense of accomplishment when I play an instrument. | 0.419 | 0.425 | ||
11. | My sense of accomplishment is great when I play an instrument well in front of others. | 0.656 | 0.454 | ||
12. | I feel a sense of accomplishment when I master playing a piece of music. | 0.601 | 0.420 | ||
13. | Tackling difficult music and playing it gives me a great sense of achievement. | 0.720 | 0.518 | ||
14. | Playing an instrument makes me feel different. | 0.600 | 0.405 | ||
15. | When I play an instrument, the thoughts that bothered me disappear. | 0.435 | 0.383 | ||
16. | Playing musical instruments can make my partner admire me even more. | 0.585 | 0.410 | ||
Eigenvalues | 6.23 | 1.58 | 1.17 | ||
% Variance | 38.92 | 9.85 | 7.31 | 56.08 |
Scale | Learning/Social Bonds | Achievement/Pride | Cognitive Refreshment and Stimulation | EPIS |
---|---|---|---|---|
Cognitive and emotional regulation | 0.69 *** | 0.61 ** | 0.67 *** | 0.71 *** |
Engaged production | 0.76 *** | 0.68 *** | 0.70 *** | 0.78 *** |
Social connection | 0.71 *** | 0.63 *** | 0.63 *** | 0.72 *** |
Physical exercise | 0.68 *** | 0.65 *** | 0.62 *** | 0.71 *** |
Dance | 0.57 *** | 0.44 *** | 0.53 *** | 0.55 *** |
MUSE | 0.80 *** | 0.71 *** | 0.74 *** | 0.81 *** |
Affect | 0.66 *** | 0.70 *** | 0.64 *** | 0.73 *** |
Attention | −0.15 * | −0.19 ** | −0.17 ** | −0.18 ** |
MRS | 0.47 *** | 0.48 *** | 0.44 *** | 0.50 *** |
Emotional use of music | 0.66 *** | 0.65 *** | 0.65 *** | 0.71 *** |
Rational/Cognitive use of music | 0.50 *** | 0.29 *** | 0.51 *** | 0.46 *** |
Background use of music | 0.57 *** | 0.45 *** | 0.56 *** | 0.57 *** |
UMI | 0.64 *** | 0.49 *** | 0.64 *** | 0.63 *** |
Skewness | −0.89 | −1.35 | −0.78 | −1.19 |
Kurtosis | 0.61 | 1.53 | 0.28 | 1.34 |
Variables | Learning/Social Bonds | Achievement/Pride | Cognitive Refreshment/Stimulation | EPIS |
---|---|---|---|---|
Life satisfaction | 0.57 *** | 0.52 *** | 0.55 *** | 0.59 *** |
Positive emotions | 0.54 *** | 0.52 *** | 0.53 *** | 0.57 *** |
Negative emotions | 0.25 *** | 0.19 *** | 0.29 *** | 0.25 *** |
Subjective well-being | 0.47 *** | 0.46 *** | 0.43 *** | 0.49 *** |
Subjective happiness | 0.38 *** | 0.39 *** | 0.34 *** | 0.40 *** |
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
Zhang, Q.; Park, A.; Suh, K.-H. Relationship between Playing Musical Instruments and Subjective Well-Being: Enjoyment of Playing Instruments Scale. Behav. Sci. 2024 , 14 , 744. https://doi.org/10.3390/bs14090744
Zhang Q, Park A, Suh K-H. Relationship between Playing Musical Instruments and Subjective Well-Being: Enjoyment of Playing Instruments Scale. Behavioral Sciences . 2024; 14(9):744. https://doi.org/10.3390/bs14090744
Zhang, Qian, Alexander Park, and Kyung-Hyun Suh. 2024. "Relationship between Playing Musical Instruments and Subjective Well-Being: Enjoyment of Playing Instruments Scale" Behavioral Sciences 14, no. 9: 744. https://doi.org/10.3390/bs14090744
Article access statistics, further information, mdpi initiatives, follow mdpi.
Subscribe to receive issue release notifications and newsletters from MDPI journals
IMAGES
VIDEO
COMMENTS
The 4 Types of Validity in Research | Definitions & Examples. Published on September 6, 2019 by Fiona Middleton.Revised on June 22, 2023. Validity tells you how accurately a method measures something. If a method measures what it claims to measure, and the results closely correspond to real-world values, then it can be considered valid.
The validity of the instruments was evaluated through various methods, including the examination of reports, literature reviews, consultations with the research tutor and co-tutors, and engagement ...
Validity refers to the degree to which an instrument accurately measures what it intends to measure. Three common types of validity for researchers and evaluators to consider are content, construct, and criterion validities. Content validity indicates the extent to which items adequately measure or represent the content of the property or trait ...
Ensuring validity in research involves several strategies: Clear Operational Definitions: Define variables clearly and precisely. Use of Reliable Instruments: Employ measurement tools that have been tested for reliability. Pilot Testing: Conduct preliminary studies to refine the research design and instruments.
Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.opt. It's important to consider reliability and validity when you are creating your research design, planning your methods, and writing up your results, especially in quantitative research. Failing to do so can lead to several types of research ...
What is validity? 1. Validity in research refers to how accurately a study answers the study question or the strength of the study conclusions. For outcome measures such as surveys or tests, validity refers to the accuracy of measurement. Here validity refers to how well the assessment tool actually measures the underlying outcome of interest.
An essential part of validity is choosing the right research instrument or method for accurate results. Consider the thermometer that is reliable but still produces inaccurate results. You're unlikely to achieve research validity without activities like calibration, content, and construct validity.
Validity. Validity is defined as the extent to which a concept is accurately measured in a quantitative study. For example, a survey designed to explore depression but which actually measures anxiety would not be considered valid. The second measure of quality in a quantitative study is reliability, or the accuracy of an instrument.In other words, the extent to which a research instrument ...
In simple terms, validity (also called "construct validity") is all about whether a research instrument accurately measures what it's supposed to measure. For example, let's say you have a set of Likert scales that are supposed to quantify someone's level of overall job satisfaction. If this set of scales focused purely on only one ...
In this vein, there are many different types of validity and ways of thinking about it. Let's take a look at several of the more common types. Each kind is a line of evidence that can help support or refute a test's overall validity. In this post, learn about face, content, criterion, discriminant, concurrent, predictive, and construct ...
17.4.1 Validity of instruments. 17.4.1. Validity of instruments. Validity has to do with whether the instrument is measuring what it is intended to measure. Empirical evidence that PROs measure the domains of interest allows strong inferences regarding validity. To provide such evidence, investigators have borrowed validation strategies from ...
Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to. Validity is a judgment based on various types of evidence.
Often new researchers are confused with selection and conducting of proper validity type to test their research instrument (questionnaire/survey). This review article explores and describes the validity and reliability of a questionnaire/survey and also discusses various forms of validity and reliability tests.
The application of the Rasch model in validity and reliability research instruments is valuable because the model able to define the constructs of valid items and provide a clear definition of the measurable constructs that are consistent with theoretical expectations. Interestingly, this model can be effectively used on items that can be ...
Instrument Validity. Validity (a concept map shows the various types of validity) A instrument is valid only to the extent that it's scores permits appropriate inferences to be made about. 1) a specific group of people for. 2) specific purposes. An instrument that is a valid measure of third grader's math skills probably is not a valid ...
Content validity evaluates how well an instrument (like a test) covers all relevant parts of the construct it aims to measure. Here, a construct is a theoretical concept, ... Example: Content validity in psychology. Psychological research often involves developing screening tools to identify metrics for clinical diagnoses. Let's say you are ...
Learn how to design and measure quantitative research with excellence and validity from this comprehensive article.
Validity in research refers to the extent to which a study accurately measures what it intends to measure. It ensures that the results are truly representative of the phenomena under investigation. ... Validity indicates how well an instrument measures what it's intended to measure, ensuring accuracy and relevance. While a test can be ...
Validation of a survey instrument is an important activity in the research process. Face validity and content validity, though being qualitative methods, are essential steps in validating how far the survey instrument can measure what it is intended for. These techniques are used in both scale development processes and a questionnaire that may ...
Validity basically means "measure what is intended to be measured" (Field, 2005). In this paper, main types of validity namely; face validity, content validity, construct validity, criterion validity and reliability are discussed. Figure 1 shows the subtypes of various forms of validity tests exploring and describing in this article.
Validity is the extent to which an instrument measures what it is supposed to measure and performs as it is designed to perform. It is rare, if nearly impossible, that an instrument be 100% valid, so validity is generally measured in degrees. As a process, validation involves collecting and analyzing data to assess the accuracy of an instrument.
Revised on June 22, 2023. Criterion validity (or criterion-related validity) evaluates how accurately a test measures the outcome it was designed to measure. An outcome can be a disease, behavior, or performance. Concurrent validity measures tests and criterion variables in the present, while predictive validity measures those in the future.
There are two types of criterion validity namely; concurrent validity, predictive and postdictive validity. 6. Reliability. Reliability concerns the extent to which a measurement of a phenomenon provides stable and consist result (Carmines and Zeller [ 13] ). Reliability is also concerned with repeatability.
This study highlights the usefulness of the Enjoyment of Playing Instruments Scale (EPIS) as a measure for research, educational, and clinical use, providing a rationale for using instrument playing as a therapeutic approach to promote subjective well-being. ... validity of the scale and the relationships between the variables were analyzed ...