validity in research instrument

Home » Validity – Types, Examples and Guide

Validity – Types, Examples and Guide

Table of Contents

Validity is a fundamental concept in research, referring to the extent to which a test, measurement, or study accurately reflects or assesses the specific concept that the researcher is attempting to measure. Ensuring validity is crucial as it determines the trustworthiness and credibility of the research findings.

Research Validity

Research validity pertains to the accuracy and truthfulness of the research. It examines whether the research truly measures what it claims to measure. Without validity, research results can be misleading or erroneous, leading to incorrect conclusions and potentially flawed applications.

How to Ensure Validity in Research

Ensuring validity in research involves several strategies:

Clear Operational Definitions : Define variables clearly and precisely.
Use of Reliable Instruments : Employ measurement tools that have been tested for reliability.
Pilot Testing : Conduct preliminary studies to refine the research design and instruments.
Triangulation : Use multiple methods or sources to cross-verify results.
Control Variables : Control extraneous variables that might influence the outcomes.

Types of Validity

Validity is categorized into several types, each addressing different aspects of measurement accuracy.

Internal Validity

Internal validity refers to the degree to which the results of a study can be attributed to the treatments or interventions rather than other factors. It is about ensuring that the study is free from confounding variables that could affect the outcome.

External Validity

External validity concerns the extent to which the research findings can be generalized to other settings, populations, or times. High external validity means the results are applicable beyond the specific context of the study.

Construct Validity

Construct validity evaluates whether a test or instrument measures the theoretical construct it is intended to measure. It involves ensuring that the test is truly assessing the concept it claims to represent.

Content Validity

Content validity examines whether a test covers the entire range of the concept being measured. It ensures that the test items represent all facets of the concept.

Criterion Validity

Criterion validity assesses how well one measure predicts an outcome based on another measure. It is divided into two types:

Predictive Validity : How well a test predicts future performance.
Concurrent Validity : How well a test correlates with a currently existing measure.

Face Validity

Face validity refers to the extent to which a test appears to measure what it is supposed to measure, based on superficial inspection. While it is the least scientific measure of validity, it is important for ensuring that stakeholders believe in the test’s relevance.

Importance of Validity

Validity is crucial because it directly affects the credibility of research findings. Valid results ensure that conclusions drawn from research are accurate and can be trusted. This, in turn, influences the decisions and policies based on the research.

Examples of Validity

Internal Validity : A randomized controlled trial (RCT) where the random assignment of participants helps eliminate biases.
External Validity : A study on educational interventions that can be applied to different schools across various regions.
Construct Validity : A psychological test that accurately measures depression levels.
Content Validity : An exam that covers all topics taught in a course.
Criterion Validity : A job performance test that predicts future job success.

Where to Write About Validity in A Thesis

In a thesis, the methodology section should include discussions about validity. Here, you explain how you ensured the validity of your research instruments and design. Additionally, you may discuss validity in the results section, interpreting how the validity of your measurements affects your findings.

Applications of Validity

Validity has wide applications across various fields:

Education : Ensuring assessments accurately measure student learning.
Psychology : Developing tests that correctly diagnose mental health conditions.
Market Research : Creating surveys that accurately capture consumer preferences.

Limitations of Validity

While ensuring validity is essential, it has its limitations:

Complexity : Achieving high validity can be complex and resource-intensive.
Context-Specific : Some validity types may not be universally applicable across all contexts.
Subjectivity : Certain types of validity, like face validity, involve subjective judgments.

By understanding and addressing these aspects of validity, researchers can enhance the quality and impact of their studies, leading to more reliable and actionable results.

About the author

Muhammad Hassan

Researcher, Academic Writer, Web developer

External Validity – Threats, Examples and Types

Construct Validity – Types, Threats and Examples

Reliability Vs Validity

Internal Consistency Reliability – Methods...

Alternate Forms Reliability – Methods, Examples...

Parallel Forms Reliability – Methods, Example...

Validity in research: a guide to measuring the right things

Last updated

27 February 2023

Reviewed by

Cathy Heath

Short on time? Get an AI generated summary of this article instead

Validity is necessary for all types of studies ranging from market validation of a business or product idea to the effectiveness of medical trials and procedures. So, how can you determine whether your research is valid? This guide can help you understand what validity is, the types of validity in research, and the factors that affect research validity.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

What is validity?

In the most basic sense, validity is the quality of being based on truth or reason. Valid research strives to eliminate the effects of unrelated information and the circumstances under which evidence is collected.

Validity in research is the ability to conduct an accurate study with the right tools and conditions to yield acceptable and reliable data that can be reproduced. Researchers rely on carefully calibrated tools for precise measurements. However, collecting accurate information can be more of a challenge.

Studies must be conducted in environments that don't sway the results to achieve and maintain validity. They can be compromised by asking the wrong questions or relying on limited data.

Why is validity important in research?

Research is used to improve life for humans. Every product and discovery, from innovative medical breakthroughs to advanced new products, depends on accurate research to be dependable. Without it, the results couldn't be trusted, and products would likely fail. Businesses would lose money, and patients couldn't rely on medical treatments.

While wasting money on a lousy product is a concern, lack of validity paints a much grimmer picture in the medical field or producing automobiles and airplanes, for example. Whether you're launching an exciting new product or conducting scientific research, validity can determine success and failure.

What is reliability?

Reliability is the ability of a method to yield consistency. If the same result can be consistently achieved by using the same method to measure something, the measurement method is said to be reliable. For example, a thermometer that shows the same temperatures each time in a controlled environment is reliable.

While high reliability is a part of measuring validity, it's only part of the puzzle. If the reliable thermometer hasn't been properly calibrated and reliably measures temperatures two degrees too high, it doesn't provide a valid (accurate) measure of temperature.

Similarly, if a researcher uses a thermometer to measure weight, the results won't be accurate because it's the wrong tool for the job.

How are reliability and validity assessed?

While measuring reliability is a part of measuring validity, there are distinct ways to assess both measurements for accuracy.

How is reliability measured?

These measures of consistency and stability help assess reliability, including:

Consistency and stability of the same measure when repeated multiple times and conditions

Consistency and stability of the measure across different test subjects

Consistency and stability of results from different parts of a test designed to measure the same thing

How is validity measured?

Since validity refers to how accurately a method measures what it is intended to measure, it can be difficult to assess the accuracy. Validity can be estimated by comparing research results to other relevant data or theories.

The adherence of a measure to existing knowledge of how the concept is measured

The ability to cover all aspects of the concept being measured

The relation of the result in comparison with other valid measures of the same concept

What are the types of validity in a research design?

Research validity is broadly gathered into two groups: internal and external. Yet, this grouping doesn't clearly define the different types of validity. Research validity can be divided into seven distinct groups.

Face validity : A test that appears valid simply because of the appropriateness or relativity of the testing method, included information, or tools used.

Content validity : The determination that the measure used in research covers the full domain of the content.

Construct validity : The assessment of the suitability of the measurement tool to measure the activity being studied.

Internal validity : The assessment of how your research environment affects measurement results. This is where other factors can’t explain the extent of an observed cause-and-effect response.

External validity : The extent to which the study will be accurate beyond the sample and the level to which it can be generalized in other settings, populations, and measures.

Statistical conclusion validity: The determination of whether a relationship exists between procedures and outcomes (appropriate sampling and measuring procedures along with appropriate statistical tests).

Criterion-related validity : A measurement of the quality of your testing methods against a criterion measure (like a “gold standard” test) that is measured at the same time.

Examples of validity

Like different types of research and the various ways to measure validity, examples of validity can vary widely. These include:

A questionnaire may be considered valid because each question addresses specific and relevant aspects of the study subject.

In a brand assessment study, researchers can use comparison testing to verify the results of an initial study. For example, the results from a focus group response about brand perception are considered more valid when the results match that of a questionnaire answered by current and potential customers.

A test to measure a class of students' understanding of the English language contains reading, writing, listening, and speaking components to cover the full scope of how language is used.

Factors that affect research validity

Certain factors can affect research validity in both positive and negative ways. By understanding the factors that improve validity and those that threaten it, you can enhance the validity of your study. These include:

Random selection of participants vs. the selection of participants that are representative of your study criteria

Blinding with interventions the participants are unaware of (like the use of placebos)

Manipulating the experiment by inserting a variable that will change the results

Randomly assigning participants to treatment and control groups to avoid bias

Following specific procedures during the study to avoid unintended effects

Conducting a study in the field instead of a laboratory for more accurate results

Replicating the study with different factors or settings to compare results

Using statistical methods to adjust for inconclusive data

What are the common validity threats in research, and how can their effects be minimized or nullified?

Research validity can be difficult to achieve because of internal and external threats that produce inaccurate results. These factors can jeopardize validity.

History: Events that occur between an early and later measurement

Maturation: The passage of time in a study can include data on actions that would have naturally occurred outside of the settings of the study

Repeated testing: The outcome of repeated tests can change the outcome of followed tests

Selection of subjects: Unconscious bias which can result in the selection of uniform comparison groups

Statistical regression: Choosing subjects based on extremes doesn't yield an accurate outcome for the majority of individuals

Attrition: When the sample group is diminished significantly during the course of the study

Maturation: When subjects mature during the study, and natural maturation is awarded to the effects of the study

While some validity threats can be minimized or wholly nullified, removing all threats from a study is impossible. For example, random selection can remove unconscious bias and statistical regression.

Researchers can even hope to avoid attrition by using smaller study groups. Yet, smaller study groups could potentially affect the research in other ways. The best practice for researchers to prevent validity threats is through careful environmental planning and t reliable data-gathering methods.

How to ensure validity in your research

Researchers should be mindful of the importance of validity in the early planning stages of any study to avoid inaccurate results. Researchers must take the time to consider tools and methods as well as how the testing environment matches closely with the natural environment in which results will be used.

The following steps can be used to ensure validity in research:

Choose appropriate methods of measurement

Use appropriate sampling to choose test subjects

Create an accurate testing environment

How do you maintain validity in research?

Accurate research is usually conducted over a period of time with different test subjects. To maintain validity across an entire study, you must take specific steps to ensure that gathered data has the same levels of accuracy.

Consistency is crucial for maintaining validity in research. When researchers apply methods consistently and standardize the circumstances under which data is collected, validity can be maintained across the entire study.

Is there a need for validation of the research instrument before its implementation?

An essential part of validity is choosing the right research instrument or method for accurate results. Consider the thermometer that is reliable but still produces inaccurate results. You're unlikely to achieve research validity without activities like calibration, content, and construct validity.

Understanding research validity for more accurate results

Without validity, research can't provide the accuracy necessary to deliver a useful study. By getting a clear understanding of validity in research, you can take steps to improve your research skills and achieve more accurate results.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 18 April 2023

Last updated: 27 February 2023

Last updated: 22 August 2024

Last updated: 5 February 2023

Last updated: 16 August 2024

Last updated: 9 March 2023

Last updated: 30 April 2024

Last updated: 12 December 2023

Last updated: 11 March 2024

Last updated: 4 July 2024

Last updated: 6 March 2024

Last updated: 5 March 2024

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next, log in or sign up.

Get started for free

Log in using your username and password

Search More Search for this keyword Advanced search
Latest content
Current issue
Write for Us
BMJ Journals

Roberta Heale 1 ,
Alison Twycross 2
1 School of Nursing, Laurentian University , Sudbury, Ontario , Canada
2 Faculty of Health and Social Care , London South Bank University , London , UK
Correspondence to : Dr Roberta Heale, School of Nursing, Laurentian University, Ramsey Lake Road, Sudbury, Ontario, Canada P3E2C6; rheale{at}laurentian.ca

https://doi.org/10.1136/eb-2015-102129

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Evidence-based practice includes, in part, implementation of the findings of well-conducted quality research studies. So being able to critique quantitative research is an important skill for nurses. Consideration must be given not only to the results of the study but also the rigour of the research. Rigour refers to the extent to which the researchers worked to enhance the quality of the studies. In quantitative research, this is achieved through measurement of the validity and reliability. 1

View inline

Types of validity

The first category is content validity . This category looks at whether the instrument adequately covers all the content that it should with respect to the variable. In other words, does the instrument cover the entire domain related to the variable, or construct it was designed to measure? In an undergraduate nursing course with instruction about public health, an examination with content validity would cover all the content in the course with greater emphasis on the topics that had received greater coverage or more depth. A subset of content validity is face validity , where experts are asked their opinion about whether an instrument measures the concept intended.

Construct validity refers to whether you can draw inferences about test scores related to the concept being studied. For example, if a person has a high score on a survey that measures anxiety, does this person truly have a high degree of anxiety? In another example, a test of knowledge of medications that requires dosage calculations may instead be testing maths knowledge.

There are three types of evidence that can be used to demonstrate a research instrument has construct validity:

Homogeneity—meaning that the instrument measures one construct.

Convergence—this occurs when the instrument measures concepts similar to that of other instruments. Although if there are no similar instruments available this will not be possible to do.

Theory evidence—this is evident when behaviour is similar to theoretical propositions of the construct measured in the instrument. For example, when an instrument measures anxiety, one would expect to see that participants who score high on the instrument for anxiety also demonstrate symptoms of anxiety in their day-to-day lives. 2

The final measure of validity is criterion validity . A criterion is any other instrument that measures the same variable. Correlations can be conducted to determine the extent to which the different instruments measure the same variable. Criterion validity is measured in three ways:

Convergent validity—shows that an instrument is highly correlated with instruments measuring similar variables.

Divergent validity—shows that an instrument is poorly correlated to instruments that measure different variables. In this case, for example, there should be a low correlation between an instrument that measures motivation and one that measures self-efficacy.

Predictive validity—means that the instrument should have high correlations with future criterions. 2 For example, a score of high self-efficacy related to performing a task should predict the likelihood a participant completing the task.

Reliability

Reliability relates to the consistency of a measure. A participant completing an instrument meant to measure motivation should have approximately the same responses each time the test is completed. Although it is not possible to give an exact calculation of reliability, an estimate of reliability can be achieved through different measures. The three attributes of reliability are outlined in table 2 . How each attribute is tested for is described below.

Attributes of reliability

Homogeneity (internal consistency) is assessed using item-to-total correlation, split-half reliability, Kuder-Richardson coefficient and Cronbach's α. In split-half reliability, the results of a test, or instrument, are divided in half. Correlations are calculated comparing both halves. Strong correlations indicate high reliability, while weak correlations indicate the instrument may not be reliable. The Kuder-Richardson test is a more complicated version of the split-half test. In this process the average of all possible split half combinations is determined and a correlation between 0–1 is generated. This test is more accurate than the split-half test, but can only be completed on questions with two answers (eg, yes or no, 0 or 1). 3

Cronbach's α is the most commonly used test to determine the internal consistency of an instrument. In this test, the average of all correlations in every combination of split-halves is determined. Instruments with questions that have more than two responses can be used in this test. The Cronbach's α result is a number between 0 and 1. An acceptable reliability score is one that is 0.7 and higher. 1 , 3

Stability is tested using test–retest and parallel or alternate-form reliability testing. Test–retest reliability is assessed when an instrument is given to the same participants more than once under similar circumstances. A statistical comparison is made between participant's test scores for each of the times they have completed it. This provides an indication of the reliability of the instrument. Parallel-form reliability (or alternate-form reliability) is similar to test–retest reliability except that a different form of the original instrument is given to participants in subsequent tests. The domain, or concepts being tested are the same in both versions of the instrument but the wording of items is different. 2 For an instrument to demonstrate stability there should be a high correlation between the scores each time a participant completes the test. Generally speaking, a correlation coefficient of less than 0.3 signifies a weak correlation, 0.3–0.5 is moderate and greater than 0.5 is strong. 4

Equivalence is assessed through inter-rater reliability. This test includes a process for qualitatively determining the level of agreement between two or more observers. A good example of the process used in assessing inter-rater reliability is the scores of judges for a skating competition. The level of consistency across all judges in the scores given to skating participants is the measure of inter-rater reliability. An example in research is when researchers are asked to give a score for the relevancy of each item on an instrument. Consistency in their scores relates to the level of inter-rater reliability of the instrument.

Determining how rigorously the issues of reliability and validity have been addressed in a study is an essential component in the critique of research as well as influencing the decision about whether to implement of the study findings into nursing practice. In quantitative studies, rigour is determined through an evaluation of the validity and reliability of the tools or instruments utilised in the study. A good quality research study will provide evidence of how all these factors have been addressed. This will help you to assess the validity and reliability of the research and help you decide whether or not you should apply the findings in your area of clinical practice.

Lobiondo-Wood G ,
Shuttleworth M
↵ Laerd Statistics . Determining the correlation coefficient . 2013 . https://statistics.laerd.com/premium/pc/pearson-correlation-in-spss-8.php

Twitter Follow Roberta Heale at @robertaheale and Alison Twycross at @alitwy

Competing interests None declared.

Read the full text or download the PDF:

Validity & Reliability In Research

A Plain-Language Explanation (With Examples)

By: Derek Jansen (MBA) | Expert Reviewer: Kerryn Warren (PhD) | September 2023

Validity and reliability are two related but distinctly different concepts within research. Understanding what they are and how to achieve them is critically important to any research project. In this post, we’ll unpack these two concepts as simply as possible.

This post is based on our popular online course, Research Methodology Bootcamp . In the course, we unpack the basics of methodology using straightfoward language and loads of examples. If you’re new to academic research, you definitely want to use this link to get 50% off the course (limited-time offer).

Overview: Validity & Reliability

The big picture
Validity 101
Reliability 101
Key takeaways

First, The Basics…

First, let’s start with a big-picture view and then we can zoom in to the finer details.

Validity and reliability are two incredibly important concepts in research, especially within the social sciences. Both validity and reliability have to do with the measurement of variables and/or constructs – for example, job satisfaction, intelligence, productivity, etc. When undertaking research, you’ll often want to measure these types of constructs and variables and, at the simplest level, validity and reliability are about ensuring the quality and accuracy of those measurements .

As you can probably imagine, if your measurements aren’t accurate or there are quality issues at play when you’re collecting your data, your entire study will be at risk. Therefore, validity and reliability are very important concepts to understand (and to get right). So, let’s unpack each of them.

What Is Validity?

In simple terms, validity (also called “construct validity”) is all about whether a research instrument accurately measures what it’s supposed to measure .

For example, let’s say you have a set of Likert scales that are supposed to quantify someone’s level of overall job satisfaction. If this set of scales focused purely on only one dimension of job satisfaction, say pay satisfaction, this would not be a valid measurement, as it only captures one aspect of the multidimensional construct. In other words, pay satisfaction alone is only one contributing factor toward overall job satisfaction, and therefore it’s not a valid way to measure someone’s job satisfaction.

Oftentimes in quantitative studies, the way in which the researcher or survey designer interprets a question or statement can differ from how the study participants interpret it . Given that respondents don’t have the opportunity to ask clarifying questions when taking a survey, it’s easy for these sorts of misunderstandings to crop up. Naturally, if the respondents are interpreting the question in the wrong way, the data they provide will be pretty useless . Therefore, ensuring that a study’s measurement instruments are valid – in other words, that they are measuring what they intend to measure – is incredibly important.

There are various types of validity and we’re not going to go down that rabbit hole in this post, but it’s worth quickly highlighting the importance of making sure that your research instrument is tightly aligned with the theoretical construct you’re trying to measure . In other words, you need to pay careful attention to how the key theories within your study define the thing you’re trying to measure – and then make sure that your survey presents it in the same way.

For example, sticking with the “job satisfaction” construct we looked at earlier, you’d need to clearly define what you mean by job satisfaction within your study (and this definition would of course need to be underpinned by the relevant theory). You’d then need to make sure that your chosen definition is reflected in the types of questions or scales you’re using in your survey . Simply put, you need to make sure that your survey respondents are perceiving your key constructs in the same way you are. Or, even if they’re not, that your measurement instrument is capturing the necessary information that reflects your definition of the construct at hand.

If all of this talk about constructs sounds a bit fluffy, be sure to check out Research Methodology Bootcamp , which will provide you with a rock-solid foundational understanding of all things methodology-related. Remember, you can take advantage of our 60% discount offer using this link.

Need a helping hand?

What Is Reliability?

As with validity, reliability is an attribute of a measurement instrument – for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the “thing” it’s supposed to be measuring, reliability is concerned with consistency and stability . In other words, reliability reflects the degree to which a measurement instrument produces consistent results when applied repeatedly to the same phenomenon , under the same conditions .

As you can probably imagine, a measurement instrument that achieves a high level of consistency is naturally more dependable (or reliable) than one that doesn’t – in other words, it can be trusted to provide consistent measurements . And that, of course, is what you want when undertaking empirical research. If you think about it within a more domestic context, just imagine if you found that your bathroom scale gave you a different number every time you hopped on and off of it – you wouldn’t feel too confident in its ability to measure the variable that is your body weight 🙂

It’s worth mentioning that reliability also extends to the person using the measurement instrument . For example, if two researchers use the same instrument (let’s say a measuring tape) and they get different measurements, there’s likely an issue in terms of how one (or both) of them are using the measuring tape. So, when you think about reliability, consider both the instrument and the researcher as part of the equation.

As with validity, there are various types of reliability and various tests that can be used to assess the reliability of an instrument. A popular one that you’ll likely come across for survey instruments is Cronbach’s alpha , which is a statistical measure that quantifies the degree to which items within an instrument (for example, a set of Likert scales) measure the same underlying construct . In other words, Cronbach’s alpha indicates how closely related the items are and whether they consistently capture the same concept .

Reliability reflects whether an instrument produces consistent results when applied to the same phenomenon, under the same conditions.

Recap: Key Takeaways

Alright, let’s quickly recap to cement your understanding of validity and reliability:

Validity is concerned with whether an instrument (e.g., a set of Likert scales) is measuring what it’s supposed to measure
Reliability is concerned with whether that measurement is consistent and stable when measuring the same phenomenon under the same conditions.

In short, validity and reliability are both essential to ensuring that your data collection efforts deliver high-quality, accurate data that help you answer your research questions . So, be sure to always pay careful attention to the validity and reliability of your measurement instruments when collecting and analysing data. As the adage goes, “rubbish in, rubbish out” – make sure that your data inputs are rock-solid.

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS.

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS AND I HAVE GREATLY BENEFITED FROM THE CONTENT.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Print Friendly

Skip to secondary menu
Skip to main content
Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Validity in Research and Psychology: Types & Examples

By Jim Frost 3 Comments

What is Validity in Psychology, Research, and Statistics?

Validity in research, statistics , psychology, and testing evaluates how well test scores reflect what they’re supposed to measure. Does the instrument measure what it claims to measure? Do the measurements reflect the underlying reality? Or do they quantify something else?

photograph of a confident researcher because her data have high validity.

For example, does an intelligence test assess intelligence or another characteristic, such as education or the ability to recall facts?

Researchers need to consider whether they’re measuring what they think they’re measuring. Validity addresses the appropriateness of the data rather than whether measurements are repeatable ( reliability ). However, for a test to be valid, it must first be reliable (consistent).

Evaluating validity is crucial because it helps establish which tests to use and which to avoid. If researchers use the wrong instruments, their results can be meaningless!

Validity is usually less of a concern for tangible measurements like height and weight. You might have a cheap bathroom scale that tends to read too high or too low—but it still measures weight. For those types of measurements, you’re more interested in accuracy and precision . However, other types of measurements are not as straightforward.

Validity is often a more significant concern in psychology and the social sciences, where you measure intangible constructs such as self-esteem and positive outlook. If you’re assessing the psychological construct of conscientiousness, you need to ensure that the measurement instrument asks questions that evaluate this characteristic rather than, say, obedience.

Psychological assessments of unobservable latent constructs (e.g., intelligence, traits, abilities, proclivities, etc.) have a specific application known as test validity, which is the extent that theory and data support the interpretations of test scores. Consequently, it is a critical issue because it relates to understanding the test results.

Evaluating Validity

Researchers validate tests using different lines of evidence. An instrument can be strong for one type of validity but weaker for another. Consequently, it is not a black or white issue—it can have degrees.

In this vein, there are many different types of validity and ways of thinking about it. Let’s take a look at several of the more common types. Each kind is a line of evidence that can help support or refute a test’s overall validity. In this post, learn about face, content, criterion, discriminant, concurrent, predictive, and construct validity.

If you want to learn about experimental validity, read my post about internal and external validity . Those types relate to experimental design and methods.

Types of Validity

In this post, I cover the following seven types of validity:

Face Validity : On its face, does the instrument measure the intended characteristic?
Content Validity : Do the test items adequately evaluate the target topic?
Criterion Validity : Do measures correlate with other measures in a pattern that fits theory?
Discriminant Validity : Is there no correlation between measures that should not have a relationship?
Concurrent Validity : Do simultaneous measures of the same construct correlate?
Predictive Validity : Does the measure accurately predict outcomes?
Construct Validity : Does the instrument measure the correct attribute?

Let’s look at these types of validity in more detail!

Face Validity

Face validity is the simplest and weakest type. Does the measurement instrument appear “on its face” to measure the intended construct? For a survey that assesses thrill-seeking behavior, you’d expect it to include questions about seeking excitement, getting bored quickly, and risky behaviors. If the survey contains these questions, then “on its face,” it seems like the instrument measures the construct that the researchers intend.

While this is a low bar, it’s an important issue to consider. Never overlook the obvious. Ensure that you understand the nature of the instrument and how it assesses a construct. Look at the questions. After all, if a test can’t clear this fundamental requirement, the other types of validity are a moot point. However, when a measure satisfies face validity, understand it is an intuition or a hunch that it feels correct. It’s not a statistical assessment. If your instrument passes this low bar, you still have more validation work ahead of you.

Content Validity

Content validity is similar to face validity—but it’s a more rigorous form. The process often involves assessing individual questions on a test and asking experts whether each item appraises the characteristics that the instrument is designed to cover. This process compares the test against the researcher’s goals and the theoretical properties of the construct. Researchers systematically determine whether each question contributes, and that no aspect is overlooked.

For example, if researchers are designing a survey to measure the attitudes and activities of thrill-seekers, they need to determine whether the questions sufficiently cover both of those aspects.

Learn more about Content Validity .

Criterion Validity

Criterion validity relates to the relationships between the variables in your dataset. If your data are valid, you’d expect to observe a particular correlation pattern between the variables. Researchers typically assess criterion validity by correlating different types of data. For whatever you’re measuring, you expect it to have particular relationships with other variables.

For example, measures of anxiety should correlate positively with the number of negative thoughts. Anxiety scores might also correlate positively with depression and eating disorders. If we see this pattern of relationships, it supports criterion validity. Our measure for anxiety correlates with other variables as expected.

This type is also known as convergent validity because scores for different measures converge or correspond as theory suggests. You should observe high correlations (either positive or negative).

Related posts : Criterion Validity: Definition, Assessing, and Examples and Interpreting Correlation Coefficients

Discriminant Validity

This type is the opposite of criterion validity. If you have valid data, you expect particular pairs of variables to correlate positively or negatively. However, for other pairs of variables, you expect no relationship.

For example, if self-esteem and locus of control are not related in reality, their measures should not correlate. You should observe a low correlation between scores.

It is also known as divergent validity because it relates to how different constructs are differentiated. Low correlations (close to zero) indicate that the values of one variable do not relate to the values of the other variables—the measures distinguish between different constructs.

Concurrent Validity

Concurrent validity evaluates the degree to which a measure of a construct correlates with other simultaneous measures of that construct. For example, if you administer two different intelligence tests to the same group, there should be a strong, positive correlation between their scores.

Learn more about Concurrent Validity: Definition, Assessing and Examples .

Predictive Validity

Predictive validity evaluates how well a construct predicts an outcome. For example, standardized tests such as the SAT and ACT are intended to predict how high school students will perform in college. If these tests have high predictive ability, test scores will have a strong, positive correlation with college achievement. Testing this type of validity requires administering the assessment and then measuring the actual outcomes.

Learn more about Predictive Validity: Definition, Assessing and Examples .

Construct Validity

A test with high construct validity correctly fits into the big picture with other constructs. Consequently, this type incorporates aspects of criterion, discriminant, concurrent, and predictive validity. A construct must correlate positively and negatively with the theoretically appropriate constructs, have no correlation with the correct constructs, correlate with other measures of the same construct, etc.

Construct validity combines the theoretical relationships between constructs with empirical relationships to see how closely they align. It evaluates the full range of characteristics for the construct you’re measuring and determines whether they all correlate correctly with other constructs, behaviors, and events.

As you can see, validity is a complex issue, particularly when you’re measuring abstract characteristics. To properly validate a test, you need to incorporate a wide range of subject-area knowledge and determine whether the measurements from your instrument fit in with the bigger picture! Researchers often use factor analysis to assess construct validity. Learn more about Factor Analysis .

For more in-depth information, read my article about Construct Validity .

Learn more about Experimental Design: Definition, Types, and Examples .

Nevo, Baruch (1985), Face Validity Revisited , Journal of Educational Measurement.

Reader Interactions

April 21, 2022 at 12:05 am

Thank you for the examples and easy-to-understand information about the various types of statistics used in psychology. As a current Ph.D. student, I have struggled in this area and finally, understand how to research using Inter-Rater Reliability and Predictive Validity. I greatly appreciate the information you are sharing and hope you continue to share information and examples that allows anyone, regardless of degree or not, an easy way to grasp the material.

April 21, 2022 at 1:38 am

Thanks so much! I really appreciate your kind words and I’m so glad my content has been helpful. I’m going to keep sharing! 🙂

March 14, 2023 at 1:27 am

Indeed! I think I’m grasping the concept reading your contents. Thanks!

Comments and Questions Cancel reply

17.4.1 Validity of instruments

Validity has to do with whether the instrument is measuring what it is intended to measure. Empirical evidence that PROs measure the domains of interest allows strong inferences regarding validity. To provide such evidence, investigators have borrowed validation strategies from psychologists who for many years have struggled with determining whether questionnaires assessing intelligence and attitudes really measure what is intended.

Validation strategies include:

content-related: evidence that the items and domains of an instrument are appropriate and comprehensive relative to its intended measurement concept(s), population and use;

construct-related: evidence that relationships among items, domains, and concepts conform to a priori hypotheses concerning logical relationships that should exist with other measures or characteristics of patients and patient groups; and

criterion-related (for a PRO instrument used as diagnostic tool): the extent to which the scores of a PRO instrument are related to a criterion measure.

Establishing validity involves examining the logical relationships that should exist between assessment measures. For example, we would expect that patients with lower treadmill exercise capacity generally will have more shortness of breath in daily life than those with higher exercise capacity, and we would expect to see substantial correlations between a new measure of emotional function and existing emotional function questionnaires.

When we are interested in evaluating change over time, we examine correlations of change scores. For example, patients who deteriorate in their treadmill exercise capacity should, in general, show increases in dyspnoea, whereas those whose exercise capacity improves should experience less dyspnoea. Similarly, a new emotional function measure should show improvement in patients who improve on existing measures of emotional function. The technical term for this process is testing an instrument’s construct validity.

Review authors should look for, and evaluate the evidence of, the validity of PROs used in their included studies. Unfortunately, reports of randomized trials and other studies using PROs seldom review evidence of the validity of the instruments they use, but review authors can gain some reassurance from statements (backed by citations) that the questionnaires have been validated previously.

A final concern about validity arises if the measurement instrument is used with a different population, or in a culturally and linguistically different environment, than the one in which it was developed (typically, use of a non-English version of an English-language questionnaire). Ideally, one would have evidence of validity in the population enrolled in the randomized trial. Ideally PRO measures should be re-validated in each study using whatever data are available for the validation, for instance, other endpoints measured. Authors should note, in evaluating evidence of validity, when the population assessed in the trial is different from that used in validation studies.

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 5: Psychological Measurement

Reliability and Validity of Measurement

Learning Objectives

Define reliability, including the different types and how they are assessed.
Define validity, including the different types and how they are assessed.
Describe the kinds of evidence that would be relevant to assessing the reliability and validity of a particular measure.

Again, measurement involves assigning scores to individuals so that they represent some characteristic of the individuals. But how do researchers know that the scores actually represent the characteristic, especially when it is a construct like intelligence, self-esteem, depression, or working memory capacity? The answer is that they conduct research using the measure to confirm that the scores make sense based on their understanding of the construct being measured. This is an extremely important point. Psychologists do not simply assume that their measures work. Instead, they collect data to demonstrate that they work. If their research does not demonstrate that a measure works, they stop using it.

As an informal example, imagine that you have been dieting for a month. Your clothes seem to be fitting more loosely, and several friends have asked if you have lost weight. If at this point your bathroom scale indicated that you had lost 10 pounds, this would make sense and you would continue to use the scale. But if it indicated that you had gained 10 pounds, you would rightly conclude that it was broken and either fix it or get rid of it. In evaluating a measurement method, psychologists consider two general dimensions: reliability and validity.

Reliability

Reliability refers to the consistency of a measure. Psychologists consider three types of consistency: over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability).

Test-Retest Reliability

When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time. Test-retest reliability is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent.

Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the same group of people at a later time, and then looking at test-retest correlation between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing Pearson’s r . Figure 5.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart. Pearson’s r for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.

Score at time 1 is on the x-axis and score at time 2 is on the y-axis, showing fairly consistent scores

Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.

Internal Consistency

A second kind of reliability is internal consistency , which is the consistency of people’s responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people’s scores on those items should be correlated with each other. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that that they have a number of good qualities. If people’s responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioural and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking. This measure would be internally consistent to the extent that individual participants’ bets were consistently high or low across trials.

Like test-retest reliability, internal consistency can only be assessed by collecting and analyzing data. One approach is to look at a split-half correlation . This involves splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items. Then a score is computed for each set of items, and the relationship between the two sets of scores is examined. For example, Figure 5.3 shows the split-half correlation between several university students’ scores on the even-numbered items and their scores on the odd-numbered items of the Rosenberg Self-Esteem Scale. Pearson’s r for these data is +.88. A split-half correlation of +.80 or greater is generally considered good internal consistency.

Score on even-numbered items is on the x-axis and score on odd-numbered items is on the y-axis, showing fairly consistent scores

Perhaps the most common measure of internal consistency used by researchers in psychology is a statistic called Cronbach’s α (the Greek letter alpha). Conceptually, α is the mean of all possible split-half correlations for a set of items. For example, there are 252 ways to split a set of 10 items into two sets of five. Cronbach’s α would be the mean of the 252 split-half correlations. Note that this is not how α is actually computed, but it is a correct way of interpreting the meaning of this statistic. Again, a value of +.80 or greater is generally taken to indicate good internal consistency.

Interrater Reliability

Many behavioural measures involve significant judgment on the part of an observer or a rater. Inter-rater reliability is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students’ social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student’s level of social skills. To the extent that each participant does in fact have some level of social skills that can be detected by an attentive observer, different observers’ ratings should be highly correlated with each other. Inter-rater reliability would also have been measured in Bandura’s Bobo doll study. In this case, the observers’ ratings of how many acts of aggression a particular child committed while playing with the Bobo doll should have been highly positively correlated. Interrater reliability is often assessed using Cronbach’s α when the judgments are quantitative or an analogous statistic called Cohen’s κ (the Greek letter kappa) when they are categorical.

Validity is the extent to which the scores from a measure represent the variable they are intended to. But how do researchers make this judgment? We have already considered one factor that they take into account—reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to. There has to be more to it, however, because a measure can be extremely reliable but have no validity whatsoever. As an absurd example, imagine someone who believes that people’s index finger length reflects their self-esteem and therefore tries to measure self-esteem by holding a ruler up to people’s index fingers. Although this measure would have extremely good test-retest reliability, it would have absolutely no validity. The fact that one person’s index finger is a centimetre longer than another’s would indicate nothing about which one had higher self-esteem.

Discussions of validity usually divide it into several distinct “types.” But a good way to interpret these types is that they are other kinds of evidence—in addition to reliability—that should be taken into account when judging the validity of a measure. Here we consider three basic kinds: face validity, content validity, and criterion validity.

Face Validity

Face validity is the extent to which a measurement method appears “on its face” to measure the construct of interest. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. So a questionnaire that included these kinds of items would have good face validity. The finger-length method of measuring self-esteem, on the other hand, seems to have nothing to do with self-esteem and therefore has poor face validity. Although face validity can be assessed quantitatively—for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to—it is usually assessed informally.

Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. One reason is that it is based on people’s intuitions about human behaviour, which are frequently wrong. It is also the case that many established measures in psychology work quite well despite lacking face validity. The Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality characteristics and disorders by having people decide whether each of over 567 different statements applies to them—where many of the statements do not have any obvious relationship to the construct that they measure. For example, the items “I enjoy detective or mystery stories” and “The sight of blood doesn’t frighten me or make me sick” both measure the suppression of aggression. In this case, it is not the participants’ literal answers to these questions that are of interest, but rather whether the pattern of the participants’ responses to a series of questions matches those of individuals who tend to suppress their aggression.

Content Validity

Content validity is the extent to which a measure “covers” the construct of interest. For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings, and actions toward something. By this conceptual definition, a person has a positive attitude toward exercise to the extent that he or she thinks positive thoughts about exercising, feels good about exercising, and actually exercises. So to have good content validity, a measure of people’s attitudes toward exercise would have to reflect all three of these aspects. Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.

Criterion Validity

Criterion validity is the extent to which people’s scores on a measure are correlated with other variables (known as criteria ) that one would expect them to be correlated with. For example, people’s scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam. If it were found that people’s scores were in fact negatively correlated with their exam performance, then this would be a piece of evidence that these scores really represent people’s test anxiety. But if it were found that people scored equally well on the exam regardless of their test anxiety scores, then this would cast doubt on the validity of the measure.

A criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them. For example, one would expect test anxiety scores to be negatively correlated with exam performance and course grades and positively correlated with general anxiety and with blood pressure during an exam. Or imagine that a researcher develops a new measure of physical risk taking. People’s scores on this measure should be correlated with their participation in “extreme” activities such as snowboarding and rock climbing, the number of speeding tickets they have received, and even the number of broken bones they have had over the years. When the criterion is measured at the same time as the construct, criterion validity is referred to as concurrent validity ; however, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as predictive validity (because scores on the measure have “predicted” a future outcome).

Criteria can also include other measures of the same construct. For example, one would expect new measures of test anxiety or physical risk taking to be positively correlated with existing measures of the same constructs. This is known as convergent validity .

Assessing convergent validity requires collecting data using the measure. Researchers John Cacioppo and Richard Petty did this when they created their self-report Need for Cognition Scale to measure how much people value and engage in thinking (Cacioppo & Petty, 1982) [1] . In a series of studies, they showed that people’s scores were positively correlated with their scores on a standardized academic achievement test, and that their scores were negatively correlated with their scores on a measure of dogmatism (which represents a tendency toward obedience). In the years since it was created, the Need for Cognition Scale has been used in literally hundreds of studies and has been shown to be correlated with a wide variety of other variables, including the effectiveness of an advertisement, interest in politics, and juror decisions (Petty, Briñol, Loersch, & McCaslin, 2009) [2] .

Discriminant Validity

Discriminant validity , on the other hand, is the extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct. For example, self-esteem is a general attitude toward the self that is fairly stable over time. It is not the same as mood, which is how good or bad one happens to be feeling right now. So people’s scores on a new measure of self-esteem should not be very highly correlated with their moods. If the new measure of self-esteem were highly correlated with a measure of mood, it could be argued that the new measure is not really measuring self-esteem; it is measuring mood instead.

When they created the Need for Cognition Scale, Cacioppo and Petty also provided evidence of discriminant validity by showing that people’s scores were not correlated with certain other variables. For example, they found only a weak correlation between people’s need for cognition and a measure of their cognitive style—the extent to which they tend to think analytically by breaking ideas into smaller parts or holistically in terms of “the big picture.” They also found no correlation between people’s need for cognition and measures of their test anxiety and their tendency to respond in socially desirable ways. All these low correlations provide evidence that the measure is reflecting a conceptually distinct construct.

Key Takeaways

Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them.
There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
Validity is a judgment based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. The assessment of reliability and validity is an ongoing process.
Practice: Ask several friends to complete the Rosenberg Self-Esteem Scale. Then assess its internal consistency by making a scatterplot to show the split-half correlation (even- vs. odd-numbered items). Compute Pearson’s r too if you know how.
Discussion: Think back to the last college exam you took and think of the exam as a psychological measure. What construct do you think it was intended to measure? Comment on its face and content validity. What data could you collect to assess its reliability and criterion validity?
Cacioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42 , 116–131. ↵
Petty, R. E, Briñol, P., Loersch, C., & McCaslin, M. J. (2009). The need for cognition. In M. R. Leary & R. H. Hoyle (Eds.), Handbook of individual differences in social behaviour (pp. 318–329). New York, NY: Guilford Press. ↵

The consistency of a measure.

The consistency of a measure over time.

The consistency of a measure on the same group of people at different times.

Consistency of people’s responses across the items on a multiple-item measure.

Method of assessing internal consistency through splitting the items into two sets and examining the relationship between them.

A statistic in which α is the mean of all possible split-half correlations for a set of items.

The extent to which different observers are consistent in their judgments.

The extent to which the scores from a measure represent the variable they are intended to.

The extent to which a measurement method appears to measure the construct of interest.

The extent to which a measure “covers” the construct of interest.

The extent to which people’s scores on a measure are correlated with other variables that one would expect them to be correlated with.

In reference to criterion validity, variables that one would expect to be correlated with the measure.

When the criterion is measured at the same time as the construct.

when the criterion is measured at some point in the future (after the construct has been measured).

When new measures positively correlate with existing measures of the same constructs.

The extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Validity and Reliability of the Research Instrument; How to Test the Validation of a Questionnaire/Survey in a Research

9 Pages Posted: 31 Jul 2018

Hamed Taherdoost

Hamta Group

Date Written: August 10, 2016

Questionnaire is one of the most widely used tools to collect data in especially social science research. The main objective of questionnaire in research is to obtain relevant information in most reliable and valid manner. Thus the accuracy and consistency of survey/questionnaire forms a significant aspect of research methodology which are known as validity and reliability. Often new researchers are confused with selection and conducting of proper validity type to test their research instrument (questionnaire/survey). This review article explores and describes the validity and reliability of a questionnaire/survey and also discusses various forms of validity and reliability tests.

Keywords: Research Instrument, Questionnaire, Survey, Survey Validity, Questionnaire Reliability, Content Validity, Face Validity, Construct Validity, Criterion Validity

Suggested Citation: Suggested Citation

Hamed Taherdoost (Contact Author)

Hamta group ( email ).

Vancouver Canada

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, social sciences education ejournal.

Subscribe to this fee journal for more curated articles on this topic

Political Behavior: Voting & Public Opinion eJournal

Political methods: experiments & experimental design ejournal.

Our websites may use cookies to personalize and enhance your experience. By continuing without changing your cookie settings, you agree to this collection. For more information, please see our University Websites Privacy Notice .

Neag School of Education

Educational Research Basics by Del Siegle

Instrument validity.

Validity (a concept map shows the various types of validity) A instrument is valid only to the extent that it’s scores permits appropriate inferences to be made about 1) a specific group of people for 2) specific purposes.

An instrument that is a valid measure of third grader’s math skills probably is not a valid measure of high school calculus student’s math skills. An instrument that is a valid predictor of how well students might do in school, may not be a valid measure of how well they will do once they complete school. So we never say that an instrument is valid or not valid…we say it is valid for a specific purpose with a specific group of people. Validity is specific to the appropriateness of the interpretations we wish to make with the scores.

In the reliability section , we discussed a scale that consistently reported a weight of 15 pounds for someone. While it may be a reliable instrument, it is not a valid instrument to determine someone’s weight in pounds. Just as a measuring tape is a valid instrument to determine people’s height, it is not a valid instrument to determine their weight.

There are three general categories of instrument validity. Content-Related Evidence (also known as Face Validity) Specialists in the content measured by the instrument are asked to judge the appropriateness of the items on the instrument. Do they cover the breath of the content area (does the instrument contain a representative sample of the content being assessed)? Are they in a format that is appropriate for those using the instrument? A test that is intended to measure the quality of science instruction in fifth grade, should cover material covered in the fifth grade science course in a manner appropriate for fifth graders. A national science test might not be a valid measure of local science instruction, although it might be a valid measure of national science standards.

Criterion-Related Evidence Criterion-related evidence is collected by comparing the instrument with some future or current criteria, thus the name criterion-related. The purpose of an instrument dictates whether predictive or concurrent validity is warranted.

– Predictive Validity If an instrument is purported to measure some future performance, predictive validity should be investigated. A comparison must be made between the instrument and some later behavior that it predicts. Suppose a screening test for 5-year-olds is purported to predict success in kindergarten. To investigate predictive validity, one would give the prescreening instrument to 5-year-olds prior to their entry into kindergarten. The children’s kindergarten performance would be assessed at the end of kindergarten and a correlation would be calculated between the screening instrument scores and the kindergarten performance scores.

– Concurrent Validity Concurrent validity compares scores on an instrument with current performance on some other measure. Unlike predictive validity, where the second measurement occurs later, concurrent validity requires a second measure at about the same time. Concurrent validity for a science test could be investigated by correlating scores for the test with scores from another established science test taken about the same time. Another way is to administer the instrument to two groups who are known to differ on the trait being measured by the instrument. One would have support for concurrent validity if the scores for the two groups were very different. An instrument that measures altruism should be able to discriminate those who possess it (nuns) from those who don’t (homicidal maniacs). One would expect the nuns to score significantly higher on the instrument.

Construct-Related Evidence Construct validity is an on-going process. Please refer to pages 174-176 for more information. Construct validity will not be on the test.

– Discriminant Validity An instrument does not correlate significantly with variables from which it should differ.

– Convergent Validity An instrument correlates highly with other variables with which it should theoretically correlate.

Del Siegle, Ph.D. Neag School of Education – University of Connecticut [email protected] www.delsiegle.info

How it works

Reliability and Validity – Definitions, Types & Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On October 26, 2023

A researcher must test the collected data before making any conclusion. Every research design needs to be concerned with reliability and validity to measure the quality of the research.

What is Reliability?

Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid.

Example: If you weigh yourself on a weighing scale throughout the day, you’ll get the same results. These are considered reliable results obtained through repeated measures.

Example: If a teacher conducts the same math test of students and repeats it next week with the same questions. If she gets the same score, then the reliability of the test is high.

What is the Validity?

Validity refers to the accuracy of the measurement. Validity shows how a specific test is suitable for a particular situation. If the results are accurate according to the researcher’s situation, explanation, and prediction, then the research is valid.

If the method of measuring is accurate, then it’ll produce accurate results. If a method is reliable, then it’s valid. In contrast, if a method is not reliable, it’s not valid.

Example: Your weighing scale shows different results each time you weigh yourself within a day even after handling it carefully, and weighing before and after meals. Your weighing machine might be malfunctioning. It means your method had low reliability. Hence you are getting inaccurate or inconsistent results that are not valid.

Example: Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from various participants, it means the validity of the questionnaire and product is high as it has high reliability.

Most of the time, validity is difficult to measure even though the process of measurement is reliable. It isn’t easy to interpret the real situation.

Example: If the weighing scale shows the same result, let’s say 70 kg each time, even if your actual weight is 55 kg, then it means the weighing scale is malfunctioning. However, it was showing consistent results, but it cannot be considered as reliable. It means the method has low reliability.

Internal Vs. External Validity

One of the key features of randomised designs is that they have significantly high internal and external validity.

Internal validity is the ability to draw a causal link between your treatment and the dependent variable of interest. It means the observed changes should be due to the experiment conducted, and any external factor should not influence the variables .

Example: age, level, height, and grade.

External validity is the ability to identify and generalise your study outcomes to the population at large. The relationship between the study’s situation and the situations outside the study is considered external validity.

Also, read about Inductive vs Deductive reasoning in this article.

Looking for reliable dissertation support?

We hear you.

Whether you want a full dissertation written or need help forming a dissertation proposal, we can help you with both.
Get different dissertation services at ResearchProspect and score amazing grades!

Threats to Interval Validity

Threat	Definition	Example
Confounding factors	Unexpected events during the experiment that are not a part of treatment.	If you feel the increased weight of your experiment participants is due to lack of physical activity, but it was actually due to the consumption of coffee with sugar.
Maturation	The influence on the independent variable due to passage of time.	During a long-term experiment, subjects may feel tired, bored, and hungry.
Testing	The results of one test affect the results of another test.	Participants of the first experiment may react differently during the second experiment.
Instrumentation	Changes in the instrument’s collaboration	Change in the may give different results instead of the expected results.
Statistical regression	Groups selected depending on the extreme scores are not as extreme on subsequent testing.	Students who failed in the pre-final exam are likely to get passed in the final exams; they might be more confident and conscious than earlier.
Selection bias	Choosing comparison groups without randomisation.	A group of trained and efficient teachers is selected to teach children communication skills instead of randomly selecting them.
Experimental mortality	Due to the extension of the time of the experiment, participants may leave the experiment.	Due to multi-tasking and various competition levels, the participants may leave the competition because they are dissatisfied with the time-extension even if they were doing well.

Threats of External Validity

Threat	Definition	Example
Reactive/interactive effects of testing	The participants of the pre-test may get awareness about the next experiment. The treatment may not be effective without the pre-test.	Students who got failed in the pre-final exam are likely to get passed in the final exams; they might be more confident and conscious than earlier.
Selection of participants	A group of participants selected with specific characteristics and the treatment of the experiment may work only on the participants possessing those characteristics	If an experiment is conducted specifically on the health issues of pregnant women, the same treatment cannot be given to male participants.

How to Assess Reliability and Validity?

Reliability can be measured by comparing the consistency of the procedure and its results. There are various methods to measure validity and reliability. Reliability can be measured through various statistical methods depending on the types of validity, as explained below:

Types of Reliability

Type of reliability	What does it measure?	Example
Test-Retests	It measures the consistency of the results at different points of time. It identifies whether the results are the same after repeated measures.	Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from a various group of participants, it means the validity of the questionnaire and product is high as it has high test-retest reliability.
Inter-Rater	It measures the consistency of the results at the same time by different raters (researchers)	Suppose five researchers measure the academic performance of the same student by incorporating various questions from all the academic subjects and submit various results. It shows that the questionnaire has low inter-rater reliability.
Parallel Forms	It measures Equivalence. It includes different forms of the same test performed on the same participants.	Suppose the same researcher conducts the two different forms of tests on the same topic and the same students. The tests could be written and oral tests on the same topic. If results are the same, then the parallel-forms reliability of the test is high; otherwise, it’ll be low if the results are different.
Inter-Term	It measures the consistency of the measurement.	The results of the same tests are split into two halves and compared with each other. If there is a lot of difference in results, then the inter-term reliability of the test is low.

Types of Validity

As we discussed above, the reliability of the measurement alone cannot determine its validity. Validity is difficult to be measured even if the method is reliable. The following type of tests is conducted for measuring validity.

Type of reliability	What does it measure?	Example
Content validity	It shows whether all the aspects of the test/measurement are covered.	A language test is designed to measure the writing and reading skills, listening, and speaking skills. It indicates that a test has high content validity.
Face validity	It is about the validity of the appearance of a test or procedure of the test.	The type of included in the question paper, time, and marks allotted. The number of questions and their categories. Is it a good question paper to measure the academic performance of students?
Construct validity	It shows whether the test is measuring the correct construct (ability/attribute, trait, skill)	Is the test conducted to measure communication skills is actually measuring communication skills?
Criterion validity	It shows whether the test scores obtained are similar to other measures of the same concept.	The results obtained from a prefinal exam of graduate accurately predict the results of the later final exam. It shows that the test has high criterion validity.

Does your Research Methodology Have the Following?

Great Research/Sources
Perfect Language
Accurate Sources

If not, we can help. Our panel of experts makes sure to keep the 3 pillars of Research Methodology strong.

Does your Research Methodology Have the Following?

How to Increase Reliability?

Use an appropriate questionnaire to measure the competency level.
Ensure a consistent environment for participants
Make the participants familiar with the criteria of assessment.
Train the participants appropriately.
Analyse the research items regularly to avoid poor performance.

How to Increase Validity?

Ensuring Validity is also not an easy job. A proper functioning method to ensure validity is given below:

The reactivity should be minimised at the first concern.
The Hawthorne effect should be reduced.
The respondents should be motivated.
The intervals between the pre-test and post-test should not be lengthy.
Dropout rates should be avoided.
The inter-rater reliability should be ensured.
Control and experimental groups should be matched with each other.

How to Implement Reliability and Validity in your Thesis?

According to the experts, it is helpful if to implement the concept of reliability and Validity. Especially, in the thesis and the dissertation, these concepts are adopted much. The method for implementation given below:

Segments

Explanation

All the planning about reliability and validity will be discussed here, including the chosen samples and size and the techniques used to measure reliability and validity.

Please talk about the level of reliability and validity of your results and their influence on values.

Discuss the contribution of other researchers to improve reliability and validity.

Frequently Asked Questions

What is reliability and validity in research.

Reliability in research refers to the consistency and stability of measurements or findings. Validity relates to the accuracy and truthfulness of results, measuring what the study intends to. Both are crucial for trustworthy and credible research outcomes.

What is validity?

Validity in research refers to the extent to which a study accurately measures what it intends to measure. It ensures that the results are truly representative of the phenomena under investigation. Without validity, research findings may be irrelevant, misleading, or incorrect, limiting their applicability and credibility.

What is reliability?

Reliability in research refers to the consistency and stability of measurements over time. If a study is reliable, repeating the experiment or test under the same conditions should produce similar results. Without reliability, findings become unpredictable and lack dependability, potentially undermining the study’s credibility and generalisability.

What is reliability in psychology?

In psychology, reliability refers to the consistency of a measurement tool or test. A reliable psychological assessment produces stable and consistent results across different times, situations, or raters. It ensures that an instrument’s scores are not due to random error, making the findings dependable and reproducible in similar conditions.

What is test retest reliability?

Test-retest reliability assesses the consistency of measurements taken by a test over time. It involves administering the same test to the same participants at two different points in time and comparing the results. A high correlation between the scores indicates that the test produces stable and consistent results over time.

How to improve reliability of an experiment?

Standardise procedures and instructions.
Use consistent and precise measurement tools.
Train observers or raters to reduce subjective judgments.
Increase sample size to reduce random errors.
Conduct pilot studies to refine methods.
Repeat measurements or use multiple methods.
Address potential sources of variability.

What is the difference between reliability and validity?

Reliability refers to the consistency and repeatability of measurements, ensuring results are stable over time. Validity indicates how well an instrument measures what it’s intended to measure, ensuring accuracy and relevance. While a test can be reliable without being valid, a valid test must inherently be reliable. Both are essential for credible research.

Are interviews reliable and valid?

Interviews can be both reliable and valid, but they are susceptible to biases. The reliability and validity depend on the design, structure, and execution of the interview. Structured interviews with standardised questions improve reliability. Validity is enhanced when questions accurately capture the intended construct and when interviewer biases are minimised.

Are IQ tests valid and reliable?

IQ tests are generally considered reliable, producing consistent scores over time. Their validity, however, is a subject of debate. While they effectively measure certain cognitive skills, whether they capture the entirety of “intelligence” or predict success in all life areas is contested. Cultural bias and over-reliance on tests are also concerns.

Are questionnaires reliable and valid?

Questionnaires can be both reliable and valid if well-designed. Reliability is achieved when they produce consistent results over time or across similar populations. Validity is ensured when questions accurately measure the intended construct. However, factors like poorly phrased questions, respondent bias, and lack of standardisation can compromise their reliability and validity.

Method of preparing a document for survey instrument validation by experts

Associated data.

Validation of a survey instrument is an important activity in the research process. Face validity and content validity, though being qualitative methods, are essential steps in validating how far the survey instrument can measure what it is intended for. These techniques are used in both scale development processes and a questionnaire that may contain multiple scales. In the face and content validation, a survey instrument is usually validated by experts from academics and practitioners from field or industry. Researchers face challenges in conducting a proper validation because of the lack of an appropriate method for communicating the requirement and receiving the feedback.

In this Paper, the authors develop a template that could be used for the validation of survey instrument.

In instrument development process, after the item pool is generated, the template is completed and sent to the reviewer. The reviewer will be able to give the necessary feedback through the template that will be helpful to the researcher in improving the instrument.

Graphical abstract

Specifications table

Subject Area	Psychology
More specific subject area
Method name
Name and reference of original method	American Educational Research Association, American Psychological Association, & the National Council on Measurement in Education. . Standards for educational & psychological testing. Washington, DC: Author. Boateng et al. ( . Best practices for developing and validating scales for health, social, and behavioral research: a primer. , 149. Willis and Lessler . Question appraisal system QAS-99. .
Resource availability

*Method details

Introduction

Survey instruments or questionnaires are the most popular data collection tool because of its many advantages. Collecting data from a huge population in a limited time and at a lower cost, convenient to respondents, anonymity, lack of interviewer bias and standardization of questions are some of the benefits. However, an important disadvantage of a questionnaire is poor data quality due to incomplete and inaccurate questions, wording problems and poor development process. The problems are critical and can be avoided or mitigated [14] .

To ensure the quality of the instrument, using a previously validated questionnaire is useful. This will save time and resources in development process and testing its reliability and validity. However, there can be situations wherein a new questionnaire is needed [5] . Whenever a new scale or questionnaire needs to be developed, following a structured method will help us to develop a quality instrument. There are many approaches in scale development and all the methods include stages for testing reliability and validity among them.

Even though there are many literatures available on the reliability and validity procedures, many researches struggle to operationalize the process. Collingridge [8] wrote in the Methodspace blog of Sage publication that he repeatedly asked professors on how to validate the questions in a survey and unfortunately did not get an answer. Most of the time, researchers send the completely designed questionnaire with the actual measurement scale without providing adequate information for the reviewers to provide proper feedback. This paper is an effort to develop a document template that can capture the feedback of the expert reviewers of the instrument.

This paper is structured as follows: Section 1 provides the introduction to the need for a validation format for research, and the fundamentals of validation and the factors involved in validation from various literature studies are discussed in Section 2. Section 3 presents the methodology used in framing the validation format. Section 4 provides the results of the study. Section 5 presents explanation of how the format can be used and feedback be processed. Finally, Section 6 concludes the paper with a note on contribution.

Review of literature

A questionnaire is explained as “an instrument for the measurement of one or more constructs by means of aggregated item scores, called scales” [21] . A questionnaire can be identified on a continuum of unstructured to structure [14] . A structured questionnaire will “have a similar format, are usually statements, questions, or stimulus words with structured response categories, and require a judgment or description by a respondent or rater” [21] . Research in social science with a positivist paradigm began in the 19th century. The first use of a questionnaire is attributed to the Statistical Society of London as early as 1838. Berthold Sigismund proposed the first guidelines for questionnaire development in 1856, which provided a definite plan for the questionnaire method [13] . In 1941, The British Association for the Advancement of Science provided Acceptance of Quantitative Measures for Sensory Events [26] provided a much pervasive application or questionnaire in research, similar to Guttman scale [15] , Thurstone Scale [27] and Likert Scale [18] .

Carpenter [6] argued that scholars do not follow the best practices in the measurement building procedure. The author claims that “the defaults in the statistical programs, inadequate training and numerous evaluation points can lead to improper practices”. Many researchers have proposed techniques for scale development. We trace the prominent methods from the literature. Table 1 presents various frameworks in scale development.

Frameworks of Scale development.

Author & Framework	Steps	Remarks
Churchill Paradigm for Developing Better Measures of Marketing Constructs	8-Step process. (1) specify domain of construct, (2) generate a sample of items, (3) collect data, (4) purify measure, (5) collect data, (6) assess reliability, (7) assess validity and (7) develop standards.	He recommended a multi-item measure to diminish the difficulties of a single-item measure. Experts are consulted during the item development stage. A focus group of 8 to 10 participants are triggered for an open discussion on the concept. When a researcher wants to include items, experienced researchers can attest identical statements. Every statement will be reviewed for the preciseness of words, double-barreled statements, positive and negative statements, socially acceptable responses and even to remove the item.
Hinkin Three stages scale development	Following are the stages of the scale construction: (1) Item generation, (2) Scale development under which Design of developmental study, scale construction and reliability assessment are the steps, (3) Scale evaluation.	The study recommended the use of subject matter experts in developing the conceptual definition.
Hinkin et al. Seven-step scale development procedure.	(1) Item Generation, (2) Content Adequacy Assessment, (3) Questionnaire Administration, (4) Factor Analysis, (5) Internal Consistency Assessment, (6) Construct Validation and (7) Replication.	The authors propose ‘content adequacy assessment’ as a necessary step in scale development. They are of the concern that this step is being overlooked and researchers land in trouble after collecting large datasets. The authors argue that there are several content assessment methods and recommend using experts in a content domain for the assessment.
Rossiter C-OAR-SE scale development.	The steps of the framework are as follows: (1) Construct definition, (2), Object classification, (3) Attribute classification, (4) Rater identification, (5) Scale formation, and (6) Enumeration and reporting.	This framework has been exclusively proposed for scale development in marketing research where the construct is defined in terms of object, attribute and rater entity (OAR). The scale depends on only content validity than any other types of validity and places more emphasis on reasonable arguments and the agreement of experts. The author distinguishes content validity from face validity and argues that “content validity is conducted before the scale is developed, that the items will properly represent the construct”, whereas “face validity is a post hoc claim that the items in the scale measure the construct”. The author presented a prototype of an expert judge's rating form.
DeVellis Eight-step scale construct method	(1) Determine clearly what it is you want to measure, (2) Generate the Item pool, (3) Determine the format for measurement, (4) Have the initial item pool reviewed by experts, (5) Consider the inclusion of Validation items, (6) Administer Items to a development sample, (7) Evaluate the items and (8) Optimize scale length.	The author proposes an exclusive step in which the generated items are validated by experts. The expert panel is required to evaluate how each item is relevant to measure the concept based on the working definition of the construct. The experts are also expected to assess the clarity and conciseness of the items. The experts can also indicate any missing phenomenon that the researcher failed to include. However, the final decision on considering the expert's comments is with the researcher.
Carpenter 10 step scale development and Reproting	(1) Research the intended meaning and breadth of the theoretical concept, (2) Determine sampling procedure, (3) Examine data quality, (4) Verify the factorability of the data, (5) Conduct Common Factor Analysis, (6) Select factor extraction method, (7) Determine the number of factors, (8) Rotate factors, (9) Evaluate items based on a priori criteria and (10) Present results.	The author claims that “Interviews, focus groups, and expert feedback are critical in the item generation and dimension identification process” and recommends that “the pool of items needs to be concise, clear, distinct, and reflect the chosen conceptual definition”.

Reeves and Marbach-Ad [22] argued that the quantitative aspect of social science research is different from science in terms of quantifying the phenomena using instruments. Bollen [4] explained that a social science instrument measures latent variables that are not directly observed, although inferred from observable behaviour. Because of this characteristic of social science measures, there is a need to ensure that what is being measured actually is measuring the intended phenomenon.

The concept of reliability and validity was evolved as early as 1896 by Pearson. The validity theory from 1900 to 1950 basically dealt with the alignment of test scores with other measures. This was operationally tested by correlation. The validity theory was refined during the 1950s to include criterion, content and construct validity. Correlation of the test measure to an accurate criterion score is the criterion validity. In 1955, criterion validity was proposed as concurrent validity and predictive validity. Content validity provides “domain relevance and representativeness of the test instrument”. The concept of construct validity was introduced in 1954 and got increased emphasis, and from 1985 it took a central form as the appropriate test for validity. The new millennium saw a change in the perspectives of validity theory. Contemporary validity theory is a metamorphosis of epistemological and methodological perspectives. Argument-based approach and consequences-based validity are some new concepts that are evolving [24] .

American Educational Research Association (AERA), American Psychological Association (APA) and National Council on Measurement in Education (NCME) jointly developed ‘Standards for educational and psychological testing’. It is described as “the degree to which evidence and theory support the interpretations of test scores for posed uses of tests” [1] .

Based on the ‘Standards’, the validity tests are classified on the type of evidence. Standards 1.11 to 1.25, describe various evidence to test the validity [1] . Table 2 presents different types of validities based on evidence and their explanation.

Types of validity.

Types of evidence	Explanation
Content-oriented evidence	“Validity evidence can be obtained from an analysis of the relationship between the content of a test and the construct it is intended to measure”.
Evidence regarding cognitive processes	“Evidence concerning the fit between the construct and the derailed nature of the performance or response actually engaged in by test takers”.
Evidence regarding internal structure	“Analyses of the internal structure of a test can indicate the degree to which the relationships among test items and test components conform to the construct on which the proposed test score interpretations are based”.
Evidence concerning relationships with conceptually related constructs	“Evidence based on relationships with other variables provides evidence about the degree to which these relationships are consistent with the construct underlying the proposed test score interpretations”. This includes convergent and discriminant validity.
Evidence regarding relationships with criteria	“Evidence of the relation of test scores to a relevant criterion”. This includes concurrent and predictive validity.
Evidence based on consequences of tests	“The validation process involves gathering evidence to evaluate the soundness of these proposed interpretations for their intended use”.

( Source: [1] )

Souza et al. [25] argued that “there is no statistical test to assess specifically the content validity; usually researchers use a qualitative approach, through the assessment of an experts committee, and then, a quantitative approach using the content validity index (CVI).”

Worthington and Whittaker [29] conducted a content analysis on new scales developed between 1995 and 2004. They specifically focused on the use of Exploratory and Confirmatory Factor Analysis (EFA & CFA) procedures in the validation of the scales. They argued that though the post-tests in the validation procedure, which are usually based on factor-analytic techniques, are more scientific and rigorous, the preliminary steps are necessary. Mistakes committed in the initial stages of scale development lead to problems in the later stages.

Messick [20] proposed six distinguishable notions of construct validity for educational and psychological measurements. Among the six, the foremost one is the content validity that looks at the relevance of the content, representativeness and technical quality. In a similar way Oosterveld et al. [21] developed taxonomy of questionnaire design directed towards psychometric aspects. The taxonomy introduces the following questionnaire design methods: (1) coherent, (2) prototypical, (3) internal, (4) external, (5) construct and (6) facet design technique. These methods are related “to six psychometric features guiding them face validity, process validity, homogeneity, criterion validity, construct validity and content validity”. The authors presented these methods under four stages: (1) concept review, (2) item generation, (3) scale development and (4) evaluation. After the definition of the construct in the first stage, the item pool is developed. The item production stage “comprises an item review by judges, e.g., experts, or potential respondents, and a pilot administration of the preliminary questionnaire, the results of which are subsequently used for refinement of the items”.

What needs to be checked?

This paper mainly focuses on the expert validation done under the face validity and content validity stages. Martinez [19] provides a clear distinction between content validity and face validity. “Face validity requires an examination of a measure and the items of which it is composed as sufficient and suitable ‘on its face’ for capturing a concept. A measure with face validity will be visibly relevant to the concept it is intended to measure, and less so to other concepts”. Though face validity is the quick and excellent first step for assessing the appropriateness of measure to capture the concept, it is not sufficient. It needs to be interpreted along with other forms of measurement validity.

“Content validity focuses on the degree to which a measure captures the full dimension of a particular concept. A measure exhibiting high content validity is one that encompasses the full meaning of the concept it is intended to assess” [19] . An extensive review of literature and consultation with experts ensures the validity of the content.

From the review of various literature studies, we arrive at the details of validation that need to be done by experts. Domain or subject matter experts both from academic and industry, a person with expertise in the construct being developed, people familiar with the target population on whom the instrument will be used, users of the instrument, data analysts and those who take decisions based on the scores of the test are recommended as experts. Experts are consulted during the concept development stage and item generation stage. Experts provide feedback on the content, sensitivity and standard settings [10] .

During the concept development stage, experts provide inputs on the definition of the constructs, relating it to the domain and also check with the related concepts. At the item generation stage, experts validate the representativeness and significance of each item to the construct, accuracy of each item in measuring the concept, inclusion or deletion of elements, logical sequence of the items, and scoring models. Experts also validate how the instrument can measure the concept among different groups of respondents. An item is checked for its bias to specific groups such as gender, minority groups and linguistically different groups. Experts also provide standard scores or cutoff scores for decision making [10] .

The second set of reviewers who are experts in questionnaire development basically check the structural aspects of the instrument in terms of common errors such as double-barreled, confusing and leading questions. This also includes language experts, even if the questionnaire is developed in a popular language like English. Other language experts are required in case the instrument involves translation.

There were many attempts to standardize the validation of the questionnaire. Forsyth et al. [11] developed a Forms Appraisal model, which was an exhaustive list of problems that occur in a questionnaire item. This was found to be tiresome for experts. Fowler and Roman [12] developed an ‘Interviewer Rating Form’, which allowed experts to comment on three qualities: (1) trouble reading the question, (2) respondent not understanding the meaning or ideas in the question and (3) respondent having difficulty in providing an answer. The experts had to code as ‘ A ’ for ‘No evidence of a problem’, ‘ B ’ for ‘Possible problem’ and ‘ C ’ for ‘Definite Problem’. Willis and Lessler [28] developed a shorter version of the coding scheme for evaluation of questionnaire items called “Question appraisal system (QAS)”. This system evaluates each item on 26 problem areas under seven heads. The expert needs to just code ‘Yes’ or ‘No’ for each item. Akkerboom and Dehue [2] developed a systematic review of a questionnaire for an interview and self-completion questionnaire with 26 problems items categorized under eight problem areas.

Hinkin [16] recommended a "best practices" of “clearly cite the theoretical literature on which the new measures are based and describe the manner in which the items were developed and the sample used for item development”. The author claims that “in many articles, this information was lacking, and it was not clear whether there was little justification for the items chosen or if the methodology employed was simply not adequately presented”.

Further to the qualitative analysis of the items, recent developments include quantitative assessments of the items. “The content adequacy of a set of newly developed items is assessed by asking respondents to rate the extent to which items corresponded with construct definitions” [16] . Souza et al. [25] suggest using the Content Validity Index (CVI) for the quantitative approach. Experts evaluate every item on a four-point scale, in which “1 = non-equivalent item; 2 = the item needs to be extensively revised so equivalence can be assessed; 3 = equivalent item, needs minor adjustments; and 4 = totally equivalent item”. The number of items with a score of 3 or 4 and dividing it with the total number of answers is used to calculate an index of CVI. The CVI value is the percentage of judges who agree with an item, and the index value of at least 0.80 and higher than 0.90 is accepted.

Information to be provided to the experts

The problems with conducting a face validity and content validity may be attributed to both scale developer and the reviewer. Scale developers do not convey their requirements to the experts properly, and experts are also not sure about what is expected by the researcher. Therefore, a format is developed, which will capture the requirements information for scale validation from both the researcher and the experts.

Covering letter

A covering letter is an important part when sending a questionnaire for review. It can help in persuading a reviewer to support the research. It should be short and simple. A covering letter first invites the experts for the review and provides esteem to the expert. Even if the questionnaire for review is handed over personally, having a covering letter will serve instructions for the review process and the expectations from the reviewer.

Boateng et al. [3] recommended that the researcher specifies the purpose of the construct or the questionnaire being developed, justifying the development of new instruments by confirming that there are no existing instruments are crucial. If there are any similar instruments, how different is the proposed one from the existing instruments.

The covering letter can mention the maximum time required for the review and any compensation that the expert will be awarded. This will motivate the reviewer to contribute their expertise and efforts. Instructions on how to complete the review process, what aspects to be checked, the coding systems and how to give the feedback are also provided in the covering letter. The covering letter ends with a thank you note in advance and personally signed by the instrument developer. Information on further contact details can also be provided at the end of the covering letter.

Introduction to research

Boateng et al. [3] proposed that it is an essential step to articulate the domain(s) before any validation process. They recommend that “the domain being examined should be decided upon and defined before any item activity. A well-defined domain will provide a working knowledge of the phenomenon under study, specify the boundaries of the domain, and ease the process of item generation and content validation”.

In the introduction section, the research problem being addressed, existing theories, the proposed theory or model that will be investigated, list of variables/concepts that are to be measured can be elaborated. Guion [30] defended that for those who do not just accept the content validity by the evaluations of operational definition alone, five conditions will be a tentative answer: “(1) the content domain should be grounded in behavior with a commonly accepted meaning, (2) the content domain must be defined in a manner that is not open to more than one interpretation, (3) the content domain must be related to the purposes of measurement, (4) qualified judges must agree that the domain has been sufficiently sampled and (5) the response content must be dependably observed and evaluated.” Therefore, the information provided in the ‘Introduction’ section will be helpful to the expert to do a content validity at the first step.

Construct-wise item validation

After the need for the measure or the survey instrument is communicated, the domain is validated. The next step is to validate the items. Validation may be done for developing a scale for a single concept or as a questionnaire with multiple concepts of measure. For a multiple construct instrument, the validation is done construct-wise.

In an instrument with multiple constructs, the Introduction provides information at the theory level. The domain validation is done to assess the relevance of the theory to the problem. In the next section, the domain validation is done at variable level. Similar to the Introduction, details about the construct is provided. The definition of the construct, source of the definition, description of the concept, and the operational definition are shared to the experts. Experts will validate the construct by relating it to the relevant domain. If the conceptualization and definition are not properly done, it will result in poor evaluation of the items.

New items are developed by deductive method or deductive method. In deductive methods, items are generated from already existing scales and indicators through literature review. In inductive technique, the items are generated through direct observation, individual interviews, focus group discussion and exploratory research. It is necessary to convey how the item is generated to the expert reviewer. Even when the item or a scale is adopted unaltered; it becomes necessary to validate them to assess their relevance to a particular culture or a region. Even in such situations, it is necessary to inform the reviewer about the source of the items.

Experts review each item and the construct as a whole. For each item, item code, the item statement, measurement scale, the source of item and description of the item are provided. In informing the source of the item, there are three options. When the item is adopted as it is from the previous scales, the source can be provided. If the item is adapted by modifying the earlier item, the source and the original item can be informed along with description of modification done. If the item is developed by induction, the item source can be mentioned. First, experts evaluate each item to assess if they represent the domain of the construct and provide their evaluation and 4-point or 3-point scale. When multiple experts are used for the validation process, this score can also be used for quantitative evaluation. The quality parameters of the item are further evaluated. Researchers may choose the questionnaire appraisal scheme from many different systems available. An open remarks column is provided for experts to give any feedback that is not covered by the format. A comments section is provided at the end of the construct validation section where the experts can give the feedback such underrepresentation of the construct by the items.

Validation of demography items

The same way, the information regarding each of the demography items that will be required in the questionnaire is also included in the format. Finally, space for the expert to comment on the entire instrument is also provided. The template of the evaluation form is provided in the Appendix.

Inferring the feedback

Since the feedback is a qualitative approach, mathematical or statistical approach is not required for inferring the review. Researcher can retain, remove or modify the statements of the questionnaire as indicated by the experts as essential, not essential and modify. As we have recommended using the quality parameters of QAS for describing the problems and issues, researcher will get a precise idea on what need to be corrected. Remarks by the experts will carry additional information in form of comments or suggestion that will be easy to follow when revising the items. General comments at the end of each scale or construct will provide suggestions on adding further items to the construct.

Despite the various frameworks available for the available to the researchers for developing the survey instrument, the quality of the same is not at the desirable level. Content validation of the measuring instrument is an essential requirement of every research. A rigorous process expert validation can avoid the problems at the latter stage. However, researchers are disadvantaged at operationalising the instrument review process. Researchers are challenged with communicating the background information and collecting the feedback. This paper is an attempt to design a standard format for the expert validation of the survey instrument. Through a literature review, the expectations from the expert review for validation are identified. The domain of the construct, relevance, accuracy, inclusion or deletion of items, sensitivity, bias, structural aspects such as language issues, double-barreled, negative, confusing and leading questions need to be validated by the experts. A format is designed with a covering page having an invitation to the experts, their role, introduction to the research and the instrument. Information regarding the scale and the list of the scale item are provided in the subsequent pages. The demography questions are also included for validation. The expert review format will provide standard communication and feedback between the researcher and the expert reviewer that can help in developing a rigorous and quality survey instruments.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

[OPTIONAL. This is where you can acknowledge colleagues who have helped you that are not listed as co-authors, and funding. MethodsX is a community effort, by researchers for researchers. We highly appreciate the work not only of authors submitting, but also of the reviewers who provide valuable input to each submission. We therefore publish a standard ``thank you'' note in each of the articles to acknowledge the efforts made by the respective reviewers.]

Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.mex.2021.101326 .

Appendix. Supplementary materials

Uncomplicated Reviews of Educational Research Methods

Instrument, Validity, Reliability

.pdf version of this page

Part I: The Instrument

Instrument is the general term that researchers use for a measurement device (survey, test, questionnaire, etc.). To help distinguish between instrument and instrumentation, consider that the instrument is the device and instrumentation is the course of action (the process of developing, testing, and using the device).

Instruments fall into two broad categories, researcher-completed and subject-completed, distinguished by those instruments that researchers administer versus those that are completed by participants. Researchers chose which type of instrument, or instruments, to use based on the research question. Examples are listed below:


Rating scales	Questionnaires
Interview schedules/guides	Self-checklists
Tally sheets	Attitude scales
Flowcharts	Personality inventories
Performance checklists	Achievement/aptitude tests
Time-and-motion logs	Projective devices
Observation forms	Sociometric devices

Usability refers to the ease with which an instrument can be administered, interpreted by the participant, and scored/interpreted by the researcher. Example usability problems include:

Students are asked to rate a lesson immediately after class, but there are only a few minutes before the next class begins (problem with administration).
Students are asked to keep self-checklists of their after school activities, but the directions are complicated and the item descriptions confusing (problem with interpretation).
Teachers are asked about their attitudes regarding school policy, but some questions are worded poorly which results in low completion rates (problem with scoring/interpretation).

Validity and reliability concerns (discussed below) will help alleviate usability issues. For now, we can identify five usability considerations:

How long will it take to administer?
Are the directions clear?
How easy is it to score?
Do equivalent forms exist?
Have any problems been reported by others who used it?

It is best to use an existing instrument, one that has been developed and tested numerous times, such as can be found in the Mental Measurements Yearbook . We will turn to why next.

Part II: Validity

Validity is the extent to which an instrument measures what it is supposed to measure and performs as it is designed to perform. It is rare, if nearly impossible, that an instrument be 100% valid, so validity is generally measured in degrees. As a process, validation involves collecting and analyzing data to assess the accuracy of an instrument. There are numerous statistical tests and measures to assess the validity of quantitative instruments, which generally involves pilot testing. The remainder of this discussion focuses on external validity and content validity.

External validity is the extent to which the results of a study can be generalized from a sample to a population. Establishing eternal validity for an instrument, then, follows directly from sampling. Recall that a sample should be an accurate representation of a population, because the total population may not be available. An instrument that is externally valid helps obtain population generalizability, or the degree to which a sample represents the population.

Content validity refers to the appropriateness of the content of an instrument. In other words, do the measures (questions, observation logs, etc.) accurately assess what you want to know? This is particularly important with achievement tests. Consider that a test developer wants to maximize the validity of a unit test for 7th grade mathematics. This would involve taking representative questions from each of the sections of the unit and evaluating them against the desired outcomes.

Part III: Reliability

Reliability can be thought of as consistency. Does the instrument consistently measure what it is intended to measure? It is not possible to calculate reliability; however, there are four general estimators that you may encounter in reading research:

Inter-Rater/Observer Reliability : The degree to which different raters/observers give consistent answers or estimates.
Test-Retest Reliability : The consistency of a measure evaluated over time.
Parallel-Forms Reliability: The reliability of two tests constructed the same way, from the same content.
Internal Consistency Reliability: The consistency of results across items, often measured with Cronbach’s Alpha.

Relating Reliability and Validity

Reliability is directly related to the validity of the measure. There are several important principles. First, a test can be considered reliable, but not valid. Consider the SAT, used as a predictor of success in college. It is a reliable test (high scores relate to high GPA), though only a moderately valid indicator of success (due to the lack of structured environment – class attendance, parent-regulated study, and sleeping habits – each holistically related to success).

Second, validity is more important than reliability. Using the above example, college admissions may consider the SAT a reliable test, but not necessarily a valid measure of other quantities colleges seek, such as leadership capability, altruism, and civic involvement. The combination of these aspects, alongside the SAT, is a more valid measure of the applicant’s potential for graduation, later social involvement, and generosity (alumni giving) toward the alma mater.

Finally, the most useful instrument is both valid and reliable. Proponents of the SAT argue that it is both. It is a moderately reliable predictor of future success and a moderately valid measure of a student’s knowledge in Mathematics, Critical Reading, and Writing.

Part IV: Validity and Reliability in Qualitative Research

Thus far, we have discussed Instrumentation as related to mostly quantitative measurement. Establishing validity and reliability in qualitative research can be less precise, though participant/member checks, peer evaluation (another researcher checks the researcher’s inferences based on the instrument ( Denzin & Lincoln, 2005 ), and multiple methods (keyword: triangulation ), are convincingly used. Some qualitative researchers reject the concept of validity due to the constructivist viewpoint that reality is unique to the individual, and cannot be generalized. These researchers argue for a different standard for judging research quality. For a more complete discussion of trustworthiness, see Lincoln and Guba’s (1985) chapter .

About Research Rundowns

Research Rundowns was made possible by support from the Dewar College of Education at Valdosta State University .

Experimental Design
What is Educational Research?
Writing Research Questions
Mixed Methods Research Designs
Qualitative Coding & Analysis
Qualitative Research Design
Correlation
Effect Size
Mean & Standard Deviation
Significance Testing (t-tests)
Steps 1-4: Finding Research
Steps 5-6: Analyzing & Organizing
Steps 7-9: Citing & Writing
Writing a Research Report

Blog at WordPress.com.

Already have a WordPress.com account? Log in now.
Subscribe Subscribed
Copy shortlink
Report this content
View post in Reader
Manage subscriptions
Collapse this bar

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

Knowledge Base

Methodology

What Is Criterion Validity? | Definition & Examples

What Is Criterion Validity? | Definition & Examples

Published on September 2, 2022 by Kassiani Nikolopoulou . Revised on June 22, 2023.

Criterion validity (or criterion-related validity ) evaluates how accurately a test measures the outcome it was designed to measure. An outcome can be a disease, behavior, or performance. Concurrent validity measures tests and criterion variables in the present, while predictive validity measures those in the future.

To establish criterion validity, you need to compare your test results to criterion variables . Criterion variables are often referred to as a “gold standard” measurement. They comprise other tests that are widely accepted as valid measures of a construct .

The researcher can then compare the college entry exam scores of 100 students to their GPA after one semester in college. If the scores of the two tests are close, then the college entry exam has criterion validity.

When your test agrees with the criterion variable, it has high criterion validity. However, criterion variables can be difficult to find.

What is criterion validity, types of criterion validity, criterion validity example, how to measure criterion validity, other interesting articles, frequently asked questions about criterion validity.

Criterion validity shows you how well a test correlates with an established standard of comparison called a criterion.

A measurement instrument, like a questionnaire , has criterion validity if its results converge with those of some other, accepted instrument, commonly called a “gold standard.”

A gold standard (or criterion variable) measures:

The same construct
Conceptually relevant constructs
Conceptually relevant behavior or performance

When a gold standard exists, evaluating criterion validity is a straightforward process. For example, you can compare a new questionnaire with an established one. In medical research, you can compare test scores with clinical assessments.

However, in many cases, there is no existing gold standard. If you want to measure pain, for example, there is no objective standard to do so. You must rely on what respondents tell you. In such cases, you can’t achieve criterion validity.

It’s important to keep in mind that criterion validity is only as good as the validity of the gold standard or reference measure. If the reference measure suffers from some sort of research bias , it can impact an otherwise valid measure. In other words, a valid measure tested against a biased gold standard may fail to achieve criterion validity.

Similarly, two biased measures will confirm one another. Thus, criterion validity is no guarantee that a measure is in fact valid. It’s best used in tandem with the other types of validity .

Prevent plagiarism. Run a free check.

There are two types of criterion validity. Which type you use depends on the time at which the two measures (the criterion and your test) are obtained.

Concurrent validity is used when the scores of a test and the criterion variables are obtained at the same time .
Predictive validity is used when the criterion variables are measured after the scores of the test.

Concurrent validity

Concurrent validity is demonstrated when a new test correlates with another test that is already considered valid, called the criterion test. A high correlation between the new test and the criterion indicates concurrent validity.

Establishing concurrent validity is particularly important when a new measure is created that claims to be better in some way than its predecessors: more objective, faster, cheaper, etc.

Remember that this form of validity can only be used if another criterion or validated instrument already exists.

Predictive validity

Predictive validity is demonstrated when a test can predict future performance. In other words, the test must correlate with a variable that can only be assessed at some point in the future, after the test has been administered.

For predictive criterion validity, researchers often examine how the results of a test predict a relevant future outcome. For example, the results of an IQ test can be used to predict future educational achievement. The outcome is, by design, assessed at some point in the future.

A high correlation provides evidence of predictive validity. It indicates that a test can correctly predict something that you hypothesize it should.

Criterion validity is often used when a researcher wishes to replace an established test with a different version of the same test, particularly one that is more objective, shorter, or cheaper.

Although the original test is widely accepted as a valid measure of procrastination, it is very long and takes a lot of time to complete. As a result, many students fill it in without carefully considering their answers.

Criterion validity is assessed in two ways:

By statistically testing a new measurement technique against an independent criterion or standard to establish concurrent validity
By statistically testing against a future performance to establish predictive validity

The measure to be validated, such as a test, should be correlated with a measure considered to be a well-established indication of the construct under study. This is your criterion variable.

Correlations between the scores on the test and the criterion variable are calculated using a correlation coefficient , such as Pearson’s r . A correlation coefficient expresses the strength of the relationship between two variables in a single value between −1 and +1.

Correlation coefficient values can be interpreted as follows:

r = 1: There is perfect positive correlation
r = 0: There is no correlation at all.
r = −1: There is perfect negative correlation

You can automatically calculate Pearson’s r in Excel , R , SPSS or other statistical software.

Positive correlation between a test and the criterion variable shows that the test is valid. No correlation or a negative correlation indicates that the test and criterion variable do not measure the same concept.

You give the two scales to the same sample of respondents. The extent of agreement between the results of the two scales is expressed through a correlation coefficient.

You calculate the correlation coefficient between the results of the two tests and find out that your scale correlates with the existing scale ( r = 0.80). This value shows that there is a strong positive correlation between the two scales.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

Normal distribution
Degrees of freedom
Null hypothesis
Discourse analysis
Control groups
Mixed methods research
Non-probability sampling
Quantitative research
Ecological validity

Research bias

Rosenthal effect
Implicit bias
Cognitive bias
Selection bias
Negativity bias
Status quo bias

Criterion validity and construct validity are both types of measurement validity . In other words, they both show you how accurately a method measures something.

While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.

Construct validity is often considered the overarching type of measurement validity . You need to have face validity , content validity , and criterion validity in order to achieve construct validity.

When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.

Construct validity is often considered the overarching type of measurement validity , because it covers all of the other types. You need to have face validity , content validity , and criterion validity to achieve construct validity.

Reliability and validity are both about how well a method measures something:

Reliability refers to the consistency of a measure (whether the results can be reproduced under the same conditions).
Validity refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.

Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Nikolopoulou, K. (2023, June 22). What Is Criterion Validity? | Definition & Examples. Scribbr. Retrieved August 29, 2024, from https://www.scribbr.com/methodology/criterion-validity/

Is this article helpful?

Kassiani Nikolopoulou

Other students also liked, what is convergent validity | definition & examples, what is content validity | definition & examples, construct validity | definition, types, & examples, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

Scholarly Community Encyclopedia
Log in/Sign up

Version	Summary	Modification	Content Size	Created at	Operation
1		+ 1483 word(s)	1483	2022-03-10 03:27:19	\|
2	formating	-48 word(s)	1435	2022-03-14 03:32:50	\| \|
3	reference	Meta information modification	1435	2022-03-15 07:42:16	\|

Video Upload Options

MDPI and ACS Style
Chicago Style

1. Introduction

Validity explains how well the collected data covers the actual area of investigation [ 1 ] . Validity basically means “measure what is intended to be measured” [ 2 ] .

2. Face Validity

Face validity is a subjective judgment on the operationalization of a construct. Face validity is the degree to which a measure appears to be related to a specific construct, in the judgment of non-experts such as test takers and representatives of the legal system. That is, a test has face validity if its content simply looks relevant to the person taking the test. It evaluates the appearance of the questionnaire in terms of feasibility, readability, consistency of style and formatting, and the clarity of the language used.

In other words, face validity refers to researchers’ subjective assessments of the presentation and relevance of the measuring instrument as to whether the items in the instrument appear to be relevant, reasonable, unambiguous and clear [ 3 ] .

In order to examine the face validity, the dichotomous scale can be used with categorical option of “Yes” and “No” which indicate a favourable and unfavourable item respectively. Where favourable item means that the item is objectively structured and can be positively classified under the thematic category. Then the collected data is analysed using Cohen’s Kappa Index (CKI) in determining the face validity of the instrument. DM. et al. [ 4 ] recommended a minimally acceptable Kappa of 0.60 for inter-rater agreement. Unfortunately, face validity is arguably the weakest form of validity and many would suggest that it is not a form of validity in the strictest sense of the word.

3. Content Validity

Content validity is defined as “the degree to which items in an instrument reflect the content universe to which the instrument will be generalized” (Straub, Boudreau et al. [ 5 ] ). In the field of IS, it is highly recommended to apply content validity while the new instrument is developed. In general, content validity involves evaluation of a new survey instrument in order to ensure that it includes all the items that are essential and eliminates undesirable items to a particular construct domain [ 6 ] ). The judgemental approach to establish content validity involves literature reviews and then follow-ups with the evaluation by expert judges or panels. The procedure of judgemental approach of content validity requires researchers to be present with experts in order to facilitate validation. However it is not always possible to have many experts of a particular research topic at one location. It poses a limitation to conduct validity on a survey instrument when experts are located in different geographical areas (Choudrie and Dwivedi [ 7 ] ). Contrastingly, a quantitative approach may allow researchers to send content validity questionnaires to experts working at different locations, whereby distance is not a limitation. In order to apply content validity following steps are followed:

1.An exhaustive literature reviews to extract the related items.

2.A content validity survey is generated (each item is assessed using three point scale (not necessary, useful but not essential and essential).

3.The survey should sent to the experts in the same field of the research.

4.The content validity ratio (CVR) is then calculated for each item by employing Lawshe [ 8 ] (1975) ‘s method.

5.Items that are not significant at the critical level are eliminated. In following the critical level of Lawshe method is explained.

4. Construct Validity

If a relationship is causal, what are the particular cause and effect behaviours or constructs involved in the relationship? Construct validity refers to how well you translated or transformed a concept, idea, or behaviour that is a construct into a functioning and operating reality, the operationalization. Construct validity has two components: convergent and discriminant validity.

4.1 Discriminant Validity

Discriminant validity is the extent to which latent variable A discriminates from other latent variables (e.g., B, C, D). Discriminant validity means that a latent variable is able to account for more variance in the observed variables associated with it than a) measurement error or similar external, unmeasured influences; or b) other constructs within the conceptual framework. If this is not the case, then the validity of the individual indicators and of the construct is questionable (Fornell and Larcker [ 9 ] ). In brief, Discriminant validity (or divergent validity) tests that constructs that should have no relationship do, in fact, not have any relationship.

4.2 Convergent Validity

Convergent validity, a parameter often used in sociology, psychology, and other behavioural sciences, refers to the degree to which two measures of constructs that theoretically should be related, are in fact related. In brief, Convergent validity tests that constructs that are expected to be related are, in fact, related.

With the purpose of verifying the construct validity (discriminant and convergent validity), a factor analysis can be conducted utilizing principal component analysis (PCA) with varimax rotation method (Koh and Nam [ 9 ] , Wee and Quazi, [ 10 ] ). Items loaded above 0.40, which is the minimum recommended value in research are considered for further analysis. Also, items cross loading above 0.40 should be deleted. Therefore, the factor analysis results will satisfy the criteria of construct validity including both the discriminant validity (loading of at least 0.40, no cross-loading of items above 0.40) and convergent validity (eigenvalues of 1, loading of at least 0.40, items that load on posited constructs) (Straub et al., [ 11 ] ). There are also other methods to test the convergent and discriminant validity.

5. Criterion Validity

Criterion or concrete validity is the extent to which a measure is related to an outcome. It measures how well one measure predicts an outcome for another measure. A test has this type of validity if it is useful for predicting performance or behavior in another situation (past, present, or future).

Criterion validity is an alternative perspective that de-emphasizes the conceptual meaning or interpretation of test scores. Test users might simply wish to use a test to differentiate between groups of people or to make predictions about future outcomes. For example, a human resources director might need to use a test to help predict which applicants are most likely to perform well as employees. From a very practical standpoint, she focuses on the test’s ability to differentiate good employees from poor employees. If the test does this well, then the test is “valid” enough for her purposes. From the traditional three-faceted view of validity, criterion validity refers to the degree to which test scores can predict specific criterion variables. The key to validity is the empirical association between test scores and scores on the relevant criterion variable, such as “job performance.”

Messick [ 12 ] suggests that “even for purposes of applied decision making, reliance on criterion validity or content coverage is not enough. The meaning of the measure, and hence its construct validity, must always be pursued – not only to support test interpretation but also to justify test use”. There are two types of criterion validity namely; concurrent validity, predictive and postdictive validity.

6. Reliability

Reliability concerns the extent to which a measurement of a phenomenon provides stable and consist result (Carmines and Zeller [ 13 ] ). Reliability is also concerned with repeatability. For example, a scale or test is said to be reliable if repeat measurement made by it under constant conditions will give the same result (Moser and Kalton [ 14 ] ).

Testing for reliability is important as it refers to the consistency across the parts of a measuring instrument (Huck [ 15 ] ). A scale is said to have high internal consistency reliability if the items of a scale “hang together” and measure the same construct (Huck [ 16 ] Robinson [ 17 ] ). The most commonly used internal consistency measure is the Cronbach Alpha coefficient. It is viewed as the most appropriate measure of reliability when making use of Likert scales (Whitley [ 18 ] , Robinson [ 19 ] ). No absolute rules exist for internal consistencies, however most agree on a minimum internal consistency coefficient of .70 (Whitley [ 20 ] , Robinson [ 21 ] ).

For an exploratory or pilot study, it is suggested that reliability should be equal to or above 0.60 (Straub et al. [ 22 ] ). Hinton et al. [ 23 ] have suggested four cut-off points for reliability, which includes excellent reliability (0.90 and above), high reliability (0.70-0.90), moderate reliability (0.50-0.70) and low reliability (0.50 and below)(Hinton et al., [ 24 ] ). Although reliability is important for study, it is not sufficient unless combined with validity. In other words, for a test to be reliable, it also needs to be valid [ 25 ] .

ACKOFF, R. L. 1953. The Design of Social Research, Chicago, University of Chicago Press.
BARTLETT, J. E., KOTRLIK, J. W. & HIGGINS, C. C. 2001. Organizational research: determining appropriate sample size in survey research. Learning and Performance Journal, 19, 43-50.
BOUDREAU, M., GEFEN, D. & STRAUB, D. 2001. Validation in IS research: A state-of-the-art assessment. MIS Quarterly, 25, 1-24.
BREWETON, P. & MILLWARD, L. 2001. Organizational Research Methods, London, SAGE.
BROWN, G. H. 1947. A comparison of sampling methods. Journal of Marketing, 6, 331-337.
BRYMAN, A. & BELL, E. 2003. Business research methods, Oxford, Oxford University Press.
CARMINES, E. G. & ZELLER, R. A. 1979. Reliability and Validity Assessment, Newbury Park, CA, SAGE.
CHOUDRIE, J. & DWIVEDI, Y. K. Investigating Broadband Diffusion in the Household: Towards Content Validity and Pre-Test of the Survey Instrument. Proceedings of the 13th European Conference on Information Systems (ECIS 2005), May 26-28, 2005 2005 Regensburg, Germany.
DAVIS, D. 2005. Business Research for Decision Making, Australia, Thomson South-Western.
DM., G., DP., H., CC., C., CL., S. & ., P. B. 1975. The effects of instructional prompts and praise on children's donation rates. Child Development 46, 980-983.
ENGELLANT, K., HOLLAND, D. & PIPER, R. 2016. Assessing Convergent and Discriminant Validity of the Motivation Construct for the Technology Integration Education (TIE) Model. Journal of Higher Education Theory and Practice 16, 37-50.
FIELD, A. P. 2005. Discovering Statistics Using SPSS, Sage Publications Inc.
FORNELL, C. & LARCKER, D. F. 1981. Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18, 39-50.
FOWLER, F. J. 2002. Survey research methods, Newbury Park, CA, SAGE.
GHAURI, P. & GRONHAUG, K. 2005. Research Methods in Business Studies, Harlow, FT/Prentice Hall.
GILL, J., JOHNSON, P. & CLARK, M. 2010. Research Methods for Managers, SAGE Publications.
HINTON, P. R., BROWNLOW, C., MCMURRAY, I. & COZENS, B. 2004. SPSS explained, East Sussex, England, Routledge Inc.
HUCK, S. W. 2007. Reading Statistics and Research, United States of America, Allyn & Bacon.
KOH, C. E. & NAM, K. T. 2005. Business use of the internet: a longitudinal study from a value chain perspective. Industrial Management & Data Systems, 105 85-95.
LAWSHE, C. H. 1975. A quantitative approach to content validity. Personnel Psychology, 28, 563-575.
LEWIS, B. R., SNYDER, C. A. & RAINER, K. R. 1995. An empirical assessment of the Information Resources Management construct. Journal of Management Information Systems, 12, 199-223.
MALHOTRA, N. K. & BIRKS, D. F. 2006. Marketing Research: An Applied Approach, Harlow, FT/Prentice Hall.
MAXWELL, J. A. 1996. Qualitative Research Design: An Intractive Approach London, Applied Social Research Methods Series.
MESSICK, S. 1989. Validity. In: LINN, R. L. (ed.) Educational measurement. New York: Macmillan.
MOSER, C. A. & KALTON, G. 1989. Survey methods in social investigation, Aldershot, Gower.

Terms and Conditions
Privacy Policy
Advisory Board

Information

Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

Active Journals
Find a Journal
Proceedings Series
For Authors
For Reviewers
For Editors
For Librarians
For Publishers
For Societies
For Conference Organizers
Open Access Policy
Institutional Open Access Program
Special Issues Guidelines
Editorial Process
Research and Publication Ethics
Article Processing Charges
Testimonials
Preprints.org
SciProfiles
Encyclopedia

Article Menu

Subscribe SciFeed
Recommended Articles
Google Scholar
on Google Scholar
Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Relationship between playing musical instruments and subjective well-being: enjoyment of playing instruments scale.

1. Introduction

2.1. participants, 2.2. measures, 2.2.1. preliminary enjoyment of playing instruments items, 2.2.2. music use questionnaire, 2.2.3. music receptivity scale, 2.2.4. use of music inventory, 2.2.5. satisfaction with life scale and the emotional frequency test, 2.2.6. subjective happiness scale (shs), 2.3. procedure, 2.4. statistical analysis, 3.1. exploratory factor analysis of the epis, 3.2. confirmatory factor analysis of the epis, 3.3. criterion-related validity of the epis, 3.4. relationship between the epis and subjective well-being, 4. discussion, 5. limitations of the study, 6. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest, abbreviations.

EPIS	Enjoyment of Playing Instruments Scale
UMI	Use of Music Inventory
MUSE	Music Use Questionnaire
MRS	Music Receptivity Scale
SWLS	The Satisfaction with Life Scale
EFT	Emotional Frequency Test
SHS	Subjective Happiness Scale
CVI	Content Validity Index
IRB	Institutional Review Board
SPSS	Statistical Package for Social Sciences
AMOS	Analysis of Moment Structure
EFA	Exploratory Factor Analysis
CFA	Confirmatory Factor Analyses
TLI	Tucker−Lewis Index
CFI	Comparative Fit Index
RMSEA	Root Mean Square Error of Approximation
SRMR	Standardized Root Mean Square Residual
CR	Composite reliability
AVE	Average Variance Extracted
ES	Enjoyment of Singing
SRWs	Standardized Regression Weights
df	Degree of Freedom
CI	Confidence Interval

Horden, P. (Ed.) Music as Medicine: The History of Music Therapy Since Antiquity , 1st ed.; Routledge: New York, NY, USA, 2000. [ Google Scholar ]
Thaut, M.H. Music as therapy in early history. Prog. Brain Res. 2015 , 217 , 143–158. [ Google Scholar ] [ CrossRef ]
Cho, S.; Baek, Y.; Choe, E.J. A strategic approach to music listening with a mobile app for high school students. Int. J. Music Educ. 2019 , 37 , 132–141. [ Google Scholar ] [ CrossRef ]
Welch, G.F.; Biasutti, M.; MacRitchie, J.; McPherson, G.E.; Himonides, E. Editorial: The impact of music on human development and well-being. Front. Psychol. 2020 , 11 , 1246. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Chamorro-Premuzic, T.; Furnham, A. Personality and music: Can traits explain how people use music in everyday life? British J. Psychol. 2007 , 98 , 175–185. [ Google Scholar ] [ CrossRef ]
Moore, K.S.; Hanson-Abromeit, D. Theory-guided therapeutic function of music to facilitate emotion regulation development in preschool-aged children. Front. Hum. Neurosci. 2015 , 9 , 572. [ Google Scholar ] [ CrossRef ]
Gold, C.; Solli, H.P.; Krüger, V.; Lie, S.A. Dose-response relationship in music therapy for people with serious mental disorders: Systematic review and meta-analysis. Clin. Psychol. Rev. 2009 , 29 , 193–207. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Merrill, J.; Omigie, D.; Wald-Fuhrmann, M. Locus of emotion influences psychophysiological reactions to music. PLoS ONE 2020 , 15 , e0237641. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Koelsch, S.A. A neuroscientific perspective on music therapy. Ann. N. Y. Acad. Sci. 2009 , 1169 , 374–384. [ Google Scholar ] [ CrossRef ]
Wang, W.-C. The effect of adaptive music playing system on emotion regulation. J. Acoust. Soc. Am. 2016 , 140 , 3100. [ Google Scholar ] [ CrossRef ]
Harkvoort, L. A music therapy anger management program for forensic offenders. Music Ther. Perspect. 2002 , 20 , 123–132. [ Google Scholar ] [ CrossRef ]
Malakoutikhah, A.; Dehghan, M.; Ghonchehpoorc, A.; Parandeh Afshar, P.; Honarmand, A. The effect of different genres of music and silence on relaxation and anxiety: A randomized controlled trial. Explore 2020 , 16 , 376–381. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Park, J.; Lee, J.; Suh, K. The effect of music on recovery of cardiovascular and psychoaffective responses to stress. Korean J. Health Psychol. 2007 , 12 , 395–409. [ Google Scholar ]
Aalbers, S.; Fusar-Poli, L.; Freeman, R.E.; Spreen, M.; Ket, J.C.; Vink, A.C.; Maratos, A.; Crawford, M.; Chen, X.J.; Gold, C. Music therapy for depression. Cochrane Database Syst. Rev. 2017 , 11 , CD004517. [ Google Scholar ] [ CrossRef ]
Sandstrom, G.M.; Russo, F.A. Absorption in music: Development of a scale to identify individuals with strong emotional responses to music. Psychol. Music 2013 , 41 , 216–228. [ Google Scholar ] [ CrossRef ]
Groarke, J.M.; Hogan, M.J. Development and psychometric evaluation of the adaptive functions of music listening scale. Front. Psychol. 2018 , 9 , 516. [ Google Scholar ] [ CrossRef ]
van Sprang, I.; Haeyen, S. Music therapy for therapeutic development in personality disorders: A qualitative case study. J. Clin. Psychol. 2024 , 79 , 23682. [ Google Scholar ] [ CrossRef ]
Bansal, D. Effect of music therapy on mental disorders. Int. J. Multidiscip. Res. 2024 , 6 , 10944. [ Google Scholar ] [ CrossRef ]
Kim, A.J. Differential effects of musical expression of emotions and psychological distress on subjective appraisals and emotional responses to music. Behav. Sci. 2023 , 13 , 491. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Belgrave, M. The effect of expressive and instrumental touch on the behavior states of older adults with late-stage dementia of the Alzheimer’s type and on music therapist’s perceived rapport. J. Music Ther. 2009 , 46 , 132–146. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Juslin, P.N.; Sloboda, J.A. Music and Emotion: Theory and Research ; Oxford University Press: New York, NY, USA, 2001. [ Google Scholar ]
Sala, G.; Gobet, F. Cognitive and academic benefits of music training with children: A multilevel meta-analysis. Mem. Cognit. 2020 , 48 , 1429–1441. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Miendlarzewska, E.A.; Trost, W.J. How musical training affects cognitive development: Rhythm, reward and other modulating variables. Front. Neurosci. 2014 , 7 , 279. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Cope, P. Informal learning of musical instruments: The importance of social context. Music Educ. Res. 2002 , 4 , 93–104. [ Google Scholar ] [ CrossRef ]
MacDonald, J.; Wilbiks, J.M.P. Undergraduate students with musical training report less conflict in interpersonal relationships. Psychol. Music 2021 , 49 , 533–547. [ Google Scholar ] [ CrossRef ]
North, A.C.; Hargreaves, D.J.; O’Neill, S.A. The importance of music to adolescents. Br. J. Educ. Psychol. 2000 , 70 , 255–272. [ Google Scholar ] [ CrossRef ]
Suh, K.; Park, J. Relationship between the music preference and self-concepts among college students. Korean J. Youth Stud. 2021 , 19 , 257–279. [ Google Scholar ]
Böttcher, A.; Zarucha, A.; Köbe, T.; Gaubert, M.; Höppner, A.; Altenstein, S.; Bartels, C.; Buerger, K.; Dechent, P.; Dobisch, L.; et al. Musical activity during life is associated with multi-domain cognitive and brain benefits in older adults. Front. Psychol. 2022 , 13 , 945709. [ Google Scholar ] [ CrossRef ]
Weinberg, M.K.; Joseph, D. If you’re happy and you know it: Music engagement and subjective wellbeing. Psychol. Music 2017 , 45 , 257–267. [ Google Scholar ] [ CrossRef ]
Lindblad, K.; de Boise, S. Musical engagement and subjective wellbeing amongst men in the third age. Nord. J. Music Ther. 2020 , 29 , 20–38. [ Google Scholar ] [ CrossRef ]
Hazratian, S.; Motaghi, M. Investigation of the effect of music on happiness in the elderly residing at the retirement homes in Kermanshah in 2019. J. Multidiscip. Care 2022 , 11 , 19–24. [ Google Scholar ] [ CrossRef ]
Krause, A.E.; Davidson, J.W.; North, A.C. Musical activity and well-being: A new quantitative measurement instrument. Music Percept. 2018 , 35 , 454–474. [ Google Scholar ] [ CrossRef ]
Vannay, V.; Acebes-de-Pablo, A.; Delgado Álvarez, C. Effects of music therapy on the subjective well-being of women with fibromyalgia: A quasi-experimental study. Arts Psychother. 2023 , 83 , 102002. [ Google Scholar ] [ CrossRef ]
Cho, E.Y.; Cho, G.P.; Kim, J.U. The effect of the music therapy program on happiness, sociality, and self-efficacy of senior citizen. J. Korean Assn. Learn. Cent. Curric. 2019 , 19 , 423–442. [ Google Scholar ] [ CrossRef ]
García, S.M.; Fernández-Company, J.F. Hacia un enfoque terapéutico integrado: Musicoterapia y mindfulness contra el estrés y la ansiedad. Rev. Investig. Musicoter. 2023 , 7 , 30–44. [ Google Scholar ] [ CrossRef ]
Shipman, D. A prescription for music lessons. Fed. Pract. 2016 , 33 , 9–12. [ Google Scholar ]
Kondo, E.; Tabei, K.; Okuno, R.; Akazawa, K. Accessible digital musical instrument can be used for active music therapy in a person with severe dementia and worsening behavioral and psychological symptoms: A case study over a year and a half. Front. Neurol. 2022 , 13 , 831523. [ Google Scholar ] [ CrossRef ]
Román-Caballero, R.; Vadillo, M.; Trainor, L.; Lupiáñez, J. Please don’t stop the music: A meta-analysis of the cognitive and academic benefits of instrumental musical training in childhood and adolescence. Educ. Res. Rev. 2022 , 35 , 100436. [ Google Scholar ] [ CrossRef ]
Saarikallio, S.; Erkkilä, J. The role of music in adolescents’ mood regulation. Psychol. Music 2007 , 35 , 88–109. [ Google Scholar ] [ CrossRef ]
Diener, E. Subjective well-being. Psychol. Bull. 1984 , 95 , 542–575. [ Google Scholar ] [ CrossRef ]
Helliwell, J.F.; Layard, R.; Sachs, J.D. World Happiness Report 2019 ; Sustainable Development Solutions Network: New York, NY, USA, 2019. [ Google Scholar ]
Fernández-Herranz, N.; Ferreras-Mencia, S.; Arribas-Marín, J.M.; Corraliza, J.A. Choral singing and personal well-being: A Choral Activity Perceived Benefits Scale (CAPBES). Psychol. Music 2022 , 50 , 895–910. [ Google Scholar ] [ CrossRef ]
Meir, E.I.; Gati, I. Guidelines for item selection in inventories yielding score profiles. Educ. Psychol. Meas. 1981 , 41 , 1011–1016. [ Google Scholar ] [ CrossRef ]
Zijlmans, E.A.O.; Tijmstra, J.; van der Ark, L.A.; Sijtsma, K. Item−score reliability as a selection tool in test construction. Front. Psychol. 2019 , 9 , 2298. [ Google Scholar ] [ CrossRef ]
Chin, T.C.; Rickard, N.S. The Music USE (MUSE) questionnaire: An instrument to measure engagement in music. Music Percept. 2012 , 29 , 429–446. [ Google Scholar ] [ CrossRef ]
George, M.; Ilavarasu, J. Development and psychometric validation of the Music Receptivity Scale. Front. Psychol. 2021 , 11 , 585891. [ Google Scholar ] [ CrossRef ]
Diener, E.; Emmons, R.A.; Larsen, R.J.; Griffin, S. The Satisfaction with Life Scale. J. Personal. Assess. 1985 , 49 , 71–75. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Cho, M.; Cha, K. International Comparison of Quality of Life ; Jibmoon: Seoul, Republic of Korea, 1998. [ Google Scholar ]
Lyubomirsky, S.; Lepper, H.S. A measure of subjective happiness: Preliminary reliability and construct validation. Soc. Ind. Res. 1999 , 46 , 137–155. [ Google Scholar ] [ CrossRef ]
Kim, J. The Relationship between Life Satisfaction/Life Satisfaction Expectancy and Stress/Well-Being: An Application of Motivational States Theory. Korean J. Health Psychol. 2007 , 12 , 325–3450. [ Google Scholar ] [ CrossRef ]
Kline, R.B. Principles and Practice of Structural Equation Modeling , 2nd ed.; Guilford Press: New York, NY, USA, 2005. [ Google Scholar ]
Hair, J.F.; Black, W.C.; Babin, B.J.; Anderson, R.E. Multivariate Data Analysis , 7th ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2010. [ Google Scholar ]
Cheung, G.W.; Cooper-Thomas, H.D.; Lau, R.S.; Wang, L.C. Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations. Asia Pac. J. Manag. 2024 , 41 , 745–783. [ Google Scholar ] [ CrossRef ]
Tarr, B.; Launay, J.; Dunbar, R.I.M. Music and social bond: “Self-other” merging and neurohormonal mechanisms. Front. Psychol. 2014 , 5 , 1096. [ Google Scholar ] [ CrossRef ]
Kawase, S.; Ogawa, J.; Obata, S.; Hirano, T. An investigation into the relationship between onset age of musical lessons and levels of sociability in childhood. Front. Psychol. 2018 , 9 , 2244. [ Google Scholar ] [ CrossRef ]
Verga, L.; Bigand, E.; Kotz, S.A. Play along: Effects of music and social interaction on word learning. Front. Psychol. 2015 , 6 , 1316. [ Google Scholar ] [ CrossRef ]
Welch, G.F.; Himonides, E.; Saunders, J.; Papageorgi, I.; Sarazin, M. Singing and social inclusion. Front. Psychol. 2014 , 5 , 803. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Hallam, S. The power of music: Its impact on the intellectual, social and personal development of children and young people. Int. J. Music Educ. 2010 , 28 , 269–289. [ Google Scholar ] [ CrossRef ]
Mawang, L.L.; Kigen, E.M.; Mutweleli, S.M. The relationship between musical self-concept and musical creativity among secondary school music students. Int. J. Music Educ. 2019 , 37 , 78–90. [ Google Scholar ] [ CrossRef ]
Shayan, N.; AhmadiGatab, T.; Jeloudar, J.G.; Ahangar, K.S. The effect of playing music on the confidence level. Procedia Soc. 2011 , 30 , 2061–2063. [ Google Scholar ] [ CrossRef ]
Evans, P.; Bonneville-Roussy, A. Self-determined motivation for practice in university music students. Psychol. Music 2016 , 44 , 1095–1110. [ Google Scholar ] [ CrossRef ]
Giordano, F.; Scarlata, E.; Baroni, M.; Gentile, E.; Puntillo, F.; Brienza, N.; Gesualdo, L. Receptive music therapy to reduce stress and improve wellbeing in Italian clinical staff involved in COVID-19 pandemic: A preliminary study. Arts Psychother. 2020 , 70 , 101688. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Wesseldijk, L.W.; Ullén, F.; Mosing, M.A. The effects of playing music on mental health outcomes. Sci. Rep. 2019 , 9 , 12606. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Creech, A. Learning a musical instrument: The case for parental support. Music Educ. Res. 2010 , 12 , 13–32. [ Google Scholar ] [ CrossRef ]
Larsen, J.T.; Hershfield, H.E.; Stastny, B.J.; Hester, N. On the relationship between positive and negative affect: Their correlation and their co-occurrence. Emotion 2017 , 17 , 323–336. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Oh, V.Y.S.; Tong, E.M.W. Mixed emotions, but not positive or negative emotions, facilitate legitimate virus-prevention behaviors and eudaimonic outcomes in the emergence of the COVID-19 crisis. Affect. Sci. 2021 , 2 , 311–323. [ Google Scholar ] [ CrossRef ] [ PubMed ]

Click here to enlarge figure

NO	Items	Factor Loadings			h
NO	Items	1	2	3	h
1.	I like learning to play challenging music.	0.553			0.401
2.	Learning to play musical instruments together with others can make me feel close to them.	0.407			0.371
3.	I like playing musical instruments as a group with other people.	0.822			0.632
4.	Playing musical instruments together improves family ties.	0.450			0.311
5.	It is fun to play musical instruments with other people.	0.822			0.620
6.	Playing musical instruments with others makes us more united.	0.516			0.473
7.	I am looking forward to practicing playing musical instruments with other people.	0.820			0.653
8.	I feel good when I realize that my musical instrument playing skills have improved.		0.599		0.445
9.	I feel good when I play an instrument in front of people and receive applause.		0.696		0.451
10.	I feel a sense of accomplishment when I play an instrument.		0.419		0.425
11.	My sense of accomplishment is great when I play an instrument well in front of others.		0.656		0.454
12.	I feel a sense of accomplishment when I master playing a piece of music.		0.601		0.420
13.	Tackling difficult music and playing it gives me a great sense of achievement.		0.720		0.518
14.	Playing an instrument makes me feel different.			0.600	0.405
15.	When I play an instrument, the thoughts that bothered me disappear.			0.435	0.383
16.	Playing musical instruments can make my partner admire me even more.			0.585	0.410
	Eigenvalues	6.23	1.58	1.17
	% Variance	38.92	9.85	7.31	56.08

Scale	Learning/Social Bonds	Achievement/Pride	Cognitive Refreshment and Stimulation	EPIS
Cognitive and emotional regulation	0.69 ***	0.61 **	0.67 ***	0.71 ***
Engaged production	0.76 ***	0.68 ***	0.70 ***	0.78 ***
Social connection	0.71 ***	0.63 ***	0.63 ***	0.72 ***
Physical exercise	0.68 ***	0.65 ***	0.62 ***	0.71 ***
Dance	0.57 ***	0.44 ***	0.53 ***	0.55 ***
MUSE	0.80 ***	0.71 ***	0.74 ***	0.81 ***
Affect	0.66 ***	0.70 ***	0.64 ***	0.73 ***
Attention	−0.15 *	−0.19 **	−0.17 **	−0.18 **
MRS	0.47 ***	0.48 ***	0.44 ***	0.50 ***
Emotional use of music	0.66 ***	0.65 ***	0.65 ***	0.71 ***
Rational/Cognitive use of music	0.50 ***	0.29 ***	0.51 ***	0.46 ***
Background use of music	0.57 ***	0.45 ***	0.56 ***	0.57 ***
UMI	0.64 ***	0.49 ***	0.64 ***	0.63 ***
Skewness	−0.89	−1.35	−0.78	−1.19
Kurtosis	0.61	1.53	0.28	1.34

Variables	Learning/Social Bonds	Achievement/Pride	Cognitive Refreshment/Stimulation	EPIS
Life satisfaction	0.57 ***	0.52 ***	0.55 ***	0.59 ***
Positive emotions	0.54 ***	0.52 ***	0.53 ***	0.57 ***
Negative emotions	0.25 ***	0.19 ***	0.29 ***	0.25 ***
Subjective well-being	0.47 ***	0.46 ***	0.43 ***	0.49 ***
Subjective happiness	0.38 ***	0.39 ***	0.34 ***	0.40 ***

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Zhang, Q.; Park, A.; Suh, K.-H. Relationship between Playing Musical Instruments and Subjective Well-Being: Enjoyment of Playing Instruments Scale. Behav. Sci. 2024 , 14 , 744. https://doi.org/10.3390/bs14090744

Zhang Q, Park A, Suh K-H. Relationship between Playing Musical Instruments and Subjective Well-Being: Enjoyment of Playing Instruments Scale. Behavioral Sciences . 2024; 14(9):744. https://doi.org/10.3390/bs14090744

Zhang, Qian, Alexander Park, and Kyung-Hyun Suh. 2024. "Relationship between Playing Musical Instruments and Subjective Well-Being: Enjoyment of Playing Instruments Scale" Behavioral Sciences 14, no. 9: 744. https://doi.org/10.3390/bs14090744

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

[PDF] Validity and Reliability of the Research Instrument; How to Test
Validity In Research
(PDF) Validation Instrument for Undergraduate Qualitative Research
8. validity and reliability of research instruments
PPT
Instrument Validity and Reliability

VIDEO

Research instrument VALIDITY & RELIABILITY
Validation Of Research Instruments
Validity vs Reliability || Research ||
Research Instrument Validity
Developing the Research Instrument/Types and Validation
Internal & External Validity Research

COMMENTS

The 4 Types of Validity in Research
The 4 Types of Validity in Research | Definitions & Examples. Published on September 6, 2019 by Fiona Middleton.Revised on June 22, 2023. Validity tells you how accurately a method measures something. If a method measures what it claims to measure, and the results closely correspond to real-world values, then it can be considered valid.
Validity and Reliability of the Research Instrument; How to Test the
The validity of the instruments was evaluated through various methods, including the examination of reports, literature reviews, consultations with the research tutor and co-tutors, and engagement ...
How to Determine the Validity and Reliability of an Instrument
Validity refers to the degree to which an instrument accurately measures what it intends to measure. Three common types of validity for researchers and evaluators to consider are content, construct, and criterion validities. Content validity indicates the extent to which items adequately measure or represent the content of the property or trait ...
Validity
Ensuring validity in research involves several strategies: Clear Operational Definitions: Define variables clearly and precisely. Use of Reliable Instruments: Employ measurement tools that have been tested for reliability. Pilot Testing: Conduct preliminary studies to refine the research design and instruments.
Reliability vs. Validity in Research
Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.opt. It's important to consider reliability and validity when you are creating your research design, planning your methods, and writing up your results, especially in quantitative research. Failing to do so can lead to several types of research ...
A Primer on the Validity of Assessment Instruments
What is validity? 1. Validity in research refers to how accurately a study answers the study question or the strength of the study conclusions. For outcome measures such as surveys or tests, validity refers to the accuracy of measurement. Here validity refers to how well the assessment tool actually measures the underlying outcome of interest.
Validity in Research: A Guide to Better Results
An essential part of validity is choosing the right research instrument or method for accurate results. Consider the thermometer that is reliable but still produces inaccurate results. You're unlikely to achieve research validity without activities like calibration, content, and construct validity.
Validity and reliability in quantitative studies
Validity. Validity is defined as the extent to which a concept is accurately measured in a quantitative study. For example, a survey designed to explore depression but which actually measures anxiety would not be considered valid. The second measure of quality in a quantitative study is reliability, or the accuracy of an instrument.In other words, the extent to which a research instrument ...
Validity & Reliability In Research
In simple terms, validity (also called "construct validity") is all about whether a research instrument accurately measures what it's supposed to measure. For example, let's say you have a set of Likert scales that are supposed to quantify someone's level of overall job satisfaction. If this set of scales focused purely on only one ...
Validity in Research and Psychology: Types & Examples
In this vein, there are many different types of validity and ways of thinking about it. Let's take a look at several of the more common types. Each kind is a line of evidence that can help support or refute a test's overall validity. In this post, learn about face, content, criterion, discriminant, concurrent, predictive, and construct ...
17.4.1 Validity of instruments
17.4.1 Validity of instruments. 17.4.1. Validity of instruments. Validity has to do with whether the instrument is measuring what it is intended to measure. Empirical evidence that PROs measure the domains of interest allows strong inferences regarding validity. To provide such evidence, investigators have borrowed validation strategies from ...
Reliability and Validity of Measurement
Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to. Validity is a judgment based on various types of evidence.
Validity and Reliability of the Research Instrument; How to Test the
Often new researchers are confused with selection and conducting of proper validity type to test their research instrument (questionnaire/survey). This review article explores and describes the validity and reliability of a questionnaire/survey and also discusses various forms of validity and reliability tests.
Measuring the Validity and Reliability of Research Instruments
The application of the Rasch model in validity and reliability research instruments is valuable because the model able to define the constructs of valid items and provide a clear definition of the measurable constructs that are consistent with theoretical expectations. Interestingly, this model can be effectively used on items that can be ...
Instrument Validity
Instrument Validity. Validity (a concept map shows the various types of validity) A instrument is valid only to the extent that it's scores permits appropriate inferences to be made about. 1) a specific group of people for. 2) specific purposes. An instrument that is a valid measure of third grader's math skills probably is not a valid ...
What Is Content Validity?
Content validity evaluates how well an instrument (like a test) covers all relevant parts of the construct it aims to measure. Here, a construct is a theoretical concept, ... Example: Content validity in psychology. Psychological research often involves developing screening tools to identify metrics for clinical diagnoses. Let's say you are ...
Quantitative Research Excellence: Study Design and Reliable and Valid
Learn how to design and measure quantitative research with excellence and validity from this comprehensive article.
Reliability and Validity
Validity in research refers to the extent to which a study accurately measures what it intends to measure. It ensures that the results are truly representative of the phenomena under investigation. ... Validity indicates how well an instrument measures what it's intended to measure, ensuring accuracy and relevance. While a test can be ...
Method of preparing a document for survey instrument validation by
Validation of a survey instrument is an important activity in the research process. Face validity and content validity, though being qualitative methods, are essential steps in validating how far the survey instrument can measure what it is intended for. These techniques are used in both scale development processes and a questionnaire that may ...
Validity and Reliability of the Research Instrument; How to Test the
Validity basically means "measure what is intended to be measured" (Field, 2005). In this paper, main types of validity namely; face validity, content validity, construct validity, criterion validity and reliability are discussed. Figure 1 shows the subtypes of various forms of validity tests exploring and describing in this article.
Instrument, Validity, Reliability
Validity is the extent to which an instrument measures what it is supposed to measure and performs as it is designed to perform. It is rare, if nearly impossible, that an instrument be 100% valid, so validity is generally measured in degrees. As a process, validation involves collecting and analyzing data to assess the accuracy of an instrument.
What Is Criterion Validity?
Revised on June 22, 2023. Criterion validity (or criterion-related validity) evaluates how accurately a test measures the outcome it was designed to measure. An outcome can be a disease, behavior, or performance. Concurrent validity measures tests and criterion variables in the present, while predictive validity measures those in the future.
Validity and Reliability of the Research Instrument
There are two types of criterion validity namely; concurrent validity, predictive and postdictive validity. 6. Reliability. Reliability concerns the extent to which a measurement of a phenomenon provides stable and consist result (Carmines and Zeller [ 13] ). Reliability is also concerned with repeatability.
Relationship between Playing Musical Instruments and Subjective ...
This study highlights the usefulness of the Enjoyment of Playing Instruments Scale (EPIS) as a measure for research, educational, and clinical use, providing a rationale for using instrument playing as a therapeutic approach to promote subjective well-being. ... validity of the scale and the relationships between the variables were analyzed ...

Validity – Types, Examples and Guide

Research Validity

How to Ensure Validity in Research

Types of Validity

Internal Validity

External Validity

Construct Validity

Content Validity

Criterion Validity

Face Validity

Importance of Validity

Examples of Validity

Where to Write About Validity in A Thesis

Applications of Validity

Limitations of Validity

About the author

Muhammad Hassan

You may also like

External Validity – Threats, Examples and Types

Construct Validity – Types, Threats and Examples

Reliability Vs Validity

Internal Consistency Reliability – Methods...

Alternate Forms Reliability – Methods, Examples...

Parallel Forms Reliability – Methods, Example...

Validity in research: a guide to measuring the right things

Make research less tedious

Why is validity important in research?

How is reliability measured?

How is validity measured?

What are the common validity threats in research, and how can their effects be minimized or nullified?

How do you maintain validity in research?

Is there a need for validation of the research instrument before its implementation?

Should you be using a customer insights hub?

Editor’s picks

Latest articles

Log in using your username and password

You are here

Statistics from Altmetric.com

Reliability

Read the full text or download the PDF:

Validity & Reliability In Research

Overview: Validity & Reliability

First, The Basics…

What Is Validity?

Need a helping hand?

What Is Reliability?

Recap: Key Takeaways

Psst… there’s more!

Submit a Comment Cancel reply

Validity in Research and Psychology: Types & Examples

What is Validity in Psychology, Research, and Statistics?

Evaluating Validity

Types of Validity

Face Validity

Content Validity

Criterion Validity

Discriminant Validity

Concurrent Validity

Predictive Validity

Construct Validity

Share this:

Reader Interactions

Comments and Questions Cancel reply

17.4.1 Validity of instruments

Reliability and Validity of Measurement

Reliability

Test-Retest Reliability

Internal Consistency

Interrater Reliability

Face Validity

Content Validity

Criterion Validity

Discriminant Validity

Share This Book

Validity and Reliability of the Research Instrument; How to Test the Validation of a Questionnaire/Survey in a Research

Hamed Taherdoost

Hamed Taherdoost (Contact Author)

Do you have a job opening that you would like to promote on SSRN?

Political Behavior: Voting & Public Opinion eJournal

Educational Research Basics by Del Siegle