Skip Nav

# What is Reliability?

❶Manual for the Minnesota Multiphasic Personality Inventory. External validity - the results can be generalized beyond the immediate study.

## Inter-Rater or Inter-Observer Reliability

The correlation between the two parallel forms is the estimate of reliability. One major problem with this approach is that you have to be able to generate lots of items that reflect the same construct. This is often no easy feat. Furthermore, this approach makes the assumption that the randomly divided halves are parallel or equivalent. Even by chance this will sometimes not be the case.

The parallel forms approach is very similar to the split-half reliability described below. The major difference is that parallel forms are constructed so that the two forms can be used independent of each other and considered equivalent measures.

For instance, we might be concerned about a testing threat to internal validity. If we use Form A for the pretest and Form B for the posttest, we minimize that problem. With split-half reliability we have an instrument that we wish to use as a single measurement instrument and only develop randomly split halves for purposes of estimating reliability.

In internal consistency reliability estimation we use our single measurement instrument administered to a group of people on one occasion to estimate reliability. In effect we judge the reliability of the instrument by estimating how well the items that reflect the same construct yield similar results.

We are looking at how consistent the results are for different items for the same construct within the measure. There are a wide variety of internal consistency measures that can be used. The average inter-item correlation uses all of the items on our instrument that are designed to measure the same construct. We first compute the correlation between each pair of items, as illustrated in the figure. For example, if we have six items we will have 15 different item pairings i.

The average interitem correlation is simply the average or mean of all these correlations. In the example, we find an average inter-item correlation of. This approach also uses the inter-item correlations. In addition, we compute a total score for the six items and use that as a seventh variable in the analysis. The figure shows the six item-to-total correlations at the bottom of the correlation matrix.

In split-half reliability we randomly divide all items that purport to measure the same construct into two sets. We administer the entire instrument to a sample of people and calculate the total score for each randomly divided half.

In the example it is. Imagine that we compute one split-half reliability and then randomly divide the items into another set of split halves and recompute, and keep doing this until we have computed all possible split half estimates of reliability.

Cronbach's Alpha is mathematically equivalent to the average of all possible split-half estimates, although that's not how we compute it. Notice that when I say we compute all possible split-half estimates, I don't mean that each time we go an measure a new sample!

That would take forever. Instead, we calculate all split-half estimates from the same sample. Because we measured all of our sample on each of the six items, all we have to do is have the computer analysis do the random subsets of items and compute the resulting correlations.

The figure shows several of the split-half estimates for our six item example and lists them as SH with a subscript.

A typical assessment would involve giving participants the same test on two separate occasions. If the same or similar results are obtained then external reliability is established. The disadvantages of the test-retest method are that it takes a long time for results to be obtained. The timing of the test is important; if the duration is to brief then participants may recall information from the first test which could bias the results.

Alternatively, if the duration is too long it is feasible that the participants could have changed in some important way which could also bias the results.

This refers to the degree to which different raters give consistent estimates of the same behavior. Inter-rater reliability can be used for interviews. Note, it can also be called inter-observer reliability when referring to observational research. Here researcher when observe the same behavior independently to avoided bias and compare their data. If the data is similar then it is reliable. In this scenario it would be unlikely they would record aggressive behavior the same and the data would be unreliable.

However, if they were to operationalize the behavior category of aggression this would be more objective and make it easier to identify when a specific behavior occurs. Thus researchers could simply count how many times children push each other over a certain duration of time. Manual for the beck depression inventory The Psychological Corporation. San Antonio , TX.

Manual for the Minnesota Multiphasic Personality Inventory. Saul McLeod , published The term reliability in psychological research refers to the consistency of a research study or measuring test. Criterion-Related Validity is used to predict future or current performance - it correlates test results with another criterion of interest.

If a physics program designed a measure to assess cumulative student learning throughout the major. The new measure could be correlated with a standardized measure of ability in this discipline, such as an ETS field test or the GRE subject test.

The higher the correlation between the established measure and new measure, the more faith stakeholders can have in the new assessment tool. If the measure can provide information that students are lacking knowledge in a certain area, for instance the Civil Rights Movement, then that assessment tool is providing meaningful information that can be used to improve the course or program requirements.

Sampling Validity similar to content validity ensures that the measure covers the broad range of areas within the concept under study.

Not everything can be covered, so items need to be sampled from all of the domains. When designing an assessment of learning in the theatre department, it would not be sufficient to only cover issues related to acting. Other areas of theatre such as lighting, sound, functions of stage managers should all be included.

The assessment should reflect the content area in its entirety. National Council on Measurement in Education. Standards for educational and psychological testing.

## Main Topics

Reliability refers to whether or not you get the same answer by using an instrument to measure something more than once. In simple terms, research reliability is the degree to which research method produces stable and consistent results. A specific measure is considered to be reliable if its.

### Privacy FAQs

Reliability has to do with the quality of measurement. In its everyday sense, reliability is the "consistency" or "repeatability" of your measures. Before we can define reliability precisely we have to .