**`Chapter 4 Test Questions**

** **

- In Classical Test Theory, the X represents _______ and the T represents ________.
- measurement error, observed score
- observed score; stable test-taker characteristics X
- observed score; measurement error
- stable test-taker characteristics; observed score

- _________ refers to the consistency or stability of test scores.
- Measurement error
- Reliability X
- Variance
- Validity

- Classical Test Theory focuses our attention on ________ measurement error.
- random X
- variable
- standard
- systematic

- The mean of error scores in a population is equal to ________.
- 1
- 0 X
- 1
- 10

- There is (a) _______ relationship between an individuals level on a construct and the amount of measurement error impacting their observed score.
- no X
- weak
- moderate
- strong

- ________ is/are usually considered the largest source of error in test scores.
- Administrative errors
- Clerical errors
- Content sampling X
- Time sampling

- On a test comprised of constructed response items, it is important to consider:
- administrative errors.
- clerical errors.
- inter-rater differences. X
- time sampling.

- The reliability coefficient (r
_{xx}) equals true score variance (**s**^{2}_{T}) divided by the:- observed score.
- measurement error .
- variance due to measurement error.
- variance of the total test. X

- What conclusion could be accurately drawn from a reliability coefficient of .80?
- 18% of test score variance is due to true score variance.
- 20% of test score variance is due to true score variance.
- 64% of test score variance is due to true score variance.
- 80% of test score variance is due to true score variance. X

- If 6% of test scores observed variance is due to measurement error, the reliability coefficient of the test would be:
- .06.
- .36.
- .60.
- .94. X

- Test-retest reliability is primarily sensitive to measurement error due to:
- content sampling.
- content sampling and temporal instability.
- factor invariance.
- temporal instability. X

- Alternate form reliability based on simultaneous administration is primarily sensitive to measurement error due to:
- content sampling. X
- content sampling and temporal instability.
- practice effects.
- temporal instability.

- Alternate form reliability based on delayed administration is sensitive to measurement error due to:
- content sampling.
- content sampling and temporal instability. X
- practice effects.
- temporal instability.

- As a general rule, _________ tests produce more reliable scores than ______ tests.
- brief; lengthy
- intensive; extensive
- longer; shorter X
- shorter; longer

- The uncorrected split-half reliability coefficient __________ the reliability of the full test score.
- accurately reflects
- overestimates
- underestimates X
- provides an indeterminate reflection of

- Split-half reliability is primarily sensitive to measurement error due to:
- content sampling. X
- content sampling and temporal instability.
- practice effects.
- temporal instability.

- __________ is sensitive to the heterogeneity of the test content.
- Alternate-from reliability with delayed administration
- Coefficient Alpha X
- Split-half reliability
- Test-retest reliability

- _________ is applicable when test items are scored dichotomously while _________ can be used when test items produce multiple values.
- Coefficient Alpha; KR 20
- KR 20; Coefficient Alpha X
- Split-half reliability; test-retest reliability
- Test-retest reliability; split-half reliability

- On a classroom essay test, __________ is a major concern.
- inter-rater reliability X
- internal consistency reliability
- split-half reliability
- test-retest reliability

- The reliability of composite scores is generally ________ the reliability of the individual scores that contributed to the composite.
- equal to
- higher than X
- lower than

- Which of the following is a measure of inter-rater agreement that takes into consideration the degree of agreement expected by chance?
- KR 20
- Strong-Campbell Beta
- Cronbachs Coefficient alpha
- Cohens kappa X

- Which of the following methods is necessary when estimating the reliability of a test score intended to predict performance at a future time?
- Alternate form reliability with simultaneous administration
- Coefficient alpha
- KR 20
- Test-rest reliability X

- Which reliability estimate would be preferred for a score derived from a test with heterogeneous content?
- Coefficient Aplha
- KR 20
- Split-Half Coefficient X

- Which method of rating reliability would be appropriate for scores from a speed test?
- Coefficient Alpha
- Kuder Richardson 20
- Test-retest reliability X
- Split-half reliability

- Reliability coefficients based on a homogeneous sample would likely be ________ coefficients based on a heterogeneous sample.
- equal to
- larger than
- smaller than X

- As the reliability of a test score_______ the standard error of measurement _______.
- decreases; increases X
- decreases; decreases
- increases; remains the same
- decreases; remains the same

- Sallys obtained scored on a statistics exam is 75. The SEM is 2. With what confidence interval would we capture her true score 68% of the time?
- 71 to 79
- 73 to 77 X
- 69 to 81
- 70 to 80

- Generalizability Theory typically uses which statistical procedure to estimate reliability?
- Analysis of Variance (ANOVA) X
- Correlation Coefficient
- Linear Regression
- Multivariate Analysis of Vvariance (MANOVA)

- The average of all possible split-half coefficients is known as:
- Coefficient alpha. X
- correlation coefficient.
- alternate form reliability.
- Spearman-Brown coefficient.

- A limitation of the test-retest approach to estimating reliability is the influence of:
- administration effects.
- content effects.
- practice effects. X
- temporal effects.

- The Spearman-Brown formula is used to:
- correct a split-half reliability coefficient. X
- estimate construct reliability.
- perform a curvilinear transformation of the scores.
- perform a linear transformation of the scores.

- As reliability increases, confidence intervals:
- X
- do not change.

- _________ is a result of transient events in the test taker (fatigue, illness, etc.) and the testing environment (temperature, noise level, etc.).
- Administration error
- Content sampling error
- Temporal instability X
- Systematic measurement error

- The reliability of difference scores is typically _______ the reliability of the individual scores.
- equal to
- higher than
- lower than X

- The reliability index reflects the correlation between:
- true scores and observed scores. X
- true scores and measurement error.
- observed scores and measurement error.
- true scores and true scores.

- If the reliability coefficient equals .81, the reliability index equals:
- .19.
- .81.
- .90. X
- 0.

- What happens to the size of confidence intervals as reliability coefficients increase?
- They decrease. X
- They increase.
- They remain the same.
- It is indeterminate it depends on the construct being measured.

- The ____________ is an index of the amount of measurement error in test scores and is used in calculating confidence intervals.
- Standard Error of Estimate
- Standard Error of Measurement X
- Spearman-Brown Coefficient
- Skew Coefficient

- ______________________ is a useful index when comparing the reliability of the scores produced by different tests, but when the focus is on interpreting the test scores of individuals, the ________________________ is more practical.
- Reliability Coefficient; Standard Error of Measurement X
- Standard Error of Measurement; Reliability Coefficient
- Standard Error of Estimate; Coefficient Alpha
- Standard Error of Estimate; Reliability Coefficient

- In Item Response Theory, information on the reliability of test scores is typically reported as a:
- Test Information Function X
- Standard Error of Estimate
- Skew Coefficient
- Coefficient of Determination
- Coefficient of Non-determination

**Chapter 6 Test Questions**

** **

- An oral examination, scored by examiners who use a manual and rubric, is an example of _________ scoring.
- objective
- subjective X
- projective
- validity

- A fill-in-the-blank question is a ___________ item.
- constructed-response X
- selected-response
- typical-response
- objective-response

- Which of the following formats is a selected-response format?
- Multiple-choice
- True-false
- Matching
- All of above X

- How many distracters is it recommended that one provide for multiple choice items?
- 2
- 2 to 6
- 3 to 5 X
- 4

- When writing true-false items, one should include approximately _________ true and ________ false.
- 30%; 70%
- 50%; 50% X
- 70%; 30%

- When developing matching items, one should keep the lists as ___________ as possible.
- heterogeneous
- homogeneous X
- sequential
- simultaneous

- What is a strength of selected-response items?
- Selected-response items are easy and quick to write.
- Selected-response items can be used to assess all constructs.
- Selected-response items can be objectively scored. X

- ___________ require examinees to complete a process or produce a project in a real-life simulation.
- Projective tests
- Performance assessments X
- Selected response test
- Multi-trait/multi-method tasks

- A strength of constructed-response items is that they:
- eliminate random guessing. X
- produce highly reliable scores
- can be quickly completed by examinees.
- eliminate feigning.

- You are creating a test designed to assess a flute players ability. Which format would assess this domain most effectively?
- Performance assessment X
- Matching
- Selected-response
- True-false

- General guidelines for writing test items include:
- the frequent use of negative statements.
- the use of complex, compound sentences to challenge the examinees.
- the avoidance of inadvertent cues to the answers. X
- arranging items in a non-systematic manner.

- When developing maximum performance tests, it is best to arrange the items:
- from easiest to hardest. X
- from hardest to easiest.
- in the order the information was taught.

- Including more selected-response and other time-efficient items can:
- enhance the sampling of the content domain and increase reliability. X
- enhance the sampling of the content domain and decrease reliability.
- introduce construct irrelevant variance.
- decrease validity.

- In order to determine the number of items to include on a test, one should consider the:
- age of examinees.
- purpose of test.
- types of items.
- type of test.
- All of the above X

- __________ are reported as the most popular selected-response items.
- Essays
- Matching
- Multiple-choice X
- True-false

- When writing multiple-choice items, one advantage to the ______________ is that it may present the problem in a more concise manner.
- direct-question format X
- incomplete sentence format
- indirect question format

- What would be the recommended multiple-choice format for the stem: What does 1010 equal?
- Best answer
- Correct answer X
- Closed negative
- Double negative

- Which multiple-choice answer format requires the examinee to make subtle distinctions among distracters?
- Best answer X
- Correct answer
- Closed negative
- Double negative

- Which of the following is NOT a guideline for developing true-false items?
- Include more than one idea in the statement. X
- Avoid using specific determiners such as
*all*,*none*, or*never*. - Ensure that true and false statements are approximately the same length.
- Avoid using moderate determiners such as
*sometimes*and*usually*.

- What is a strength of true-false items?
- They can measure very complex objectives.
- Examinees can answer many items in a short period of time. X
- They are not vulnerable to guessing.

- _________ scoring rubrics identify different aspects or dimensions, each of which is scored separately.
- Analytic X
- Holistic
- Sequential
- Simultaneous

- With a _______ rubric, a single grade is assigned based on the overall quality of the response.
- analytic
- holistic X
- reliable
- structured

- One way to increase the reliability of short-answer items is to:
- give partial credit.
- provide a word bank.
- use the incomplete sentence format with multiple blanks.
- use a scoring rubric. X

- What item format is commonly used in both maximum performance tests and typical response tests?
- Constructed-response
- Multiple-choice
- Rating scales
- True-false X

- For typical-response tests, which format provides more information per item and thus can increase the range and reliability of scores?
- Constructed-response
- Frequency ratings X
- True-false
- Matching

- Which format is the most popular when assessing attitudes?
- Constructed-response
- Forced choice
- Frequency scales
- Likert items X
- True-false

- What is a guideline for developing typical response items?
- Include more than one construct per item to increase variability.
- Include items that are worded in both positive and negative directions. X
- Include more than 5 options on rating scales in order to increase reliability.
- Include statements that most people will endorse in a specific manner.

- Examinees tend to overuse the neutral response when Likert items use ________ and may omit items when Likert items use __________.
- an odd number of options; an even number of options X
- an even number of options; an odd number of options
- homogenous options; heterogeneous options
- heterogeneous options; homogenous options

- Which of the following items are difficult to score in a reliable manner and subject to feigning?
- Constructed-response X
- True-false
- Selected-response
- Forced choice

- Guttman scales provide which scale of measurement?
- Nominal
- Ordinal X
- Interval
- Ratio

- Which assessment would best use a Thurstone scale?
- Constructed-response test
- Maximum performance test
- Speed test
- Power test
- Typical response test X

- According to a study by Powers and Kaufman (2002) regarding the relationship between performance on the GRE and creativity, depth, and quickness, what were the findings?
- There is substantial evidence that creative, deep-thinkers are penalized by multiple-choice items.
- There was no evidence that creative, deep-thinkers are penalized by multiple-choice items. X
- There was a significant negative correlation between GRE scores and depth.
- There was a significant negative correlation between GRE scores and creativity.

- _________ are a form of performance assessment that involves the systematic collection and evaluation of work products.
- Rubrics
- Virtual exams
- Practical assessments
- Portfolio assessments X

- Distracters are:
- rubric grading criteria. X
- the incorrect response on a multiple-choice items.
- words inserted in an item intended to trick the examinee.
- unintentional clues to the correct answer.

