Testing, Testing, Testing

Chapter 6. Methods of Measuring Behavior


This chapter will assist students in understanding the different methods used to measure behavior and collect data, with emphasis placed on the research question as the driving force of the type of data that should be collected. Considerable time is devoted to the discussion of the types of tests as well as the techniques involved in conducting an item analysis. The use of attitude scales and survey questionnaires is also addressed.


List five reasons why tests are useful.
Discuss the various types of tests and how they are used.
Conduct an item analysis identifying the discrimination and difficulty indices for each item in a test.
Explain the difference between discrimination index and difficulty index.
List the various techniques used to record behavior.
Explain the differences between Thurstone and Likert scales.
Write questions using a Thurstone scale and Likert scales.
List the factors to consider in order to make questionnaires successful.


I. Tests and Their Development
A. Why Use Tests?
B. What Tests Look Like
II. Types of Tests
A. Achievement Tests
B. Multiple-Choice Achievement Items
1) The Anatomy of a Multiple-Choice Item
2) To Use or Not to Use?
3) Item Analysis: How to Tell if Your Items Work
C. Attitude Tests
1) Thurstone Scales
2) Likert Scales
D. Personality Tests
E. Observational Techniques
1) Techniques for Recording Behavior
2) Observational Techniques? Be Careful!
F. Questionnaires
1) What Makes Questionnaires Work
a. Basic Assumptions of the Questionnaire
b. What About the Questions?
c. The Format of the Questionnaire
d. The Cover Letter



Achievement Tests
Standardized Tests
Researcher/Teacher-made Tests
Norm-referenced Tests
Criterion-referenced Tests
Item Analysis
Difficulty Index
Discrimination Index
Attitude Tests
Thurstone Scale
Method of Equal-appearing Intervals
Likert Scale
Method of Summated Ratings
Personality Tests
Projective Tests
Structured Tests
Duration Recording
Frequency Recording
Interval Recording (Time Sampling)
Continuous Recording
Cover Letter


1. List five reasons why tests are useful.
2. Discuss the various types of tests and how they are used.
3. Conduct an item analysis identifying the discrimination and difficulty indices for each item in a test.
4. Explain the difference between discrimination index and difficulty index.
5. List the various techniques used to record behavior.
6. Discuss and explain the differences between Thurstone and Likert scales.
7. Write questions using a Thurstone scale and Likert scales.
8. List the factors to consider in order to make questionnaires successful.


1. Why are tests so popular in the field of assessment and social research? What type of information do they provide? Can they be misused?
2. What are the differences between criterion-referenced and norm-referenced tests? Provide an example of when each test might be an appropriate method of assessment.
3. Referring to Figure 6.2 in the text, demonstrate the relationship between item discrimination and item difficulty. To facilitate discussion, pose questions to the class such as:
a. In order for an item to have a discrimination index of +1.0, the item needs to have a difficulty index of 50 percent. Why?
b. If an item has a difficulty index of .4 and a discrimination index of -.5, is this a good item? Why or why not?
4. Discuss the different types of recording behavior. How do they work? Provide an example of each.
5. What are the differences between a Thurstone and Likert scale? Why is one more preferable as opposed to the other?
6. What are the factors that help distinguish a good questionnaire? What are the important points to remember about each of these factors?


1. The crossword puzzle that ends chapter 6 of this manual may be used to initiate class discussion and/or review key vocabulary that relates to the topic of measuring behavior (see Appendix A for answer key).
2. Table 6.1 can be used to introduce the various types of tests and how they are used to collect information. Have students work in small groups to generate lists of real-life examples for each test type. Together, as a large group, generate a class list of examples.
3. Students work together in small groups to generate questions that might be included on a questionnaire. Have each group share their set of questions and ask the class as a whole to judge whether or not the questions would be considered good. Have the class explain why/why not based on their judgment of the questions.
4. Using the results of a real (the students first test in your class) or theoretical test, calculate the difficulty index and discrimination index of several questions as a group. Discuss the indexes and the value of each question based on the results.


Part I. Multiple-Choice and Fill-in-the-Blank Questions (32 items)
1. What is NOT the purpose of a test?
a) assist in placement
b) help researchers determine the outcome of an experiment
c) assist in selection
d) collect detailed and unique information about each test taker

2. A test that measures knowledge of a specific topic is considered what type of test?
a) achievement test
b) attitude test
c) personality test
d) projective test

3. What is an advantage of using multiple choice questions?
a) There is no chance to practice writing.
b) It is difficult to fake a correct answer.
c) They limit the type of content that can be assessed.
d) Some people do not like them.

4. A Likert scale is the most popular type of scale for which type of test?
a) projective
b) personality
c) achievement
d) attitude

5. Recording the number of occurrences of a particular behavior is noted in which of the following?
a) duration
b) interval
c) frequency
d) continuous

6. Which type of recording would measure the length of time a behavior occurs?
a) duration
b) interval
c) frequency
d) continuous

7. In which type of recording is a subject observed for a particular amount of time?
a) duration
b) interval
c) frequency
d) continuous

8. All the behavior of a subject is recorded in _____________________.
a) duration recording
b) interval recording
c) frequency recording
d) continuous recording

9. What is one disadvantage to using questionnaires?
a) They save time.
b) They are inexpensive.
c) The completion rate is low.
d) People may be more truthful.

10. Which of the following would be considered a poor questionnaire question?
a) Do you ever cheat on exams?
b) Do you often feel anxious about taking a test?
c) Your boss says he will give you a raise, but you dont believe him, do you?
d) How often do you use cigarettes?

11. In which of the following instances are tests used as dependent variables?
a) to assist in selection to a graduate program
b) to help in placing a child into an appropriate program based on reading level
c) to screen for a possible learning disability
d) to determine the effectiveness of a specific type of instruction

12. Which of the following difficulty levels indicates the item that was the most difficult?
a) 0.40
b) 0.30
c) 0.05
d) 0.50

13. A standardized test _________________________________________.
a) is usually commercially produced and requires a common set of administration and scoring procedures
b) compares an individuals test performance to the test performance of others
c) relies only on multiple choice questions
d) is designed and administered by a researcher for a specific research study

14. Which type of test assumes that a persons responses reveal a (perhaps unconscious) world
view which can be interpreted by an expert?
a) Likert scales
b) Thurstone scales
c) achievement tests
d) projective tests

15. Which assessment technique is used in field work?
a) Likert scaling
b) Thurstone scaling
c) observation
d) item analysis

16. Which of the following is NOT a component of a multiple choice question?
a) the stem
b) the root
c) distracters
d) alternatives

17. Name a multiple choice item which provides a plausible but wrong answer.
a) attracter
b) alternative
c) detractor
d) distracter

18. Which of the following describes the best multiple choice item?
a) The difficulty level is .90.
b) The distracters are implausible.
c) A large proportion of the low group answered the item correctly.
d) The discrimination level is .90.

19. A discrimination index of +1.00 means that the item ____________________.
a) fails to discriminate
b) discriminates perfectly
c) is too easy
d) is too hard

20. What type of test is used to assess an individuals feelings about a particular topic?
a) aptitude test
b) preference test
c) selection test
d) attitude test

21. With no information, what is the chance that a person can guess correctly on a four-option
multiple-choice question?
a) 100%
b) 75%
c) 50%
d) 25%

22. Intelligence tests usually compare the performance of a child against other children of the
same age. This type of test is called ________________________.
a) an achievement test
b) a standardized test
c) a norm-referenced test
d) a criterion-referenced test

23. Tests such as the Denver Developmental Screening Test are used _____________________.
a) as dependent variables
b) for entrance to a program to identify strengths and weaknesses
c) to distinguish among people for selection purposes
d) to determine if program goals were met

24. What type of test defines a specific level of performance (or mastery) of some content
a) standardized test
b) researcher-made test
c) norm-referenced test
d) criterion-referenced test

25. The Graduate Record Examination and the Millers Analogy Test are generally used
for ______________.
a) placement
b) selection
c) evaluation of a program
d) diagnosis

26. What is the computed number for the proportion of test takers who get an item correct?
a) item analysis
b) discrimination index
c) difficulty index
d) attitude index

27. What is the computed number for how well an item distinguishes between the knowing
and the unknowing?
a) item analysis
b) discrimination index
c) difficulty index
d) attitude index

28. Which one of the following is a disadvantage of multiple choice tests?
a) can be used to assess almost anyone
b) limits the kind of content assessed
c) are relatively easy to score
d) poor writers are not penalized

29. Which level of measurement is very similar to Thurstone-like scales?
a) ordinal
b) nominal
c) interval
d) ratio

30. Which of the following is an example of poor characteristics of a questionnaire?
a) Questions are objective and forthright.
b) They are accompanied by a cover letter.
c) They begin with the more difficult, thought-provoking questions while the reader is still attentive.
d) There is a clear statement of transition when the topic of the questions changes.

31. Which of the following is a reason you need to be cautious when observing behavior?
a) Your very presence may affect the behavior being observed.
b) The researcher records everything that happens.
c) Your questionnaire should not be too long and tedious.
d) You may confuse the difficulty index with the discrimination index.

32. A good cover letter for a survey questionnaire has all but one of the following characteristics listed below. Which is NOT a characteristic of a good cover letter?
a) The initial questions are relatively simple, nonthreatening and easy-to-answer.
b) It is written on official letterhead
c) It promises confidentiality.
d) It clearly states the purpose of the questionnaire and the importance of the study.

Answer Key for Multiple-Choice Items

1. d 11. d 22. c
2. a 12. c 23. b
3. b 13. a 24. d
4. d 14. d 25. b
5. c 15. c 26. c
6. a 16. b 27. b
7. b 17. d 28. b
8. d 18. d 29. c
9. c 19. b 30. c
10. c 20. d 31. a
21. d 32. a

Part II. Short Answer Questions (7 items)

1. What are some reasons why we use tests?

2. What are some advantages and disadvantages to using multiple choice questions?

3. Describe the basic procedure involved in setting up to conduct an item analysis.

4. Why is it important to be unobtrusive when observing behavior?

5. What are the five basic assumptions when using a questionnaire?

6. What makes a good questionnaire?

7. Discuss the characteristics of a good cover letter and why it is helpful to have one.

Note: See Appendix A for the answer key to Puzzle 6.

