|
Performance of students in project-based
science classrooms on a national measure of science achievement
Schneider RM, Krajcik J, Marx RW, Soloway E
JOURNAL OF RESEARCH IN SCIENCE TEACHING
39 (5): 410-422 MAY 2002
Reform efforts in science education emphasize the importance
of supporting students' construction of knowledge through
inquiry. Project-based science (PBS) is an ambitious approach
to science instruction that addresses concerns of reformers.
A sample of 142 10th- and 11th-grade students enrolled in
a PBS program completed the 12th-grade 1996 National Assessment
of Educational Progress (NAEP) science test. Compared with
subgroups identified by NAEP that most closely matched our
student sample, White and middle class, PBS students outscored
the national sample on 44% of NAEP test items. This study
shows that students participating in a PBS curriculum were
prepared for this type of testing. Educators should be encouraged
to use inquiry-based approachcs such as PBS to implement reform
in their schools.
Assessing model sensitivity of the imputation
methods used in the national assessment of educational progress
Thomas N
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS
25 (4): 351-371 WIN 2000
The National Assessment of Educational Progress (NAEP) uses
latent trait item response models to summarize:,e performance
of students on assessments of educational proficiency in different
subject areas such as mathematics and reading. Because of
limited examination time and concerns about student motivation,
NAEP employs sparse matrix sampling designs that assign a
small number of examination items to each sampled student
to measure broad curriculums. As a consequence, each sampled
students latent fruit is not accurately measured, and NAEP
uses multiple imputation missing data statistical methods
to account fur the uncertainty about the latent traits. The
sensitivity of these model-based estimation and reporting
procedures to statistical and psychometric assumptions is
assessed. Estimation of the mean of the latent trait in different
subpopulations was very robust to the modeling assumptions.
Many of the other currently reported summaries, however; may
depend on the modeling assumptions underlying the estimation
procedures; these assumptions. motivated primarily by analytic
tractability are unlikely to attain, raising concerns about
current reporting practices. The results indicate that more
conservative criteria should be considered when forming internals
about estimates, and when assessing significance. A possible
expansion of the imputation model is suggested that may improve
its performance.
Benefits of opportunity to read and balanced
instruction on the NAEP
Guthrie JT, Schafer WD, Huang CW
JOURNAL OF EDUCATIONAL RESEARCH
94 (3): 145-162 JAN-FEB 2001
The National Assessment of Educational Progress (NAEP) requires
reading comprehension processes that may be increased by students'
amount of engaged reading, parental education, and gender,
along,vith balanced reading Instruction and opportunity to
read. To examine the effects of those variables on reading
achievement and engagement, the authors analyzed the 1994
Grade 4 Maryland NAEP with hierarchical linear modeling to
construct both between-school and between-teacher models.
Amount of engaged reading significantly predicted reading
achievement on the NAEP, after parental education was statistically
controlled, Balanced reading instruction significantly predicted
reading achievement after accounting for students' engaged
reading and parental education. Findings confirmed expectations
from the proposed theoretical perspective on reading engagement.
Policy implications included an emphasis on some instructional
variables in the reading engagement model.
Synthesizing results from the trial state
assessment
Raudenbush SW, Fotiu RP, Cheong YF
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS
24 (4): 413-438 WIN 1999
Using data collected under the Trial State Assessment (TSA)
of the National Assessment of Educational Progress (NAEP),
this article describes and illustrates a two-stage statistical
model for investigating state-to-state variation in mathematics
achievement. At the first stage within each state, a two-level
hierarchical linear model is estimated via maximum likelihood
Ar the second stage, results are combined across states using
Bayesian estimation implemented via the Gibbs sampler The
results reveal considerable state-re-state heterogeneity in
mathematics proficiency, but most heterogeneity is explainable
on the basis of covariates defined on students, teachers,
and schools. The findings suggest that interest in state comparisons
might productively focus on state differences in policy-relevant
correlates of proficiency rather than on state differences
in mean proficiency. The analytical approach can be applied
in other cases where data are dense at the lower level of
a hierarchy but thin at the higher level.
Alternative displays for communicating NAEP
results: A redesign and validity study
Wainer H, Hambleton RK, Meara K
JOURNAL OF EDUCATIONAL MEASUREMENT
36 (4): 301-335 WIN 1999
Five displays, chosen from the NAEP 1994 Reading: A First
Look, were redesigned. The redesign was informed by the principles
developed and enunciated in Wainer's 1997 popular text Visual
Revelations. After the redesign was completed a survey of
educational policymakers was done in which substantive questions
were asked about the content of the various displays. Each
redesign was paired with the original and were assigned randomly
to one of two survey forms. We found that, on average, the
redesigns yielded both more accurate and faster answers to
the questions asked. The more difficult the question the greater
the disparity between the original format and the redesigned
one.
Equity implications based on the conceptions
of science achievement in major reform documents
Lee O
REVIEW OF EDUCATIONAL RESEARCH
69 (1): 83-115 SPR 1999
The construct of science achievement-what K-12 students should
know and be able to do in science-is central to science education
reform. This paper analyzes current conceptions of science
achievement in major reform documents, and considers equity
implications for science achievement and assessment in the
context of standards-based and systemic reform. The paper
reviews documents on science content standards (NSES and Project
2061), performance standards (New Standards), and large-scale
assessment frameworks (1996 NAEP and TIMSS). Although the
documents emphasize equity as the key principle, they present
the assimilationist perspective by defining science and science
achievement in terms of the Western science tradition with
little consideration of alternative views of science and ways
of knowing from diverse backgrounds. Based on the conception
of equity in terms of social justice, the paper proposes the
cultural anthropological perspective to develop a more inclusive
and broader view of science achievement and assessment for
diverse students.
Projecting to the NAEP scale: Results from
the North Carolina End-of-Grade testing program
Williams VSL, Rosa KR, McLeod LD, Thissen D, Sanford EE
JOURNAL OF EDUCATIONAL MEASUREMENT
35 (4): 277-296 WIN 1998
Data from the North Carolina End-of-Grade rest of eighth-grade
mathematics are used to estimate the achievement results an
the scale of the National Assessment of Educational Progress
(NAEP) Trial State Assessment. Linear regression models are
used to develop projection equations to predict state NAEP
results in the future, and the results of such predictions
are compared with those obtained in the 1996 administration
of NAEP. Standard errors of the parameter estimates are obtained
using a bootstrap resampling technique.
Inequality of access to educational resources:
A national report card for eighth-grade math
Raudenbush SW, Fotiu RP, Cheong YF
EDUCATIONAL EVALUATION AND POLICY ANALYSIS
20 (4): 253-267 WIN 1998
This article considers social and ethnic inequality in access
to resources or mathematics learning in eighth grade: favorable
school disciplinary climate, advanced course offerings, teacher
subject-matter preparation, and emphasis on reasoning during
classroom discourse. Data are from 41 states and tories(1)
participating in the 1992 Trial State Assessment (TSA) of
the National Assessment of Educational Progress (NAEP). Socially
advantaged students typically had greater access to these
resources than did socially disadvantaged students. Access
also depended on student ethnicity. However; the degree of
social and ethnic inequality in access varied significantly
across states. New methods for assessing and displaying state-to-state
variation in social and ethnic inequality are illustrated
We argue that "report cards" displaying state differences
in student proficiency are, by themselves, misleading; stare
differences in access to key educational resources provide
an important supplement.
High school mathematics course-taking by
gender and ethnicity
Davenport EC, Davison ML, Kuang H, Ding S, Kim SK, Kwak
N
AMERICAN EDUCATIONAL RESEARCH JOURNAL
35 (3): 497-514 FAL 1998
The 1990 NAEP transcript data were used to study the number
of Carnegie units (CUs) earned by students in seven categories
of mathematics courses plus a miscellaneous category. On average,
students earned 3.11 CUs, slightly more than the minimum of
3 suggested in A Nation at Risk (National Commission on Excellence
in Education, 1983). Fifty-four percent of the CUs were earned
in the standard high school sequence (Algebra 1 and 2 and
geometry), and 20% were earned in preformal courses (e.g.,
General Math 1 and 2). Overall, gender and ethnic differences
in the total number of mathematics CUs were small, but ethnic
differences relative to the type of math course represented
by the course categories were large. Gender differences in
mathematics course-taking are discussed in light of differences
in college attendance patterns and achievement variability.
Implications of ethnic differences for school and curriculum
reform are discussed.
Trends in gender differences in academic
achievement from 1960 to 1994: An analysis of differences
in mean, variance, and extreme scores
Nowell A, Hedges LV
SEX ROLES
39 (1-2): 21-43 JUL 1998
Gender differences in academic achievement have been studied
extensively While it is generally agreed that females have
a slight advantage on average in verbal abilities and males
have a slight advantage on average in mathematics, it is unclear
whether these differences have changed over rime. In this
paper evidence from seven surveys representative of the United
States twelth grade student population and the National Assessment
of Educational Progress (NAEP) long term trend data is brought
to bear on the magnitude of gender differences in achievement,
the level of agreement among different indices of difference,
and the stability of these differences over time. These data
provide the unique opportunity to not only empirically estimate
mean differences, differences in variance, and differences
in extreme scores, but also to estimate change over time in
all three indices using both the same and different tests
over time. Results show that gender differences in mean and
variance are small, while differences in extreme scores are
often substantial. None of these differences have changed
significantly since 1960 with the possible exception of mean
differences in mathematics and science. Each of the datasets
reflects the racial composition of the national population
when properly weighted (i.e. White = 70%, Black = 15%, Hispanic
= 10% Other = 5%).
Converting boundaries between National Assessment
Governing Board performance categories to points on the National
Assessment of Educational Progress score scale: The 1996 science
NAEP process
Reckase MD
APPLIED MEASUREMENT IN EDUCATION
11 (1): 9-21 1998
National Assessment Governing Board (NAGB) policy indicates
that results from the National Assessment of Educational Progress
(NAEP) should be reported according to the percentage of students
estimated to be above 3 levels of standards called achievement
levels. The standards, labeled Basic, Proficient, and Advanced,
are operationalized by 3 points on the NAEP scale. In this
article, I provide an overview of the process that was used
to identify provisional locations for the points that would
inform NAGB as they set the achievement levels for the science
NAEP. The process includes the identification of panelists
to be involved in the achievement-level setting, the training
for the panelists, and the method for converting panelists
ratings of NAEP items to points on the NAEP score scale.
Validating inferences from National Assessment
of Educational Progress achievement-level reporting
Linn RL
APPLIED MEASUREMENT IN EDUCATION
11 (1): 23-47 1998
The validity of interpretations of National Assessment of
Educational Progress (NAEP) achievement levels is evaluated
by focusing on evidence regarding 3 types of discrepancies:
(a) discrepancies between standards implied by judgments of
different types of items (e.g., multiple choice vs. short
answer or dichotomously scored vs. extended response tasks
scored using multipoint rubrics), (b) discrepancies between
descriptions of achievement levels with their associated exemplar
items and the location of cut scores on the scale, and (c)
discrepancies between the assessments and content standards.
Large discrepancies of all 3 types raise serious questions
about some of the more expansive inferences that have been
made in reporting NAEP results in terms of achievement levels.
It is argued that the evidence reviewed provides a strong
case for making more modest inferences and interpretations
of achievement levels than have frequently been made.
Implications of market-basket reporting
for achievement-level setting
Mislevy RJ
APPLIED MEASUREMENT IN EDUCATION
11 (1): 49-63 1998
In this article, I discuss ways in which reporting National
Assessment of Educational Progress (NAEP) results in terms
of a market basket of tasks would affect achievement-level
reporting. After reviewing current NAEP reporting and achievement-level
setting procedures, 3 market-basket variations are described.
Ways in which achievement-level standards would be set, interpreted,
and validated are then discussed. The conclusions are as follows:
(a) the structure of the market-basket reporting scale can
be exploited to simplify a key step in the standard-setting
process, namely mapping item-or booklet-level judgments to
the reporting scale; (b) the more transparent meaning of market-basket
scores, in contrast to scaled scores and behavioral descriptions,
clarifies the limitations of NAEP performances as evidence
about the range of student proficiencies and accomplishments
that the public's and educators' interests may span; and (c)
market-basket reporting approaches that enable individual
students to take a full market-basket set of items simplify
data-gathering and analysis for validity studies of achievement-level
set-points and interpretations.
Setting performance standards for professional
licensure and certification
Plake BS
APPLIED MEASUREMENT IN EDUCATION
11 (1): 65-80 1998
Credentialing programs were surveyed to ascertain the procedures
that they use to set performance standards on multiple-choice
and open-ended assessments. For multiple-choice assessments,
these programs mostly employ variations on the Angoff (1971)
standard-setting method. Procedures used with open-ended questions
showed more divergence; some agencies use a question by question
approach, whereas others utilize methods that consider the
assessment results more holistically. Implications of these
standard-setting practices from credentialing agencies to
the National Assessment of Educational Progress (NAEP), including
the consequences of the assessment on the individual candidate,
the matrix sampling construction of NAEP-assessments, the
multiple cutpoints of the NAEP assessment program, and the
types of validity evidence that are typically gathered to
support the validity of the performance standard, are discussed.
Generalizations of these standard-setting methods from the
field of professional licensure and certification should be
made with caution.
Influencing achievement through high school
graduation requirements
Chaney B, Burgdorf K, Atash N
EDUCATIONAL EVALUATION AND POLICY ANALYSIS
19 (3): 229-244 FAL 1997
Using data from the 1990 National Assessment of Educational
Progress (NAEP) and the 1990 High School Transcript Study,
we compare students' course-taking patterns with their NAEP
achievement scores and with schools' graduation requirements.
We find relatively few students were affected by the requirements,
either because students took more than was required or they
took courses that did not affect their achievement. Those
course sequences that were correlated with increases in students'
achievement scores suggested that students who were marginal
in their motivation and skills could benefit by taking courses
that were more demanding
Using performance standards to link statewide
achievement results to NAEP
Waltman KK
JOURNAL OF EDUCATIONAL MEASUREMENT
34 (2): 101-121 SUM 1997
The purpose of this study was to investigate the comparability
in score meaning of the performance regions on the ITBS and
NAEP mathematics score scales that resulted from using performance
standards to establish two separate links: socially moderated
and statistically moderated. A socially moderated link was
established by using the same achievement level descriptions
in an ITBS standard-setting study that were used in a NAEP
standard-setting study. A statistically moderated link was
accomplished by using an equipercentile procedure. The primary
findings were that (a) social moderation yielded cutscores
on the ITBS scales that resulted in larger percentages of
Iowa public fourth-grade students being classified within
the basic, proficient, and advanced achievement regions than
those reported by NAEP; (b) the equipercentile link yielded
percentages on the ITBS scale that were similar to those reported
by NAEP far ''type of community'' subgroups; and (c) for students
taking both assessments, the corresponding achievement regions
on the NAEP and ITBS scales produced low to moderate percents
of agreement in student classification.
Course-taking, equity, and mathematics learning:
Testing the constrained curriculum hypothesis in US secondary
schools
Lee VE, Croninger RG, Smith JB
EDUCATIONAL EVALUATION AND POLICY ANALYSIS
19 (2): 99-121 SUM 1997
This study investigated how the organization of the mathematics
curriculum in U.S. high schools affects how much students
learn in that subject. The study used data on the background
and academic proficiency of 3,056 high school seniors in 123
public high schools from the 1990 National Assessment of Educational
Progress (NAEP) in mathematics. These data were linked with
information from students' high school transcripts and with
information from their high schools about courses offered
during that period. To accommodate the nested structure of
the data and research questions, we used Hierarchical Linear
Modeling (HLM) methods, including a subroutine (HLM2PV) that
simplifies the proper use of multiple plausible values estimates
Ibr NAEP proficiency scores. Results provide support for our
hypothesis about curriculum constraint: Students learn more
in: schools that offer them a narrow curriculum composed mostly
of academic courses. Difficulties in conducting school effects
studies using NAEP proficiency score outcomes, particularly
the procedures for estimating plausible values, are discussed.
Improving tabular displays, with NAEP tables
as examples and inspirations
Wainer H
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS
22 (1): 1-30 SPR 1997
The modern world is rich with data; an inability to effectively
utilize these data is a real handicap. One common mode of
data communication is the printed data table. In this article
we provide four guidelines the use of which can make cables
more effective and evocative data displays. We use the National
Assessment of Educational Progress both to provide inspiration
for the development of these guidelines and to illustrate
their operation. We also discuss a theoretical structure to
aid in the development cf test items to tap students' proficiency
in extracting information from tables.
Linking statewide tests to the national
assessment of educational progress: Accuracy of combining
test results across states
Ercikan K
APPLIED MEASUREMENT IN EDUCATION
10 (2): 145-159 1997
The National Assessment of Educational Progress (NAEP) surveys
achievement at selected grades and content areas and does
not report scores at the individual, school, or district level.
There is a desire by schools and school districts to compare
results of their own assessments to the national results provided
by NAEP. The main objective of this study is to investigate
the accuracy of linking NAEP scores to statewide test results.
This study investigates whether the population invariance
condition of the function for linking two sets of scores holds,
specifically whether the function obtained for an individual
state is the same as the function obtained for other states
either individually or combined. Using an equipercentile procedure,
functions obtained separately for four states are compared
to a function obtained using data combined across the four
states. The results suggest that the link between statewide
tests and the NAEP does not provide precise information and
that the information from a linking study such as this one
should be limited to rough estimates of percentages of students
in each of the NAEP achievement levels. Two areas of concern
are identified: (a) the differences between the statewide
tests and the NAEP test and (b) the error due to results from
the two sets of tests and the error due to the linking analyses.
Some multivariate displays for NAEP results
Wainer H
PSYCHOLOGICAL METHODS
2 (1): 34-63 MAR 1997
The principal goal of graphic display is to ease access to
complex information. Simple univariate displays are easy to
understand but usually do not have the capability to transmit
accurately the often complex structure of multivariate data.
Multivariate displays were specially designed for exactly
this purpose. The National Assessment of Educational Progress
(NAEP) generates data of a multivariate richness and complexity
that defies accurate univariate transmission. The broad use
and understanding of the information NAEP provides can be
aided through the use of more suitable and evocative data
displays. In this article, we demonstrate the limitations
of univariate displays and suggest some multivariate displays
that may enable us to understand, and thence communicate,
what is contained in NAEP more fully.
Using trilinear plots for NAEP state data
Wainer H
JOURNAL OF EDUCATIONAL MEASUREMENT
33 (1): 41-55 SPR 1996
Understanding the distribution of achievement levels of students'
performance in the National Assessment of Educational Progress
(NAEP) is aided through the use of the trilinear chart. In
this article, this chart is described and its use illustrated
with data from the 1992 state NAEP mathematics assessment.
It is shown that one can see readily the trends in performance
for different demographic groups for all of the 44 participating
jurisdictions simultaneously. lt is suggested that this graphical
form may be useful in other contexts, as well.
Linking statewide tests to the national
assessment of educational-progress-stability of results
LINN RL, KIPLINGER VL
APPLIED MEASUREMENT IN EDUCATION
8 (2): 135-155 1995
The adequacy of linking statewide standardized test results
to the National Assessment of Educational Progress (NAEP)
by using equipercentile equating procedures was investigated.
Statewide mathematics test data for eighth-grade students
in 1990 and 1992 were obtained from four states. NAEP data
for samples from these four states were obtained from the
results of the Trial State Assessment administrations in the
same years. Equating functions for male and female students
in two states providing gender identification were similar
at the low end of the scale but diverged at the high end of
the scale. Applications of the equating functions obtained
for 1990 data to the statewide test results obtained in 1992
provided estimates that were generally similar to actual NAEP
results near the median, but not in the tails of the distribution.
These results suggest that such linking, although reasonable
for estimating average performance for the state, are not
sufficiently trustworthy to use for making comparisons based
on the tails of the distribution.
NAEP and the quality of education
BRACEY GW
PHI DELTA KAPPAN
76 (1): 84-& SEP 1994
Inconsistencies in students reasoning about probability
KONOLD C, POLLATSEK A, WELL A, LOHMEIER J, LIPSON A
JOURNAL FOR RESEARCH IN MATHEMATICS EDUCATION
24 (5): 392-414 NOV 1993
Subjects were asked to select from among four possible sequences
the ''most likely'' to result from flipping a coin five times.
Contrary to the results of Kahneman and Tversky (1972), the
majority of subjects (72%) correctly answered that the sequences
are equally likely to occur. This result suggests, as does
performance on similar NAEP items, that most secondary school
and college-age students view successive outcomes of a random
process as independent. However, in a follow-up question,
subjects were also asked to select the ''least likely'' result.
Only half the subjects who had answered correctly responded
again that the sequences were equally likely; the others selected
one of the sequences as least likely. This result was replicated
in a second study in which 20 subjects were interviewed as
they solved the same problems. One account of these logically
inconsistent responses is that subjects reason about the two
questions from different perspectives. When asked to select
the most likely outcome, some believe they are being asked
to predict what actually will happen, and give the answer
''equally likely'' to indicate that all of the sequences are
possible. This reasoning has been described by Konold (1989)
as an ''outcome approach'' to uncertainty. This prediction
scheme does not fit questions worded in terms of the least
likely result, and thus some subjects select an incompatible
answer based on ''representativeness'' (Kahneman & Tversky,
1972). These results suggest that the percentage of secondary
school students who understand the concept of independence
is much lower than the latest NAEP results would lead us to
believe and, more generally, point to the difficulty of assessing
conceptual understanding with multiple-choice items
Growth on NAEP scales or not
BRACEY GW
PHI DELTA KAPPAN
74 (10): 807-808 JUN 1993
An examination of relationships between
the 1990 NAEP mathematics items for grade*8 and selected themes
from the NCTM standards
SILVER EA, KENNEY PA
JOURNAL FOR RESEARCH IN MATHEMATICS EDUCATION
24 (2): 159-167 MAR 1993
A study of student outcomes and teacher
characteristics in exemplary middle and junior-high science
programs
BRUNKHORST BJ
JOURNAL OF RESEARCH IN SCIENCE TEACHING
29 (6): 571-583 AUG 1992
Recent efforts of the National Association for Research in
Science Teaching (NARST) and the National Science Teachers
Association (NSTA) have encouraged collaborative "research
partnerships" between university researchers and classroom
science teachers. This research partners study, begun in 1987,
examined student outcomes and teacher characteristics in middle/junior
high exemplary programs identified by the NSTA's Search for
Excellence in Science Education (SESE). A second year of the
study has been completed involving SESE program teachers with
similar instructional profiles. Using Iowa Test of Basic Skills
and National Assessment of Educational Progress (NAEP) items,
key teachers in those SESE programs examined their seventh-
and eighth-grade student outcomes in three domains: (a) knowledge,
(b) attitudes, and (c) applications/connections. Results were
compared with national populations. A similar study was conducted
during the second year, involving teachers from the first
year and additional teachers with instructional practice profiles
similar to those in SESE programs. Teachers were surveyed
using a questionnaire from the Report of the 1977 National
Survey of Science, Mathematics and Social Studies Education
Teachers (Weiss, 1978a) and supplemental questions (Bonnstetter,
1985). This study found that in exemplary middle/junior high
programs: (a) as a group, students achieve high scores in
science knowledge and maintain or develop positive attitudes
toward science; and (b) students need opportunities to make
connections between what they learn in science and personal
responsibility.
Overview of the National Assessment of Educational-Progress
BEATON AE, ZWICK R
JOURNAL OF EDUCATIONAL STATISTICS
17 (2): 95-109 SUM 1992
This chapter gives an overview of the design and the statistical
and psychometric analysis methods developed for use in the
National Assessment of Educational Progress (NAEP). For more
than 20 years, NAEP has provided information about the educational
achievements of students in American schools. In recent years,
NAEP has been gaining in prominence and has also been growing
bigger and more complex. In 1990, an assessment of individual
states was added to NAEP. Also, it is anticipated that the
legislation that prohibits NAEP from reporting district and
school results may be removed and that NAEP may return to
annual rather than biennial assessments. In addition, future
assessments will involve a larger number of innovative items,
such as questions for which students must produce their own
answers rather than selecting among specified options, tasks
in which students are asked to read aloud, and portfolios
that consist of classroom work produced over a period of time.
NAEP's never-ending growth and evolution continue to provide
new technological challenges to its statisticians and psychometricians.
Sampling and weighting in the national assessment
RUST KF, JOHNSON EG
JOURNAL OF EDUCATIONAL STATISTICS
17 (2): 111-129 SUM 1992
This chapter describes procedures for obtaining the National
Assessment of Educational Progress (NAEP) student samples
used in the national and state assessments and for deriving
survey weights for use in the analysis of the survey data.
Following the description of general procedures, more detailed
discussion is included about several issues that relate to
the procedures used. In some cases, these involve procedures
that NA EP is actively reviewing and investigating, with a
view toward implementing improvements in the future. In other
cases, the procedures, although well established in NAEP,
involve technical aspects with interesting features not fully
described in the available technical reports.
Scaling procedures in NAEP
MISLEVY RJ, JOHNSON EG, MURAKI E
JOURNAL OF EDUCATIONAL STATISTICS
17 (2): 131-154 SUM 1992
Scale-score reporting is a recent innovation in the National
Assessment of Educational Progress (NAEP). With scaling methods,
the performance of a sample of students in a subject area
or subarea can be summarized on a single scale even when different
students have been administered different exercises. This
article presents an overview of the scaling methodologies
employed in the analyses of NAEP surveys beginning with 1984.
The first section discusses the perspective on scaling from
which the procedures were conceived and applied. The plausible
values methodology developed for use in NAEP scale-score analyses
is then described, in the contexts of item response theory
and average response method scaling. The concluding section
lists milestones in the evolution of the plausible values
approach in NAEP and directions for further improvement.
Item response theory scale linking in NAEP
YAMAMOTO K, MAZZEO J
JOURNAL OF EDUCATIONAL STATISTICS
17 (2): 155-173 SUM 1992
In educational assessments, it is often necessary to compare
the performance of groups of individuals who have been administered
different forms of a test. If these groups are to be validly
compared, all results need to be expressed on a common scale.
When assessment results are to be reported using an item response
theory (IRT) proficiency metric, as is done for the National
Assessment of Educational Progress (NAEP), establishing a
common metric becomes synonymous with expressing IRT item
parameter estimates on a common scale. Procedures that accomplish
this are referred to here as scale linking procedures. This
chapter discusses the need for scale linking in NAEP and illustrates
the specific procedures used to carry out the linking in the
context of the major analyses conducted for the 1990 NAEP
mathematics assessment.
Population inferences and variance-estimation
for NAEP data
JOHNSON EG, RUST KF
JOURNAL OF EDUCATIONAL STATISTICS
17 (2): 175-190 SUM 1992
In the National Assessment of Educational Progress (NAEP),
population inferences and variance estimation are based on
a randomization-based perspective where the link between the
observed data and the population quantities of interest is
given by the distribution of potential values of estimates
over repeated samples from the same population using the identical
sample design. Because NAEP uses a complex sample design,
many of the assumptions underlying traditional statistical
analyses are violated, and, consequently, analysis procedures
must be adjusted to appropriately handle the structure of
the sample. In this article, we discuss the use of sampling
weights in deriving population estimates and consider the
effect of nonresponse and undercoverage on those estimates.
We also discuss the estimation of sampling variability from
complex sample surveys, concentrating on the jackknife repeated
replication procedure-the variance estimation procedure used
by NAEP-and address the use of a simple approximation to sampling
variability. Finally, we discuss measures of the stability
of variance estimates.
Interpreting scales through scale anchoring
BEATON AE, ALLEN NL
JOURNAL OF EDUCATIONAL STATISTICS
17 (2): 191-204 SUM 1992
The major purpose of the National Assessment of Educational
Progress (NAEP) is to provide a means to compare groups of
students both across and within assessment years. A complementary
purpose of NAEP is to provide information about what these
groups of students know and can do. This purpose has been
addressed using the scale anchoring techniques described in
this chapter. Scale anchoring involves a statistical component
that identifies items that discriminate between successive
points on the proficiency scale using specific item characteristics.
It also involves a consensus component in which identified
items are used by subject-area and educational experts to
provide an interpretation of what groups of students at or
close to the selected scale points know and can do.
Statistical and psychometric issues in the
measurement of educational-achievement trends - examples from
the National Assessment of Educational Progress
ZWICK R
JOURNAL OF EDUCATIONAL STATISTICS
17 (2): 205-218 SUM 1992
Like all studies involved in the assessment of trends in educational
performance, the National Assessment of Educational Progress
(NAEP) is confronted with an array of unresolved methodological
and philosophical issues. One of the basic dilemmas faced
by NAEP is how to measure performance change while remaining
responsive to advances in curriculum and the technology of
assessment. NAEP has become much more cautious about making
seemingly insubstantial changes in the assessment because
of the so-called NAEP reading anomaly-an apparently steep
drop between 1984 and 1986 in estimated reading proficiency
that was found to have resulted in part from changes in the
order and context in which items appeared. Other issues that
NAEP must consider in reporting performance trends are the
effect of measurement scale indeterminacies and the ways in
which interpretation of trend results can depend on the statistics
that are selected for comparing proficiency distributions
over time.
Applications and extensions of NAEP concepts
and technology
ROCK DA, NELSON J
JOURNAL OF EDUCATIONAL STATISTICS
17 (2): 219-232 SUM 1992
The National Assessment of Educational Progress (NAEP) has
consistently pioneered new assessment methods in conjunction
with developing the psychometric methodologies underlying
them. Several NAEP developments-such as complex matrix item
sampling designs, the introduction of performance-based items
in large-scale assessments, vertical scaling, and an intelligent
computer system that produces unique assessment reports for
participating jurisdictions in the NAEP Trial State Assessment
program-are presented in this chapter along with a discussion
of their extensions and applications to other current and
future assessment projects.
The design of the National Assessment of
Educational-Progress
JOHNSON EG
JOURNAL OF EDUCATIONAL MEASUREMENT
29 (2): 95-110 SUM 1992
The key features of the design of the National Assessment
of Educational Progress (NAEP) are discussed with particular
emphasis on the design to be used for the 1992 assessment.
An overview of the design and its philosophy are given with
a description of the multicomponent solution to the twin requirements
of reliably measuring trends in achievement while responding
to changing educational priorities and advances in measurement
technology. The student sample designs for the National Assessment
and the Trial State Assessment are described. The focused-balanced
incomplete block (focused-BIB) spiraling method of item sampling
is discussed and compared with simpler matrix sampling designs.
The impact of the NAEP design on the analysis of assessment
data is discussed.
Developing the NAEP content-area frameworks
and innovative assessment methods in the 1992 assessments
of mathematics, reading, and writing
MULLIS IVS
JOURNAL OF EDUCATIONAL MEASUREMENT
29 (2): 111-131 SUM 1992
This article provides an overview of the consensus processes
for the development of the frameworks underlying the NAEP
assessments, with emphasis on those for the 1990 and 1992
assessments of mathematics, the 1992 assessment of reading,
and the 1994 assessment of science. In addition, innovative
assessment techniques included in the 1992 assessments of
mathematics, reading, and writing are described, including
use of mathematics tools, oral interviews, and portfolio assessment.
Estimating population characteristics from
sparse-matrix samples of item responses
MISLEVY RJ, BEATON AE, KAPLAN B, SHEEHAN KM
JOURNAL OF EDUCATIONAL MEASUREMENT
29 (2): 133-161 SUM 1992
The multiple-matrix item sampling designs that provide information
about population characteristics most efficiently administer
too few responses to students to estimate their proficiencies
individually. Marginal estimation procedures, which estimate
population characteristics directly from item responses, must
be employed to realize the benefits of such a sampling design.
Numerical approximations of the appropriate marginal estimation
procedures for a broad variety of analyses can be obtained
by constructing, from the results of a comprehensive extensive
marginal solution, files of plausible values of student proficiencies.
This article develops the concepts behind plausible values
in a simplified setting, sketches their use in the National
Assessment of Educational Progress (NAEP), and illustrates
the approach with data from the Scholastic Aptitude Test (SAT).
Overview of the scaling methodology used
in the national assessment
BEATON AE, JOHNSON EG
JOURNAL OF EDUCATIONAL MEASUREMENT
29 (2): 163-175 SUM 1992
The National Assessment of Educational Progress (NAEP) uses
item response theory (IRT)-based scaling methods to summarize
the information in complex data sets. Scale scores are presented
as tools for illuminating patterns in the data and for exploiting
regularities across patterns of responses to tasks requiring
similar skills. In this way, the dominant features of the
data are captured. Discussed are the necessity of global scores
or more detailed subscores, the creation of developmental
scales spanning different age levels, and the use of scale
anchoring as a way of interpreting the scales.
Issues in the design and reporting of the
National Assessment of Educational-Progress
LINN RL, DUNBAR SB
JOURNAL OF EDUCATIONAL MEASUREMENT
29 (2): 177-194 SUM 1992
Several issues related to the design and reporting of NAEP
results are discussed within the context of current expectations
for NAEP and its historical origins. Procedures for establishing
the content and form of assessments, including the process
of developing frameworks, and eventually individual assessment
items are discussed. The need to maintain a comprehensive
assessment reflecting both current practice in schools and
the best thinking by subject matter experts is emphasized.
Issues in the design and the estimation of subpopulation parameters
using conditioning variables are discussed. Finally, continuing
misinterpretations of anchor item results are analyzed.
Assessments and accountability
Robert L. Linn
EDUCATIONAL RESEARCHER
29(2) 4-14, 2000
Use of tests and assessments as key elements in five waves
of educational reform during the past 50 years are reviewed.
These waves include the role of tests in tracking and selection
emphasized in the 1950s, the use of tests for program accountability
in the 1960s, minimum competency testing programs of the 1970s,
school and district accountability of the 1980s, and the standards-based
accountability systems of the 1990s. Questions regarding the
impact, validity, and generalizability of reported gains,
and the credibility of results in high-stakes accountability
uses are discussed. Emphasis is given to three issues regarding
currently popular accountability systems. These are (a) the
role of content standards, (b) the dual goals of high performance
standards and common standards for all students, and (c) the
validity of accountability models. Some suggestions for dealing
with the most severe limitations of accountability are provided.
Accountability
Systems: Implications of Requirements of the No Child Left
Behind Act of 2001
Robert L. Linn, Eva L. Baker, Damian W.
Betebenner
EDUCATIONAL RESEARCHER
31(6) 3-16, 2002
The No Child Left Behind Act of 2001 substantially increases
the testing requirements for states and sets demanding accountability
standards for schools, districts, and states with measurable
adequate
yearly progress (AYP) objectives for all students and subgroups
of students defined by socioeconomic background, race–ethnicity,
English language proficiency, and disability. However, states’
content standards,
the rigor of their tests, and the stringency of their performance
standards vary greatly. Consequently, the percentage of students
who score at the proficient level or higher on the state assessments
varies
radically from state to state. Some states have farther to
go than others to meet the mandated target of 100% proficient
within 12 years. These differences are illustrated and the
implications for achieving
AYP targets are discussed. Also addressed are possible uses
of results from the biennial state-level administrations of
the National Assessment of Educational Progress as a means
of leveling the playing
field. Factors contributing to the volatility of gains in
achievement from year to year for individual schools are discussed.
|