Tuesday, June 30, 2020

Problems With New PSAT Part 2 Score Discrepancies

[Part 2: Score Discrepancies  is the second  of a three part report on the new PSAT.   See Overview,  Part 1: Percentile Inflation,  and Part 3: Lowered Benchmark.  The entire report can also be  downloaded or distributed as a PDF.] Part 2 : Score Discrepancies An historically narrow gap between sophomore and junior performance  does not seem credible and leads to questions about how scoring, scaling, and weighting were performed and reported. Sophomore Versus Junior Score Discrepancies Call Scoring Methodologies into Question Percentile inflation caused by redefinition and re-norming creates unfortunate misinterpretations, but the sources of the change can be readily identified; previous percentile tables can be restated  based on the new definition; the difference between Nationally Representative percentiles and User percentiles can be compared to gauge the difference added there. However,  without further information from College Board it is impossible to know the accuracy of  the 11th and 10th grade percentiles. Our  analysis shows that there are significant problems in the way the numbers are being presented that mask the very thing  the new test was meant to reveal — college readiness and academic progress.   If score results between grades are suspect, it leads to questions about the pilot studies that were  performed and how they inform the scoring for the PSAT and SAT. Expected Versus Observed Score Differences Between Grades Historically, juniors have outperformed sophomores on the PSAT/NMSQT by approximately 5 points per section [see table below]. Translated into SAT scores, the differences between 10th and 11th graders  in 2014 were 48 points, 47 points, and 51 points in Critical Reading, Writing, and Math, respectively. On the new PSAT, however, the reported difference is only 12 points on  Evidence-Based Reading and Writing (EBRW) and 19 points in Math.  The average difference in 2014 is more than 3 times that seen in 2015. The 2014 grade differences were in line with those seen over the last decade, so they were  not anomalous. The old and new PSAT are different tests, but student growth tends to show up similarly even on different college admission exams. Are Low Score Discrepancies  Due to Differing Testing Populations? Not all sophomores and juniors take the PSAT. Some take the PSAT as mandatory testing; some take the PSAT in order to qualify for National Merit; some take the ACT Aspire instead of the PSAT. If College Boards calculation of a nationally representative sample is correct, though, this years grade differences should be immune from differences in test-taker demographics. Previous PSATs lacked a nationally representative sample, so sophomore to junior comparisons may be distorted by test-taker patterns. A way of removing potential distortion is to look at the results only for repeat testers students who took the test in both school years. College Board has done research on the typical score change  on the old PSAT by analyzing only students who  took the test  as sophomores and repeated the test  as juniors [see table below].  The average increase, expressed in SAT points, was 33 points in Critical Reading, 33 points in  Writing, and 40 points in Math. The figures are still twice what is being shown on PSAT reports as the 10th grade to 11th grade  score differential. Do Content Differences Between Old and New PSATs Provide an Explanation? A remaining  problem is that  the old PSAT is not the new PSAT.   Although the new and old tests cover roughly the same score range and do not have radically different means or standard deviations, we cannot be certain that year-over-year growth is identical. A third set of data is College Board’s own estimates of growth. Below are the College and Career Readiness Benchmarks. College Board assumes that students improve at roughly 30 points from sophomore year PSAT to junior year PSAT and another 20 points from junior year PSAT to SAT. The PSAT figures — which  themselves seem conservative  Ã¢â‚¬â€ are still twice that shown in the 2015 student data. Percentile Data for Sophomores and Juniors May  Prove the Existence of Errors in Presentation, Computation, or Norming The low observed  score differences between 10th and 11th graders do not fit into a historical pattern, match studies of repeat testers, or align with assumed College Board benchmark progress. As improbable as the small point discrepancy is, though,  it  seems impossible to go one step further and state  that sophomores outperform juniors. But this is exactly what the published  percentile tables show [below]. As you move up the scale, the difference between 10th and 11th graders disappears and then turns in favor of the younger students. Read literally, the score tables say that more sophomores than juniors achieved top scores on the PSAT/NMSQT. There have always been talented sophomores who score highly on the PSAT, but as a group, these students should not do better on the PSAT in 10th grade than they do in the 11th. These figures are for the Nationally Representative  groups, so cannot be explained away by saying that the test-taking populations are different. There is  no logical  statistical or content explanation as to how sophomores could actually perform better than juniors. In fact, we should be seeing scores 30-50 points higher per section for juniors.  The most likely explanation  is that the surveying and weighting methods used for the PSAT did not properly measure the class year compositions. If we assume this to be the case, though, can we be assured that the studies did any better in measuring the intra-class composition? Will the SAT be immune from the same problems? Can Anything Explain the Low Sophomore/Junior  Score Differences and the Score Inversion? A suspect in the mix is the PSAT 10. Although the content of the PSAT 10 is identical to that of the PSAT/NMSQT, it is positioned as a way for schools to measure how students perform near the end of the sophomore year rather than toward the outset of the year. The PSAT 10 will first be offered between February 22 and March 4, 2016. It is a safe assumption that spring sophomores, adjusted for differences in the testing pool, will score higher than fall sophomores. If College Board statistically accounted for PSAT 10 takers in their figures, the scores for sophomores would be inflated. It seems academically inappropriate to lump PSAT/NMSQT and PSAT 10 scores into the same bucket. The tests are taken at different phases of a students high school progress. In fact, one  reason a PSAT 10 exists is because spring performance differs from fall performance. The only clue that College Board may have made such a combination is reproduced from its  Understanding Scores 2015. Highlighting has been added. Its likely that this reference is simply the result of a production error. The document never makes this reference  again in its 32 pages. In short, all figures likely measure October  performance for sophomores and juniors. This final attempt to explain the  anomalous supremacy of sophomores comes up short. Even had a PSAT 10  explanation proved successful, it would have raised more questions than it answered. Tables surrounding the PSAT are all marked as â€Å"Preliminary.† College Board has made  clear that final scaling for the redesigned SAT (and the PSAT is on the same scale) will not be completed until May 2016. Final concordance tables between old and new tests will replace any preliminary work. If the explanation of  the statistical anomalies is that the paint is not yet dry, it begs the question as to what 3 million students and their educators are to do with the scores they have been presented. The new PSAT reports are the most detailed that have ever existed. They have total scores, section scores, test scores, cross-test scores, sub-scores, Nationally Representative percentiles, User percentiles, SAT score projections, sophomore and junior year benchmarks, and more. Which parts of the reports are reliable and which parts remain under construction? Should educators simply push these reports aside and wait until next year? Should students make test-taking and college c hoice decisions based on these scores? [Continue to  Part 3: Lowered Benchmark]