REFERENCES

Abedi, J. (1996). The Interrater/Test Reliability System (ITRS). Multivariate Behavioral Research, 31(4), 409–417.

Abedi, J. (2006). Language issues in item-development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 377–398). Mahwah, NJ: Erlbaum.

Abedi, J. (2007). English language proficiency assessment and accountability under NCLB Title III: An overview. In J. Abedi (Ed.), English language proficiency assessment in the nation: Current status and future practice (pp. 3–10). Davis: University of California at Davis.

Abedi, J. (2008). Measuring students’ level of English proficiency: Educational significance and assessment requirements. Educational Assessment, 13(2), 193–214.

Abedi, J. (2010). Performance assessments for English language learners. Stanford, CA: Stanford Center for Opportunity Policy in Education.

Abedi, J., Bayley, R., Ewers, N., Herman, J., Kao, J., Leon, S., . . . Herman, J. (2010). Accessible reading assessments for students with disabilities: The role of cognitive, linguistic, and textual features. ERIC Educational Resources Information Center.

Abedi, J., & Herman, J. L. (2010). Assessing English language learners’ opportunity to learn mathematics: Issues and limitations. Teachers College Record, 112(3), 723–746.

Abedi, J., Leon, S., & Kao, J. (2008). Examining differential distracter functioning in reading assessments for students with disabilities (CSE Report No. 743). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J., Leon, S., & Mirocha, J. (2003). Impact of students’ language background on content-based data: Analyses of extant data (CSE Report No. 603). Los Angeles: Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J., & Lord, C. (2001). The language factor in mathematics tests. Applied Measurement in Education, 14(3), 219–234.

Abedi, J., Lord, C., Hofstetter, C., & Baker, E. (2000). Impact of accommodation strategies on English language learners’ test performance. Educational Measurement: Issues and Practice, 19(3), 16–26.

Abedi, J., Lord, C., & Plummer, J. (1997). Language background as a variable in NAEP mathematics performance (CSE Technical Report No. 429). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Achieve. (2004). Do graduation tests measure up? A closer look at state high school exit exams. Executive summary. Washington, DC: Author.

Aguirre-Munoz, Z., Boscardin, C. K., Jones, B., Park, J. E., Chinen, M., Shin, H. S., . . . Benner, A. (2006). Consequences and validity of performance assessment for English language learners: Integrating academic language and ELL instructional needs into opportunity to learn measures (CSE Report No. 678). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Allen, D. (1998). Assessing student learning: From grading to understanding. New York, NY: Teachers College Press.

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Prospect Heights, IL: Waveland Press.

Allen, N. L., Johnson, E. G., Mislevy, R. J., & Thomas, N. (1994). Scaling procedures. In N. J. Allen, D. L. Kline, & C. A. Zelenak (Eds.), The NAEP 1994 technical report (pp. 247–266). Washington, DC: US Department of Education.

Alliance for Excellent Education. (2009). The high cost of high school dropouts: What the nation pays for inadequate high schools (Issue Brief). Washington, DC: Author.

Alvarado, A. (1998). Professional development is the job. American Educator, 22(4), 18–23.

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Amrein, A. L., & Berliner, D. C. (2002). High-stakes testing, uncertainty, and student learning. Education Policy Analysis Archives, 10(18). Retrieved from http://epaa.asu.edu/epaa/v10n18/

Appalachia Educational Laboratory. (1996, February). Five years of reform in rural Kentucky. Notes from the field: Educational Reform in Rural Kentucky, 5(1). Charleston, WV: Author.

Archbald, D. A., & Newmann, F. M. (1988). Beyond standardized testing: Assessing authentic academic achievement in the secondary school. Reston, VA: National Association of Secondary School Principals.

Archer, J. (2006, December 19). Wales eliminates national exams for many students. Education Week. Retrieved from http://www.edweek.org/ew/articles/2006/12/20/16wales.h26.html?qs=Wales

Aschbacher, P. (1991). Performance assessment: State activity, interest and concerns. Applied Measurement in Education, 4(1), 275–288.

Association of Test Publishers & Council of Chief State School Officers. (2010). Operational best practices. Washington, DC: Authors.

Attali, Y., & Burstein, J. (2005). Automated essay scoring with E-Rater v. 2.0 (ETS Research Report No. RR-04–45). Princeton, NJ: Educational Testing Service.

Ayala, C. C., Shavelson, R., & Ayala, M. A. (2001). On the cognitive interpretation of performance assessment scores (CSE Report No. 546). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University Press.

Badger, E., Thomas, B., & McCormack, E. (1990). Background summary: Beyond paper and pencil. Malden: Massachusetts Department of Education.

Baker, E. L. (1997). Model-based performance assessments. Theory into Practice, 36(4), 247–254.

Baker, E. L. (2007). Model-based assessments to support learning and accountability: The evolution of CRESST’s research on multiple-purpose measures. Educational Assessment, 12(3&4), 179–194.

Baker, E. L., O’Neil, H. F., & Linn, R. L. (1993). Policy and validity prospects for performance-based assessment. American Psychologist, 48(12), 1210–1218.

Baron, J. B. (1984). Writing assessment in Connecticut: A holistic eye toward identification and an analytic eye toward instruction. In Educational Measurement: Issues and Practice, 3, 27–28.

Baron, J. B. (1991). Strategies for the development of effective performance exercises. Applied Measurement in Education, 4(4), 305–318.

Barrs, M., Ellis, S., Hester, H., & Thomas, A. (1989). The primary language record: Handbook for teachers. London: Centre for Language in Primary Education.

Bass, K. M., Glaser, R., & Magone, M. E. (2002). Informing the design of performance assessments using a content-process analysis of two NAEP science tasks (CSE Technical Report No. 564). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Bauman, J., Boals, T., Cranley, E., Gottlieb, M., & Kenyon D. (2007). Assessing comprehension and communication in English state to state for English language learners (ACCESS for ELLs). In J. Abedi (Ed.), English language proficiency assessment in the nation: Current status and future practice (pp. 81–91). Davis: University of California.

Baxter, G. P., & Glaser, R. (1998). Investigating the cognitive complexity of science assessments. Educational Measurement: Issues and Practice, 17(3), 37–45.

Baxter, G. P., Shavelson, R. J., Herman, S. J., Brown, K. A., Valdadez J. R. Brown, K. A., . . . Valdadez J. R. (1993). Mathematics performance assessment: Technical quality and diverse student impact. Journal for Research in Mathematics Education, 24, 190–216.

Beaver, J., & Carter, M. A. (2001). Developmental reading assessment. Parsippany, NJ: Celebration Press.

Bejar, I. I., Williamson, D. M., & Mislevy, R. J. (2006). Human scoring (pp. 49–82). In D. M. Williamson, R. J. Mislevy, & I. I. Bejar (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 49–82). Hillside, NJ: Erlbaum.

Belfield, C. (2000). Economic principles for education: Theory and evidence. Cheltenham, UK: Edward Elgar.

Benjamin, R., Chun, M., Hardison, C., Hong, E., Jackson, C., Kugelmass, H., . . . Shavelson, R. (2009). Returning to learning in an age of assessment: Introducing the rationale of the collegiate learning assessment.

Bennett, R. E. (2006). Moving the field forward: Some thoughts on validity and automated scoring. In D. M. Williamson, R. J. Mislevy, & I. I. Bejar (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 403–412). Hillside, NJ: Erlbaum.

Bennett, R. E., & Bejar, I. (1997). Validity and automated scoring: It’s not only the scoring (ETS No. RR-97–13). Princeton, NJ: Educational Testing Service.

Bennett, R. E., & Gitomer, D. H. (2009). Transforming K–12 assessment: Integrating accountability testing, formative assessment and professional support. In C. Wyatt-Smith & J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43–61). New York, NY: Springer.

Bennett, R. E., Persky, H., Weiss, A. R., & Jenkins, F. (2007). Problem solving in technology-rich environments: A report from the NAEP Technology-Based Assessment Project (NCES No. 2007–466). Washington, DC: National Center for Education Statistics, US Department of Education. Retrieved from http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2007466

Ben-Simon, A., & Bennett, R. E. (2007). Toward more substantively meaningful essay scoring. Journal of Technology, Learning and Assessment, 6(1). Retrieved from http://escholarship.bc.edu/jtla/

Black, P., & Wiliam, D. (1998). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80, 139–148.

Black, P., & Wiliam, D. (2007). Large scale assessment systems: Design principles drawn from international comparisons. Measurement: Interdisciplinary Research and Perspectives, 5(1), 1–53.

Black, P., & Wiliam, D. (2010). Kappan classic: Inside the black box: Raising standards through classroom assessment formative assessment is an essential component of classroom work and can raise student achievement. Phi Delta Kappan, 92(1), 81–90.

Blumberg, F., Epstein, M., MacDonald, W., & Mullis, I. (1986, November). A pilot study of higher-order thinking skills assessment techniques in science and mathematics. Final report—part I. Princeton, NJ: National Assessment of Educational Progress.

Blumenthal, R. (2006). Why Connecticut sued the federal government over No Child Left Behind. Harvard Educational Review, 76(4).

Bock, R. D. (1995). Open-ended exercise in large-scale educational assessment. In L. B. Resnick & J. G. Wirt (Eds.), Linking school and work: Roles for standards and assessment (pp. 305–338). San Francisco, CA: Jossey-Bass.

Booher-Jennings, J. (2005). Below the bubble: “Educational triage” and the Texas accountability system. American Educational Research Journal, 42(2), 231–268.

Borko, H., Elliott, R., & Uchiyama, K. (2002). Professional development: A key to Kentucky’s educational reform effort. Teaching and Teacher Education, 18, 969–987.

Borko, H. H., & Stecher, B. M. (2001, April). Looking at reform through different methodological lenses: Survey and case studies of the Washington state education reform. Paper presented as part of the symposium, Testing Policy and Teaching Practice: A Multi-Method Examination of Two States at the annual meeting of the American Educational Research Association, Seattle, WA.

Boscardin, C. K., Aguirre-Munoz, Z., Chinen, M., Leon, S., & Shin, H. S. (2004). Consequences and validity of performance assessment for English learners: Assessing opportunity to learn (OTL) in grade 6 language arts (CSE Report No. 635). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Boudett, K. P., City, E. A., & Murnane, R. J. (Eds.). (2008). Data wise. Cambridge, MA: Harvard Education Press.

Bransford, J. D., & Schwartz, D. L. (2001). Rethinking Transfer: A simple proposal with multiple implications. Review of Research in Education, 24, 61–100.

Breland, H. M., Camp, R., Jones, R. J., Morris, M. M., & Rock, D. A. (1987). Assessing writing skill. New York, NY: College Entrance Examination Board.

Breland, H., Danos, D., Kahn, H., Kubota, M., & Bonner, M. (1994). Performance versus objective testing and gender: An exploratory study of an Advanced Placement History Examination. Journal of Educational Measurement, 31(4), 275–293.

Breland, H. M., & Jones, R. J. (1982). Perceptions of writing skills (College Board Report No. 82–4 and ETS Research Report No. 82–47). New York, NY: College Entrance Examination Board.

Brennan, R. L. (1996). Generalizability of performance assessments. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment (NCES 96–802). Washington, DC: National Center for Education Statistics.

Brennan, R. L. (2000). Performance assessments from the perspective of generalizability theory. Applied Psychological Measurement, 24, 339–353.

Brennan, R. L. (2001). Generalizability theory. New York, NY: Springer-Verlag.

Bridgeman, B., Trapani, C., & Attali, Y. (2009). Considering fairness and validity in evaluating automated scoring. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.

Brown, J. S., & Burton, R. R. (1978). Diagnostic models for procedural bugs in basic mathematical skills. Cognitive Science, 2, 155–192.

Buchberger, F., & Buchberger, I. (2004). Problem solving capacity of a teacher education system as a condition of success? An analysis of the “Finnish case.” In F. Buchberger & S. Berghammer (Eds.), Education policy analysis in a comparative perspective (pp. 222–237). Linz: Trauner.

Burger, S. E., & Burger, D. L. (1994). Determining the validity of performance-based assessment. Educational Measurement: Issues and Practice, 13(1), 9–15.

Burstall, C. (1986). Innovative forms of measurement: A United Kingdom perspective. Educational Measurement: Issues and Practices, 5(1), 17–22.

Burstall, C., Baron, J., & Stiggins, R. (1987). The use of performance testing in large-scale student assessment programs. Paper presented at the Education Commission of the States’ 17th Annual Assessment Conference, Denver, CO.

Burstein, J. (2003). The E-rater scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. C. Burstein (Eds.), Automated essay scoring (pp. 113–122). Mahwah, NJ: Erlbaum.

Bushaw, W. J., & Gallup, A. M. (2008, September). Americans speak out—Are educators and policy makers listening? The 40th annual Phi Delta Kappa/Gallup Poll of the public’s attitudes toward the public schools. Phi Delta Kappan, 90(10), 8–20.

Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2, 67–90.

Campbell, D. T., & Stanley, J. C. (1963). Experimental designs for research on teaching. In N. L. Gage (Ed.), Handbook of research on teaching (pp. 171–246). Chicago, IL: Rand McNally.

Cassel, R. N., & Kolstad, R. (1998). The critical job-skills requirements for the 21st century: Living and working with people. Journal of Instructional Psychology, 25(3), 176–180.

Catterall, J., Mehrens, W., Flores, R. G., & Rubin, P. (1998, January). The Kentucky instructional results information system: A technical review. Frankfort: Kentucky Legislative Research Commission.

Center for Collaborative Education. (2012). Quality performance assessment: A guide for schools and districts. Boston, MA: Author.

Chan, J. K., Kennedy, K. J., Yu, F. W., & Fok, P. (2008). Assessment policy in Hong Kong: Implementation issues for new forms of assessment. Hong Kong: Hong Kong Institute of Education. Retrieved from http://www.iaea.info/papers.aspx?id=68

Chapman, C. (1991). What have we learned from writing assessment that can be applied to performance assessment? Presentation at ECS/CDE Alternative Assessment Conference, Breckenbridge, CO.

Chi, M.T.H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems of experts and novices. Cognitive Science, 5, 121–152.

Chi, M.T.H., Glaser, R., & Farr, M. (Eds.). (1988). The nature of expertise. Hillsdale, NJ: Erlbaum.

Chingos, M. (2012). Strength in numbers: State spending on K–12 assessment systems. Washington, DC: Brookings Institution.

Christie, F. (1986). Writing in schools: Generic structures as ways of meaning. In B. Couture (Ed.), Functional approaches to writing: Research perspectives (pp. 221–239). London: Frances Pinter.

Christie, F. (2002). The development of abstraction in adolescence in subject English. In M. Schleppegrell & M. C. Colobi (Eds.), Developing advanced literacy in first and second language: Meaning with power (pp. 45–66). London: Routledge.

Chung, G.K.W.K., Delacruz, G. C., & Bewley, W. L. (2006). Performance assessment models and tools for complex tasks (CSE Report No. 682). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Cizek, G. J. (2001, Winter). More unintended consequences of high-stakes testing. Educational Measurement, Issues and Practice, 20(4), 19–28.

Clauser, B. E. (2000). Recurrent issues and recent advances in scoring performance assessments. Applied Psychological Measurement, 24(4), 310–324.

Clay, M. (2002/2006). Running records for classroom teachers. Portsmouth, NH: Heinemann.

Clyman, S. G., Melnick, D. E., & Clauser, B. E. (1995). Computer-based case simulations. In E. L. Mancall & P. G. Bashook (Eds.), Assessing clinical reasoning: The oral examination and alternative methods (pp. 139–149). Evanston, IL: American Board of Medical Specialties.

Coe, P., Leopold, G., Simon, K., Stowers, P., & Williams, J. (1994). Perceptions of school change: Interviews with Kentucky students. Charleston, WV: Appalachia Educational Laboratory.

Coelen, S., Rende, S., & Fulton, D. (2008, April). Next steps: Preparing a quality workforce. Storrs, CT: Department of Economics and Connecticut Center for Economic Analysis, University of Connecticut.

Cohen, D., Stern, V., & Balaban, N. (2008). Observing and recording the behavior of young children. New York, NY: Teachers College Press.

Cole, N. S., & Moss, P. A. (1989). Bias in test use. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 201–220). New York, NY: American Council on Education and Macmillan.

Collegiate Learning Assessment. (2009). Retrieved from http://www.collegiatelearningassessment.org/

Colorado. House. (2003). Bill 03–1108. Retrieved from http://www.state.co.us/gov_dir/leg_dir/olls/sl2003a/sl_153.htm

Common Core State Standards Initiative. (2012). In the states. Retrieved from http://www.corestandards.org/

Conley, D. T. (2005). College knowledge: What it really takes for students to succeed and what we can do to get them ready. San Francisco, CA: Jossey-Bass.

Conley, D. T. (2007). Redefining college readiness. Eugene, OR: Educational Policy Improvement Center.

Conley, D. T. (2010). College and career ready: Helping all students succeed beyond high school. San Francisco, CA: Jossey-Bass.

Conley, D. (2014). Getting ready for college, careers, and the Common Core: What every educator needs to know. San Francisco, CA: Jossey-Bass.

Connecticut State Board of Education. (2009). Connecticut Academic Performance Test, Third Generation Program overview. Retrieved from http://www.csde.state.ct.us/public/cedar/assessment/capt/resources/misc_capt/2009%20CAPT%20Program%20Overview.pdf

Connecticut State Department of Education. (2006). CAPT third generation handbook for reading and writing across the disciplines. Retrieved from http://www.sde.ct.gov/sde/cwp/view.asp?a=2618&q=320866

Connecticut State Department of Education. (2007a). CAPT high school science assessment handbook—third generation. Retrieved from http://www.sde.ct.gov/sde/cwp/view.asp?a=2618&q=320890

Connecticut State Department of Education. (2007b, August 28). Science curriculum-embedded tasks, CAPT: Generation III. Retrieved from http://www.sde.ct.gov/sde/cwp/view.asp?a=2618&q=320892

Connecticut State Department of Education. (2009). Student assessment. Retrieved from http://www.csde.state.ct.us/public/cedar/assessment/index.htm

Council for Aid to Education. (2013). Collegiate learning assessment. Retrieved from http://www.cae.org/content/pro_collegiate_sample_measures.htm

Council of Chief State School Officers. (2009). Statewide student assessment 2007–08 SY: Math, ELA, science. Retrieved from http://www.ccsso.org/content/pdfs/2007–08_Math-ELAR-Sci_Assessments.pdf

Council for the Curriculum Examinations and Assessment. (2008a). Curriculum, key stage 3, post-primary assessment. Retrieved from http://www.ccea.org.uk/

Council for the Curriculum Examinations and Assessment. (2008b). Qualifications. Retrieved from http://www.ccea.org.uk/

Creativity in Action. (1990). Skills desired by Fortune 500 companies. Buffalo, NY: Creative Education Foundation.

Crocker, L. (1997). Assessing content representativeness of performance assessment exercises. Applied Measurement in Education, 10(1), 83–95.

Cronbach, L. J. (1971). Test validation. In E. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 443–507). Washington, DC: American Council on Education.

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability of scores and profiles. New York, NY: Wiley.

Cronbach, L. J., Linn, R. L., Brennan, R. L., & Haertel, E. H. (1997). Generalizability analysis for performance assessments of student achievement or school effectiveness. Educational and Psychological Measurement, 57(3), 373–399.

Darling-Hammond, L. (1992–1993). Creating standards of practice and delivery for learner-centered schools. Stanford Law and Policy Review, 4, 37–52.

Darling-Hammond, L. (2004). Standards, accountability, and school reform. Teachers College Record, 106(6), 1047–1085.

Darling-Hammond, L. (2006). No Child Left Behind and high school reform. Harvard Educational Review, 76(4), 642–667.

Darling-Hammond, L. (2007). Race, inequality and educational accountability: The irony of No Child Left Behind. Race, Ethnicity and Education, 10(3), 245–260.

Darling-Hammond, L. (2010). Performance counts: Assessment systems that support high-quality learning. Washington, DC: Council of Chief State School Officers.

Darling-Hammond, L. (2012). The right start: Creating a strong foundation for the teaching career. Phi Delta Kappan, 94(3), 8–13. Retrieved from http://www.kappanmagazine.org/content/94/3/8.full

Darling-Hammond, L., & Adamson, F. (2010). Beyond basic skills: The role of performance assessment in achieving 21st century standards of learning. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.

Darling-Hammond, L., & Adamson, F. (2013). Developing assessments of deeper learning: The costs and benefits of using tests that help students learn. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.

Darling-Hammond, L., & Ancess, J. (1994). Authentic assessment and school development. New York, NY: National Center for Restructuring Education, Schools, and Teaching, Teachers College, Columbia University.

Darling-Hammond, L., Ancess, J., & Falk, B. (1995). Authentic assessment in action: Studies of school and students at work. New York, NY: Teachers College Press.

Darling-Hammond, L., & Falk, B. (1997). Using standards and assessments to support student learning. Phi Delta Kappan, 79(3), 190–199.

Darling-Hammond, L., Hightower, A. M., Husbands, J. L., LaFors, J. R., Young, V. M., & Christopher, C. (2005). Instructional leadership for systemic change: The story of San Diego’s reform. Lanham, MD: Scarecrow Education Press.

Darling-Hammond, L., Newton, S. P., & Wei, R. C. (2013). Developing and assessing beginning teacher effectiveness: The potential of performance assessment. Educational Assessment, Evaluation and Accountability, 25(1).

Darling-Hammond, L., & Pecheone, R. (2010, March). Developing an internationally comparable balanced assessment system that supports high-quality learning. Paper presented at the National Conference on Next Generation K–12 Assessment Systems, Washington, DC. Retrieved from http://www.k12center.org/rsc/pdf/Darling-HammondPechoneSystemModel.pdf

Darling-Hammond, L., & Rustique-Forrester, E. (2005). The consequences of student testing for teaching and teacher quality. In J. Herman & E. Haertel (Eds.), The uses and misuses of data in accountability testing (pp. 289–319). Malden, MA: Blackwell.

Darling-Hammond, L., & Wentworth, L. (2010). Benchmarking learning systems: Student performance assessment in international context. Stanford, CA: Stanford Center for Opportunity Policy in Education, Stanford University.

Darling-Hammond, L., & Wood, G. (2008). Assessment for the 21st century: Using performance assessments to measure student learning more effectively. Washington, DC: Forum for Education and Democracy.

Deane, P. (2006). Strategies for evidence identification through linguistic assessment of textual responses. In D. M. Williamson, R. J. Mislevy, & I. I. Bejar (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 313–362). Mahwah, NJ: Erlbaum.

Deane, P., & Gurevich, O. (2008). Applying content similarity metrics to corpus data: Differences between native and non-native speaker response to a TOEFL integrated writing prompt (ETS Research Report No. RR-08–5). Princeton, NJ: ETS.

Delaware Department of Education. (2000, November). Delaware student testing Program Special Writing Study Report. Retrieved July 5, 2009, from http://www.doe.k12.de.us/aab/report_special_writing%20study.pdf

Delaware Department of Education. (2005). Text-based writing item sampler. Retrieved from http://www.doe.k12.de.us/AAB/files/Grade%208%20TBW%20-%20Greaseaters.pdf

DeVore, R. N. (2002). Considerations in the development of accounting simulations (Technical Report No. 13). Ewing, NJ: AICPA.

Dixon, Q. L. (2005). Bilingual education policy in Singapore: An analysis of its sociohistorical roots and current academic outcomes. International Journal of Bilingual Education and Bilingualism, 8(1), 25–47.

Doolittle, A. (1995). The cost of performance assessment in science: The SCASS perspective. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.

Dorfman, A. (1997). Teachers’ understanding of performance assessment. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

Dowling, M. (n.d.). Examining the exams. Retrieved from http://www.hkeaa.edu.hk/files/pdf/markdowling_e.pdf

DuFour, R., DuFour, R., Eaker, R., & Many, T. (2006). Learning by doing: A handbook for professional learning communities at work. Bloomington, IN: Solution Tree.

Dunbar, S. B., Koretz, D. M., & Hoover, H. D. (1991). Quality control in the development and use of performance assessments. Applied Measurement in Education, 4(4), 289–304.

Duncan, T., and others. (2007). Reviewing the evidence on how teacher professional development affects student achievement. Washington, DC: National Center for Educational Evaluation and Regional Assistance, Institute of Education Sciences, US Department of Education.

Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185.

Eckstein, M. A., & Noah, H. J. (1993). Secondary school examinations: International perspectives on policies and practice. New Haven, CT: Yale University Press.

Education Bureau. Quality Assurance Division. (2008). Performance indicators for Hong Kong schools, 2008 with evidence of performance. Retrieved from http://www.edb.gov.hk/FileManager/EN/Content_6456/pi2008%20eng%205_5.pdf

Educational Policy Improvement Center. (n.d.). ThinkReady: College career ready system. Retrieved from https://collegeready.epiconline.org/info/thinkready.dot

Educational Testing Service. (1987, May). Learning by doing: A manual for teaching and assessing higher-order thinking in science and mathematics (Report No. 17-HOS-80). Princeton, NJ: Author.

Educational Testing Service. (2004, July 4). Pretesting plan for SAT essay topics. [internal communication]. Princeton, NJ: Author.

Elliott, S. (2003). Intellimetric: From here to validity. In M. D. Shermis & J. C. Burstein (Eds.), Automated essay scoring (pp. 71–86). Mahwah, NJ: Erlbaum.

Elmore, R., & Burney, D. (1999). Investing in teacher learning: Staff development and instructional improvement in Community School District #2, New York City. In L. Darling-Hammond & G. Sykes (Eds.), Teaching as the learning profession. San Francisco, CA: Jossey-Bass.

Embretson, S. E. (1985). Test design: Developments in psychology and psychometrics. Orlando, FL: Academic Press.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.

Engelhard, G. (2002). Monitoring raters in performance assessments. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 261–287). Mahwah, NJ: Erlbaum.

Engelhard, G., Jr., Gordon, B., Walker, E. V., & Gabrielson, S. (1994). Writing tasks and gender: Influences on writing quality of black and white students. Journal of Educational Research, 87, 197–209.

Engeström, Y. (1999). Activity theory and individual and social transformation. In Y. Engeström, R. Miettinen, & R. Punämaki (Eds.), Perspectives on activity theory (pp. 19–38). Cambridge: Cambridge University Press.

Envision Schools. (2010). “Disaster in the Gulf” performance task. Oakland, CA: Author.

Ericikan, K. (2002). Disentangling sources of differential item functioning in multi-language assessments. International Journal of Testing, 2, 199–215.

Ericsson, K. A., & Simon, H. A. (1984). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press.

Ericsson, K. A., & Smith, J. (1991). Prospects and limits of the empirical study of expertise: An introduction. In K. A. Ericsson & J. Smith (Eds.), Toward a general theory of expertise: Prospects and limits (pp. 1–38). Cambridge: Cambridge University Press.

European Commission. (2006/2007). The education system in Sweden. Eurybase, Information Database on Education Systems in Europe.

European Commission. (2007/2008). The education system in Finland. Eurybase, Information Database on Education Systems in Europe.

Falk, B. (2001). Professional learning through assessment. In A. Lieberman & L. Miller (Eds.), Teachers caught in the action: The work of professional development. New York, NY: Teachers College Press.

Falk, B., & Darling-Hammond, L. (1993). The primary language record at P.S. 261: How assessment transforms teaching and learning. New York, NY: National Center for Restructuring Education, Schools, and Teaching.

Falk, B., & Darling-Hammond, L. (2010). Documentation and democratic education. Theory into Practice, 49(1), 72–81.

Falk, B. B., MacMurdy, S., & Darling-Hammond, L. (1995). Taking a different look: How the primary language record supports teaching for diverse learners. New York, NY: National Center for Restructuring Education, Schools, and Teaching.

Falk, B., & Ort, S. (1998). Sitting down to score: Teacher learning through assessment. Phi Delta Kappan, 80(1), 59–64.

Falk, B., and Associates. (1999). The early literacy profile: An assessment instrument. New York, NY: New York State Education Department.

Fenster, M. (1996, April). An assessment of “middle” stakes educational accountability: The case of Kentucky. Paper presented at the annual meeting of the Educational Research Association, New York, NY.

Ferrara, S. F. (1987, April). Practical considerations in equating a direct writing assessment required for high school graduation. Paper presented at the annual meeting of the American Educational Research Association, Washington, DC.

Ferrara, S. (2009, December 10–11). The Maryland school performance assessment program (MSPAP) 1991–2002: Political considerations. Presentation at the National Research Council workshop, Best Practices in State Assessment. Retrieved from http://www7.nationalacademies.org/bota/Workshop_1_Presentations.html

Fiderer, A. (1993). Teaching writing: A workshop approach. Scholastic Professional Books. New York, NY: Scholastic.

Fiderer, A. (1995). Practical assessments for literature-based reading classrooms. New York, NY: Scholastic.

Fiderer, A. (2009). Performance assessment for reading. In Cobblestone, April–May 1987. Performance Task Item, Modified. Retrieved from http://www.teacher.scholastic.com/professional/assessment/readingassess.htm

Finnish Matriculation Examination. (2008). Retrieved from http://www.ylioppilastutkinto.fi/en/index.html

Finnish National Board of Education. (2007, November 12). Background for Finnish PISA success. Retrieved from http://www.oph.fi/english/SubPage.asp?path=447,65535,77331

Finnish National Board of Education. (2008a, April 30). Teachers. Retrieved from http://www.oph.fi/english/page.asp?path=447,4699,84383

Finnish National Board of Education. (2008b, June 10). Basic education. Retrieved from http://www.oph.fi/english/page.asp?path=447,4699,4847

Finnish National Board of Education. (n.d.). Background for Finnish PISA success. http://www.oph.fi/english/SubPage.asp?path=447,65535,77331

Fiore, L., & Suárez, S. C. (Eds.). (2010). Observation, documentation, and reflection to create a culture of inquiry. Theory into Practice, 49(1).

Firestone, W. A., Mayrowetz, D., & Fairman, J. (1998). Performance-based assessment and instructional change: The effects of testing in Maine and Maryland. Educational Evaluation and Policy Analysis, 20(2), 95–113.

Flexer, R. J. (1991, April). Comparisons of student mathematics performance on standardized and alternate measures in high-stakes contexts. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

Foster, D., Noyce, P., & Spiegel, S. (2007). When assessment guides instruction: Silicon Valley’s Mathematics Assessment Collaborative. Assessing Mathematical Proficiency, 53, 137–154.

Fredericksen, J. R., & Collins, A. (1989). A systems approach to educational testing. Educational Researcher, 18(9), 27–32.

Fredericksen, J. R., & Collins, A. (1996). Designing an assessment system for the workplace of the future. In L. B. Resnick, J. Wirt, & D. Jenkins (Eds.), Linking school and work: Roles for standards and assessment (pp. 193–221). San Francisco, CA: Jossey-Bass.

Fredericksen, J. R., & White, B. Y. (1997). Cognitive facilitation: A method for promoting reflective collaboration. In Proceedings of the Second International Conference on Computer Support for Collaborative Learning (pp. 53–62). Toronto: University of Toronto.

Frederiksen, N. (1984). The real test bias. American Psychologist, 39(3), 193–202.

Freedman, S. W., & Calfee, R. C. (1983). Holistic assessment of writing: Experimental design and cognitive theory. In P. Mosenthal, L. Tamor, & S. A. Walmsley (Eds.), Research on writing: Principles and methods (pp. 75–98). New York, NY: Longman.

Gabrielson, S., Gordon, B., & Engelhard, G. (1995). The effects of task choice on the quality of writing obtained in a statewide assessment. Applied Measurement in Education, 8(4), 273–290.

Gao, X., Shavelson, R. J., & Baxter, G. P. (1994). Generalizability of large-scale performance assessments in science: Promises and problems. Applied Measurement in Education, 7, 323–334.

Gearhart, M., Herman, J. L., Baker, E. L., & Whittaker, A. (1993). Whose work is it? A question for the validity of large-scale portfolio assessment (CSE Technical Report No. 363). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

General Accounting Office. (2003). Title I—Characteristics of tests will influence expenses: Information sharing may help state realize efficiencies. Washington, DC: Author. 2003.

Glaser, R. (1990a, October). Testing and assessment: O tempora! O mores! Paper presented at the 31st Horace Mann Lecture at the University of Pittsburgh, Pittsburgh, PA.

Glaser, R. (1990b). Toward new models for assessment. International Journal of Educational Research, 14(5), 475–483.

Glaser, R., Lesgold, A., & Lajoie, S. (1987). Toward a cognitive theory for the measurement of achievement. In R. R. Ronning, J. A. Glover, J. C. Conoley, & J. C. Witt (Eds.), The influence of cognitive psychology on testing (pp. 41–85). Hillsdale, NJ: Erlbaum.

Goldberg, G. L., & Roswell, B. S. (2000). From perception to practice: The impact of teachers’ scoring experience on the performance based instruction and classroom practice. Educational Assessment, 6(4), 257–290.

Goldberg, G. L., & Roswell, B. S. (2001). Are multiple measures meaningful? Lessons learned from a statewide performance assessment. Applied Measurement in Education, 14(2), 125–150.

Goldschmidt, P., Martinez, J. F., Niemi, D., & Baker, E. L. (2007). Relationships among measures as empirical evidence of validity: Incorporating multiple indicators of achievement and school context. Educational Assessment, 12(3&4), 239–266.

Gong, B. (2009, December 10–11). Innovative assessment in Kentucky’s KIRIS system: Political considerations. Presentation at the National Research Council Best Practices in State Assessment workshop, Washington, DC.

Gordon Commission on Future Assessment in Education. (2013). A public policy statement. Princeton, NJ: Educational Testing Service.

Gotwals, A. W., & Songer, N. B. (2006). Cognitive predictions: BioKIDS implementation of the PADI assessment system (PADI Technical Report No. 10). Menlo Park, CA: SRI International.

Greeno, J. G. (1989). A perspective on thinking. American Psychologist, 44, 134–141.

Gyagenda, I. S., & Engelhard, G. (2010). Rater, domain, and gender influences on the assessed quality of student writing. In M. Garner, G. Engelhard, M. Wilson, & W. Fisher (Eds.), Advances in Rasch measurement (vol. 1, pp. 398–429). Maple Grove, MN: JAM Press.

Haertel, E. H. (1999). Performance assessment and education reform. Phi Delta Kappan, 80(9), 662–667.

Haertel, E. H., & Linn, R. L. (1996). Comparability. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment (NCES 96–802). Washington, DC: US Department of Education.

Hambleton, R. K., Impara, J., Mehrens, W., & Plake, B. S. (2000). Psychometric review of the Maryland School Performance Assessment Program (MSPAP). Psychometric Review Committee.

Hambleton, R. K., Jaeger, R. M., Koretz, D., Linn, R. L., Millman, J., & Phillips, S. E. (1995). Review of the measurement quality of the Kentucky Instructional Results Information System, 1991–1994. Frankfort: Office of Educational Accountability, Kentucky General Assembly.

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory. Boston, MA: Kluwer-Nijhoff.

Hamilton, L. S. (1994). An investigation of students’ affective responses to alternative assessment formats. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.

Hamilton, L. S., & Koretz, D. M. (2002). Tests and their use in test-based accountability systems. In L. S. Hamilton, B. M. Stecher, & S. P. Klein (Eds.), Making sense of test-based accountability in education (MR-1554-EDU). Santa Monica, CA: RAND.

Hamilton, L. S., Stecher, B. M., & Klein, S. P. (2002). Making sense of test-based accountability in education. Santa Monica, CA: RAND.

Hamilton, L., Stecher, B., & Yuan, K. (2008). Standards-based reform in the United States: History, research, and future directions. Washington, DC: Center on Educational Policy, Rand Corporation.

Hardy, R. A. (1995). Examining the cost of performance assessment. Applied Measurement in Education, 8(2), 121–134.

Hartman, W. T. (2002). School district budgeting. Washington, DC: Association of School Business Officials, International.

Hatch, T. (2013). Lessons from New York City’s Local Measures Project. New York, NY: National Center for Restructuring Education, Schools, and Teaching.

Heilig, J. V., & Darling-Hammond, L. (2008). Accountability Texas style: The progress and learning of urban minority students in a high-stakes testing context. Educational Evaluation and Policy Analysis, 30(2), 75–110.

Heppen, J., Jones, W., Faria, A., Sawyer, K., Lewis, S., Horwitz, A., . . . Casserly, M. (2012). Using data to improve instruction in the Great City Schools: Documenting current practice. Washington, DC: American Institutes for Research and The Council of Great City Schools.

Herl, H. E., O’Neil, H. F., Jr., Chung, G.K.W.K., Bianchi, C., Wang, S., Mayer, R., . . . Tu, A. (1999). Final report for validation of problem solving measures (CSE Technical Report No. 5). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Herman, J. L., & Golan, S. (n.d.). Effects of standardized testing on teachers and learning—another look (CSE Technical Report No. 334). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Herman, J. L., Klein, D.C.D., Heath, T. M., & Wakai, S. T. (1991). A first look: Are claims for alternative assessment holding up (CSE Technical Report No. 391). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Herman, J. L., & Linn, R. L. (2013). On the road to assessing deeper learning: The status of Smarter Balanced and PARCC assessment consortia (CRESST Report No. 823). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Hersh, R. (2009). Teaching to a test worth teaching to: In college and high school. Retrieved December 18, 2009, from http://www.cae.org/content/pro_collegework.htm

Hiebert, E. H. (1991, April). Comparisons of student reading performance on standardized and alternative measures in high-stakes contexts. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

Hieronymus, A. N., & Hoover, H. D. (1987). Iowa tests of basic skills: Writing supplement teacher’s guide. Chicago, IL: Riverside.

Higgins, D., Burstein, J., & Attali, Y. (2006). Identifying off-topic student essays without topic-specific training data. Natural Language Engineering, 12(2), 145–159.

Hill, R., & Reidy, E. (1993). The cost factors: Can performance based assessment be a sound investment? Manuscript submitted for publication.

Himley, M., & Carini, P. (2000). From another angle: The Prospect Center’s descriptive review of the child. New York, NY: Teachers College Press.

HKEAA. (2007, January 28). School-based assessment: Changing the assessment culture. http://www.hkeaa.edu.hk/en/hkdse/School_based_Assessment/SBA/8ry

Hoff, D. (2002, April 3). Md. to phase out innovative program. Education Week. Retrieved from http://www.edweek.org/ew/articles/2002/04/03/29mspap.h21.html

Hong Kong Examinations and Assessment Authority. (2009). School-based assessment: Changing the assessment culture. Retrieved from http://www.hkeaa.edu.hk/en/hkdse/School_based_Assessment/SBA/

Hong Kong Examinations and Assessment Authority. (2007). Introduction. In 2007 annual report. Retrieved September from http://eant01.hkeaa.edu.hk/hkea/redirector.asp?p_direction=body&p_clickurl=http%3A%2F%2Fwww%2Ehkeaa%2Eedu%2Ehk%2Fen%2Fannual%5Freport%2Ehtml

Hoover, H. D., & Bray G. B. (1995). The research and development phase: Can a performance assessment be cost-effective? Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.

Hoxby, C. (2002). The cost of accountability (NBER Working Paper No. 8855). Cambridge, MA: National Bureau of Economic Research. Retrieved from http://www.nber.org/papers/w8855

Huot, B. (1990). The literature of direct writing assessments: Major concerns and prevailing trends. Review of Educational Research, 60(2), 237–263.

Hymes, D. L. (1991). The changing face of testing and assessment (Critical Issues Report Stock No. 021–00338). Arlington, VA: American Association of School Administrators.

Illinois Department of Education. (2011). Action report 0711–4209: Purchase of NWEA/MAP assessments for grades k-8. Springfield, IL: Author. Retrieved from http://www.boarddocs.com/il/d365u/Board.nsf/files/8JHLSR57FE4F/$file/Measure%20of%20Academic%20Progress%20(MAP).pdf

International Baccalaureate Organization. (2005, November). IB Diploma Programme: English A1—higher level—paper 2. Retrieved from http://www.ibo.org/diploma/curriculum/examples/samplepapers/documents/gp1_englisha1hl2.pdf

International Baccalaureate Organization. (2006, May). IB Diploma Programme: Mathematics—standard level—paper 2. Retrieved from http://www.ibo.org/diploma/curriculum/examples/samplepapers/documents/gp5_mathssl2.pdf

International Baccalaureate Organization. (2008). Diploma program assessment: Methods. Retrieved from http://www.ibo.org/diploma/assessment/methods/

Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130–144.

Kaftandjieva, F., & Takala, S. (2002). Relating the Finnish matriculation examination English test results to the CEF scales. Presented at the Helsinki Seminar, University of Sofia & University of Jyväskylä.

Kahl, S. (2008, June). The assessment of 21st-century skills: Something old, something new, something borrowed. Paper presented at the Council of Chief State School Officers 38th National Conference on Student Assessment, Orlando, FL.

Kahl, S., Abeles, S., & Baron, J. (1985, May). Results of the 1984–85 Connecticut assessment of educational progress in science: Implications for improving local science programs. Paper presented at the National Science Teachers Association area meeting, Hartford, CT.

Kamata, L., & Tate, R. L. (2005). The performance of a method for the long-term equating of mixed format assessment. Journal of Educational Measurement, 42, 193–213.

Kane, M. T. (2006). Validation. In B. Brennan (Ed.), Educational measurement. Westport, CT: American Council on Education and Praeger.

Kane, M., Crooks, T., & Cohen, A. (1999). Validating measures of performance. Educational Measurement: Issues and Practice, 18(2), 5–17.

Kates, L. (2011). Toward meaningful assessment: Lessons from five first-grade classroom (Occasional Paper No. 26). New York, NY: Bank Street College.

Kaur, B. (2005). Assessment of mathematics in Singapore schools: The present and future. Singapore: National Institute of Education.

Keiper, S., Sandene, B. A., Persky, H. R., & Kuang, M. (2009). The nation’s report card: Arts 2008 music and visual arts (NCES 2009–488). Washington, DC: National Center for Education Statistics, Institute of Education Sciences, US Department of Education.

Kentucky Department of Education. (1997). KIRIS accountability cycle 2 technical manual (Technical beport). Dover, NH: Author.

Kentucky Department of Education. (2008). Fact sheet: Reconsidering myths surrounding writing instruction and assessment in Kentucky. Retrieved from http://www.education.ky.gov/kde/instructional+resources/literacy/kentucky+writing+program/fact+sheet+-+reconsidering+myths+surrounding+writing+instruction+and+assessment+in+kentucky.htm

Kentucky Department of Education. (2009). On-demand writing released prompts in grades 5, 8, and 12. Retrieved from http://www.education.ky.gov/kde/administrative+resources/testing+and+reporting+/district+support/link+to+released+items/on-demand+writing+released+prompts.htm

Khattri, N., Kane, M., & Reeve, A. (1995). How performance assessments affect teaching and learning. Educational Leadership, 53(3), 80–83.

Kim, S., Walker, M. E., & McHale, F. (2008a, May). Equating of mixed-format tests in large-scale assessments (ETS Research Report No. 08–26). Princeton, NJ: ETS.

Kim, S., Walker, M. E., & McHale, F. (2008b, October). Comparisons among designs for equating constructed-response items (ETS Research Report No. 08–53). Princeton, NJ: ETS.

Kirst, M., & Mazzeo, C. (1996, April). The rise, fall and rise of state assessment in California, 1993–1996. Paper presented at the annual meeting of the American Educational Research Association, New York, NY.

Klein, S. (2008). Characteristics of hand and machine-assigned scores to college students’ answers to open-ended tasks. In D. Nolan & T. Speed (Eds.), Probability and statistics: Essays in honor of David A. Freeman (vol. 2, pp. 76–89). Institute of Mathematical Statistics.

Klein, S., Benjamin, R., Shavelson, R., & Bolus, R. (2007). The collegiate learning assessment: Facts and fantasies. Evaluation Review, 31(5), 415–439.

Klein, S., Freedman, D., Shavelson, R., & Bolus, R. (2008). Assessing school effectiveness. Evaluation Review, 32, 511–525.

Klein, S. P., Jovanovic, J., Stecher, B. M., McCaffrey, D., Shavelson, R. J., Haertel, E., . . . Comfort, K. (1997). Gender and racial/ethnic differences on performance assessments in science. Educational Evaluation and Policy Analysis, 19(2), 83–97.

Klein, S., Liu, O. L., Sconing, J., Bolus, R., Bridgeman, B., Kugelmass, H., . . . Steedle, J. (2009). Test Validity Study (TVS) Report. Supported by the Fund for the Improvement of Postsecondary Education. Retrieved from http://www.cae.org/content/pdf/TVS_Report.pdf

Klein, S. P., McCaffrey, D., Stecher, B., & Koretz, D. (1995). The reliability of mathematics portfolio scores: Lessons from the Vermont experience. Applied Measurement in Education, 8(3), 243–260.

Klein, S. P., Stecher, B. M., Shavelson, R. J., McCaffrey, D., Ormseth, T., Bell, R. M., . . . & Othman, A. R. (1998). Analytic versus holistic scoring of science performance tasks. Applied Measurement in Education, 11(2), 121–138.

Klein, S., Steedle, J., & Kugelmass, H. (2009). CLA Lumina longitudinal study summary findings. New York: Council for Aid to Education.

Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling and linking: Methods and practices (2nd ed.). New York, NY. Springer.

Koretz, D., & Barron, S. I. (1998). The validity of gains on the Kentucky Instructional Results Information System (KIRIS). Santa Monica, CA: RAND.

Koretz, D., Barron, S., Klein, S., & Mitchell, K. (1996). Perceived effects of the Maryland School Performance Assessment Program (CSE Technical No. Report). Los Angeles: University of Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing.

Koretz, D., Barron, S., Mitchell, M., & Stecher, B. (1996). Perceived effects of the Kentucky Instructional Results Information System (KIRIS). Santa Monica, CA: RAND.

Koretz, D., Klein, S. P., McCaffrey, D. F., & Stecher, B. M. (1993). Interim report: The reliability of Vermont portfolio scores in the 1992–93 school year. Santa Monica, CA: RAND. Retrieved from http://www.rand.org/pubs/reprints/RP260

Koretz, D. M., Linn, R. L., Dunbar, S. B., & Shepard, L. A. (1991, April). The effects of high-stakes testing on achievement: Preliminary findings about generalization across tests. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

Koretz, D., McCaffrey, D., & Hamilton, L. (2001). Toward a framework for validating gains under high-stakes conditions (CSE Technical No. Report 551). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994). The Vermont Portfolio Assessment Program: Findings and implications. Educational Measurement: Issues and Practice 13(3), 5–16.

Korpela, S. (2004, December). The Finnish school: A source of skills and well-being: A day at Stromberg Lower Comprehensive School. Retrieved from http://virtual.finland.fi/netcomm/news/showarticle.asp?intNWSAID=30625

Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2–3), 259–284.

Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated essay scoring: A cross disciplinary perspective. In M. D. Shermis & J. Burstein (Eds.), Automated Essay scoring and annotation of essays with the Intelligent Essay Assessor (pp. 87–112). Mahwah, NJ: Erlbaum.

Lane, S. (1993). The conceptual framework for the development of a mathematics performance assessment instrument. Educational Measurement: Issues and Practice, 12(3), 16–23.

Lane, S. (2011). Issues in the design and scoring of performance assessments that assess complex thinking skills. In G. Schraw (Ed.), Assessment of higher order thinking skills. Charlotte, NC: Information Age Publishing.

Lane, S., Liu, M., Ankenmann, R. D., & Stone, C. A. (1996). Generalizability and validity of a mathematics performance assessment. Journal of Educational Measurement, 33(1), 71–92.

Lane, S., Parke, C. S., & Stone, C. A. (2002). The impact of a state performance-based assessment and accountability program on mathematics instruction and student learning: Evidence from survey data and school performance. Educational Assessment, 8(4), 279–315.

Lane, S., Silver, E. A., Ankenmann, R. D., Cai, J., Finseth, C., Liu, M., . . . Zhu, Y. (1995). QUASAR Cognitive Assessment Instrument (QCAI). Pittsburgh, PA: University of Pittsburgh, Learning Research and Development Center.

Lane, S., & Stone, C. A. (2006). Performance assessments. In B. Brennan (Ed.), Educational measurement. Westport, CT: American Council on Education and Praeger.

Lane, S., Stone, C. A., Ankenmann, R. D., & Liu, M. (1995). Examination of the assumptions and properties of the graded item response model: An example using a mathematics performance assessment. Applied Measurement in Education, 8, 313–340.

Lane, S., Stone, C. A., Parke, C. S., Hansen, M. A., & Cerrillo, T. L. (2000, April). Consequential evidence for MSPAP from the teacher, principal and student perspective. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.

Lane, S., Wang, N., & Magone, M. (1996). Gender related DIF on a middle school mathematics performance assessment. Educational Measurement: Issues and Practice, 15, 21–27, 31.

Laukkanen, R. (2008). Finnish strategy for high-level education for all. In N. C. Soguel & P. Jaccard (Eds.), Governance and performance of education systems. New York, NY: Springer.

Lavonen, J. (2008). Reasons behind Finnish students’ success in the PISA Scientific Literacy Assessment. University of Helsinki, Finland. Retrieved from http://www.oph.fi/info/finlandinpisastudies/conference2008/science_results_and_reasons.pdf

Lawrenz, F., Huffman, D., & Welch, W. (2000). Considerations based on a cost analysis of alternative test formats in large scale science assessments. Journal of Research in Science Teaching, 37(6), 615–626.

Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short answer questions. Computers and Humanities, 37(4), 389–405.

Leacock, C., & Chodorow, M. (2004). A pilot study of automated scoring of constructed responses. Paper presented at the 30th Annual International Association of Educational Assessment Conference, Philadelphia, PA.

Lee, V., Smith, J. B., & Croninger, R. G. (1995). Another look at high school restructuring. Issues in Restructuring Schools, 9 (Fall), 1–10.

Levin, H. M., & McEwan, P. J. (2000). Cost-effectiveness analysis (2nd ed.). Thousand Oaks, CA: Sage.

Lieberman, A., & Miller, L. (Eds.). (2001). Teachers caught in the action: The work of professional development. New York, NY: Teachers College Press.

Linn, R. L. (1993). Educational assessment: Expanded expectations and challenges. Educational Evaluation and Policy Analysis, 15, 1–16.

Linn, R. L. (2000). Assessment and accountability. Educational Researcher, 29(2), 4–16.

Linn, R. L., Baker, E. L., & Betebenner, D. W. (2002). Accountability systems: Implications of requirements of the No Child Left Behind Act of 2001. Educational Researcher, 31(6), 3–16.

Linn, R. C., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessments: Expectation and validation criteria. Educational Researcher, 20(8), 15–21.

Linn, R. L., & Burton, E. (1994). Performance-based assessment: Implications of task specificity. Educational Measurement: Issues and Practice, 13(1), 5–8.

Linn, R. L., Burton, E., DeStafano, L., & Hanson, M. (1996). Generalizability of new standards project 1993 pilot study tasks in mathematics. Applied Measurement in Education, 9(3), 201–214.

Little, J. W. (1993). Teachers’ professional development in a climate of educational reform. New York, NY: National Center for Restructuring Education, Schools, and Teaching.

Little, J. W. (1999). Organizing schools for teacher learning. In L. Darling-Hammond & G. Sykes (Eds.), Teaching as the learning profession: Handbook of teaching and policy (pp. 233–262). San Francisco, CA: Jossey-Bass.

Little, J. W., Curry, M., Gearhart, M., & Kafka, J. (2003). Looking at student work for teacher learning, teacher community, and school reform. Kappan, 85(5), 184–192.

Liu, O. L., Lee, H. C., Hofstetter, C., & Linn, M. C. (2008). Assessing knowledge integration in science: Constructs, measures, and evidence. Educational Assessment, 13(1), 33–55.

Lloyd-Jones, R. (1977). Primary trait scoring. In C. R. Cooper & L. Odell (Eds.), Evaluating writing: Describing, measuring, and judging (pp. 33–60). Urbana, IN: National Council for Teachers in Education.

Lumley, T. (2005). Assessing second language writing: The rater’s perspective. Frankfurt: Lang.

Lyman, P., & Varian, H. R. (2003). How much information. Berkeley: School of Information Management and Systems, University of California, Berkeley. Retrieved from http://www.sims.berkeley.edu/how-much-info-2003/

Madaus, G. F., & O’Dwyer, L. M. (1999). A short history of performance assessment: Lessons learned. Phi Delta Kappan, 688–695.

Madaus, G. F., West, M. M., Harmon, M. C., Lomax, R. G., & Viator, K. A. (1992). The influence of testing on teaching mathematics and science in grades 4–12: Executive summary. Chestnut Hill, MA: Center for the Study of Testing, Evaluation, and Educational Policy, Boston College.

Martinez, M. E., & Katz, I. R. (1996). Cognitive processing requirements of constructed figural response and multiple-choice items in architecture assessment. Educational Assessment, 3(1), 83–98.

Maryland State Board of Education. (1995). Maryland school performance report: State and school systems. Baltimore, MD: Author.

Maryland State Department of Education. (1990). Technical report: Maryland Writing Test, Level II. Baltimore: Author. Retrieved from http://www.marces.org/mdarch/htm/M031987.HTM

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.

Mathematics Assessment Resource Service. (2000). Balanced assessment for the mathematics curriculum. Upper Saddle River, NJ: Dale Seymour Publications.

McBee, M. M., & Barnes, L. L. (1998). Generalizability of a performance assessment measuring achievement in eighth-grade mathematics. Applied Measurement in Education, 11(2), 179–194.

McCain, T., & Jukes, I. (2001). Windows on the future: Education in the age of technology. Thousand Oaks, CA: Corwin Press.

McDonald, J. P. (2001). Students’ work and teachers’ learning. In A. Lieberman & L. Miller (Eds.), Teachers caught in the action: Professional development that matters (pp. 209–35). New York, NY: Teachers College Press.

McDonnell, L. M. (1994). Assessment polity as persuasion and regulation. American Journal of Education, 102(4), 394–420.

McDonnell, L. M. (2004). Politics, persuasion and educational testing. Cambridge, MA: Harvard University Press.

McDonnell, L. M. (2009). Repositioning politics in education’s circle of knowledge. Educational Researcher, 38(6), 417–427.

McLaughlin, M. (2005). Listening and learning from the field: Tales of policy implementation and situated practice. In A. Lieberman (Ed.), The roots of educational change (pp. 58–72). New York, NY: Teachers College Press.

McNamara, T. F. (1996). Measuring second language performance. London: Longman.

Measured Progress. (2009). New England Common Assessment Program 2008–2009 technical report. Dover, NH: Retrieved from http://www.ride.ri.gov/assessment/DOCS/NECAP/Tech_Manual/2008–09_TechReport/2008–09_NECAP_TechReport.pdf

Mehrens, W. A. (1992). Using performance assessment for accountability purposes. Educational Measurement: Issues and Practice, 11, 3–9, 20.

Meier, S. L., Rich, B. S., & Cady, J. (2006). Teachers’ use of rubrics to score non-traditional tasks: Factors related to discrepancies in scoring. Assessment in Education, 13(1), 69–95.

Meisels, S. J., Xue, Y., & Shamblott, M. (2008). Assessing language, literacy, and mathematics skills with “Work Sampling for Head Start.” Early Education and Development, 19(6), 963–981.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–104). New York, NY: American Council on Education and Macmillan.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessment. Educational Researcher, 23(2), 12–23.

Messick, S. (1996). Validity of performance assessments. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment (pp. 1–18). Washington, DC: National Center for Educational Statistics.

Miller, M. D., & Crocker, L. (1990). Validation methods for direct writing assessment. Applied Measurement in Education, 3(3), 285–296.

Miller, M. D., & Linn, R. L. (2000). Validation of performance-based assessments. Applied Psychological Measurement, 24(4), 367–378.

Mishan, E. J., & Quah, E. (2007). Cost benefit analysis (5th ed.). New York, NY: Routledge.

Mislevy, R. J. (1993). Foundations of a new theory. In N. Frederiksen, R. J. Mislevy, & I. Bejar (Eds.), Test theory for a new generation of tests (pp. 19–39). New York: Routledge.

Mislevy, R. J. (1996). Test theory reconceived. Journal of Educational Measurement, 33(4), 379–416.

Mislevy, R. J., & Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educational Measurement: Issues and Practice, 25(4), 6–20.

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). Design and analysis in task-based language assessment (CSE Report No. 597). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1(1), 3–62.

Mislevy, R. J., Steinberg, L. S., Breyer, F. J., Almond, R. G., & Johnson, L. (2002). Making sense of data from complex assessments. Applied Measurement in Education, 15(4), 363–390.

Monk, D. H. (1990). Educational finance: An economic approach. New York, NY: McGraw-Hill.

Monk, D. H. (1995). The costs of pupil performance assessment: A summary report. Journal of Education Finance, 20(4), 363–371.

Moss, P. A., Girard, B., & Haniford, L. (2006). Validity in educational assessment. Review of Research in Education, 30, 109–162.

Mullis, I.V.S. (1984). Scoring direct writing assessments: What are the alternatives? Educational Measurement: Issues and Practice, 3(1), 16–18.

Murad, L. C. (2008). Hong Kong’s education system: Challenge for the future. Retrieved from http://www.lehigh.edu/~incntr/publications/perspectives/v20/Murad.pdf

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.

Murnane, R., & Levy, F. (1996). Teaching the new basic skills: Principles for educating children to thrive in a changing economy. New York, NY: Free Press.

Myford, C. M., & Mislevy, R. J. (1995). Monitoring and improving a portfolio assessment system (Center for Performance Assessment Research Report). Princeton, NJ: Educational Testing Service.

Nadeau, L., Richard, J.-F., & Godbout, P. (2008). The validity and reliability of a performance assessment procedure in ice hockey. Physical Education and Sport Pedagogy, 13(1), 65–83.

National Assessment of Educational Progress. (1987). Learning by doing: A manual for teaching and assessing higher-order thinking in science and mathematics (Report No. 17-HOS-80). Princeton, NJ: Educational Testing Service.

National Assessment of Educational Progress. (2009a). The nation’s report card. Retrieved from http://nces.ed.gov/nationsreportcard/

National Assessment of Educational Progress. (2009b). Writing framework for the 2011 National Assessment of Educational Progress. Retrieved from http://www.nagb.org/publications/frameworks.htm

National Association of State Boards of Education. (2009). Reform at a crossroads: A call for balanced systems of assessment and accountability. Arlington, VA: Author.

National Center on Education and the Economy. (2007). Tough choices, tough times: The report of the New Commission on Skills of the American Workforce. Washington, DC: Author.

National Center for Education Statistics. (1995, January). Windows into the classroom: NAEP’s 1992 writing portfolio study. Washington, DC: US Department of Education.

National Center for Education Statistics. (2005). National assessment of educational progress, 2007: Mathematics assessments. Washington, DC: US Department of Education, Institute of Education Sciences. Retrieved from http://nces.ed.gov/nationsreportcard/itmrlsx/search.aspx?subject=mathematics

National Commission on Testing and Public Policy. (1990). From gatekeeper to gateway: Transforming testing in America. Boston: Author.

National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA: Author.

National Council on Education Standards and Testing. (January 24, 1992). Raising standards for American education: A report to Congress, the Secretary of Education, the National Education Goals Panel, and the American people. Washington, DC: Government Printing Office.

National Research Council. (1996). National science education standards. Washington, DC: National Academy Press.

National Research Council. (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academies Press.

National Research Council. (2006). Systems for state science assessment. Washington, DC: National Academies Press.

National Research Council. (2008). Assessing accomplished teaching: Advanced-level certification programs. Washington, DC: National Academies Press.

Nelson, N., & Calfee, R. C. (1998). The reading-writing connection. In N. Nelson & R. C. Calfee (Eds.), The reading-writing connection. Chicago, IL: University of Chicago Press.

New England Common Assessment Program. (2009). 2009 test administrator manual—Grade 11 science. Retrieved from http://education.vermont.gov/new/pdfdoc/pgm_assessment/necap/manuals/science/admin_manual_09_grade_11.pdf

New Hampshire. Code of Administrative Rules-Education, 306.27(d) C.F.R. (2005).

New Hampshire Department of Education. (2005). New Hampshire code of administrative rules: Education, 306.27(d) C.F.R. (2005).

New Hampshire Department of Education. (2013). Enriching New Hampshire’s assessment and accountability systems through the Quality Performance Assessment Framework. Retrieved from http://www.education.nh.gov/assessment-systems/documents/executive-summary.pdf

New Jersey Department of Education. (2003). 2002–03 HSPA/SRA Mathematics, Performance Assessment Task. Trenton: New Jersey Department of Education.

New Jersey Department of Education. (2004). Mathematics: A rubric scoring handbook (PT No. 1504.30). Trenton: New Jersey Department of Education.

New Jersey Department of Education. (2005). October 2005 and March 2006 HSPA cycle I and cycle II score interpretation manual. Retrieved from http://www.state.nj.us/education/assessment/hs/sim.pdf

New Jersey Department of Education. (2008a). March 2009 high school proficiency assessment: Student preparation booklet. Retrieved from http://www.state.nj.us/counties/cumberland/0610/schools/distschools/senior/guidanceimages/HSPA%20Student%20Prep%20Booklet%20%2708.pdf

New Jersey Department of Education. (2008b). Special review assessment administration manual 2008–2009 school year. Retrieved from http://www.state.nj.us/education/assessment/hs/sra/man.pdf

New Jersey Department of Education. (2009). State board of education adopts revised high school graduation requirements and revised curriculum standards in six content areas. Retrieved from http://www.state.nj.us/education/news/2009/0617sboe.htm

New Jersey Department of Education. (2013). Alternative High School Assessment (AHSA) Administration Manual, 2013–2014 school year. Trenton, NJ: Author. Retrieved from http://www.state.nj.us/education/assessment/hs/sra/man.pdf

New York Commissioner of Education. (n.d.). Department-approved alternative examinations acceptable for meeting requirements for a local or Regents diploma. Retrieved from http://www.emsc.nysed.gov/osa/hsgen/list.pdf

New York State Education Department. (1987). History of Regents Examinations: 1865 to 1987. Retrieved from http://www.emsc.nysed.gov/osa/hsinfogen/hsinfogenarch/rehistory.htm

New York State Education Department. (1996, October 10). Report of the Technical Advisory Group for the New York State Assessment Project. Albany: Author.

New York State Education Department, Office of State Assessment. (2008). Regents Examinations, Regents competency tests, and second language proficiency examinations: School administrator’s manual. Retrieved from http://www.emsc.nysed.gov/osa/sam/secondary/sam08-pdf/nysed-sam08.pdf

New York State Education Department, Office of State Assessment. (2009, August 17). High school general information. Retrieved from http://www.emsc.nysed.gov/osa/hsgen.html

Newmann, F., Marks, H., & Gamoran, A. (1996). Authentic achievement: Restructuring schools for intellectual quality. San Francisco: Jossey-Bass.

Ng, P. T. (2008). Educational reform in Singapore: From quantity to quality. Education Research on Policy and Practice, 7, 5–15.

Niemi, D., Baker, E. L., & Sylvester, R. M. (2007). Scaling up, scaling down: Seven years of performance assessments development in the nation’s second largest school district. Educational Assessment, 12(3&4), 195–214.

Niemi, D., Wang, J., Steinberg, D. H., Baker, E. L., & Wang, H. (2007). Instructional sensitivity of a complex language arts performance assessment. Educational Assessment, 12(3&4), 215–238.

No Child Left Behind Act. (2001). PL No. 107–110, sec. 115 Stat. 1425 (2002).

NOCTI. (2009). Site coordinator guide for student assessment. Retrieved from http://www.nocti.org/PDFs/Coordinator_Guide_for_Student_Testing.pdf

Odden, A. R. (2009). Ten strategies for doubling student performance. Thousand Oaks, CA: Corwin Press.

Odden, A. R., & Archibald, S. J. (2009). Doubling student performance . . . and finding the resources to do it. Thousand Oaks, CA: Corwin Press.

Odden, A., Goetz, M., Archibald, S., Gross, B., Weiss, M., & Mangan, M. T. (2008). The cost of instructional improvement: Resource allocation in schools using comprehensive strategies to change classroom practice. Journal of Education Finance, 33(4), 381–405.

Odden, A. R., & Picus, L. O. (2014). School finance: A policy perspective (5th ed.). New York, NY: McGraw-Hill.

Odden, A., Picus, L. O., Archibald, S., Goetz, M., Mangan, M. T., & Aportela, A. (2007). Moving from good to great in Wisconsin: Funding schools adequately and doubling student performance. Madison: University of Wisconsin, Wisconsin Center for Education Research, Consortium for Policy Research in Education. Retrieved from http://www.wcer.wisc.edu/cpre/finance/WI%20March%201%202007%20Adequacy%20Report1.pdf

Odden, A., Picus, L. O., Archibald, S., & Smith, J. (2009). Wyoming school use of resources 2: Making more progress in identifying how schools use resources in ways that boost student performance on state tests. Retrieved from http://legisweb.state.wy.us/2008/interim/schoolfinance/Resources.pdf

Odden, A., Picus, L. O., & Goetz, M. (2006). Recalibrating the Arkansas school funding structure. North Hollywood, CA: Lawrence O. Picus and Associates.

O’Donnell, S. (2004, December). International review of curriculum and assessment frameworks: Qualifications and curriculum authority and National Foundation for Educational Research. Retrieved from http://www.inca.org.uk/pdf/comparative.pdf

Office of the Superintendent of Public Instruction. (2012). Summary of findings: 2011–12 OSPI-developed assessments social studies, the arts, health, fitness, and educational technology. Olympia, WA: OSPI. Retrieved from http://www.k12.wa.us/assessment/pubdocs/2011–12SummaryofFindings.pdf

Office of Technology Assessment. (1992). Testing in America’s schools: Asking the right questions (OTA-SET-519). Washington, DC: US Government Printing Office.

O’Reilly, T., & Sheehan, K. M. (2009, June). Cognitively-based assessment of, for, and as learning: A framework for assessing reading competency. Princeton, NJ: Educational Testing Service.

Organization for Economic Cooperation and Development. (2007). PISA 2006: Science Competencies for Tomorrow’s World. Volume 1: Analysis. Retrieved from http://www.pisa.oecd.org/dataoecd/30/17/39703267.pdf

Organization for Economic Cooperation and Development. (2012). What students know and can do: Student performance in mathematics, reading, and science. Paris: OECD.

Paek, P. L., & Foster, D. (2012). Improved mathematical teaching practices and student learning using complex performance assessment tasks. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.

Page, E. B. (1994). Computer grading of student prose, using modern concepts and software. Journal of Experimental Education, 62(2), 127–143.

Page, E. B. (2003). Project essay grade: PEG. In M. Shermis & J. Burstein (Eds.), Automated essay scoring: Across-disciplinary perspective (pp. 43–54). Mahwah, NJ: Erlbaum.

Palm, T. (2008). Performance assessment and authentic assessment: A conceptual analysis of the literature. Practical Assessment, Research and Evaluation, 13(4). Retrieved from http://pareonline.net/getvn.asp?v=13&n=4

Parke, C. S., & Lane, S. (2008). Examining alignment between state performance assessments and mathematics classroom activities. Journal of Educational Research, 101(3), 132–146.

Parke, C. S., Lane, S., & Stone, C. A. (2006). Impact of a state performance assessment program in reading and writing. Educational Research and Evaluation, 12(3), 239–269.

Parker, C. E., Louie, J., & O’Dwyer, L. (2009). New measures of English language proficiency and their relationship to performance on large-scale content assessments (Issues and Answers Report, REL 2009–No. 066). Washington, DC: US Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast and Islands. Retrieved from http://ies.ed.gov/ncee/edlabs

Partnership for Assessment of Readiness for College and Careers. (n.d.). Grade 10 prose constructed response—Sample 1 from Literary Analysis Task. Retrieved from Parcconline.org

Patz, R. J. (1996). Markov chain Monte Carlo methods for item response theory models with applications for the National Assessment of Educational Progress. Unpublished manuscript, Carnegie Mellon University, Pittsburgh, PA.

Patz, R. J., Junker, B. W., Johnson, M. S., & Mariano, L. T. (2002). The hierarchical rater model for rated test items and is application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27(4), 341–384.

Pearson, P., Calfee, R., Walker Webb, P., & Fleischer, S. (2002). The role of performance-based assessments in large scale accountability systems: Lessons learned from the inside. Washington DC: Council of Chief State School Officers.

Pecheone, R. L., & Chung, R. R. (2006). Evidence in teacher education: The performance assessment for California teachers (PACT). Journal of Teacher Education, 57(1), 22–36.

Pecheone, R., & Kahl, S. (2010). Through a looking glass: Lessons learned and future directions for performance assessment. Stanford: Stanford Center for Opportunity Policy in Education.

Pecheone, R. L., & Kahl, S. (n.d.). Lessons from the United States for developing performance assessments. Unpublished manuscript.

Pellegrino, J., Chudowsky, N., & Glaser, R. (Eds.). (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.

Pellegrino, J. W., & Hilton, M. L. (eds.). (2012). Education for life and work: Developing transferable knowledge and skills in the 21st century. Washington, DC: National Academies Press.

Peng, S. S., Wright, D., & Hill, S. T. (1995). Understanding racial-ethnic differences in secondary school science and mathematics achievement (NCES 95–710). Washington, DC: US Department of Education.

Performance Standards Consortium. (2013). Educating for the 21st century: Data report on the New York Performance Standards Consortium. NY: Author. Retrieved from http://performanceassessment.org/articles/DataReport_NY_PSC.pdf

Petterson, A. (2008). The national tests and national assessment in Sweden. Stockholm, Sweden: PRIM gruppen. Retrieved from http://www.prim.su.se/artiklar/pdf/Sw_test_ICME.pdf

Pianta, R. C., La Paro, K., & Hamre, B. K. (2007). The classroom assessment scoring system—CLASS. Baltimore, MD: Brookes Publishing.

Picus, L. O. (1994). A conceptual framework for analyzing the costs of alternative assessment (Technical Report No. 384). Los Angeles: University of California, Center for Research on Student Standards, Evaluation and Testing. Retrieved from http://www.cse.ucla.edu/products/summary.asp?report=384

Picus, L. O., Odden, A., Aportela, A., Mangan, M. T., & Goetz, M. (2008). Implementing school finance adequacy: School level resource use in Wyoming following adequacy-oriented finance reform. North Hollywood, CA: Lawrence O. Picus and Associates.

Picus, L. O., & Tralli, A. (1998). Alternative assessment programs: What are the true costs? An analysis of the total costs of assessment in Kentucky and Vermont. Los Angeles: University of California, Center for Research on Student Standards, Evaluation and Testing. Retrieved from http://www.cse.ucla.edu/products/Reports/TECH441new.pdf

Picus, L. O., Tralli, A., & Tasheny, S. (1996). Estimating the costs of student assessment in North Carolina and Kentucky: A state level analysis (Technical Report No. 408). Los Angeles: University of California, Center for Research on Student Standards, Evaluation and Testing. Retrieved from http://www.cse.ucla.edu/products/summary.asp?report=408.

Picus, L. O., and Associates. (2010). A strategic plan for the Little Rock schools.

Pinckney, E., & Taylor, G. (2006). Standards and assessment memorandum. Retrieved from http://education.vermont.gov/new/pdfdoc/pgm_curriculum/local_assessment/assessment_guidance_030106.pdf

Polikoff, M. S., Porter, A. C., & Smithson, J. (2011). How well aligned are state assessments of student achievement with state content standards? American Educational Research Journal, 48(4), 965–995.

Popham, W. J. (1999). Why standardized test scores don’t measure educational quality. Educational Leadership, 56(6), 8–15.

Popham, W. J. (2003). Living (or dying) with your NCLB tests. School Administrator, 60(11), 10–14.

Popham, W. J., Cruse, K. L., Rankin, S. C., Sandifer, P. D., & Williams, P. L. (1985). Measurement-driven instruction: It’s on the road. Phi Delta Kappan, 66(9), 628–634.

Powers, D. E., Burstein, J. C., Chodorow, M. S., Fowles, M. E., & Kukich, K. (2002). Stumpinge-rater: Challenging the validity of automated scoring of essays. Journal of Educational Computing Research, 26, 407–425.

Quaglia Institute. (2008). My voice student report 2008. Portland, ME: Quaglia Institute for Student Aspirations.

Qualifications and Curriculum Authority. (2008a). Sweden: Assessment arrangements. Retrieved from http://www.inca.org.uk/690.html

Qualifications and Curriculum Authority. (2008b). England: Assessment arrangements. Retrieved from http://www.inca.org.uk/1315, http://education.qld.gov.au/corporate/newbasics/html/richtasks/richtasks.html

Qualifications and Curriculum Authority. (2009). Assessing pupils’ progress: Assessment at the heart of learning. Retrieved May 23, 2009, from http://www.qca.org.uk/libraryAssets/media/12707_Assessing_Pupils_Progress_leaflet_-_web.pdf

Queary, P. (2004, March 5). Senate passes WASL changes. Seattle Times.

Queensland Government. (2001). New basics: The why, what, how and when of rich tasks. Retrieved from http://education.qld.gov.au/corporate/newbasics/pdfs/richtasksbklet.pdf

Raizen, S., Baron, J. B., Champagne, A. B., Haertel, E., Mullis, I.N.V., & Oakes, J. (1989). Assessment in elementary school science education. Washington, DC: National Center for Improving Science Education.

Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36.

Resnick, L. B. (1987). Education and learning to think. Washington, DC: National Academy Press.

Resnick, L. (1995). Standards for education. In D. Ravitch (Ed.), Debating the future of American standards. Washington, DC: Brookings Institution.

Resnick, L. B., & Resnick, D. P. (1982). Assessing the thinking curriculum: New tools for educational reform. In B. G. Gifford & M. C. O’Conner (Eds.), Changing assessment: Alternative views of aptitude, achievement and instruction (pp. 37–55). Boston, MA: Kluwer.

Resnick, L. B., & Resnick, D. P. (1992). Assessing the thinking curriculum: New tools for educational reform. In B. R. Gifford & M. C. O’Connor (Eds.), Changing assessments: Alternative views of aptitude, achievement and instruction. Boston: Kluwer.

Rhode Island Department of Education. (2005). The Rhode Island high school diploma system: All kids well prepared for high-performing, bright futures. Retrieved from http://www.ride.ri.gov/HighSchoolReform/DOCS/PDFs/HIGH%20school%20reform/HSDiploma_v071405.pdf

Rhode Island Board of Regents for Elementary and Secondary Education. (2008). Regulations L-6–3.2. Retrieved from www.cps.k12.ri.us

Rhode Island Department of Education & Education Alliance at Brown University. (2005a). Required elements of an exhibition system. Retrieved from http://www.ride.ri.gov/HighSchoolReform/DSLAT/pdf/exh_040203.pdf

Rhode Island Department of Education & Education Alliance at Brown University. (2005b). Required graduation portfolio elements. Retrieved from http://www.ride.ri.gov/HighSchoolReform/DSLAT/pdf/por_040103.pdf

Rohten, D., Carnoy, M., Chabran, M., & Elmore R. (2003). The conditions and characteristics of assessment and accountability. In M. Carnoy, R. Elomore, & L. Siskin (Eds.), The new accountability: High schools and high stakes testing. New York, NY: Taylor & Francis.

Roid, G. H. (1994). Patterns of writing skills derived from cluster analysis of direct-writing assessments. Applied Measurement in Education, 7(2), 159–170.

Romberg, T. A., Zarinia, E. A., & Williams, S. R. (1989). The influence of mandated testing on mathematics instruction: Grade 8 teachers’ perceptions. Madison: National Center for Research in Mathematical Science Education, University of Wisconsin-Madison.

Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of the IntelliMetricSM essay scoring system. Journal of Technology, Learning, and Assessment, 9(4), 1–21.

Rustique-Forrester, E. (2005). Accountability and the pressures to exclude: A cautionary tale from England. Education Policy Analysis Archives. Retrieved from http://epaa.edu/epaa/v13n26

Salahu-Din, D., Persky, H., & Miller, J. (2008). The nation’s report card: Writing 2007 (NCES 2008–468). Washington DC: National Center for Education Statistics, Institute of Education Sciences, US Department of Education.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph (No. 17).

Samejima, F. (Ed.). (1996). The graded response model. New York, NY: Springer.

Sandene, B., Allen, N., Bennett, Braswell, J. R., Horkay, N., Kaplan, B., & Oranje, A. (2005). Online assessment in mathematics and writing: Reports from the NAEP technology-based assessment project, research and development series (NCES 2005–457). Washington, DC: US Department of Education, National Center for Education Statistics.

Schlafly, P. (2001). Dumbing down and developing diversity. Phyllis Schlafly Report, 34(8). Retrieved from http://www.eagleforum.org/psr/2005/mar05/psrmar05.html

Schleicher, A. (2009). International assessment of student learning outcomes. In L. Pinkus (Ed.), Meaningful measurement: The role of assessments in improving high school education in the twenty-first century. Washington, DC: Alliance for Excellent Education.

Schmidt, W. H., Wang, H. C., & McKnight, C. (2005). Curriculum coherence: An examination of US mathematics and science content standards from an international perspective. Journal of Curriculum Studies, 37(5), 525–559.

Seidel, S. S. (1998). Wondering to be done: The Collaborative Assessment Conference. In D. Allen (Ed.), Assessing student learning: From grading to understanding (pp. 21–39). New York, NY: Teachers College Press.

Shapley, K. S., & Bush, M. J. (1999). Developing a valid and reliable portfolio assessment in the primary grades: Building on practical experience. Applied Measurement in Education, 12(2), 111–132.

Shavelson, R. J. (2008). The collegiate learning assessment. Paper presented at the 2007 Ford Policy Forum: Forum for the Future of Higher Education, Aspen, CO. Retrieved from http://net.educause.edu/ir/library/pdf/fp085.pdf

Shavelson, R. J., Baxter, G. P., & Gao, X. (1993). Sampling variability of performance assessments. Journal of Educational Measurement, 30(3), 215–232.

Shavelson, R., Baxter, G., & Pine, J. (1991). Performance assessment in science. Applied Measurement in Education, 4(4), 347–362.

Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 22–27.

Shavelson, R. J., & Ruiz-Primo, M. A. (1998, November). On the assessment of science achievement conceptual underpinnings for the design of performance assessments: Report of year 2 activities (CSE Technical Report 481). Los Angeles: University of California, Center for Research on Evaluation, Standards, and Student Testing.

Shavelson, R., Ruiz-Primo, M., & Solano-Flores, G. (1998). Toward a science performance assessment technology. Evaluation and Program Planning, 21(2), 171–184.

Shavelson, R. J., Ruiz-Primo, M. A., & Wiley, E. W. (1999). Note on sources of sampling variability. Journal of Educational Measurement, 36(1), 61–71.

Shavelson, R. J., Ruiz-Primo, M. A., & Wiley, E. W. (2005). Windows into the mind. Higher Education, 49, 413–430.

Sheingold, K., Heller, J. I., & Paulukonis, S. T. (1995). Actively seeking evidence: Teacher change through assessment development (Report MS No. 94–04). Princeton, NJ: Educational Testing Service.

Sheingold, K., Heller, J. I., & Storms, B. A. (1997, April). On the mutual influence of teachers’ professional development and assessment quality in curricular reform. Paper presented at the annual meeting of the American Educational Research Association, Chicago.

Shepard, L. A. (1991). Psychometricians’ beliefs about learning. Educational Researcher, 20(7), 2–16.

Shepard, L. A. (2002). The hazards of high stakes testing. Issues in Science and Technology, 19(2), 53–58.

Shepard, L. A. (2008). Formative assessment: Caveat emptor. In C. Dwyer (Ed.), The future of assessment: Shaping teaching and learning (pp. 279–303). Mahwah, NJ: Erlbaum.

Shepard, L. A., & Dougherty, K. C. (1991). Effects of high-stakes testing on instruction. Paper presented at the annual meeting of the American Educational Research Association and the National Council on Measurement in Education, Chicago, IL.

Shepard, L. A., Flexer, R. J., Hiebert, E. H., Marion, S. F., Mayfield, V., & Weston, T. J. (1995). Effects of introducing classroom performance assessments on student learning (CSE Technical Report No. 394). Boulder: Center for Research on Evaluation, Standards, and Student Testing and University of Colorado at Boulder.

Shulte, B. (2002, February 4). MSPAP grading shocked teachers. Washington Post.

Shyer, C. (2009). August 2009 Regents Examinations and Regents competency tests. Retrieved from http://www.emsc.nysed.gov/osa/08–09memo/jun-aug-09/724/563–809.pdf

Silva, E. (2008). Measuring the skills of the 21st century. Washington, DC: Education Sector.

Simon, H. A., & Chase, W. G. (1973). Skill in chess. American Scientist, 61, 394–403.

Singapore Examinations and Assessment Board. (2006). 2006 A-level examination. Singapore: Author.

Singapore Ministry of Education. (2007). Retrieved from http://www.moe.gov.sg/corpora/mission_statement.htm

Smith, C. L., Wiser, M., Anderson, C. W., & Krajcik, J. (2006). Implications of research on children’s learning for standards and assessment: A proposed learning progression for matter and the atomic-molecular theory. Measurement, 14(1&2), 1–98.

Snyder, T. D., & Dillow, S. A. (2010). Digest of education statistics 2009 (NCES 2010–013). Washington, DC: National Center for Education Statistics, Institute of Education Sciences, US Department of Education.

Solano-Flores, G. (2008). Who is given tests in what language by whom, when, and where? The need for probabilistic views of language in the testing of English language learners. Educational Researcher, 37(4), 189–199.

Solano-Flores, G., Jovanovic, J., Shavelson, R. J., & Bachman, M. (2001). On the development and evaluation of a shell for generating science performance assessments. International Journal of Science Education, 21(3), 293–315.

Solano-Flores, G., & Li, M. (2006). The use of generalizability (G) theory in the testing of linguistic minorities. Educational Measurement: Issues and Practice, 25(1), 13–22.

Solano-Flores, G., & Trumbull, E. (2003). Examining language in context: The need for new research and practice paradigms in the testing of English-language learners. Educational Researcher, 32(2), 3–13.

Spector, J. M. (2006). A methodology for assessing learning in complex and ill-structured task domains. Innovations in Education and Technology International, 43(2), 109–120.

Stage, E. K. (2005, Winter). Why do we need these assessments? Natural Selection: Journal of the BSCS, 11–13.

Stanford Center for Assessment, Learning and Equity. (2009). How things work. A physics performance task. Stanford, CA: SCALE.

Stecher, B. (1995). The cost of performance assessment in science: The RAND perspective. Paper presented at the 2006 National Council on Measurement in Education, San Francisco, CA.

Stecher, B. (2002). Consequences of large-scale, high-stakes testing on school and classroom practices. In L. S. Hamilton, B. M. Stecher, & S. P. Klein (Eds.), Making sense of test-based accountability (MR-1554-EDU). Santa Monica, CA: RAND.

Stecher, B., Barron, S., Chun, T., & Ross, K. (2000, August). The effects of the Washington State education reform in schools and classrooms (CSE Technical Report No. 525). Los Angeles: University of California, National Center for Research on Evaluation, Standards and Student Testing.

Stecher, B. M., Barron, S., Kaganoff, T., & Goodwin, J. (1998). The effects of standards-based assessment on classroom practices: Results of the 1996–97 RAND survey of Kentucky teachers of mathematics and writing (CSE Technical Report 482). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Stecher, B. M., & Klein, S. P. (1997, Spring) The cost of science performance assessments in large-scale testing programs. Educational Evaluation and Policy Analysis, 19(1), 1–14.

Stecher, B. M., & Mitchell, K. J. (1995). Portfolio driven reform: Vermont teachers’ understanding of mathematical problem solving (CSE Technical Report No. 400). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Stein, M. K., & Lane, S. (1996). Instructional tasks and the development of student capacity to think and reason: An analysis of the relationship between teaching and learning in a reform mathematics project. Educational Research and Evaluation, 2(1), 50–80.

Stevenson, Z., Averett, C., & Vickers, D. (1990, April). The reliability of using a focused-holistic scoring approach to measure student performance on a geometry proof. Paper presented at the meeting of the American Educational Research Association, Boston, MA.

Stone, C. A., & Lane, S. (2003). Consequences of a state accountability program: Examining relationships between school performance gains and teacher, student, and school variables. Applied Measurement in Education, 16(1), 1–26.

Swedish Institute. (1984, March). Primary and secondary education in Sweden. Fact Sheets on Sweden. Stockholm, Sweden.

Swedish National Agency for Education. (2005). The Swedish school system: Compulsory school. Retrieved from http://www.skolverket.se/sb/d/354/a/959

Tate, R. L. (1999). A cautionary note on IRT-based linking of tests with polytomous items. Journal of Educational Measurement, 36, 336–346.

Tate, R. L. (2000). Performance of a proposed method for the linking of mixed format tests with constructed-response and multiple-choice items. Journal of Educational Measurement, 36, 336–346.

Tate, R. L. (2003). Equating for long-term scale maintenance of mixed format tests containing multiple choice and constructed response items. Educational and Psychological Measurement, 63(6), 893–914.

Taylor, C. S. (1998). An investigation of scoring methods for mathematics performance-based assessments. Educational Assessment, 5(3), 195–224.

Texas. (2005). Education Code, section 51.968. Retrieved from http://www.legis.state.tx.us/tlodocs/79R/billtext/html/HB00130I.htm

Thomas, W., Storms, B., Sheingold, K., Heller, J., Paulukonis, S., Nunez, A., & Wing, J. (1995). California Learning Assessment System portfolio assessment research and development project: Final report. Princeton, NJ: Center for Performance Assessment, Educational Testing Service.

Topol, B., Olson, J., & Roeber, E. (2010). The cost of new higher quality assessments: A comprehensive analysis of the potential costs for future state assessments. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.

Topol, B., Olson, J., Roeber, E., & Hennon, P. (2013). Getting to higher-quality assessments: Evaluating costs, benefits, and investment strategies. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.

Tucker, B. (2009). Beyond the bubble: Technology and the future of student assessment. Washington, DC: Education Sector.

Tung, R., & Stazesky, P. (2010). Including performance assessments in accountability systems: A review of scale up efforts. Boston, MA: Center for Collaborative Education.

US Congress, Office of Technology Assessment. (1992). Testing in American schools: Asking the right questions (Report No. OTA-SET-519; pp. 216, 243, 210). Washington, DC: US Government Printing Office.

US Department of Education. (1995). Section 2: Reform through Linking Title I to Challenging Academic Standards. In Mapping out the national assessment of Title I: The interim report. Retrieved from http://www.ed.gov/pubs/NatAssess/sec2.html

US Department of Education. (2005). The nation’s report card. Washington, DC: Author. Retrieved from http://nationsreportcard.gov/science_2005/s0116.asp

US Department of Education. (n.d.). Windows into the classroom: NAEP’s 1992 writing portfolio study. Washington DC: Author.

US General Accounting Office. (1993). Student extent and expenditures, with cost estimates for a national examination (Report No. acc/PEMD-93–8). Washington, DC: Author.

US General Accounting Office. (2003). Title I: Characteristics of tests will influence expenses; information sharing may help states realize efficiencies. Washington, DC: Author.

US Government Accountability Office. (2009). No Child Left Behind Act: Enhancements in the Department of Education’s review process could improve state academic assessments (Report No. GAO-09–911). Washington, DC: Author.

University of the State of New York State Education Department. (2009a). Information booklet for scoring Regents Examinations in Global History and Geography and United States History and Government. Retrieved from http://www.emsc.nysed.gov/osa/08–09memo/jun-aug-09/730/541hg-809.pdf

University of the State of New York State Education Department. (2009b). Information booklet for scoring the Regents Comprehensive Examination in English. Retrieved from http://www.emsc.nysed.gov/osa/08–09memo/jun-aug-09/730/541e-809.pdf

University of the State of New York State Education Department. (2009c). Regents Examination in Global History and Geography—August 2009: Chart for converting total test raw scores to final examination scores (scale scores). Retrieved from http://www.emsc.nysed.gov/osa/concht/aug09/globalcc-809.pdf

University of the State of New York State Education Department. (2009d). Regents Examination in Physical Setting/Earth Science—August 2009: Chart for converting total test raw scores to final examination scores (scale scores). Retrieved from http://www.emsc.nysed.gov/osa/concht/aug09/earthvcc-809.pdf

University of the State of New York State Education Department. (2009e). Regents Examination in United States History and Government—August 2009: Chart for converting total test raw scores to final examination scores (scale scores). Retrieved from http://www.emsc.nysed.gov/osa/concht/aug09/ushgcc-809.pdf

Vacc, N. N. (1989). Writing evaluation: Examining four teachers’ holistic and analytic scores. Elementary School Journal, 90, 87–95.

Valverde, G. A., & Schmidt, W. H. (2000). Greater expectations: Learning from other nations in the quest for “world-class standards” in US school mathematics and science. Journal of Curriculum Studies, 32(5), 651–687.

Vendlinski, T. P., Baker, E. L., & Niemi, D. (2008). Templates and objects in authoring problem-solving assessments. (CRESST Technical Report No. 735). Los Angeles: University of California, National Center Research on Evaluation, Standards, and Student Testing.

Vermont Department of Education. (n.d.-a). Core principles of high-quality local assessment systems. Retrieved from http://education.vermont.gov/new/pdfdoc/pgm_curriculum/local_assessment/core_principles_08.pdf

Vermont Department of Education. (n.d.-b). Vermont item bank assessment: Suggestions and guidelines for use. Retrieved from http://education.vermont.gov/new/pdfdoc/pgm_curriculum/educ_item_bank_use_guidelines.pdf

Vermont State Board of Education. (2006). Manual of rules and practices: School quality standards. 2000 C.F.R. sec. 2120. Retrieved from http://education.vermont.gov/documents/educ_sbe_rules_manual_of_rules_ALL.pdf

Wang, J., Niemi, D., & Wang, H. (2007a). Impact of different performance assessment cut scores on student promotion (CSE Report No. 719). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Wang, J., Niemi, D., & Wang, H. (2007b). Predictive validity of an English language arts performance assessment (CSE Report No. 729). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Washington State Institute for Public Policy. (2006). Tenth-grade WASL strands: Student performance varies considerably over time. Olympia: Author.

Webb, N. L. (2002). Depth-of-knowledge levels for four content areas. Retrieved from http://facstaff.wcer.wisc.edu/normw/All%20content%20areas%20%20DOK%20levels%2032802.doc

Webb, N. M., Schlackman, J., & Sugrue, B. (2000). The dependability and interchangeability of assessment methods in science. Applied Measurement in Education, 13(3), 277–301.

Wei, R. C., Darling-Hammond, L., & Adamson, F. (2010). Professional development in the United States: Trends and challenges. Dallas, TX: National Staff Development Council and Stanford: Stanford Center for Opportunity Policy in Education.

Wei, R. C., Darling-Hammond, L., Andree, A., Richardson, N., & Orphanos, S. (2009). Professional learning in the learning profession: A status report on teacher development in the United States and abroad. Dallas, TX: National Staff Development Council and Stanford, CA: Stanford Center for Opportunity Policy in Education.

Wei, R. C., Schultz, S. E., & Pecheone, R. (2012). Performance assessments for learning: The next generation of state assessments. Stanford, CA: Stanford Center for Assessment, Learning, and Equity.

Welch, C. J., & Harris, D. J. (1994). A technical comparison of analytic and holistic scoring methods. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.

Welsh Assembly Government. (2008a). Primary (3–11). Retrieved September 12, 2008, from http://old.accac.org.uk/eng/content.php?cID=5

Welsh Assembly Government. (2008b). Secondary (11–16). Retrieved September 12, 2008, from http://old.accac.org.uk/eng/content.php?cID=6

Wentworth, N., Erickson, L. D., Lawrence, B., Popham, J. A., & Korth, B. (2009). A paradigm shift toward evidence-based clinical practice: Developing a performance assessment. Studies in Educational Evaluation, 35(1), 16–20.

White, K. (1999). Kentucky: To a different drum. Quality counts ’99 policy update. Education Week. Retrieved from http://rc-archive.edweek.org/sreports/qc99/states/policy/ky-up.htm

Wiggins, G. (1998). Educative assessment: Designing assessments to inform and improve. San Francisco, CA: Jossey-Bass.

Williamson, D. M., Bejar, I. I., & Mislevy, R. J. (2006). Automated scoring of complex tasks in computer-based testing: An introduction. In D. M. Williamson, I. I. Bejar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 1–14). Mahwah, NJ: Erlbaum.

Wilson, M. (1989). Saltus: A psychometric model of discontinuity in cognitive development. Psychological Bulletin, 105, 276–289.

Wilson, M. (Ed.). (2004). Towards coherence between classroom assessment and accountability. Chicago: University of Chicago Press.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Erlbaum.

Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 13, 181–208.

Winerip, M. (2012, April 22). Facing a robo-grader? Just keep obfuscating mellifluously. New York Times. Retrieved from http://www.nytimes.com/2012/04/23/education/robo-readers-used-to-grade-test-essays.html?pagewanted=all&_r=0

Wolf, S., Borko, H., McIver, M., & Elliott, R. (1999). “No excuses”: School reform efforts in exemplary schools of Kentucky (CS Technical Report No. 514). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Wolfe, E. (1997). The relationship between essay reading style and scoring proficiency in a psychometric scoring system. Assessing Writing, 4(1), 83–106.

Wood, G. H., Darling-Hammond, L., Neill, M., & Roschewski, P. (2007). Refocusing accountability: Using local performance assessments to enhance teaching and learning for higher order skills. Briefing paper prepared for members of the Congress of the United States. Athens, OH: Forum for Education and Democracy.

Wylie, C., & Lyon, C. (2009, August 3). What schools and districts needs to know to support teachers’ use of formative assessment. Teachers College Record. Retrieved from http://www.tcrecord.org/content.asp?contentid=15734

Yang, Y., Buchendahl, C. W., Juszkiewicz, P. J., & Bhola, D. S. (2002). A review of strategies for validating computer-automated scoring. Applied Measurement in Education, 15(4), 391–412.

Yoon, K. S., Duncan, T., Lee, S., Scarloss, B., & Shapley, K. (2007). Reviewing the evidence on how teacher professional development affects student achievement (Issues & Answers Report, REL 2007–No. 033). Retrieved from http://ies.ed.gov/ncee/edlabs/regions/southwest/pdf/REL_2007033.pdf

Yuan, K., & Le, V. (2012). Estimating the percentage of students who were tested on cognitively demanding items through the state achievement tests. Santa Monica, CA: RAND Corporation.

Zelinsky, A. L., & Sireci S. G. (2002). Technological innovations in large-scale assessments. Applied Measurement in Education, 15(4), 337–362.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.187.199