Behavioral Validity of Confidence-Based Knowledge State Reporting in Multiple Choice Examinations
DOI:
https://doi.org/10.31181/sa41202672Keywords:
Multiple-choice examination, Knowledge state reporting, Metacognition, Incentive compatibility, Proper scoring rules, Calibration, Guessing behavior, PsychometricsAbstract
Multiple choice examinations remain among the most widely deployed instruments of academic and professional assessment, yet their fundamental architecture invites a systematic distortion: examinees who lack genuine knowledge may still obtain credit through random selection among alternatives. Classical scoring mechanisms offer no incentive for examinees to reveal the quality of their knowledge, and consequently the observed score conflates genuine knowledge with fortunate guessing. This article introduces and empirically investigates a structured self-reporting framework in which each examinee, for every item, declares one of three knowledge states, namely Full Knowledge (FK), Partial Knowledge (PK), or No Knowledge (NK), alongside their chosen answer. The scoring mechanism is constructed such that truthful declaration of one's epistemic state constitutes the uniquely optimal strategy in expectation, rendering guessing strictly suboptimal. The central empirical question is whether examinees, when placed within this incentive-compatible framework, do in fact report their knowledge states truthfully or whether systematic behavioral deviations, rooted in overconfidence, risk aversion, or strategic misrepresentation, emerge. Drawing on psychometric theory, Bayesian probability modeling, and the cognitive psychology of metacognition, this article formalizes the theoretical relationships between declared knowledge states and observed response accuracy, derives testable hypotheses, proposes an experimental design, and specifies a complete statistical inference framework for behavioral validation. The contribution is threefold: a formal probabilistic model linking epistemic state declarations to correctness probabilities, a rigorous hypothesis testing architecture, and an experimentally grounded methodology for assessing metacognitive honesty under incentive-compatible conditions.
References
Lord, F. M. (1975). Formula scoring and number-right scoring. Journal of educational measurement, 7–11. https://doi.org/10.1111/j.1745-3984.1975.tb01003.x
Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive--developmental inquiry. American psychologist, 34(10), 906. https://psycnet.apa.org/doi/10.1037/0003-066X.34.10.906
Gardner-Medwin, A. R., & Gahan, M. (2003). Formative and summative confidence-based assessment. https://tmedwin.net/~ucgbarg/tea/caa03a.pdf
Bruno, J. E., & Dirkzwager, A. (1995). Determining the optimal number of alternatives to a multiple-choice test item: An information theoretic perspective. Educational and psychological measurement, 55(6), 959–966. https://doi.org/10.1177/0013164495055006004
Nelson, T. O. (1990). Metamemory: A theoretical framework and new findings. Psychology of learning and motivation (Vol. 26, pp. 125–173). Elsevier. https://doi.org/10.1016/S0079-7421(08)60053-5
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of personality and social psychology, 77(6), 1121. https://doi.org/10.1037//0022-3514.77.6.1121
Kahneman, D., & Tversky, A. (2013). Prospect theory: An analysis of decision under risk. In Handbook of the fundamentals of financial decision making: part i (pp. 99–127). World Scientific. https://doi.org/10.1142/9789814417358_0006
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the american statistical association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing. Learning, and instruction: National research council. national academy press. http://www.nap.edu/catalog/10019.html
Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied measurement in education, 6(2), 103–118. https://doi.org/10.1207/s15324818ame0602_1
Schraw, G., & Dennison, R. S. (1994). Assessing metacognitive awareness. Contemporary educational psychology, 19(4), 460–475. https://doi.org/10.1006/ceps.1994.1033
Downloads
Published
Data Availability Statement
Data not used.

All site content, except where otherwise noted, is licensed under the