Behavioral Validity of Confidence-Based Knowledge State Reporting in Multiple Choice Examinations

Authors

  • Janardan Behera Department of Statistics, Ravenshaw University, Cuttack, India. Author https://orcid.org/0000-0003-1277-7557
  • Prabhudarshan Sahoo Department of Psychology, Ravenshaw University, Cuttack, India. Author
  • Raja Kishor Prusty Department of Psychology, Ravenshaw University, Cuttack, India. Author

DOI:

https://doi.org/10.31181/sa41202672

Keywords:

Multiple-choice examination, Knowledge state reporting, Metacognition, Incentive compatibility, Proper scoring rules, Calibration, Guessing behavior, Psychometrics

Abstract

Multiple choice examinations remain among the most widely deployed instruments of academic and professional assessment, yet their fundamental architecture invites a systematic distortion: examinees who lack genuine knowledge may still obtain credit through random selection among alternatives. Classical scoring mechanisms offer no incentive for examinees to reveal the quality of their knowledge, and consequently the observed score conflates genuine knowledge with fortunate guessing. This article introduces and empirically investigates a structured self-reporting framework in which each examinee, for every item, declares one of three knowledge states, namely Full Knowledge (FK), Partial Knowledge (PK), or No Knowledge (NK), alongside their chosen answer. The scoring mechanism is constructed such that truthful declaration of one's epistemic state constitutes the uniquely optimal strategy in expectation, rendering guessing strictly suboptimal. The central empirical question is whether examinees, when placed within this incentive-compatible framework, do in fact report their knowledge states truthfully or whether systematic behavioral deviations, rooted in overconfidence, risk aversion, or strategic misrepresentation, emerge. Drawing on psychometric theory, Bayesian probability modeling, and the cognitive psychology of metacognition, this article formalizes the theoretical relationships between declared knowledge states and observed response accuracy, derives testable hypotheses, proposes an experimental design, and specifies a complete statistical inference framework for behavioral validation. The contribution is threefold: a formal probabilistic model linking epistemic state declarations to correctness probabilities, a rigorous hypothesis testing architecture, and an experimentally grounded methodology for assessing metacognitive honesty under incentive-compatible conditions.

References

Lord, F. M. (1975). Formula scoring and number-right scoring. Journal of educational measurement, 7–11. https://doi.org/10.1111/j.1745-3984.1975.tb01003.x

Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive--developmental inquiry. American psychologist, 34(10), 906. https://psycnet.apa.org/doi/10.1037/0003-066X.34.10.906

Gardner-Medwin, A. R., & Gahan, M. (2003). Formative and summative confidence-based assessment. https://tmedwin.net/~ucgbarg/tea/caa03a.pdf

Bruno, J. E., & Dirkzwager, A. (1995). Determining the optimal number of alternatives to a multiple-choice test item: An information theoretic perspective. Educational and psychological measurement, 55(6), 959–966. https://doi.org/10.1177/0013164495055006004

Nelson, T. O. (1990). Metamemory: A theoretical framework and new findings. Psychology of learning and motivation (Vol. 26, pp. 125–173). Elsevier. https://doi.org/10.1016/S0079-7421(08)60053-5

Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of personality and social psychology, 77(6), 1121. https://doi.org/10.1037//0022-3514.77.6.1121

Kahneman, D., & Tversky, A. (2013). Prospect theory: An analysis of decision under risk. In Handbook of the fundamentals of financial decision making: part i (pp. 99–127). World Scientific. https://doi.org/10.1142/9789814417358_0006

Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the american statistical association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437

Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing. Learning, and instruction: National research council. national academy press. http://www.nap.edu/catalog/10019.html

Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied measurement in education, 6(2), 103–118. https://doi.org/10.1207/s15324818ame0602_1

Schraw, G., & Dennison, R. S. (1994). Assessing metacognitive awareness. Contemporary educational psychology, 19(4), 460–475. https://doi.org/10.1006/ceps.1994.1033

Published

2026-03-24

Data Availability Statement

Data not used. 

How to Cite

Behera, J. ., Sahoo, P., & Prusty, R. K. (2026). Behavioral Validity of Confidence-Based Knowledge State Reporting in Multiple Choice Examinations. Systemic Analytics, 4(1), 68-80. https://doi.org/10.31181/sa41202672