GRANTS

Innovation and Incubator Grants from the University System of Georgia

Use of AI for critical thinking artifact assessment

Georgia Gwinnett College

2025

Grant Type:

HIPs

Project Lead:

Karen Perell-Gerson

QEP Director & Professor of Exercise Science

kperellg@ggc.edu

Project Overview:

The project pioneers an innovative approach to evaluating High-Impact Practices (HIPs) by leveraging advanced large language models (LLMs) to automate the grading of student critical thinking reflections. Building on research demonstrating LLMs’ capacity to achieve human-level accuracy in grading complex, multi-step explanations (Chen and Wan, 2024), this initiative addresses scalability challenges as the college expands its Experiential Learning and Critical Thinking (EXACT) Plan—a Quality Enhancement Plan initiative embedding experiential learning and critical thinking into 50+ courses by 2028.

The project will develop an AI algorithm trained to assess student reflections using validated rubrics for integrative learning and critical thinking, mirroring the interrater reliability of faculty committees. Unlike traditional automated grading tools, this system integrates two innovative strategies from recent LLM research:
1. Self-Consistency Voting: The AI grades each artifact five times and selects the most frequent outcome, mitigating randomness and improving accuracy. (Chen and Wan, 2024)
2. Prompt Engineering: Rubric items are augmented with detailed descriptions of each skill assessed to reduce ambiguity. This approach, shown to boost human-AI agreement to 70–80%, ensures the AI interprets student skills contextually rather than relying on keyword matching.

The project also builds on studies showing that LLMs excel at partial credit grading when rubrics emphasize reasoning (Chen & Wan, 2024). In practice, the focus will be on evaluating students’ self-reflections on critical thinking and problem-solving logic. To address variability in student responses, the AI will apply a multi-item rubric that dissects critical thinking into 1) effective communication, 2) evaluation & interpretation of information, 3) problem solving, and 4) alternative considerations, with each criterion defined by four levels.

Anticipated challenges include initial AI-human discrepancies, such as overly strict interpretations of rubric terms, which will be addressed through rubric refinement cycles. To build faculty trust, workshops co-led by the SST EXACT Faculty Fellow and ITEC developer will demonstrate the AI’s alignment with committee standards using side-by-side comparisons of human and AI gradings. Another challenge occurs within the realm of security of academic sensitive student data resulting from sharing this type of data with third party vendors. This requires a cross-functional collaborative committee between the Office of Instructional Technology, Legal, and the EXACT Plan.

Expected outcomes include full automation of artifact assessment by Spring 2026, eliminating current sampling limitations that restrict capacity. During Spring 2025, the faculty assessors will access 318 discrete artifacts distributed proportionally between two courses (ITEC 1001 and HIST 1112) over 2 days. The 318 artifacts represent approximately an 18% sampling. This will be the first time that we will ask faculty assessors to evaluate this number of artifacts, so it is not known if this statistically necessary sample is achievable by the number of faculty assessors. Thus, the development of an AI algorithm potentially may enable us to achieve the statistically significant sample size more efficiently. The project will also produce a resource toolkit featuring adaptable rubrics, entropy-based auditing protocols, and feedback templates, shareable across institutions. Additionally, it will contribute research data on LLM performance in non-STEM contexts, such as humanities reflections, informing broader HIPs assessment practices.

By synthesizing advancements in prompt engineering and self-consistency voting, this project positions GGC as a leader in scalable, equitable HIPs evaluation. The AI’s ability to generate actionable feedback offers a cost-effective model for institutions striving to balance rigor with resource constraints. Deliverables will be disseminated via General Space (Digital Archives) through the Kaufman Library services, fostering cross-institutional adoption of AI-enhanced assessment frameworks. This digital archive is open access and will be linked to the GGC EXACT Plan website for easy access to the public.

The outcomes of this work will be:
• A methodology for AI algorithm for evaluating critical thinking skillset writing reflection artifacts
• A resource toolkit featuring adaptable rubrics, entropy-based auditing protocols, and feedback templates, shareable across institutions
• Contribution to research data on LLM performance in non-STEM contexts, such as humanities reflections, informing broader HIPs assessment practices.

References:
1. Chen, Z., & Wan, T. (2024). Using Large Language Models to Assign Partial Credits to Students' Problem-Solving Process: Grade at Human Level Accuracy with Grading Confidence Index and Personalized Student-facing Feedback. arXiv preprint arXiv:2412.06910.

Project Description:

Georgia Gwinnett College’s Quality Enhancement Plan, the Experiential Learning and Critical Thinking (EXACT) Plan, is committed to expanding student access to two high impact practices, experiential learning and ePortfolio-based reflection, as means of developing critical thinking. The EXACT Plan’s critical thinking curriculum model brings together experiential learning and ePortfolios by focusing student reflection on the information they needed to interpret and understand, the problems they encountered, or the creativity they brought, to succeed in the course-embedded experiential learning activity. Through intensive training of faculty who are developing and bringing EXACT curriculum into their classrooms, the EXACT Plan seeks to embed experiential learning and reflection throughout the Core IMPACTS framework and center these high impact practices within GGC’s academic programs.

The EXACT Plan is developing an assessment model that assures a careful, systematic assessment of student learning outcomes in classrooms where that curriculum is delivered, occurring at the end of every semester. Critical for the successful implementation of the EXACT Plan’s assessment model is interrater reliability and the ability to scale appropriately as the EXACT Plan expands to over 50 courses by Year Five of the QEP. The plan is implementing the use of a fifteen-member faculty assessment committee to assess critical thinking reflection artifacts. This committee meets twice a year (May and December) to access these artifacts. During the first session, December 2024, interrater reliability was established.

The difficulty, however, is as the EXACT Plan expands, only a small portion of the critical thinking reflection artifacts will be able to be assessed by the assessment committee. To facilitate a sustainability assessment plan, GGC is seeking to develop an AI algorithm to aid in assessment. If the AI algorithm can work, the EXACT Plan assessment protocols and procedures can grow in capacity to accommodate the increase in EXACT-developed HIPs throughout GGC’s Core IMPACTS framework and academic programs and chart a pathway for the normalization of HIPs assessment at a collegewide scale. This AI algorithm will assist faculty in designing more, and more in-depth, EXACT Plan HIPs to integrate into their courses and programs and will assist the assessment team to maintain validation and scale of artifact assessment.

SUMMER 2025: During Summer 2025, an ITEC faculty will be responsible for developing an AI algorithm that can mimic the assessment done by the faculty assessment committee. It will utilize two rubrics, one for integrative learning and one for critical thinking skillsets, to provide an assessment of the artifacts at the baselines, developing, proficient, or capstone level. The integrative learning rubric has five dimensions, and the critical thinking rubric has 1-2 dimensions depending on which critical thinking skillset assessed. Interrater reliability between the Spring 2025 artifacts assessed by the faculty assessors and the AI algorithm will be calculated to determine the effectiveness of the AI algorithm for 1000 level courses (the only data existing from the initial semesters of the EXACT Plan implementation). A confidence index based on Normalized Shannon Entropy will flag low-confidence gradings (e.g., divergent self-consistency outcomes) for human review, allowing instructors to audit only 10–15% of submissions while catching 40% of potential errors (Chen and Wan, 2024).

FALL 2025: The SST EXACT Faculty Fellow will continue to revise the AI algorithm in collaboration with the ITEC faculty developer based on the interrater reliability calculations done during Summer 2025. At the end of Fall 2025, faculty assessors will meet and assess artifacts obtained during Fall 2025. The AI algorithm will be rerun to determine its effectiveness for 1000 level courses and 2000 level courses (data from Fall 2025). The EXACT faculty statistician and the SST EXACT Faculty Fellow will conduct analysis and share summative results and recommendations with the EXACT director and the EXACT Plan Advisory Committee.

SPRING 2026: The SST EXACT Faculty Fellow will continue to revise the AI algorithm in collaboration with the ITEC faculty developer based on the interrater reliability calculations done during Fall 2025 in which addressed increased complexity and scale. The EXACT faculty statistician and the SST EXACT Faculty Fellow will conduct analysis and share summative results and recommendations with the EXACT director and the EXACT Plan Advisory Committee for final determination of moving forth with the use of the AI algorithm. Abstracts and papers regarding this work will be located for public access through the Kaufman Library General Space (Digital Archives) which is accessible directly through the Kaufman Library website as well as through a link from the EXACT Plan website.

Project Outcomes:

The deliverables are:
• Determination of the interrater reliability between an AI algorithm and the faculty assessors
• A standard qualitative rubric and workflow for assessing ePortfolio reflection artifacts for critical thinking using an AI algorithm and faculty assessors
• Resource toolkit featuring adaptable rubrics, entropy-based auditing protocols, and feedback templates

During each faculty assessment committee meeting (May and December annually), the AI algorithm will be calibrated to address expansion and increased complexity of the critical thinking reflection artifacts as the EXACT Plan expands HIPs artifact assessments throughout GGC’s Core IMPACTS framework and academic programs. This calibration will be conducted by the SST EXACT Faculty Fellow, Faculty Statistician, and Assistant Provost for Academic Assessment & Accreditation throughout the lifespan of the QEP. The AI algorithm will provide a framework for the normalization of HIPs assessment at a collegewide scale even after the QEP is completed. Interrater reliability calibrations will be done periodically (once every year) to ensure the accuracy of the AI algorithm utilizing faculty assessors as a college-wide service committee and/or as new EXACT courses are added following the completion of the QEP.

GRANTS

Innovation and Incubator Grants from the University System of Georgia

Use of AI for critical thinking artifact assessment

Georgia Gwinnett College

2025

Complete College Georgia is a program of the University System of Georgia

DIVISIONS

ABOUT

INFORMATION FOR:

Search form

GRANTS

Innovation and Incubator Grants from the University System of Georgia

Use of AI for critical thinking artifact assessment

Georgia Gwinnett College

2025