I. Introduction
When a practice item is linked to multiple knowledge components, there are no widely accepted best practices on how to assign blame. In the context of mathematics education in China, the blaming problem comes into prominence as early as second grade. Despite its practical importance, this problem is under-studied in both the literature and the industry. Several solutions known to the authors are flawed as formative assessment tools, as we will discuss in the next section.
This paper proposes a new framework, Bayesian Diagnosis Tracing (BDT), to deal with the blaming problem in a logically consistent way without losing track of learning progress. The BDT model combines two well established streams in the learning analytics literature: the research on Bayesian Knowledge Tracing model and the study of procedural misconceptions. In a nutshell, the binary response in the BKT model is replaced with a vector of misconception diagnosis, allowing an intelligent tutor to mimic the reasoning of a human tutor.
II. Relevant Theories of Learning
A. The Bayesian Knowledge Tracing and the Cognitive Diagnosis Model
The BKT model is the most widely accepted analytical framework in modeling the learning process. There are two implicit assumptions. The first assumption is that each practice item maps to ONE knowledge component. The second assumption is that wrong responses are qualitatively the same conditional on the latent state[[1]]. In the context of multiple knowledge components, the first assumption is clearly violated. A common practice is to pretend that the learner practices N separate items that each map to one knowledge component. If the learner answers the original item correctly, they answer N artificial items correctly. Otherwise, the learner fails all N artificial items. This method has a fatal logic inconsistency. In the case of a correct response, then the components are treated as if they were connecting by an “AND” gate. In the case of a wrong response, then the components are treated as if they were connected by an “OR” gate. Both statements cannot be true for the item in question.
The Deterministic Input Noisy “And” Gate (DINA) model avoids such logical fallacy. However, DINA model, and likewise the family of cognitive diagnosis, cannot replace BKT model because it has no learning process. Tracing the progress of a learner is a must-have feature of the Intelligent Tutor System (ITS). In addition, the DINA model only uses the binary response, which means it throws away the information encoded in different responses.
To our best knowledge, there is not any analytical framework that can logically handle the problem of blaming multiple knowledge components in the context of dynamic mastery tracing. The ambition of this paper is to do just that.
B. Procedural Misconceptions
Inspired by the study of procedural misconceptions by VanLehn(1990), this paper to propose a semi-automatic wrong response generator based on a data informed expert system. Such data informed expert system is import because the distribution of wrong responses has a long tail. Without the help of the frequency table based on large amount of data, experts are likely to only recall a few common mistakes, which probably accounts for less than 20% of the cumulative distribution. However, without the help of pedagogical expert, one cannot make sense of the wrong answers.
III. Enabling Technological Advances
A. Variant, Solution Tree and Automatic Diagnoses of Procedure Misconceptions
This paper defines the solution tree as all possible paths of mathematical reasoning for solving a problem, including those of correct reasoning and those of incorrect reasoning[[2]]. A variant is defined as the class of problems that are identical in solution tree[[3]] but different in application settings[[4]]. In essence, the distribution of numerical responses of an instance can be generated from the distribution of algebraic responses of its variant when the parameter of the instance is plugged in. The application section offers an example of using solution tree to generate responses.
When given a wrong response of an instance of a variant, the analytical system can trace back the solution and accurately blame the components that contribute to the error[[5]]. Under some assumptions, the answers of the practice sequences can be split into K answer sequences, one for each knowledge components, and then the learner’s mastery on each knowledge components are estimated by independent HMM models.
B. Bayesian Diagnosis Tracing Model
Let be the state vector of k knowledge components at time t, . Let be the state transition matrix. When the response is diagnosed as a vector of diagnosis , .
The likelihood is given as
The state transition is
If there are K knowledge components, each with M states. The state transition matrix T is size . If there are L possible diagnosis combinations, the observation matrix E is size . In real applications, even the sizes of K,M,L are reasonable, the model still has too many parameters to estimate. Practically, the joint likelihood has to break into the product of the likelihood of each knowledge component. In the product form, each component can be estimated by an independent HMM.
The product form requires the following assumption:
A1: The learning process of each component is independent.
A2: The diagnoses of each knowledge components are exchangeable[[6]]
A3: The symptom of each knowledge components is not related to other components.
IV. Real World Applications
This paper illustrates the Bayesian Diagnosis Tracing model, and its contrast to the classical BKT model, with data collected during Oct.2018 and Dec.2018 in a Chinese learning app. About one thousand students finished three items in a row. After answering each item, the student knew if she got it right. In the case of a wrong answer, the correct answer was revealed on the same screen as the mark and the solution procedure was also revealed in the next screen.
Here is the translation of the three items.
Q1: The school has a pool, 16m in length and 5m in width. To setup a fence around the pool, the length of the fence is ____ m.
Q2: The width of a rectangle is 3cm. The length is twice the width. The circumference of the rectangle is ____cm
Q3: Boy: The length of the basketball court is 28m and the width is 15m.
Girl: I walk around the court twice.
Question: The girl has walked ____m in total.
These are instances of three variants, consisting of two key knowledge components: calculating circumference of a rectangle given length and width (K1), calculating the product given multiplicand and multiplier(K2). The difference between Q2 and Q3 shows that procedures are not sufficient to distinguish variants.
For each component, the misconceptions are listed in the following table
Knowledge Component | Right formula | Misconception | Symptom |
K1 | W*2+L*2
(W+L)*2 |
Wrong formula | W*L
L*4 W*4 (W+L)*4 |
Flawed execution | W+L
L+W*2[[7]] W+L*2 L*2[[8]] W*2 |
||
K2 | X*Y | No multiplication | Y
X |
Mistake for addition | X+Y |
The solution tree of Q1 is just P1. Q2 is K2->K1. Q3 is K1->K2. When multiple procedures are involved, the wrong responses are generated by the convolution of each procedure. For example, for Q2, K2 generate 4 intermediate answers. Taking each as an input to P1, it generates 8 answers (but only 4 category labels including right). In total, Q2 has 32 leaf nodes. If none of the leaf nodes produce the same answer, then each answer pins down a particular set of procedure misconception(s). However, when different leaf nodes produce the same answer, we rely on pedagogical expert to decide which set of misconceptions are more likely and choose it as the diagnosis.
There are two other types of special diagnosis, give-up and calculation error. Give-up are defined as answer 3/5/8[[9]] or a number from the question text, such as 16 in Q1 or 28 in Q3. Give-up does not necessarily mean the student cannot do it, but more likely not bother to. Nevertheless, if a wrong answer is identified as give-up, then all procedural diagnoses are give-up. Calculation error is defined as wrong answers that do not contain any procedural misconceptions, for example 33[[10]] for Q3. If a wrong answer is identified as a calculation error then all procedural diagnoses are right. A wrong answer is labelled as no-diagnosis if it is not diagnosed with procedural misconceptions, nor is marked as give-up, nor is marked as calculation error [[11]].
Table 1 shows the composition of the wrong responses
Table 1
Question | Procedural Misconceptions | Give-up | Calculation Error | No-diagnosis |
Q1 | 27% | 10% | 8% | 55% |
Q2 | 41% | 47% | 0% | 13% |
Q3 | 58% | 6% | 6% | 31% |
Among the identified procedural misconceptions, Table 2 shows the share of blame. Within the category of procedural misconceptions, the probability of both to blame is about 10%. It means the prevailing BKT method is logically correct for 10% of the wrong response. No matter how sophisticated the BKT model evolves, if the data are seriously compromised, the analysis is likely to be far from truth.
Table 2
K1 is to blame | K2 is to blame | Both | |
Q2 | 28% | 60% | 12% |
Q3 | 6% | 87% | 7% |
In addition to better data quality, the vector of diagnoses enables the HMM models to do a better job in forming learner type cluster. Table 3 is the observation matrix of the 3 state BKT model. It is almost impossible to interpret what each state means. Table 4 is the observation matrix of the 3 state BDT model, one can easily name the three clusters: who skip, who slip and who master.
Table 3
K1 | K2 | |
State 1 | 93% | 97% |
State 2 | 10% | 13% |
State 3 | 2% | 2% |
Table 4
State1 | State2 | State3 | ||
K1 | Right | 96.6% | 68.5% | 38.5% |
Wrong Formula | 0.6% | 4.3% | 6.8% | |
Flawed Execution | 0.2% | 3.4% | 7.4% | |
Give-up | 0.2% | 5.5% | 22.1% | |
No Diagnosis | 2.3% | 18.2% | 25.1% | |
K2 | Right | 93.9% | 43.2% | 33.9% |
No Multiplication | 4.4% | 36.7% | 12.3% | |
Mistake for Addition | 0.6% | 1.3% | 1.7% | |
Give-up | 0.0% | 0.9% | 39.5% | |
No Diagnosis | 1.1% | 17.9% | 12.7% |
Table 5 are the transition matrix, or the learning process. BDT model only allow for adjacent
Table 5-1
P1 | State 1 | State 2 | State 3 |
State 1 | 91.8% | 8.2% | – |
State 2 | 38.6% | 38.8% | 22.6% |
State 3 | – | 52.8% | 47.2% |
Table 5-2
P2 | State 1 | State 2 | State 3 |
State 1 | 58.9% | 41.1% | – |
State 2 | 16.4% | 60.4% | 23.2% |
State 3 | – | 67% | 33% |
V. Evidence of Potential Impacts
There are two potential productive application of the BDT model. For one thing, it makes the latent states much easier to interpret and communicate. This is essential for human tutors when interfacing with an ITS. They often complain that a probability statement of mastery is either mysterious or meaningless. For another thing, authors believe that different diagnosis shall lead to different pedagogical interventions.
VI. Summary
The Bayesian Diagnosis Tracing model solves the blaming problem in the context of the mastery tracing. The paper provides some evidence on its benefit, and some of its practical limitations. This paper only scratch the surface of the work can be done and needs to be done. For one thing, how much more effective the BDT driven intelligent tutor is compared to the current system remains to be see. While we have strong faith in its accuracy, it is expensive and risky to build solution-tree-diagnosis compatible assessment content inventory and state-based teaching and practicing material. Whether it is commercially viable remains in doubt. For another thing, the BDT framework needs to be extended to handle the multiple diagnoses. In principle, since the likelihood function can be explicitly written, the parameters can be estimated with MCMC algorithm. However, the computation complexity of real dataset and convergence quality of the resulted parameters needs to be examined.
References
- De La Torre, Jimmy. “The generalized DINA model framework.” Psychometrika 76.2 (2011): 179-199
- Piech, Chris, et al. “Deep knowledge tracing.”Advances in neural information processing systems. 2015.
- Corbett, Albert T., and John R. Anderson. “Knowledge tracing: Modeling the acquisition of procedural knowledge.” User modeling and user-adapted interaction 4.4 (1994): 253-278.
- VanLehn, K. (1990). Mind bugs: The origins of procedural misconceptions. MIT press.
[[1]] In the classical BKT framework, not all wrong responses are created equal. The wrong answer means incompetency for the non-mastery state while it means careless for the mastery student. This metaphor runs into trouble when there are three latent states.
[[2]] As demonstrated in the application section, incorrect reasoning branch may produce correct answer if the problem is poorly designed. Therefore, the branch of the solution tree is categorized by its reasoning process rather than the answer it produces.
[[3]] The complexity of calculation is ignored in this definition. Chinese pedagogical experts generally disagree with this simplification. However, from the perspective diagnosis, computational error can be handled separately from the mathematical modeling. In fact, if the error is attributed to calculation error, the solution blame none of the components.
[[4]] This definition is not very precise because a novel application context challenges a learner’s creativity. However, it is not clear where the line shall be drawn in the sand.
[[5]] If the parameter is not well crafted, multiple branches produce the same answer. This paper picks a branch that is favored by a human expert by experience.
[[6]] In practice, it means order the knowledge components on the solution tree is irrelevant. It is not always true.
[[7]] The origin of this error can be either (L+W)*2 but forgetting the parenthesis or L*2+W*2 but forgetting the last 2.
[[8]] It can be debated whether the student does not remember the circumference formula, or the student forget to complete the calculation. Here we chalk it up flawed execution
[[9]] Because a student uses a classical 4*3 number pad interface, students type in a number then submit without thinking. Since the student faces almost no punishment in answering an item wrong, the phenomena of “without thinking fastidiously” is obvious.
[[11]] For a few answers, the paper also treats typing error as calculation error. For example, 421 for Q1 is likely to be the result of student accidentally typed extra “1” after the correct answer 42. These cases are rare.