Mutual Information or MI score is widely used as the statistical measure of collocation in linguistic studies. The number of bits of "shared information" between two words can be calculated by observed co-occurrence (O) and expected co-occurrence (E).
A MI score of 2.00 was found useful to produce a collocational network (see Magnusson and Vanharanta's Visualizing Sequences of Texts Using Collocational Networks) for a 3000-word corpus. But what if we want to use MI3 (for example)? How do we choose the starting value of MIk score, compared to MI score? Here is my solution.
MI score is calculated by formula
The observed co-occurrence (O) is the minimal co-occurrences that you are interesting. In the case of O=10, an MI3 score is 8.644 compared to MI=2.0.
A simple converter is shown below.
MI = log2(O/E)The MI score, then, is implemented as cut-off threshold for collocate selection. In practical applications, however, MI was found to have a tendency to assign inflated scores to low-frequency word pair with E << 1, especially for data from large corpora. Thus, even a single concurrence of two word types might result in a fairly high association score (see Evert's Extended manuscript of orpora and collocations). Multiplication with O is used to increase the influence of observed concurrence frequency compared to the expected, result in the formula log2(Ok/E) with k >= 1 (the well known MIk family).
A MI score of 2.00 was found useful to produce a collocational network (see Magnusson and Vanharanta's Visualizing Sequences of Texts Using Collocational Networks) for a 3000-word corpus. But what if we want to use MI3 (for example)? How do we choose the starting value of MIk score, compared to MI score? Here is my solution.
MI score is calculated by formula
MI = log2(O/E) = log2(O) - log2(E)while MIk score is done by
MIk = log2(Ok/E) = k * log2(O) - log2(E)Thus, a given MI score can be converted into MIk by
MIk = MI + (k-1) * log2(O)
The observed co-occurrence (O) is the minimal co-occurrences that you are interesting. In the case of O=10, an MI3 score is 8.644 compared to MI=2.0.
A simple converter is shown below.
MI score: | |
Observed: | |
Convert to: | MI |
Value: |
留言
張貼留言