跳到主要內容

Convert MI score to MI(k): Setting the cut-off threshold of collocation

Mutual Information or MI score is widely used as the statistical measure of collocation in linguistic studies.  The number of bits of "shared information" between two words can be calculated by observed co-occurrence (O) and expected co-occurrence (E).
MI = log2(O/E)
The MI score, then, is implemented as cut-off threshold for collocate selection.  In practical applications, however, MI was found to have a tendency to assign inflated scores to low-frequency word pair with E << 1,  especially for data from large corpora.  Thus, even a single concurrence of two word types might result in a fairly high association score (see Evert's Extended manuscript of orpora and collocations).  Multiplication with O is used to increase the influence of observed concurrence frequency compared to the expected, result in the formula log2(Ok/E) with k >= 1 (the well known MIk family).


A MI score of 2.00 was found useful to produce a collocational network (see Magnusson and Vanharanta's Visualizing Sequences of Texts Using Collocational Networks) for a 3000-word corpus.  But what if we want to use MI3 (for example)?  How do we choose the starting value of MIk score, compared to MI score?  Here is my solution.

MI score is calculated by formula
MI = log2(O/E) = log2(O) - log2(E)
while MIk score is done by
MIk = log2(Ok/E) = k * log2(O) - log2(E)
Thus, a given MI score can be converted into MIk by
MIk = MI + (k-1) * log2(O)

The observed co-occurrence (O) is the minimal co-occurrences that you are interesting.  In the case of O=10, an MI3 score is 8.644 compared to MI=2.0.

A simple converter is shown below.

MI score:
Observed:
Convert to: MI
Value:

留言

熱門文章

差不多食譜:搖元宵 Yuan Xiao

元宵節就要到囉!除了放天燈、猜燈謎之外,這天還要做什麼呢?當然就是吃元宵啦~

「抓烏龜」的麻將遊戲

今天要和大家分享一個打發時間的簡單遊戲——抓烏龜。這可是我老爸老媽特別從美國學回來的,是個名符其實的「海歸」遊戲,據說是在下雪時無聊打發時間用的。

差不多食譜:蘆筍舒芙蕾 Asparagus Soufflé

舒芙蕾是由具備各式風味的底醬 (crème anglaise,有時也會看到卡仕達、奶黃醬等翻譯) ,加上由蛋白打發的蛋白霜而成的。主要的味道與變化就在那個底醬,最基本的就是蛋黃和牛奶的混合物,也就是卡仕達 (custard) 。想要甜的,就用糖、果汁、或其他甜味劑讓它做成甜的;要有顏色,可以用藍莓 (紫色) 、巧克力 (咖啡色) 、草莓 (粉紅色) 等去做變化。這次差不多食譜要做的是一款鹹的、具有乳酪風味的綠色舒芙蕾,主要的材料就是夏天盛產的蘆筍。