跳到主要內容

Concise 0.3.0 Release: with 'Dead End' Collocational Network Feature


As suggested in my previous article Collocation and Interactive Collocational Network, collocational networks are networks consisting of words that co-occur in a statistically significant way in a text.  In Concise 0.2x, we introduced an interactive way to explore the co-occurencetial relationship.  Now, with Concise 0.3.x, a 'dead-end' collocational network is featured.

The 'dead-end' collocational network provides a whole picture of your fundamental 'core' word.  It keeps expanding the network until nothing left.  However, the 'core' word you're using is not exactly the 'core' of its network.

Take the upper collocational network for instance.  I was looking for the collocational network of 'farmer' (農民) among some of Council of Agriculture's (農委會) official documents.  The top five collocates (sorted by co-occurrence) are 'agriculture' (農業), 'counsel' (輔導), 'conduct' (辦理), 'promote' (推動), and 'develop' (發展).  These nodes suppose to be the central part (the 'core') of the network if documents are randomly selected.  But these official documents have very strong tendencies toward agricultural policy.  That is the reason why this dead-end network is mostly comprised of policy words.

Camilla Magnusson, in her Text Visualization for Competitive Intelligence, believes collocational network method is useful handling sequence text.  To test her theory, I put my documents into two collection by time.  The first collection ranges from 1996 to 2002; and the second collection ranges from 2003 to 2009.

Figure 1: Collocational Network of Collection 1


Figure 2: Collocational Network of Collection 2

Something interesting did show.  Figure 1 remains simple compared to the top figure, but figure 2 is much complicated.  Lots of things come together: birds flu, foot-and-mouth disease, mudslide . . . .   Of course, they are not directly related to the word 'farmer'.  But figure 2 did show the potential foci of the 2nd collection.

Magnusson may be right.  Collocational network, the text visualizational presentation did show the differences.  But does it work in the field of agriculture?  Maybe!  Maybe not!  I have not tested it.


If you are interested, the latest Concise can be found at SourceForge: https://sourceforge.net/projects/concise-text/files/

留言

熱門文章

差不多食譜:搖元宵 Yuan Xiao

元宵節就要到囉!除了放天燈、猜燈謎之外,這天還要做什麼呢?當然就是吃元宵啦~

「抓烏龜」的麻將遊戲

今天要和大家分享一個打發時間的簡單遊戲——抓烏龜。這可是我老爸老媽特別從美國學回來的,是個名符其實的「海歸」遊戲,據說是在下雪時無聊打發時間用的。

差不多食譜:蘆筍舒芙蕾 Asparagus Soufflé

舒芙蕾是由具備各式風味的底醬 (crème anglaise,有時也會看到卡仕達、奶黃醬等翻譯) ,加上由蛋白打發的蛋白霜而成的。主要的味道與變化就在那個底醬,最基本的就是蛋黃和牛奶的混合物,也就是卡仕達 (custard) 。想要甜的,就用糖、果汁、或其他甜味劑讓它做成甜的;要有顏色,可以用藍莓 (紫色) 、巧克力 (咖啡色) 、草莓 (粉紅色) 等去做變化。這次差不多食譜要做的是一款鹹的、具有乳酪風味的綠色舒芙蕾,主要的材料就是夏天盛產的蘆筍。