跳到主要內容

Concise 0.3.0 Release: with 'Dead End' Collocational Network Feature


As suggested in my previous article Collocation and Interactive Collocational Network, collocational networks are networks consisting of words that co-occur in a statistically significant way in a text.  In Concise 0.2x, we introduced an interactive way to explore the co-occurencetial relationship.  Now, with Concise 0.3.x, a 'dead-end' collocational network is featured.

The 'dead-end' collocational network provides a whole picture of your fundamental 'core' word.  It keeps expanding the network until nothing left.  However, the 'core' word you're using is not exactly the 'core' of its network.

Take the upper collocational network for instance.  I was looking for the collocational network of 'farmer' (農民) among some of Council of Agriculture's (農委會) official documents.  The top five collocates (sorted by co-occurrence) are 'agriculture' (農業), 'counsel' (輔導), 'conduct' (辦理), 'promote' (推動), and 'develop' (發展).  These nodes suppose to be the central part (the 'core') of the network if documents are randomly selected.  But these official documents have very strong tendencies toward agricultural policy.  That is the reason why this dead-end network is mostly comprised of policy words.

Camilla Magnusson, in her Text Visualization for Competitive Intelligence, believes collocational network method is useful handling sequence text.  To test her theory, I put my documents into two collection by time.  The first collection ranges from 1996 to 2002; and the second collection ranges from 2003 to 2009.

Figure 1: Collocational Network of Collection 1


Figure 2: Collocational Network of Collection 2

Something interesting did show.  Figure 1 remains simple compared to the top figure, but figure 2 is much complicated.  Lots of things come together: birds flu, foot-and-mouth disease, mudslide . . . .   Of course, they are not directly related to the word 'farmer'.  But figure 2 did show the potential foci of the 2nd collection.

Magnusson may be right.  Collocational network, the text visualizational presentation did show the differences.  But does it work in the field of agriculture?  Maybe!  Maybe not!  I have not tested it.


If you are interested, the latest Concise can be found at SourceForge: https://sourceforge.net/projects/concise-text/files/

留言

熱門文章

差不多食譜:牡丹魚片 Fish Slices Moutan

往餐桌端上這一道「牡丹魚片」,需要解釋的大概只有「這真的是我做的!」它是道不折不扣的大菜,能把一塊平凡無奇的魚片,展開變成一朵朵牡丹花。做這道菜最需要的不是技巧,是耐心;當然還有一點美學的天份!

【跟著我的閱讀腳步】山居歲月:普羅旺斯的一年 A Year in Provence

就記得我看過Peter Mayle(彼得.梅爾)的作品,而且對他在第一章大談用塑膠湯匙吃高級魚子醬的說法印象深刻,但怎麼樣就是想不起來到底是哪一本書。好在有些現代科技的幫忙,找出了那本令我印象深刻的《 關於品味 》。只不過,在《 關於品味 》之前,Peter Mayle還有另一部更加出名的作品——《 山居歲月:普羅旺斯的一年 》( A Year in Provence )。 穿襪子這件事已成遙遠的記憶,手錶躺在抽屜裡也已很久了。我發覺,憑著庭院中樹影的位置,我可以大致估算出時間;至於今日何日,我就不大記得了。反正也不重要。我快要化為安份守己,無欲無求的院中蔬菜了;與現實世界的偶然接觸,僅限於在電話中與遠方辦公室裡的人交談。他們總是欣羨渴慕地問起天氣如何,答案則讓他們鬱鬱不樂。他們寬慰自己的方法是警告我會得皮膚癌,又說太陽曬多了頭腦會遲鈍。我並不與他們爭執;他們也許說的沒錯。只不過,變笨也好,增添皺紋也好,可能得癌症也罷,我從來沒像現在這麼快活過。 ---《 山居歲月 》, pp. 173-174

「抓烏龜」的麻將遊戲

今天要和大家分享一個打發時間的簡單遊戲——抓烏龜。這可是我老爸老媽特別從美國學回來的,是個名符其實的「海歸」遊戲,據說是在下雪時無聊打發時間用的。