跳到主要內容

Concise 0.3.0 Release: with 'Dead End' Collocational Network Feature


As suggested in my previous article Collocation and Interactive Collocational Network, collocational networks are networks consisting of words that co-occur in a statistically significant way in a text.  In Concise 0.2x, we introduced an interactive way to explore the co-occurencetial relationship.  Now, with Concise 0.3.x, a 'dead-end' collocational network is featured.

The 'dead-end' collocational network provides a whole picture of your fundamental 'core' word.  It keeps expanding the network until nothing left.  However, the 'core' word you're using is not exactly the 'core' of its network.

Take the upper collocational network for instance.  I was looking for the collocational network of 'farmer' (農民) among some of Council of Agriculture's (農委會) official documents.  The top five collocates (sorted by co-occurrence) are 'agriculture' (農業), 'counsel' (輔導), 'conduct' (辦理), 'promote' (推動), and 'develop' (發展).  These nodes suppose to be the central part (the 'core') of the network if documents are randomly selected.  But these official documents have very strong tendencies toward agricultural policy.  That is the reason why this dead-end network is mostly comprised of policy words.

Camilla Magnusson, in her Text Visualization for Competitive Intelligence, believes collocational network method is useful handling sequence text.  To test her theory, I put my documents into two collection by time.  The first collection ranges from 1996 to 2002; and the second collection ranges from 2003 to 2009.

Figure 1: Collocational Network of Collection 1


Figure 2: Collocational Network of Collection 2

Something interesting did show.  Figure 1 remains simple compared to the top figure, but figure 2 is much complicated.  Lots of things come together: birds flu, foot-and-mouth disease, mudslide . . . .   Of course, they are not directly related to the word 'farmer'.  But figure 2 did show the potential foci of the 2nd collection.

Magnusson may be right.  Collocational network, the text visualizational presentation did show the differences.  But does it work in the field of agriculture?  Maybe!  Maybe not!  I have not tested it.


If you are interested, the latest Concise can be found at SourceForge: https://sourceforge.net/projects/concise-text/files/

留言

熱門文章

差不多食譜:手工巧克力餅乾 Chocolate Cookies

又是手工餅乾,最近一連出了兩份餅乾食譜,這個「手工巧克力餅乾」已經是第三份了。會不會有更多呢?我可以告訴大家,這是肯定的。 要怪就怪這個陰鬱的冬季雨天,哪裡都不方便去,也懶得出去。餅乾櫃空在那邊已經很久了,雖然有時候會嘴饞,但也沒有迫切去補貨的必要。反正經常開伙,平常該有的材料都會有,自己弄個成分完全透明的零食,也是個不錯的選擇。再說,用烤箱進行烘焙時,房間會變得比較乾燥,也比較溫暖。在夏天是個折磨,但到了冬天,這種感覺還滿不錯的。 話不多說,開始進行這一道「手工巧克力餅乾」的準備工作。

差不多食譜:白糖粿 Beh Teung Guai 傳統小吃版的台式吉拿棒 Taiwanese Churros

只要有個油炸鍋,將糯米糰炸到表面金黃,裹上白糖,居家版「白糖粿」意外的簡單。 說到這「白糖粿」,就算在台灣土生土長,還是有很多人沒聽過這個點心。要不是它在網路上掀起熱門討論,恐怕到現在也只有老饕知道去哪裡解饞。但現在「差不多食譜」把它搬到回家,讓你在家裡也能自己做來吃。 至於怎麼跟外國朋友介紹,其實困擾了我一陣子。腦子裡根本沒有對應的東西,它很像年糕、麻糬、湯圓,實際上材料也一樣,但做法上的差異卻讓白糖粿又不同於上述那些食物。最後,看到西方的吉拿棒(churro),在做法和吃法上都很類似白糖粿,兩者都是弄成長條油炸,然後裹上糖粉食用。這樣,姑且就把它稱做台式的吉拿棒好了,英文除了音譯的Beh Teung Guai以外,就直翻成 Taiwanese Churros。不同於台北東區賣吉拿棒的 Street Curros,這可是道道地地 Taiwan Street Curros,而且好像只有南部限定喔!說太多了,直接看做法。

Excel運用VBA抓取Yahoo Finance APIs股票資料

Yahoo Finance APIs提供了多樣的應用程式接口,讓使用者能夠獲取Yahoo Finance的資料。這篇文章要介紹的,是多數人會用到的股票資料。實作的例子來自於 http://www.gummy-stuff.org/Yahoo-data.htm ,我只是將內容稍微解釋,並且換成台灣股票的例子。