Friday, May 17, 2013

使用 LIBSVM 的 One-Class SVM 來尋找離群值

由於目前研究需要辨識資料樣本的離群值,而最近正好在學習 SVM,就想用 LIBSVM 的 distribution estimation (one-class SVM) 功能來試試看。以下是測試 one-class 的說明與 Matlab 程式碼。

一開始先試做出一個資料集。這個資料集裡有兩種類別,分別是是 x 和 y。這兩種資料均是高斯分佈的隨機變數,各有 200 筆資料。這兩類別的前 100 筆資料都是平均值為 +2,變異數為 1 的高斯亂數。而後 100 筆資料為平均值是 -2,變異數為 1 的高斯亂數。
data_x = [ randn( 100, 1 ) + 2; randn( 100, 1 ) - 2 ] ;
data_y = [ randn( 100, 1 ) + 2; randn( 100, 1 ) - 2 ] ;
 將這些資料放在一起,並作完均等化(normailized)後就可以得到要訓練模型與測試的事例(instance)。
instance = [ data_x, data_y ] ;
[ norm_instance, min_val, max_val ] = normalize( instance ) ;
而這些事例對應到的標示(label)均為 1。
label = ones( 200, 1 ) ;
之後就可以用 LIBSVM 來建立分類模型。
svm_cmd = '-s 2 -t 2 -n 0.1 -g 12' ;
model = libsvmtrain64( label, norm_instance, svm_cmd ) ;
[ predicted, accuracy ] = libsvmpredict64( label, norm_instance, model ) ;
在這裡的參數 "-s 2" 表示要用 one-class SVM,"-t 2" 表示要用 RBF Kernal,而 "-n 0.1" 及 "-g 12" 則是建立模型所需的參數,可以依據實際需求來調整。譬如說,如果期望離群值的數量不超過全部資料數的 10%,就要找一組 ng 使得分類正確率超過 90%。尋找的方法可以用暴力法,全部都試過一次看看。在此先以 n=0.1 及 g=12 來試試看。實驗的結果如下:
optimization finished, #iter = 92
obj = 38.822651, rho = 4.093761
nSV = 25, nBSV = 16
Accuracy = 90% (180/200) (classification)
下圖的綠點是 SVM 判斷為正常的點 ,紅點為判斷異常的點,藍色的區域則是會判斷為異常的區域。


 就這樣囉~(LIBSVM 真的好用!)

Monday, December 24, 2012

簡報內容

一直以來都覺得話術是很不好的東西。似乎只要有好的話術,就可以把黑的變成白的,把白的轉成黑的,讓簡單事情變的政治複雜化來諂媚當權者,到最後喪失自我變的人不像人鬼不像鬼的(講到這裡就想到時代劇裡的壞太監)。但最近經歷一些跟別人作簡報的情況,我開始覺得適當的話術是好的。在不違背工程師歸本事實,數據說話的原則下,好的簡報話術可以增加工作的順暢度,或是可以找到更多的資源來幫助你面對要解決的問題。以下整理列出簡報對象及對應該有的簡報內容。

  • 對同事:
    • 目的:讓大家瞭解你在解決什麼問題,可以幫你看看方法有沒有錯誤或疏漏的,同時也說明這些技術未來還可以用在哪些地方。
    • 簡報內容該有:動機、問題定義、方法原理、施作步驟、方法效能、未來規劃。
  • 對小長官:
    • 目的:讓他們知道你在作什麼,大方向有沒有錯誤,運氣好的話他們可以提供一些可應用在商業用途的想法。
    • 簡報的內容該有:動機、問題定義、方法效能、情境展示、未來規劃。
  • 對大長官:
    • 目的:讓他們開心,然後不會出太多意見干擾你。
    • 簡報的內容該有:動機、情境展示、商業效益。

Friday, December 21, 2012

規劃 20121221

過去會的東西的越來越不夠用,需要好好來規畫下一個十年要存哪些老本來吃

知識技能:


線性代數

教材:周志成的線性代數
時間:週一到週五 21:00~10:00 聽課,11:00~12:00 思考練習。
預計花費 3 個月(9 週以上,12 週以下)完成
預期效果:基本功增強。

機器學習

教材:影像講義
時間:週六早上聽課及思考練習
預計花費 3 個月(10 週以上,12週以下)
預期效果:萬用瑞士刀

消息理論

教材:影像講義
時間:週六下午聽課,晚上思考練習
預計花費 5 個月(16 週以上,20 週以下)
預期效果:增強基本功

實作展示:


JavaScript & HTML5

教材: JavaScript 程式設計與應用
時間:週一至週五 18:00~19:00
預計花費 3 個月
預期效果:成果展示不求人

R

教材:R for Beginners
時間:與日遽增(什麼東西?)
預期效果:未來 Matlab 的替代方案

Saturday, November 17, 2012

辦公室抓老鼠

是的!我們公司辦公室有老鼠,而且還在白天的時候亂竄被人發現。
大家在發現老鼠後一陣慌亂,最後也不知道老鼠跑到哪裡了。後來同事就在我桌上就放了一個補鼠籠,裡面有同事早餐吃剩的培根肉。我寫了一支程式監看補鼠籠的狀態,一旦有異狀就會發郵件通知我。希望能早日將這隻老鼠逮捕歸案!
桌上的補鼠籠
這支程式是會比對每 0.1 秒間畫面不同的程度,一旦不同的程度超過設定的臨界值就會判斷有異而把畫面照起來,然後將照片寄到信箱通知我。不過一旦突然關燈或開燈都會被當成異常,這個部分可以參考現場的亮度計作改進(謝謝 Y 君的建議)。
關燈的那一瞬間會判斷為異常,一片黑壓壓什麼都看不到
晚上下班關燈後 Webcam 真的什麼都照不到了。真該入手的紅外線攝影機,不過夜視攝影機都好貴...

Wednesday, November 14, 2012

InSxgy 的存活率

最近聽說 InSxgy 準備要發出去一萬套了,看來主事者認為這還是有前景的。我眼光和歷練都不夠,說不出這到底有沒有搞頭,只是覺得透過這套系統蒐集的資料還有一些可以玩的地方沒被開發。但要能夠好好玩下去需要一些嚴格的事前準備,不然使用者恐怕不會一直乖乖的當白老鼠。從之前發出去的 500 套來看,能撐過半年就是進步了。

Tuesday, March 29, 2011

台北馬偕醫院的詩歌

上個禮拜掛號去台北馬偕看檢查結果,剛好醫院裡有聖歌隊在唱詩歌。已經很久沒有聽到現場的台語詩歌了,在台北再次聽到,不同的旋律,但同樣帶著安慰,那值得讚美的神依然偉大。

Thursday, July 08, 2010

Matlab, Python 到 Jython

Matlab,這個軟體我用了快十五年了,用起來相當順手。無奈他是個營利公司作的付費軟體,在作 Demo 展示或與別的語言介接時,常常要安裝一些額外的程式才能運作。Python 是我第一個考慮替換的方案。Python 有很相近的直譯式特性,語法也算簡單。打著「想完了方法,就寫完程式」的推薦詞,相當吸引懶惰的我。

Python 的套件很豐富;可以用 Numpy 作數學運算,用 Matplotlib 出美美的曲線圖,用 Py2exe 做成執行檔帶著跑,或是用 Portable Python 帶著到處寫程式或展示,十分方便。不過,小組在發展應用上,都是一貫使用 Java,我也只好放棄用 C 寫成的 Python,改用 Jython 了。

Jython 或許是發展時間比較短,所以沒有像 Python 那麼多額外的套件,不過他可以很容易跟原來的 Java 套件銜接,倒是解決了這樣的問題。畫圖的部分,我想就用 JFreeChart 來替代 Matplotlib,不過 Numpy 倒是沒什麼好的替代方案,可能要靠自己寫或 Java 的套件了。

Monday, July 05, 2010

安裝 Python 2.6.5 + Numpy 1.4.1 + Matplotlib 0.99.3 + Eclipse PyDev 1.5.9



環境建製請依下列步驟順序:


安裝 Python

Matplotlib 目前只支援 Python 2.6,所以要選擇下載 Python 2.6.5。

下載位置:http://www.python.org/download/releases/


安裝 Numpy

下載位置:http://sourceforge.net/projects/numpy/files/

安裝 Scipy

 下載位置:http://sourceforge.net/projects/scipy/files/

安裝 Matplotlib

下載位置:http://sourceforge.net/projects/matplotlib/files/matplotlib/matplotlib-0.99.3/ 

安裝 IPython

 下載位置:http://ipython.scipy.org/moin/Download

安裝 Eclipse 及 PyDev (編輯環境)

下載 Eclipse,解壓縮後擊點即可開啟。

在 Eclipse中的「Help」=>「Software Updates...」=>「Available Software」 =>「Add Site...」

加入:http://pydev.org/updates

下載 PyDev Plug-in,之後便依照只是完成 Plug-in 的安裝。

在「Windows」=>「Performances」=>「Pydev」=>「Interpreter-Pydev」 指定 Python 執行檔的位置。

完成!

Sunday, April 25, 2010

Top 5 Web Trends of 2009: Internet of Things

Written by Richard MacManus / September 11, 2009
This week ReadWriteWeb is running a series of posts analyzing the 5 biggest Web trends of 2009. So far we've explored these trends: Structured Data, The Real-Time Web, Personalization, Mobile Web / Augmented Reality. The fifth and final part of our series is about the Internet of Things, when real world objects (such as fridges, lights and toasters) get connected to the Internet. In 2009, this trend has ramped up and is adding a significant amount of new data to the Web.
In this post we'll see how companies as big as IBM and as small as Pachube are building up this new world of Internet data and services.



What is The Internet of Things?

The Internet of Things is a network of Internet-enabled objects, together with web services that interact with these objects. Underlying the Internet of Things are technologies such as RFID (radio frequency identification), sensors, and smartphones.
The Internet fridge is probably the most oft-quoted example of what the Internet of Things will enable. Imagine a refrigerator that monitors the food inside it and notifies you when you're low on milk. It also perhaps monitors all of the best food websites, gathering recipes for your dinners and adding the ingredients automatically to your shopping list. This fridge knows what kinds of foods you like to eat, based on the ratings you have given to your dinners. Indeed the fridge helps you take care of your health, because it knows which foods are good for you.
However, we're not quite at that level of sophistication yet in the Internet of Things. As we discovered in our Internet Fridges State of the Market in July, current internet fridges are more about entertainment than utility.

IBM and The Internet of Things

One of the leading big companies in Internet of Things is IBM, which offers a range of RFID and sensor technology solutions. IBM has been busy working with various manufacturers and goods suppliers in recent months, to introduce those solutions to the world.
For example IBM announced a deal at the end of June with Danish transportation company Container Centralen. By February 2010, Container Centralen undertakes to use IBM sensor technology "to allow participants in the horticultural supply chain to track the progress of shipments as they move from growers to wholesalers and retailers across 40 countries in Europe." Specifically this refers to transportation of things like flowers and pot plants, which are very sensitive to the environment they travel in. Having sensors as part of the entire travel chain will allow participants to monitor conditions and climate during travel. Essentially it makes the travel process very transparent.

Pachube: Building a Platform for Internet-Enabled Environments

IBM is a leading bigco active in the Internet of Things. At the other end of the spectrum is a small UK startup which has impressed us a lot this year: Pachube. It was one of 5 Internet of Things services that we profiled in February and we followed up with an in-depth look at the service in May. Pachube, (pronounced "PATCH-bay") lets you tag and share real time sensor data from objects, devices, buildings and environments both physical and virtual. In a blog pos by Tish Shute, Pachube founder Usman Haque explained that Pachube is about "environments" moreso than "sensors." In other words, Pachube aims to be responsive to and influence your environment - for example your home.

Conclusion

What's the point of all this new object data from the Internet of Things? As well as the new types of functionalities it will enable, such as health monitoring by Internet fridges, the sheer amount of new data about an object should lead to better quality goods and better decision-making by consumers. For example when you buy a loaf of bread from the grocery store, it will have its own RFID tag - which theoretically can tell you when it was produced, when it was packaged, how long it traveled to get to the store, whether the temperature during its travel was optimal, the pricing history of the product, what the precise ingredients are and associated health benefits (or dangers), and much more information.

That ends our look at the 5 biggest trends of the Web in 2009. First thing next week we will post a round-up, along with a downloadable presentation.
ReadWriteWeb's Top 5 Web Trends of 2009:
  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things

Wednesday, April 14, 2010

King's College


世界好大, 我好渺小...
我的極限在哪裡呢?

Tuesday, March 23, 2010

怎麼都沒印象啦

媽的!
電機機械我學了兩次,怎麼連最基本的都不知道。那時是怎麼學的(搥牆)!

Monday, March 22, 2010

Latex 測試

在 Blogger 上用 Latex 寫數學式子。

http://watchmath.com/vlog/?p=438 (此網頁已經掛了,所以要補充一下怎麼作的)

  1. 在 [版面配置] 裡按 [新增小工具]
  2. 選擇 [基本] 中的 [HTML/JavaScript] 模組
  3. 將此網誌描述的程式碼貼入 [內容] 中就 (標題可以不用填) 


  • 完成!

  • 之後要寫就用 $ 符號把數學式框起來就可以了。


    說真的還蠻好用的,雖然在編輯預覽時候沒辦法正確顯示,要等到正式發佈才知道是否正確。

    測試: $x^n+y^n=z^n$


    Thursday, August 03, 2006

    2006/08/04 Today's Works

    [Works]
    1. Derive and verify the misclassification of probability for retransmission algorithm.
    2. Effective retransmission algorithm (derive effective cost function).
    [Note]

    These are difficult !!

    Wednesday, August 02, 2006

    2006/08/03 Today's Works

    [Works]

    1. Procedure for Registers. (not yet)
    2. Derive the effective redetection and retransmission algorithm. (50%)
    3. Simulation revision programs about retransmission algorithm. (AWGN channel ok!)

    Tuesday, July 18, 2006

    2006/07/19 Achievements and Future Works

    [Achievements]
    [Future Works]
    • Identify the validity of the simulation program (robust version).
    • The belance-load retransmission mechanism is poor and weak. Improve it ! (Considering the reliability of channel and observations individually could solve unbalence-load problem beautifully. )
    • Huge mission: derive the misclassification probability of DCSD approach theoretically.

    Sunday, July 02, 2006

    2006/07/03 Achievements and Future Works

    [Achievements]

    • Simulations for evaluating the performance of deterministic local decision rules.
    • Simulations of NEW retransmission mechanism.
    [Future Works]
    • Evaluations of NEW retransmission mechanism (draft for WANS06').
    • Evaluation of Combination and Adaptive retx mechanism in math.
    • Simulating the faulty sensor condition in adaptive retransmission mechanism (draft for IJAHUC)

    Sunday, June 25, 2006

    2006/06/26 Achievements and Future Works

    [Achievements]

    • I have Completed 80% of "Performances and Retransmission Times Comparisons of the Retransmission Mechanisms for Distributed Detection in Wireless Sensor Networks." (prepared to submit to WANS06')

    [Future Works]
    • Simulations and evaluations of retransmission mechanism (draft for WANS06').
    • Evaluation of Combination and Adaptive retx mechanism in math.
    • Simulating the faulty sensor condition in adaptive retransmission mechanism (draft for IJAHUC)

    Saturday, June 10, 2006

    2006/06/11 Achievements and Future Works

    [Achievements of This Week]
    1. I gave a talk in AHUC2006 (TaiChung). The topic is "Adaptive Retransmission for Distributed Detecion in Wireless Sensor Networks."
    2. Take a break in TaiChung ^^.

    [Future Works]
    1. Simulations: the sensors which have all decision rules for one code matrix are considered and the sensor asked to retransmission is chose randomly with equal probability.
    2. Draft of MSN06'.
    3. Reading Papaer "On the Reliability-Order-Based Decoding Algorithms for Binary Linear Block Codes."

    Wednesday, July 13, 2005

    Paper Review of PBPO

    Decentralized detection
    Tsitsiklis has shown that the computational complexity of determining the optimal strategy is exponential in the number of sensors.

    An algorithm for determining the decision thresholds in a distributed detection problem
    Tang et al. have addressed a number of issues, e.g., the effect of different starting points, variations in costs, number and quality of local detectors, etc.

    Optimization of detection networks. I. Tandem structures

    Optimization of detection networks. II. Tree structures
    Tang found the PBPO algorithm was most efficient but more sensitive to initial larg number of sensors. But the other three algorithms were found to be more robust to initial conditions.