▎臺大入選Amazon對話系統全球挑戰賽
NTU has been selected to participate in the Amazon Alexa Prize
►國立臺灣大學 林守德教授 / Prof. Shou-De Lin, National Taiwan University
►國立臺灣大學 陳縕儂副教授/Associate Prof. Yun-Nung Vivian Chen, National Taiwan University
Amazon Alexa Prize提供學術界實際建置真實使用者使用之智慧音箱技術,並針對產學皆感興趣之研究議題發展對話之新穎技術,為產學高度合作之機會。在第一屆任務導向之挑戰賽中,臺灣大學團隊入選為十所參與學校之一,也為唯一一所來自亞洲之大學。此挑戰賽透過為期一年之深入合作,發展包含聲音及視覺之使用者互動技術,在真實使用者回饋下精進前瞻之技術,突破現今智慧音箱的各項侷限,包含知識之運用、語言理解與生成,以及各項AI技術之整合。
Amazon Alex Prize provides academic universities opportunities of developing technologies for real products and focuses on enabling delightful and engaging conversations between humans and AI. Ten university teams, one from Asia, three from Europe and six from within the United States, have been selected to participate in the Alexa Prize TaskBot Challenge, the first conversational AI challenge to incorporate multimodal (voice and vision) customer experiences.
Success in the challenge will require the teams to address many difficult AI challenges, from knowledge representation and inference, and commonsense and causal reasoning, to language understanding and generation, requiring fusion of multiple AI techniques.
Reference :
▲系統設計流程圖
The planned system pipeline.
▲臺大資訊系陳縕儂老師帶領之參與團隊
NTU team with faculty advisor Yun-Nung (Vivian) Chen
▲Amazon Echo Show做菜任務示意圖
Illustration of cooking tasks in Amazon Echo Show
此研究歸屬科技部 AI 專案計畫執行成果,詳細資訊請參考附錄之計畫總表第 21項
For the name of the project which output this research, please refer to project serial no. 21 on the List of MOST AI projects on Appendix
▎自動化建構大規模的中文知識圖譜
Automatic Construction of a Large-Scale Chinese Knowledge Graph
► 中央研究院 馬偉雲副研究員 / Associate Research Fellow Wei-Yun Ma, Academia Sinica
對各種與語言相關的AI應用而言,AI需要有知識,才能夠進行推論,甚至能夠解釋,打造一個大規模的中文知識圖譜正是為了這樣的目的而設。
我們利用現有九萬詞彙所建立的廣義知網,將中文維基百科上的百萬詞彙,一一自動掛載到廣義知網中,擴大廣義知網的詞彙規模,打造一個同時包含常識和知識的百萬詞彙級別的中文知識圖譜。我們開發了獨特的知識表達模式:每一個詞彙的語義都可以分解成多個概念,而每一個概念又可分解成多個更基礎的概念,這樣的循環表達可以很容易地進行邏輯推論。
For language-related AI applications, AI requires knowledge to be able to make inferences and even provide an explanation. So we aim to build a large-scale Chinese knowledge base to fill the purpose. We take our existing Chinese knowledge base of 90,000 vocabularies – E-HowNet as a backbone and automatically mount millions of words on Chinese Wikipedia on it, leading to a Chinese knowledge graph of millions of words with commonsense and domain knowledge.
We also developed a unique knowledge expression: the meaning of each word can be represented by multiple concepts with relations, and the meaning of each concept can be further represented by more fundamental concepts with relations. The iterative expression is very powerful to make a logical inference.
Reference :
- Demo: https://reurl.cc/7rG781
- Download: https://reurl.cc/83E0m4
- Paper: https://reurl.cc/6DlxRV
▲包含常識和知識的百萬詞彙級別的中文知識圖譜
A Chinese knowledge graph of millions of words with commonsense and domain knowledge
此研究歸屬科技部 AI 專案計畫執行成果,詳細資訊請參考附錄之計畫總表第 19 項
For the name of the project which output this research, please refer to project serial no. 19 on the List of MOST AI projects on Appendix
▎非特定目的之匿名化技術
Robust Privatization with Nonspecific Tasks
► 國立臺灣大學 王奕翔副教授 / Associate Prof. I-Hsiang Wang, National Taiwan University
隨者大數據分析的廣泛運用,數據隱私收到了大量的關注,為了防止傳送數據時洩漏用戶隱私,匿名化技術被運用來保護傳送的數據。傳統上匿名化技術仰賴對於資料蒐集的目的了解,然而在實際運用上,用戶並不一定能得知資料蒐集的目的。在我們的研究中,我們發展了一個可以適用於非特定目的的匿名化技術,使得用戶不用仰賴資料蒐集方的幫助,也可以保護自身隱私。部分研究成果已經發表於國際研討會ITW2020.
Data privacy has received great attention recently due to emerging applications of big data analytics. To prevent privacy leakage during data collection, the privatization technique is applied before the data release. In the literature, most privatization relies on the knowledge of tasks in which the released data is utilized, however, it is not the case in practice.
The user may not know the target for the data collection in some applications. Novel privatization which is robust against the non-specific tasks is developed in our work. It provides users to protect their privacy locally, i.e., without the help of the curator which may be untrustworthy. Part of our results has been published in ITW2020.
Reference :
◀ 非特定目的之匿名化技術系統示意圖
Illustration of robust privatization with non-specific task
◀ 非特定目的匿名化技術與傳統技術比較示意圖
Performance comparison between robust privatization and specific task privatization
此研究歸屬科技部 AI 專案計畫執行成果,詳細資訊請參考附錄之計畫總表第 16 項
For the name of the project which output this research, please refer to project serial no. 16 on the List of MOST AI projects on Appendix
▎對抗式像素遮罩:物件辨識系統防禦技巧
Adversarial Pixel Masking: A Defense Technique for Object Detector
►國立清華大學 吳尚鴻教授 / Prof. Shan-Hung Wu, National Tsing Hua University
深度學習網路有一個重大的弱點,也就是它們會被精心打造的對抗式攻擊所突破。例如在人身上放一個經過計算產生的對抗式補丁(Adversarial Patch),物件辨識系統就無法將此人成功辨認出來,如此這般在安全性上的漏洞也使得深度學習網路在實務上的應用備受考驗。
我們提出的對抗式像素遮罩,能夠偵測圖片上對抗式攻擊可能存在的位置,並且透過將攻擊從圖片上移除的做法來提升物件辨識系統的防禦能力。
It is well known that deep neural networks are vulnerable to adversarial attacks.
A person who overlapped with a calculated adversarial patch can hide from object detectors, which causes realistic concern because DNNs have been more and more prevailing in the real world recently.
Our proposed Adversarial Pixel Masking is able to detect the possible location of adversarial attacks on a given image, and protect the corresponding object detector through removing the attack from input image.
▲ 圖一:架設於物件辨識系統前的遮望網路為對抗式防禦遮罩之核心技術。
The prepended Masking Net is the core idea of Adversarial Pixel Masking.
◀圖二:對抗式防禦遮罩之偵測與防禦機制。
The detection and protection mechanism of Adversarial Pixel Masking.
此研究歸屬科技部 AI 專案計畫執行成果,詳細資訊請參考附錄之計畫總表第 6 項
For the name of the project which output this research, please refer to project serial no. 6 on the List of MOST AI projects on Appendix
▎遠程監督下的資訊擷取去噪技術
Automatic Data Denoising for Distant Supervised Relation Extraction
►中央研究院 馬偉雲副研究員 / Associate Research Fellow Wei-Yun Ma, Academia Sinica
建立大規模的知識圖譜對資訊處理相當重要。所涉及的實體數極多,不可能手動建立,必須有自動建立的技術與機制。遠程監督是生成數據的一種自動化作法。但過程中會導致為數不少的偽訓練數據,特別是偽陰性資料(False Negative Sample)。為了克服這個問題,我們提出了H-FND,這是一種階層式的去噪框架,用以確定每一筆自動產生的訓練樣本是否應保留、丟棄或修改。我們在SemEval-2010上進行了實驗,並設定多種不同的偽陰性資料比率。結果表明,即使在高達50%的偽陰性資料之下,也有能力進行大部分的修復工作,保持穩定的F1分數。本工作發表在ACL 2021.
Construction of a large-scale knowledge base for information processing is crucial. The number of entities involved is extremely large. Accordingly, it is impossible to build it manually, so it must be built automatically. Distant supervision is an approach to automatically generate training data. But distant supervision is vulnerable to generate false samples, especially false-negative (FN) samples,
To overcome this problem, we generated H-FND, a hierarchical false-negative denoising framework. It can determine whether non-relation (NA) sentences should be kept, discarded, or revised during training. We conducted experiments on SemEval-2010 and set different filtered ratios of FN. Our results show that our approach can maintain high F1 scores even under a FN ratio of 50%. This work was published in ACL 2021.
Reference :
▲階層式的去噪框架
Hierarchical denoising framework
此研究歸屬科技部 AI 專案計畫執行成果,詳細資訊請參考附錄之計畫總表第 19 項
For the name of the project which output this research, please refer to project serial no. 19 on the List of MOST AI projects on Appendix