教师简介

胡海 长聘教轨助理教授

部门:英语系

主要经历

上海交通大学外国语学院英语系助理教授(2021-今)

美国印第安纳大学布鲁明顿分校计算语言学博士(2015-2021)

中国人民大学英语语言学硕士(2012-2015)

中国人民大学英语语言文学学士(2008-2012)

德国图宾根大学语言学系交换生(2013-2014)

 

个人主页

教学科研

研究兴趣:

计算语言学;自然语言理解;自然语言推理;句法树库;语料库翻译研究;数字人文

 

科研项目:

  • 主持教育部人文社科青年项目(2022-今,在研)
  • 翻译文本句法树库建立(2019-2022,在研):中国人民大学-印第安纳大学种子基金 (Renmin University of China–Indiana University Joint Funding Program) 
  • 原生汉语自然语言推理数据集 Original Chinese Natural Language Inference (OCNLI) corpus. link
  • 中文语言理解测评基准 Chinese Language Understanding Evaluation (CLUE) benchmark. link

 

教授课程

上海交通大学:学术英语写作、大学英语、英语视听说、语言智能

印第安纳大学:语言学入门、认知科学中的逻辑与数学(助教)

 

研究领域及论文:


【自然语言理解/自然语言推理; natural language understanding/natural language inference】

 

In this line of research, I work on:

 

1) teaching computers to understand human language in the form of natural language inference (自然语言推理), employing both logic-based methods (monotonicity calculus) and deep learning methods (pre-trained language models such as BERT), in collaboration with logicians and computer scientists.

We ask questions such as:

if we know that "All students party on New Year's Eve" and that "Most students get drunk in every party", does it follow that "Most PhD students get drunk on New Year's Eve"? (find the answer at the bottom of the page)

简言之,我用逻辑模型或深度学习模型(如BERT)教计算机做推理。

 

2) constructing benchmarks/infrastructure for evaluating NLP models mainly in Chinese, partly aiming to expose NLP models' weaknesses on specific linguistic phenomena such as the classifiers in Chinese.

In other words, I build datasets!

简言之,我建数据集训练/测试/玩弄计算机模型。

 

  • Kalouli*, Aikaterini-Lida, Hai Hu*, Alexander F. Webb, Lawrence S. Moss, Valeria de Paiva. (accepted). Curing the SICK and other NLI maladies. Computational Linguistics (SSCI). *equal contributions.  paper. data.
  • Xu, Liang, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Pan Xiang, Xin Tian, Hai Hu. (2021). FewCLUE: A Chinese few-shot learning evaluation benchmark. arXiv preprint arXiv:2107.07498. paper. code.
  • Hu, Hai, He Zhou, Zuoyu Tian, Yiwen Zhang, Yina Ma, Yanting Li, Yixin Nie, Kyle Richardson (2021). Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference. In: Findings of ACLpapercode

  • Xu, Liang, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, and Zhenzhong Lan (2020). CLUE: A Chinese Language Understanding Evaluation Benchmark. In Proceedings ofthe 28th International Conference on Computational Linguistics (COLING). pp. 4762–4772. paperwebsitegithub page

  • Hu, Hai, Kyle Richardson, Liang Xu, Lu Li, Sandra Kuebler, and Larry Moss. (2020). OCNLI: Original Chinese Natural Language Inference. In: Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 3512–3526. papercode and dataleaderboard.

  • Richardson, Kyle, Hai Hu, Larry Moss, and Ashish Sabharwal. (2020). Probing Natural Language Inference Models through Semantic Fragments. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence. pp. 8713-8721. papercode and data.

  • Hu, Hai, Qi Chen, Kyle Richardson, Atreyee Mukherjee, Lawrence S Moss, and Sandra Kuebler. (2020). MonaLog: a Lightweight System for Natural Language Inference Based on Monotonicity. In: Proceedings of the Society for Computation in Linguistics 2020. pp. 319-329. paperpostercode.

  • Hu, Hai, Qi Chen and Larry Moss. (2019). Natural Language Inference with Monotonicity. In Proceedings of the 13th International Conference on Computational Semantics (IWCS 2019), pp. 8–15. Gothenburg, Sweden. paper.

  • Hu, Hai, and Lawrence S. Moss. (2018). Polarity Computations in Flexible Categorial Grammar. In Proceedings of the 7th Joint Conference on Lexical and Computational Semantics: *SEM, pp. 124–129. New Orleans, Louisiana, USA. paperpostercode.


【语义变迁; semantic change】

 

Here I work on detecting semantic change using word embeddings (word2vec, GloVe) in low-resource scenarios, e.g., medieval Spanish. 

简言之,我用词向量的方法探测哪些词语的词义发生了历时变化。

 

  • Amaral, Patrícia, Hai Hu and Sandra Kübler (accepted). “Tracing semantic change with distributional methods: The contexts of algo”. Diachronica. (SSCI).

  • Hu, Hai, Patrícia Amaral and Sandra Kübler (2022). “Word Embeddings and Semantic Shifts in Historical Spanish: Methodological Considerations”. Digital Scholarship in the Humanities. Volume 37, Issue 2, Pages 441–461 (SSCI) papercode


【语料库翻译研究/翻译汉语树库建设; corpus translation studies/treebank construction】

 

I am also interested in the morphological, syntactic and stylistic characteristics of translated Chinese (翻译汉语) and Europeanized Chinese (欧化汉语).

To this end, I 1) employ machine learning methods to study translations and 2) build treebanks (=syntactically annotated corpora) to look into the syntactic features of translationese. 

简言之,我用机器学习的方法和自建的句法树库研究翻译文本特征。

 

  • Hu, Hai and Sandra Kübler. (2021). Investigating Translated Chinese and Its Variants Using Machine Learning. In Natural Language Engineering. Volume 27, Issue 3 , May 2021 , pp. 339 - 372. (SCI/SSCI/AHCI) papercode.

  • Hu, Hai, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Sandra Kübler, and Chien-Jer Charles Lin (2020). “Building a Literary Treebank for Translation Studies in Chinese”. In: Proceedings of 19th International Workshop on Treebanks and Linguistic Theories (TLT). pp. 18-31. paper.

  • Hu, Hai, Wen Li, and Sandra Kübler. (2018). Detecting Syntactic Features of Translated Chinese. In Proceedings of the 2nd Workshop on Stylistic Variation, pp. 20-28. New Orleans, Louisiana, USA. paperslidesvideo presentation.


【其它; others】

 

I'm a linguist, so I also collaborate with other linguists on very linguistic-y projects where computational modeling is sometimes used. 

作为语言学家,我也做一些有趣的语言学研究,比如为什么成都人把“吃饭“说成“吃fɛn”。

 

  • Lin, Chien-Jer Charles, & Hu, Hai. (2018). Linking comprehension and production: Frequency distribution of Chinese relative clauses in the Sinica Treebank. In Chu-Ren Huang, Shukai Hsieh, & Peng Jin (eds.) Text, Speech, and Language Technology Series. Springer. pp. 1-21.

  • Hu, Hai and Yiwen Zhang. (2017). Path of Vowel Raising in Chengdu Dialect of Mandarin. In Proceedings of the 29th North America Conference on Chinese Linguistics. Rutgers, NJ. pp. 481-498. paperabstract.

 

所有发表文章请参看:https://huhailinguist.github.io/publications/

 

翻译:

  • 《表象与本质——类比,思考之源和思维之火》刘健、胡海、陈祺 译;[美] 侯世达 / [法] 桑德尔 著;浙江人民出版社;2018年

 

(The answer to the inference question is: NO. It does not follow. )

社会兼职

会议组织:

  • NAtural LOgic meets MAchine learning (NALOMA) workshop; Workshop at WESSLLI 2020. webpage

 

审稿:

  • ACL, EMNLP, NAACL, CCL等计算语言学会议
  • Natural Language Engineering等学术期刊

地址:中国上海东川路800号上海交通大学闵行校区杨咏曼楼

  邮编:200240  网址:http://sfl.sjtu.edu.cn

​​​​​​​ 电话:021-34205664 (党政办公室)  021-34204723(教学科研办公室)

Copyright @ 2017 All Rights Reserved 旧版网站