University of Illinois at Urbana-Champaign

Education

Dec. 2011 – Dec. 2014

University of Illinois at Urbana-Champaign | Urbana, IL
PhD in Electrical and Computer Engineering
Advisor: Prof. Mark Hasegawa-Johnson

Aug. 2009 – Dec. 2011

University of Illinois at Urbana-Champaign | Urbana, IL
Master in Electrical and Computer Engineering
Advisor: Prof. Mark Hasegawa-Johnson

Sep. 2005 – Jun. 2008

National Taiwan University | Taipei, Taiwan
Bachelor of Science in Electrical Engineering

Research Interests

I am interested in the field of machine learning and signal processing, especially in

  1. Speech and audio processing
  2. Natural language processing

Selected Publications (Complete List)

  1. Po-Sen Huang
    Shallow and Deep Learning for Audio and Natural Language Processing
    Ph.D. dissertation, 2015 (PDF, Bibtex)
  2. Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
    Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, Dec. 2015 (PDF, Bibtex, Codes)
  3. Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
    Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks
    Proc. of the International Society for Music Information Retrieval (ISMIR), 2014 (PDF, Bibtex, Codes)
  4. Po-Sen Huang, Haim Avron, Tara Sainath, Vikas Sindhwani, Bhuvana Ramabhadran
    Kernel Methods match Deep Neural Networks on TIMIT
    Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014. (PDF, Bibtex, Codes) [IBM Research Spoken Language Processing Student Grant]
  5. Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
    Deep Learning for Monaural Speech Separation
    Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014. (PDF, Slides, Bibtex, Codes) [Starkey Signal Processing Research Student Grant]
  6. Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, Larry Heck
    Learning Deep Structured Semantic Models for Web Search using Clickthrough Data
    Proc. of the ACM International Conference on Information and Knowledge Management (CIKM), 2013. (PDF, Bibtex, Codes)
  7. Po-Sen Huang, Li Deng, Mark Hasegawa-Johnson, Xiaodong He
    Random Features for Kernel Deep Convex Network
    Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013. (PDF, Bibtex)
  8. Po-Sen Huang, Jianchao Yang, Mark Hasegawa-Johnson, Feng Liang, Thomas S. Huang
    Pooling Robust Shift-Invariant Sparse Representation of Acoustic Signals
    Proc. of the Interspeech, 2012. (PDF, Bibtex)
  9. Po-Sen Huang, Scott Deeann Chen, Paris Smaragdis, Mark Hasegawa-Johnsons
    Singing-Voice Separation From Monaural Recordings Using Robust Principal Component Analysis
    Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012 (PDF, Bibtex, Codes)

Software

Singing-Voice Separation From Monaural Recordings Using Robust Principal Component Analysis
Deep Learning for Monaural Source Separation

Honors & Awards

May 2014

IBM Research Spoken Language Processing Student Grant | ICASSP

  1. Research grant for the paper, "Kernel Methods match Deep Neural Networks on TIMIT"

May 2014

Starkey Signal Processing Research Student Grant | ICASSP

  1. Research grant for the paper, "Deep Learning for Monaural Speech Separation"

Apr. 2014

Yi-Min Wang and Pi-Yu Chung Endowed Research Award | ECE, UIUC

  1. Given to a doctoral graduate student who has demonstrated excellence in research

2012

Study Abroad Scholarship | Ministry of Education, Taiwan

  1. A two-year scholarship, 32,000 USD, for the Ph.D. study

2008

Graduation Representative | Department of Electrical Engineering, NTU

2008

Excellent Student in Engineering Award | Chinese Institute of Engineers

  1. Recommended by the department head and the president of National Taiwan University

2008

Dean's List | National Taiwan University

  1. Given to students who provide volunteer tutoring assistance to other students on studies

2008

Presidential Award | National Taiwan University

  1. Given to the top 5% students in academic standing

Research Experience

Aug. 2009 – Jan. 2015

University of Illinois at Urbana-Champaign | Research Assistant

Project: Source Separation from Monaural Recordings

  1. Investigated the use of robust principle component analysis (RPCA) for unsupervised singing voice separation tasks.
  2. Developed joint mask training and discriminative objectives with deep recurrent neural networks.
  3. Achieved state-of-the-art monaural source separation results.
  4. Reference: [ICASSP12a, ICASSP14a, ISMIR14, TASLP15].

Project: Opportunistic Sensing for Object and Activity Recognition

  1. Improved object classification accuracy by using POMDP and crowdsourcing models to discover and select sensing platforms.
  2. Developed multi-modal fusion techniques based on coupled-HMM and feature representation learning in audio-visual event detection tasks.
  3. Improved the robustness of recognition systems in adversary conditions.
  4. Reference: [MLSP12, ISPRS14, ICASSP11, Interspeech12]

Project: Multi-dialect Speech Recognition and Machine Translation for Qatari TV

  1. Developed a data-sharing transfer learning technique for multi-dialect Arabic ASR.
  2. Resolved data scarcity problems in dialectal Arabic ASR.
  3. Reference: [CITALA11]

May – Aug. 2014

Google | Software Engineering Intern
Mountain View, CA
Project: End-to-End Speech Recognition

Mentor: Thad Hughes, Oriol Vinyals


May – Aug. 2013

IBM T.J. Watson Research Center | Research Intern
Yorktown Heights, NY
Project: Shallow Kernel Machines vs. Deep Learning for Speech Recognition

  1. Developed an ensemble method and a scalable parallel solver to enhance the expressive power of kernel machines.
  2. Achieved competitive results compared to deep neural networks.
  3. Reference: [ICASSP14b]

Mentors: Dr. Tara N Sainath, Dr. Haim Avron, Dr. Vikas Sindhwani, and Dr. Bhuvana Ramabhadran


Jan. – May 2013

Microsoft Research | Research Intern
Redmond, WA
Project: Deep Learning for Web Search

  1. Proposed and implemented the deep structure semantic model (DSSM), including the word hashing and discriminative training techniques using deep neural networks.
  2. Achieved significant NDCG gains in large-scale web search tasks.
  3. Reference: [CIKM13]

Mentors: Dr. Xiaodong He, Dr. Jianfeng Gao, Dr. Li Deng, Dr. Alex Acero, and Dr. Larry Heck


Jun. – Aug. 2012

Microsoft Research | Research Intern
Redmond, WA
Project: Deep Learning for Speech Confidence Estimation

  1. Investigated new word identity and score features with deep learning architectures.
  2. Developed a large-scale kernel deep convex network based on random Fourier features.
  3. Achieved 48% relative performance gain for the speech confidence estimator.
  4. Reference: [ICASSP13a, ICASSP13b]

Mentors: Dr. Kshitiz Kumar, Dr. Yifan Gong, and Dr. Li Deng


Jun. – Aug. 2011

SRI International Sarnoff | Research Intern
Princeton, NJ
Project: Multimedia Event Detection

  1. Proposed a random forest based dictionary learning algorithm.
  2. Achieved the state-of-the-art MAP in the TRECVID 2011 multimedia event detection task.
  3. Reference: [ICASSP12b, ISM12, IJDEM12].

Mentor: Dr. Ajay Divakaran


Jun. – Aug. 2010

U.S. Army Research Lab | Research Intern
Adelphi, MD
Project: Multi-modal Sensor Fusion for Human Detection at Border Crossing

  1. Proposed and evaluated multi-sensory features with different phenomenologies.
  2. Improved the robustness of human detection systems.
  3. Reference: [HLVD11, Fusion11].

Mentor: Dr. Thyagaraju Damarla


Jul. 2007 – Jul. 2008

DSPIC Lab | Undergraduate Research Assistant
Taipei, Taiwan
Project: Block-Based Depth Stabilization for 3D Scene Reconstruction

  1. Developed a depth stabilization algorithm for depth reconstruction in 3D video sequences.
  2. Reference: [ICCE09].

Advisor: Prof. Liang-Gee Chen


Feb. 2007 – Jun. 2007

Speech Processing Laboratory | Undergraduate Research Assistant
Taipei, Taiwan
Project: Speaker Adaptation in Large Vocabulary Speech Recognition Systems

  1. Investigated speaker adaptation with MAP and MLLR criteria in a large vocabulary speech recognition system.

Advisor: Prof. Lin-shan Lee

Teaching Experience

Spring 2014

University of Illinois at Urbana-Champaign | Teaching Assistant
ECE 417: Multimedia Signal Processing (by Prof. Mark Hasegawa-Johnson)

Spring 2011

University of Illinois at Urbana-Champaign | Teaching Assistant
ECE 493: Advanced Engineering Mathematics (by Prof. Jont Allen)

Patent

A Deep Structured Semantic Model Produced Using Click-Through Data, US Patent, 2013

Talks

May 2014

Kernel Methods Match Deep Neural Networks on TIMIT, Google Brain

Nov. 2013

Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data, DAIS seminar, UIUC

Aug. 2013

Shallow Kernel Machines vs. Deep Neural Networks, IBM Research

May 2013

Deep Learning for Web Search, Microsoft Research

Aug. 2012

Predicting Speech Recognition Using Deep Learning with Word Identity and Score Features, Microsoft

Jun. 2011

Improving Acoustic Event Detection Using Generalizable Visual Features and Multi-modality Modeling, SRI International Sarnoff

May 2011

Improving Acoustic Event Detection Using Generalizable Visual Features and Multi-modality Modeling, DSP seminar, UIUC

Aug. 2010

When Multi-Sensor Fusion Meets Acoustic Analysis, U.S. Army Research Lab

May 2010

Synchrony and Asynchrony Modeling for Audio-Visual Event Detection, Illinois Speech Day, TTIC

Academic Service

Reviewer for Interspeech, ICASSP, CBMI, EUSIPCO, ISMIR, Journal of the Acoustical Society of America, IEEE Signal Processing Letters, IEEE Transactions on Multimedia Journal

Course Works

  1. Statistical Learning
  2. Statistical Learning and Pattern Recognition
  3. Introduction to Optimization
  4. Detection and Estimation
  5. Information Theroy
  6. Statistical Learning Theory
  7. Real Variable
  8. Real Analysis

Skill & Sets

Experienced | C, C++
Knowledgeable | C#, HTML, CUDA, MPI, Perl, Python, shell script, MATLAB
Software | OpenCV, HTK, Sphinx, Kaldi