Representation learning

The success of data-driven solutions generally depends on the digital data representation (digital twin) of physical objects of interests. Representation learning is an algorithmic technique used to generate such representations from raw data.

Team

Principal Investigator

PhD students

  • Yiang Deng
  • Samuel Jackson
  • Huw Jones
  • Yusuf Sulehman

Representation learning aims at extracting refined information from raw data, and encoding it by data representation with low memory usage, which needs to be sufficiently accurate for further use in pattern recognition, prediction and exploration tasks.

We develop algorithms to compute representations for various types of data, such as:

  • locally structured data - e.g. images
  • sequential data - e.g. videos and signals
  • symbolic data - e.g. text, logic expressions
  • relation and graph data - e.g. knowledge bases, networks and ratings

Our algorithms build upon linear algebraic, geometric, probabilistic and neural / deep learning operations. Additionally, we attempt to develop theoretical foundations of the effect and principles underlying our algorithmic approaches.

Our current representation learning projects are mainly focused on the following directions:

  1. To leverage multimodal data collected from different information resources and characterised by different feature views.
  2. To seek effective solutions addressing the deficiency of large-scale, noisy data with high redundancy, missing and small data situations, inadequate labelling cases.
  3. To learn data representations in 2D or 3D space to facilitate data understanding based on visual observation.
  4. To understand and mitigate the trade-off between model robustness and accuracy by both theoretical and empirical studies.
  5. In addition to general image, text and speech analytics, we are particularly interested in issues and challenges identified in science and engineering domains, addressing issues like multi-modality, high-noise, missing information, data incompatibility, large scale, and memory usage.