Dimensionality reduction in large datasets

Hello guys, this is the first post of this tech blog about our discoveries and tips while working on the Meerkat’s products. Today we are going to talk about a problem that we faced a couple of weeks ago: how to reduce high-dimensional data and which dimensionality reduction technique to use.

First, let’s contextualize a little bit more here. So, we were working on facial recognition and extracting features from faces that will be latter used for classification. When faced with this problem in Computer Vision, usually you have two alternatives: 1) you design your features to be very discriminant and compact or 2) to hell with small features I use a dimensionality reduction afterwards to make it work. I might be exaggerating in alternative 2, but in the problem of facial recognition there is a nice paper about high dimensional features [1] (with a very good name) which shows that features in higher dimensions can be used in practice. For that, you need to reduce them to a sub-space that, while maintaining the discriminant aspect of features, make them easy to put in classifier.

Now, we are faced with the choice of a dimensionality reduction technique. There are a lot of methods for this, but Principal Component Analysis (PCA) is by far the most used. Let’s make some considerations on that. What the PCA will try to do is to find a subspace by maximising the variance between dimensions. This is super cool, since in classification we want to maintain features points that are far away in the original space also far away in the sub-space. However, if you have a supervised learning, it’s more intelligent to use this information to create your basis for the sub-space since we can create a subspace in which the classes are distant from each other. That is what methods like LDA and Partial Least Squares (PLS) does. They aim to maximize inter-class variance while minimising the intra-class variance (isn’t that clever?).

For instance, take a look at these plots extracted from the work of Schwartz [2] that uses PLS dimensionality
reduction to the problem of pedestrian detection:

pca_vs_pls

 

There are also some important works that use LDA, PLS and derivate techniques for face identification, but that will be a subject for another post. One thing that is not well addressed in the literature (in my knowledge) is what implies to your feature if a dimensionality reduction can be excessively used without
any loss in the algorithm performance. What I mean is, if you can reduce a 100-d feature to a 2-d feature without loosing much information, this might means that your original feature is not discriminant at all! Why else that will be so much redundancy in the dimensions?

References:

[1] – Chen, Dong, et al. “Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification.” Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.

[2] – Schwartz, William Robson, et al. “Human detection using partial least squares analysis.” Computer vision, 2009 IEEE 12th international conference on. IEEE, 2009.

2 comments

  1. Ali Abusnina · June 30, 2015

    What exactly do you mean by :
    – inter-class variance
    – intra-class variance

    • meerkat · June 30, 2015

      Hello,

      inter-class in a sense is the distance between the different classes points. How much class A is far from class B.

      intra-class takes into account only one class. Is it spread across the space, or clustered tight.

      For instance:
      https://en.wikipedia.org/wiki/Intraclass_correlation