Skip to content

Dimensions

Computers can only work with numerical data, and the same is true for machine learning models. Features need to be represented into tensors and mapped into high-dimensional space for processing.

For example, a 28x28 grayscale image will be flatten into a single matrix of 28×28=784 dimensions, which contains where each value represents a pixel's intensity, from pure black (0) to pure white (255).

python
[[  0 128 25587  92  14]
 [ 24  58 119160  92  77]
 [210  45 20056 183 134]

 [120  92 25034 101  23]
 [255  35  7612 141  67]
 [ 23 145 17687  59 204]]

A colored image would typically need a third dimension to include 3 values pro pixel (i.e. one for each RGB canal).

python
[[[  0 128 255] [ 24  58 119] [210  45 200] ...]
 [[160  92  77] [ 92  77 130] [ 56 183 134] ...]
 [255  35  76](255%20%2035%20%2076)

Manifold hypothesis

The Manifold hypothesis suggests that, within a high-dimensional space, data tends to lie on a lower-dimensional manifold (a curved surface) that is sufficient to capture patterns and relationships.

For example, in a 28x28 image of a letter, not all 784 dimensions are necessary to recognize the character. Despite variations, like handwriting styles or cases, a smaller number of dimensions are typically enough to capture the essential features and understand the underlying pattern.

Curse of dimensionality

The curse of dimensionality refers to the challenges high-dimensional spaces create for data analysis, such as sparsity and issues like overfitting.

Dimensionality reduction

Dimensionality reduction is the process of reducing the number of dimensions (i.e. features, variables) in a dataset while trying to preserve as much important information as possible.

Techniques