Dimensions
Computers can only work with numerical data, and the same is true for machine learning models. Features need to be represented into tensors and mapped into high-dimensional space for processing.
For example, a 28x28 grayscale image will be flatten into a single matrix of
[[ 0 128 255 … 87 92 14]
[ 24 58 119 … 160 92 77]
[210 45 200 … 56 183 134]
…
[120 92 250 … 34 101 23]
[255 35 76 … 12 141 67]
[ 23 145 176 … 87 59 204]]
A colored image would typically need a third dimension to include 3 values pro pixel (i.e. one for each RGB canal).
[[[ 0 128 255] [ 24 58 119] [210 45 200] ...]
[[160 92 77] [ 92 77 130] [ 56 183 134] ...]
[255 35 76](255%20%2035%20%2076)
Manifold hypothesis
The Manifold hypothesis suggests that, within a high-dimensional space, data tends to lie on a lower-dimensional manifold (a curved surface) that is sufficient to capture patterns and relationships.
For example, in a 28x28 image of a letter, not all 784 dimensions are necessary to recognize the character. Despite variations, like handwriting styles or cases, a smaller number of dimensions are typically enough to capture the essential features and understand the underlying pattern.
Curse of dimensionality
The curse of dimensionality refers to the challenges high-dimensional spaces create for data analysis, such as sparsity and issues like overfitting.
Dimensionality reduction
Dimensionality reduction is the process of reducing the number of dimensions (i.e. features, variables) in a dataset while trying to preserve as much important information as possible.