Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Big Data
AIML427
Manifold Learning/
Nonlinear Dimensionality Reduction (1)
1
Week 5:2
Dimensionality Reduction
2
Original
Dataset
N
#features = dimension D ≪ D
“Embedding”
(Nonlinear)
Dimensionality Reduction
Week 5:3How do we do the (NL)DR?
• We’ve discussed a lot of supervised methods
– Filter: information gain, mutual information, correlation, …
– Wrapper: Training a classifier given a feature subset
– Embedded: Decision Trees, Genetic Programming
• But what about if you have no labels?
• Or what if you want a (filter) method that considers the entire
set of feature relationships?
– Pairwise redundancies feel “naïve”
– What about the topology of the data?
• Let us make an “assumption”: our data is actually
(intrinsically) lower-dimensional than the D features we have
– We say it lies on a lower-dimensional manifold (embedding)
3
Week 5:4Swiss Rolls
4
This Photo by Unknown Author is licensed under CC BY-NC-ND
Week 5:5…mathematically speaking
5
3D
2D
Did we lose any structure?
Nonlinear
Transformation
Week 5:6
Well…
• The red and blue points were close in Euclidean (straight-
line) space in 3D – but aren’t in 2D
• …but this is desirable, as there is a “gap” in the topology
• We have preserved the geodesic distance
– Geodesic Distance = shortest path in a graph
– E.g. distance from Auckland to London flying vs through the Earth
6
Week 5:7
Geodesic Distance
7
Week 5:8
Manifolds
• In mathematics, a manifold is a topological space that locally
resembles Euclidean space near each point.
– Small portions of a circle look like a line
– “Small” portions of the Earth look flat
– Topology ignores bending
8
Week 5:9I thought this was an AI class…
• It is!
• The key takeaway: a manifold is a d-dimensional object
“living” in a D-dimensional world.
• If we can find or approximate this manifold, we can
represent our data in d dimensions instead of D!
• We call this d-dimensional space an embedding – it is
embedded within the D-dimensional space.
9
Week 5:10
Manifold Learning
• We want to find a smaller embedding of our data
• Manifold learning (MaL): using machine learning to learn an
embedding
• Nonlinear dimensionality reduction (NLDR): a broader term:
any approach to reducing dimensionality through nonlinear
transformations
• In practice, the two terms are used pretty interchangeably
10
Week 5:11
Manifold Learning
• How do we know our data lies on a manifold?
• Ways to estimate the intrinsic dimensionality of data
• In practice, an approximation of the original topology is
perfectly useable.
• As with PCA – if we retain the majority of the
variance/structure, we likely retain the important patterns
• Feature construction? Yes…and no. We’ll come back to that
11
Week 5:12
“CLASSIC” MANIFOLD
LEARNING ALGORITHMS
Week 5:13
Multidimensional Scaling (MDS)
• Dates from as early as 1938 (!)
• Minimises the difference between the pairwise distance
(between instances) in the high-dimensional vs embedding
space:
! = &"#$[ , − %(, )]&
where k and l are two instances, , is the distance in
embedded space and %(, ) the distance in high-dim
space