050326 commit (d352baa0) · Commits · Stefano Covino / TimeDomainAstrophysics

Lectures/Lecture - Singular Spectrum Analysis/Lecture-PCA.ipynb

0 → 100644

+176 −0

Original line number	Diff line number	Diff line
		%% Cell type:markdown id:91330533 tags:

		What is this?


		This jupyter notebook is part of a collection of notebooks on various topics discussed during the Time Domain Astrophysics course delivered by Stefano Covino at the [Università dell'Insubria](https://www.uninsubria.eu/) in Como (Italy). Please direct questions and suggestions to [stefano.covino@inaf.it](mailto:stefano.covino@inaf.it).

		%% Cell type:markdown id:915ee876 tags:

		This is a `julia` notebook

		%% Cell type:code id:41d298ca-3a5c-457d-ab57-308a7de2d3c0 tags:

		``` julia
		import Pkg; Pkg.activate(".")
		```

		%% Output

		Activating project at `/mnt/chromeos/GoogleDrive/MyDrive/Teaching/Insubria/Docs_2025_26/Lectures/Lecture - Singular Spectrum Analysis`

		%% Cell type:code id:6fff2b26-f553-42dd-a7fd-292dbcdbfbed tags:

		``` julia
		Pkg.instantiate()
		```

		%% Output

		Installed FFMPEG_jll ─ v8.0.0+0
		Installed Unitful ──── v1.26.0
		Precompiling packages...
		1122.1 ms ✓ PDMats → StatsBaseExt
		1301.1 ms ✓ FillArrays → FillArraysPDMatsExt
		1649.1 ms ✓ libpng_jll
		2075.4 ms ✓ WeakRefStrings
		2587.3 ms ✓ ColorBrewer
		2704.7 ms ✓ Glib_jll
		2317.6 ms ✓ libsixel_jll
		4085.4 ms ✓ StringManipulation
		4437.4 ms ✓ ImageAxes
		5217.4 ms ✓ ComputePipeline
		3991.0 ms ✓ libwebp_jll
		3968.8 ms ✓ Cairo_jll
		4148.5 ms ✓ ImageMetadata
		2921.7 ms ✓ HarfBuzz_jll
		10086.6 ms ✓ IntervalArithmetic
		7038.1 ms ✓ Sixel
		6198.7 ms ✓ WebP
		2175.8 ms ✓ IntervalArithmetic → IntervalArithmeticIntervalSetsExt
		2708.1 ms ✓ IntervalArithmetic → IntervalArithmeticSparseArraysExt
		2645.3 ms ✓ IntervalArithmetic → IntervalArithmeticLinearAlgebraExt
		3651.0 ms ✓ libass_jll
		4919.6 ms ✓ Netpbm
		3941.6 ms ✓ Pango_jll
		13194.0 ms ✓ PNGFiles
		13787.1 ms ✓ Distributions
		3397.0 ms ✓ FFMPEG_jll
		2378.7 ms ✓ Distributions → DistributionsTestExt
		3755.0 ms ✓ Distributions → DistributionsChainRulesCoreExt
		5425.7 ms ✓ Cairo
		8423.4 ms ✓ ExactPredicates
		3322.3 ms ✓ KernelDensity
		22694.6 ms ✓ PlotUtils
		7260.2 ms ✓ DelaunayTriangulation
		34396.4 ms ✓ CSV
		37212.2 ms ✓ Unitful
		1254.2 ms ✓ Unitful → ConstructionBaseUnitfulExt
		1364.1 ms ✓ Unitful → PrintfExt
		1416.6 ms ✓ Unitful → InverseFunctionsUnitfulExt
		2894.7 ms ✓ Interpolations → InterpolationsUnitfulExt
		40614.6 ms ✓ PrettyTables
		43283.5 ms ✓ DataFrames
		156760.1 ms ✓ Makie
		52338.7 ms ✓ CairoMakie
		43 dependencies successfully precompiled in 252 seconds. 252 already precompiled.

		%% Cell type:code id:8fa2633c-c139-4196-8a1e-3273bfe990a4 tags:

		``` julia
		using CairoMakie
		```

		%% Cell type:markdown id:53194e25 tags:

		![Time Domain Astrophysics](Pics/TimeDomainBanner.jpg)

		%% Cell type:markdown id:a7b36f9a tags:

		# Principal Component Analysis (PCA)
		***

		- Principal component analysis (PCA) is a dimensionality reduction technique that transforms a data set into a set of orthogonal components, called principal components, which capture the maximum variance in the data.
		- PCA simplifies complex data sets while preserving their most important structures.

		%% Cell type:markdown id:8781bea3 tags:

		## What Are Principal Components?
		***

		- Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables.
		- These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most of the information within the initial variables is squeezed or compressed into the first components.

		- So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to put maximum possible information in the first component, then maximum remaining information in the second and so on, until having something like shown below:

		![PCAComps](Pics/PCAComponents.jpg)

		%% Cell type:markdown id:c6d67b94 tags:

		- Organizing information in principal components will allow one to reduce dimensionality without losing much information, and this by discarding the components with low information and considering the remaining components as your new variables.

		- Principal components are often less interpretable and do not (necessarily) have any real physical meaning since they are constructed as linear combinations of the initial variables.

		- Geometrically speaking, principal components represent the directions of the data that explain a maximal amount of variance.

		> To put all this simply, just think of principal components as new axes that provide the best angle to see and evaluate the data, so that the differences between the observations are better visible.

		%% Cell type:markdown id:15b3a2a7-1cbc-4615-8c20-c8c35e5c92b7 tags:

		- For example, let us assume that the scatter plot of our data set is as shown below, the first principal component is approximately the line that matches the purple marks because it goes through the origin and it’s the line in which the projection of the points (red dots) is the most spread out.

		- Or mathematically speaking, it’s the line that maximizes the variance (the average of the squared distances from the projected points (red dots) to the origin).

		![PCA1_2](Pics/PCA1_2.gif)

		- The second principal component is calculated in the same way, with the condition that it is uncorrelated with (i.e., perpendicular to) the first principal component and that it accounts for the next highest variance.

		- This continues until a total of $p$ principal components have been calculated, equal to the original number of variables.

		%% Cell type:markdown id:6311910b-04ac-4c56-bdc5-b77191949ebb tags:

		## How Principal Component Analysis Works: 5 Steps
		***

		- Principal component analysis can be broken down into five steps. We’ll go through each step, providing logical explanations of what PCA is doing.

		### Step 1: Standardization and Centering Data
		***

		- The aim of this step is to standardize the range of the continuous initial variables so that each one of them contributes equally to the analysis.

		- More specifically, the reason why it is critical to perform standardization prior to PCA, is that the latter is quite sensitive regarding the variances of the initial variables. That is, if there are large differences between the ranges of initial variables, those variables with larger ranges will dominate over those with small ranges (for example, a variable that ranges between 0 and 100 will dominate over a variable that ranges between 0 and 1), which will lead to biased results. So, transforming the data to comparable scales can prevent this problem.

		- Mathematically, this can be done by subtracting the mean and dividing by the standard deviation for each value of each variable. E.g., if $x$ is the considered variable, the standardized variable, $z$, turns out to be:

		$$ z = \frac{x - <x>}{\sigma_x} $$

		%% Cell type:markdown id:0cd961f2 tags:

		### Credits
		***

		This notebook contains material obtained by https://towardsdatascience.com/a-proof-of-the-central-limit-theorem-8be40324da83.

		%% Cell type:markdown id:05e93b1d tags:

		## Course Flow
		***

		<table>
		<tr>
		<td>Previous lecture</td>
		<td>Next lecture</td>
		</tr>
		<tr>
		<td><a href="Lecture-StatisticsReminder.ipynb">Reminder of frequentist statistics</a></td>
		<td><a href="Lecture-StatisticsReminder.ipynb">Reminder of frequentist statistics</a></td>
		</tr>
		</table>


		%% Cell type:markdown id:591bd355 tags:

		Copyright

		This notebook is provided as [Open Educational Resource](https://en.wikipedia.org/wiki/Open_educational_resources). Feel free to use the notebook for your own purposes. The text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/), the code of the examples, unless obtained from other properly quoted sources, under the [MIT license](https://opensource.org/licenses/MIT). Please attribute the work as follows: Stefano Covino, Time Domain Astrophysics - Lecture notes featuring computational examples, 2026.

Lectures/Lecture - Singular Spectrum Analysis/Pics/PCA1_2.gif

0 → 100644

+365 KiB

Loading image diff...

Lectures/Lecture - Singular Spectrum Analysis/Pics/PCAComponents.jpg

0 → 100644

+64 KiB

Loading image diff...

Lectures/Lecture - Spectral Analysis/Lecture-SpectralAnalysis.ipynb

+110 −48

File changed.

Preview size limit exceeded, changes collapsed.

Lectures/Lecture - Spectral Analysis/Pics/squarewave.jpg

deleted100644 → 0

−107 KiB

Loading image diff...