Commit d352baa0 authored by Stefano Covino's avatar Stefano Covino
Browse files

050326 commit

parent 002e6984
Loading
Loading
Loading
Loading
+176 −0
Original line number Diff line number Diff line
%% Cell type:markdown id:91330533 tags:

**What is this?**


*This jupyter notebook is part of a collection of notebooks on various topics discussed during the Time Domain Astrophysics course delivered by Stefano Covino at the [Università dell'Insubria](https://www.uninsubria.eu/) in Como (Italy). Please direct questions and suggestions to [stefano.covino@inaf.it](mailto:stefano.covino@inaf.it).*

%% Cell type:markdown id:915ee876 tags:

**This is a `julia` notebook**

%% Cell type:code id:41d298ca-3a5c-457d-ab57-308a7de2d3c0 tags:

``` julia
import Pkg; Pkg.activate(".")
```

%% Output

      Activating project at `/mnt/chromeos/GoogleDrive/MyDrive/Teaching/Insubria/Docs_2025_26/Lectures/Lecture - Singular Spectrum Analysis`

%% Cell type:code id:6fff2b26-f553-42dd-a7fd-292dbcdbfbed tags:

``` julia
Pkg.instantiate()
```

%% Output

       Installed FFMPEG_jll ─ v8.0.0+0
       Installed Unitful ──── v1.26.0
    Precompiling packages...
       1122.1 ms  ✓ PDMats → StatsBaseExt
       1301.1 ms  ✓ FillArrays → FillArraysPDMatsExt
       1649.1 ms  ✓ libpng_jll
       2075.4 ms  ✓ WeakRefStrings
       2587.3 ms  ✓ ColorBrewer
       2704.7 ms  ✓ Glib_jll
       2317.6 ms  ✓ libsixel_jll
       4085.4 ms  ✓ StringManipulation
       4437.4 ms  ✓ ImageAxes
       5217.4 ms  ✓ ComputePipeline
       3991.0 ms  ✓ libwebp_jll
       3968.8 ms  ✓ Cairo_jll
       4148.5 ms  ✓ ImageMetadata
       2921.7 ms  ✓ HarfBuzz_jll
      10086.6 ms  ✓ IntervalArithmetic
       7038.1 ms  ✓ Sixel
       6198.7 ms  ✓ WebP
       2175.8 ms  ✓ IntervalArithmetic → IntervalArithmeticIntervalSetsExt
       2708.1 ms  ✓ IntervalArithmetic → IntervalArithmeticSparseArraysExt
       2645.3 ms  ✓ IntervalArithmetic → IntervalArithmeticLinearAlgebraExt
       3651.0 ms  ✓ libass_jll
       4919.6 ms  ✓ Netpbm
       3941.6 ms  ✓ Pango_jll
      13194.0 ms  ✓ PNGFiles
      13787.1 ms  ✓ Distributions
       3397.0 ms  ✓ FFMPEG_jll
       2378.7 ms  ✓ Distributions → DistributionsTestExt
       3755.0 ms  ✓ Distributions → DistributionsChainRulesCoreExt
       5425.7 ms  ✓ Cairo
       8423.4 ms  ✓ ExactPredicates
       3322.3 ms  ✓ KernelDensity
      22694.6 ms  ✓ PlotUtils
       7260.2 ms  ✓ DelaunayTriangulation
      34396.4 ms  ✓ CSV
      37212.2 ms  ✓ Unitful
       1254.2 ms  ✓ Unitful → ConstructionBaseUnitfulExt
       1364.1 ms  ✓ Unitful → PrintfExt
       1416.6 ms  ✓ Unitful → InverseFunctionsUnitfulExt
       2894.7 ms  ✓ Interpolations → InterpolationsUnitfulExt
      40614.6 ms  ✓ PrettyTables
      43283.5 ms  ✓ DataFrames
     156760.1 ms  ✓ Makie
      52338.7 ms  ✓ CairoMakie
      43 dependencies successfully precompiled in 252 seconds. 252 already precompiled.

%% Cell type:code id:8fa2633c-c139-4196-8a1e-3273bfe990a4 tags:

``` julia
using CairoMakie
```

%% Cell type:markdown id:53194e25 tags:

![Time Domain Astrophysics](Pics/TimeDomainBanner.jpg)

%% Cell type:markdown id:a7b36f9a tags:

# Principal Component Analysis (PCA)
***

- Principal component analysis (PCA) is a dimensionality reduction technique that transforms a data set into a set of orthogonal components, called *principal components*, which capture the maximum variance in the data.
- PCA simplifies complex data sets while preserving their most important structures.

%% Cell type:markdown id:8781bea3 tags:

## What Are Principal Components?
***

- Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables.
- These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most of the information within the initial variables is squeezed or compressed into the first components.

- So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to put maximum possible information in the first component, then maximum remaining information in the second and so on, until having something like shown below:

![PCAComps](Pics/PCAComponents.jpg)

%% Cell type:markdown id:c6d67b94 tags:

- Organizing information in principal components will allow one to reduce dimensionality without losing much information, and this by discarding the components with low information and considering the remaining components as your new variables.

- Principal components are often less interpretable and do not (necessarily) have any real physical meaning since they are constructed as linear combinations of the initial variables.

- Geometrically speaking, principal components represent the directions of the data that explain a maximal amount of variance.

> To put all this simply, just think of principal components as new axes that provide the best angle to see and evaluate the data, so that the differences between the observations are better visible.

%% Cell type:markdown id:15b3a2a7-1cbc-4615-8c20-c8c35e5c92b7 tags:

- For example, let us assume that the scatter plot of our data set is as shown below, the first principal component is approximately the line that matches the purple marks because it goes through the origin and it’s the line in which the projection of the points (red dots) is the most spread out.

- Or mathematically speaking, it’s the line that maximizes the variance (the average of the squared distances from the projected points (red dots) to the origin).

![PCA1_2](Pics/PCA1_2.gif)

- The second principal component is calculated in the same way, with the condition that it is uncorrelated with (i.e., perpendicular to) the first principal component and that it accounts for the next highest variance.

- This continues until a total of $p$ principal components have been calculated, equal to the original number of variables.

%% Cell type:markdown id:6311910b-04ac-4c56-bdc5-b77191949ebb tags:

## How Principal Component Analysis Works: 5 Steps
***

- Principal component analysis can be broken down into five steps. We’ll go through each step, providing logical explanations of what PCA is doing.

### Step 1: Standardization and Centering Data
***

- The aim of this step is to standardize the range of the continuous initial variables so that each one of them contributes equally to the analysis.

- More specifically, the reason why it is critical to perform standardization prior to PCA, is that the latter is quite sensitive regarding the variances of the initial variables. That is, if there are large differences between the ranges of initial variables, those variables with larger ranges will dominate over those with small ranges (for example, a variable that ranges between 0 and 100 will dominate over a variable that ranges between 0 and 1), which will lead to biased results. So, transforming the data to comparable scales can prevent this problem.

- Mathematically, this can be done by subtracting the mean and dividing by the standard deviation for each value of each variable. E.g., if $x$ is the considered variable, the standardized variable, $z$, turns out to be:

$$ z = \frac{x - <x>}{\sigma_x} $$

%% Cell type:markdown id:0cd961f2 tags:

### Credits
***

This notebook contains material obtained by https://towardsdatascience.com/a-proof-of-the-central-limit-theorem-8be40324da83.

%% Cell type:markdown id:05e93b1d tags:

## Course Flow
***

<table>
  <tr>
    <td>Previous lecture</td>
    <td>Next lecture</td>
  </tr>
  <tr>
      <td><a href="Lecture-StatisticsReminder.ipynb">Reminder of frequentist statistics</a></td>
    <td><a href="Lecture-StatisticsReminder.ipynb">Reminder of frequentist statistics</a></td>
  </tr>
 </table>


%% Cell type:markdown id:591bd355 tags:

**Copyright**

This notebook is provided as [Open Educational Resource](https://en.wikipedia.org/wiki/Open_educational_resources). Feel free to use the notebook for your own purposes. The text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/), the code of the examples, unless obtained from other properly quoted sources, under the [MIT license](https://opensource.org/licenses/MIT). Please attribute the work as follows: *Stefano Covino, Time Domain Astrophysics - Lecture notes featuring computational examples, 2026*.
+365 KiB
Loading image diff...
+64 KiB
Loading image diff...
+110 −48

File changed.

Preview size limit exceeded, changes collapsed.

−107 KiB
Loading image diff...
Loading