Unverified Commit a18a975f authored by Akke Viitanen's avatar Akke Viitanen
Browse files

rename install, remove trailing cells

parent eeba8ae6
Loading
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -66,7 +66,7 @@ demo
.venv/
venv/
doc/
notebooks/
#notebooks/
overleaf/
paper/
table/
+1 −1
Original line number Diff line number Diff line
@@ -54,6 +54,6 @@ Useful resources
   :hidden:

   self
   install
   installation
   userguide
   autoapi/index
+2 −2
Original line number Diff line number Diff line
Install
=======
Installation
============

Due to the modular nature, installation of AGILE is done in a few distinct
steps that are detailed here.
+0 −5
Original line number Diff line number Diff line
%% Cell type:markdown id:4ce8b6f0-0ff4-4fc8-a317-d9e303f590da tags:

# Create photometric catalogs

This notebooks demonstrates how to use the AGILE truth catalog in order to reduce the simulated images and create photometric catalogs using the LSST Science pipelines

We use imSim (https://lsstdesc.org/imSim/index.html) to simulate the images. The purpose of this notebook is not to doucment the feature of imSim, but only to demonstrate how to pass our truth catalog as input to imSim, and how to run software using the tools available in AGILE. Note that imSim only simulates raw LSSTCam data (incl. e.g. sky and instrumental noise), which are not science ready products. The raw images are processed in subsequent steps by using the LSST Science pipelines.

Below, we provide helpful links and resources to further understand the LSST Science Pipelines and the butler:

+ LSST Science Pipelines: https://pipelines.lsst.io/
+ butler: https://pipelines.lsst.io/modules/lsst.daf.butler/index.html
+ DRP: https://github.com/lsst/drp_pipe
+ DP0.2: https://dp0-2.lsst.io/data-products-dp0-2/index.html
+ DP1: https://dp1.lsst.io/

The following notebook may be accessed [here](https://www.ict.inaf.it/gitlab/akke.viitanen/lsst_inaf_agile/-/blob/a619e4244f56e731d11724f8a70dd5e7d6e1fda4/docs/notebooks/create_photometric_catalog.ipynb)

%% Cell type:code id:ae4ce30f-f883-4878-a3f8-8ed90ee112bb tags:

``` python
import os

if not os.path.exists("src/lsst_inaf_agile"):
    os.chdir("../../")
    os.getcwd()
dirname = "data/tests/test_agile"
```

%% Cell type:markdown id:073add4f-0080-4f7d-a458-209e2bfdc50a tags:

## Make a butler repository

%% Cell type:code id:03466797-0fb3-4d25-963f-c12b2468f427 tags:

``` python
repo = f"{dirname}/repo"
print(f"Name of repository is '{repo}'")
```

%% Cell type:code id:29666986-d87c-4b43-8ffe-4af60a122c74 tags:

``` python
%%file src/scripts/make_repo.sh
#!/bin/bash -ex
if [ $# -ne 1 ] ; then
	echo "Initialize a repository"
	echo "Usage: $0 [repo]"
	exit 1
fi

REPO=$1
REPO_ASSET_DIR=/beegfs/AGILE/Pleiadi_at_IRA/pleia15/lsst/repo_assets
CALIBS=${REPO_ASSET_DIR}/calib_imSim2.0_w_2023_04

butler create ${REPO}
butler register-instrument ${REPO} lsst.obs.lsst.LsstCam
butler import --export-file "${CALIBS}/export.yaml" \
       --skip-dimensions 'instrument,detector,physical_filter,band' \
       --transfer direct ${REPO} ${CALIBS}
butler write-curated-calibrations ${REPO} 'LSSTCam'
butler register-skymap --config-file ${REPO_ASSET_DIR}/DC2_skymap_config.py ${REPO}
cp ${REPO}/gen3.sqlite3 ${REPO}/gen3.sqlite3_orig
```

%% Cell type:code id:a0fc5980-1dc8-455d-a972-1a97273a2615 tags:

``` python
os.system(f"sh src/scripts/make_repo.sh {repo}")
```

%% Cell type:markdown id:2db244d6-5770-4c12-904e-7e20d1989171 tags:

## Ingest files into the repository

%% Cell type:code id:e27db403-c026-46bc-80d4-ca29dd4ea868 tags:

``` python
%%file src/scripts/ingest.sh
#!/bin/bash -x

if [ $# -lt 2 ] ; then
	echo "Usage: $0 [repo] (raws)"
	exit 1
fi

repo=$1
shift
raws="$@"

butler ingest-raws --transfer direct ${repo} ${raws}
butler define-visits --collections LSSTCam/raw/all ${repo} LSSTCam
butler collection-chain ${repo} LSSTCam/defaults 'LSSTCam/raw/all,LSSTCam/calib,u/jchiang/calib_imSim2.0_w_2023_04,refcats,skymaps'
```

%% Cell type:code id:b45f0700-af53-4181-95f2-b4bf4b3e0a4e tags:

``` python
import glob

raws = " ".join(glob.glob(f"{dirname}/imsim/**/amp*.fits.fz", recursive=True))
print(raws)
```

%% Cell type:code id:82461ca7-eb77-4453-82fd-341edf414392 tags:

``` python
os.system(f"sh src/scripts/ingest.sh {repo} {raws}")
```

%% Cell type:markdown id:d4c75520-2a98-4812-947e-d2d95264fd65 tags:

## Run a pipeline processing task

%% Cell type:code id:2546962b-8e97-4aad-a519-aae2f7092656 tags:

``` python
%%file src/scripts/pipetask.sh
#!/bin/sh -ex

###############################################################################
# Run an LSST pipeline task
###############################################################################

if [ $# -lt 2 ] ; then
	echo "Usage: $0 [repo] [task] (task) ..."
	exit 1
fi

# Handles input arguments
REPO=$1
shift
TASKS="$@"
FLAGS="-j 10 -b $REPO -i LSSTCam/defaults -o u/agile/output --register-dataset-types --clobber-outputs --skip-existing-in u/agile/output"

for TASK in $TASKS ; do
	FLAGS1=$FLAGS
	[ $TASK = "calibrate" ] || [ $TASK = "step1" ] && FLAGS1="$FLAGS -C calibrate:etc/cmodel.py"
	pipetask --log-level=DEBUG run $FLAGS1 -p "etc/my_DRP2.yaml#$TASK"
done
exit $?
```

%% Cell type:code id:eeb74500-6291-458a-ae94-24becfd97aef tags:

``` python
# run step1 from etc/my_DRP2.yaml. See the file for the definition of this step.
os.system(f"sh src/scripts/pipetask.sh {repo} step1")
```

%% Cell type:markdown id:f621dec2-9c5f-493d-8b47-4fbe41cf6a28 tags:

## Examine at the output products

%% Cell type:code id:e9eb56ff-43e8-4f86-9e9e-982e6f6bfd95 tags:

``` python
from lsst.daf.butler import Butler

butler = Butler(f"{repo}", collections="u/agile/output")
# do a butler query
```

%% Cell type:markdown id:a7b8f30f-78a2-45e0-822e-97a4d5931b6a tags:

## A note on large-scale production

Butler repositories have a limitation of working only in single-threaded applications. To circumvent this, it is suggested that any large-scale production does not call pipetask and/or butler directly, but through some driver software instead. BPS is one solution for these, but others exist, as well. Refer to: https://pipelines.lsst.io/modules/lsst.ctrl.bps/quickstart.html and references therein.

%% Cell type:code id:a1b131c0-e579-4738-8775-c4eaec84d6d9 tags:

``` python
```
+0 −5
Original line number Diff line number Diff line
%% Cell type:markdown id:70b92b7a-6039-4468-bf49-87dc87c2d965 tags:

# Create an example AGILE truth catalog

This notebooks demonstrates the basic usage of the truth catalog library, and generates an extermely small truth catalog for testing purposes.

In AGILE, each individual class of source (e.g. AGN, galaxy, star) is first contained in its own catalog. The final step in the truth catalog creation is to simply combine these individual source catalogs into one combined catalog that can be used for future image simulations.

The different classes of objects considered are:

1. Galaxies
2. AGNs
3. Stars
4. Binary stars


The following notebook may be accessed [here](https://www.ict.inaf.it/gitlab/akke.viitanen/lsst_inaf_agile/-/blob/a619e4244f56e731d11724f8a70dd5e7d6e1fda4/docs/notebooks/create_truth_catalog.ipynb)

%% Cell type:markdown id:3cbfa609-8950-474a-97f0-9fcefb71c103 tags:

## Initialize

%% Cell type:code id:9759604f-8be1-4a5d-b63a-88da53fab067 tags:

``` python
import os

if not os.path.exists("src/lsst_inaf_agile"):
    os.chdir("../../")
    os.getcwd()
dirname = "data/tests/test_agile"
```

%% Cell type:code id:a01c8b5b-8e8e-409d-94c2-185806a70646 tags:

``` python
import logging
import numpy as np
from lsst_inaf_agile.catalog_agn import CatalogAGN
from lsst_inaf_agile.catalog_combined import CatalogCombined
from lsst_inaf_agile.catalog_galaxy import CatalogGalaxy
from lsst_inaf_agile.catalog_star import CatalogStar
from lsst_inaf_agile.egg import Egg
from lsst_inaf_agile.image_simulator import ImageSimulator
from lsst_inaf_agile.merloni2014 import Merloni2014
from lsst_inaf_agile import util
```

%% Cell type:markdown id:897ccf7a-3bdc-4ae6-943d-20a7eb2b6c9e tags:

## Create the EGG galaxy catalog

%% Cell type:code id:37224123-c322-4cc8-beab-fe8d56d3455b tags:

``` python
# The following list of keyword arguments may be expanded on.
# Look into 'egg-gencat help'for all the available EGG arguments.
egg_kwargs = Egg.get_example_egg_kwargs(dirname + "/egg.fits")
egg_kwargs
```

%% Cell type:code id:e004eed5-a555-4307-80c2-ebe8460d4ae0 tags:

``` python
egg = Egg(egg_kwargs)
egg.run()
```

%% Cell type:code id:62685aff-dd9b-4054-92b6-5bf4f3e6cb6c tags:

``` python
catalog_egg = Egg.read(egg_kwargs["out"])
```

%% Cell type:code id:cb439284-3a83-45d6-8727-ee749892eb03 tags:

``` python
# Convert to the AGILE galaxy catalog class
catalog_galaxy = CatalogGalaxy(dirname, catalog_egg)
```

%% Cell type:markdown id:f0634979-f64c-40af-ac05-b8ab7095e39d tags:

## Create the AGN catalog

%% Cell type:code id:4f39b065-f97b-4aa7-9f59-8ed05794b724 tags:

``` python
# Create the AGN catalog
kwargs_catalog_agn = dict(
    dirname=dirname,
    catalog_galaxy=catalog_galaxy,
    type_plambda="zou+2024",
    save_sed=1,
    seed=20251005,
    merloni2014=Merloni2014(1, 0, 0.05, 0.95),
)
catalog_agn = CatalogAGN(**kwargs_catalog_agn)
```

%% Cell type:markdown id:1fd2389a-8048-41de-bbc1-b7c8848fb87d tags:

### Examine an example AGN

%% Cell type:code id:6a915c09-565d-4cc3-87ce-c7a1bb2ba24e tags:

``` python
# Select a type1 AGN
select = (catalog_agn["is_agn"] == 1) & (catalog_agn["is_optical_type2"] == 0)

# Select the first AGN by ID
ids = catalog_agn["ID"][select]
my_id = ids[0]

print(f"Found a type1 AGN with ID = {my_id}")
```

%% Cell type:markdown id:5ca1fcb0-67b4-4d9a-b72a-75ee712c4b11 tags:

### Examine an AGN SED

%% Cell type:code id:b00e821a-689e-47d8-a3af-841fef3b261a tags:

``` python
lam, flux = catalog_agn.get_sed(my_id)
plt.loglog(lam, flux)
```

%% Cell type:markdown id:b5c2b317-bae9-4cc7-a221-075c73ae7a77 tags:

### Estimate a single AGN light curve

%% Cell type:code id:c345c7d1-4bcb-439a-98ca-803689db5604 tags:

``` python
# Select a type1 AGN
select = (catalog_agn["is_agn"] == 1) & (catalog_agn["is_optical_type2"] == 1)

# Select the first AGN by ID
my_id = catalog_agn["ID"][select][0]

# Estimate its lightcurve.
# The lightcurve is estimated for 10-years with a cadence of one day.
# The returned flux is in the observer frame.
lc = catalog_agn.get_lightcurve(my_id, "lsst-r")
plt.plot(lc)
plt.xlabel("MJD")
plt.ylabel(r"flux $r$ [uJy]")
```

%% Cell type:markdown id:448f5d18-f961-4410-81e3-b5bbfe23c15c tags:

## Create the star and binary star catalogs

%% Cell type:code id:3b21fd72-661e-4094-b6d5-f09cef1f8403 tags:

``` python
# Create the star catalogs
catalog_star = CatalogStar(dirname, catalog_galaxy, is_binary=False)
catalog_binary = CatalogStar(dirname, catalog_galaxy, is_binary=True)
```

%% Cell type:code id:93b9dbe2-c79e-4c2d-a2f8-74e6195ddeff tags:

``` python
# NOTE: dal Tio+ simulated only 10% of the toal number of binary systems. They suggest a value of
# fbin=0.40 to account for this. To achieve this:
#   1) input star catalog is downsampled to 1 - fbin = 60% of the original size
#   2) the binary catalog is repeated 4 times
# The helper function below does these two steps automatically.
if "_once" not in globals():
    print(f"Old sizes are {len(catalog_star.stars)=}, {len(catalog_binary.stars)=}")
    catalog_star, catalog_binary = CatalogStar.get_star_binary_fbin(
        catalog_star, catalog_binary, fbin=0.40, nrepeat=4
    )
    _once = True
print(f"New sizes are {len(catalog_star.stars)=}, {len(catalog_binary.stars)=}")
```

%% Cell type:markdown id:5c6ad011-601a-433e-86e5-f67ffb50bc12 tags:

## Create the combined catalog of AGNs, galaxies, and stars

%% Cell type:code id:8901a2f1-86a1-4d43-a32e-6158f8f017a8 tags:

``` python
# NOTE: we set cache=False to prevent an existing catalog file to be loaded from the disk.
catalog_combined = CatalogCombined(
    dirname, catalog_galaxy, catalog_agn, catalog_star, catalog_binary, cache=False
)
```

%% Cell type:markdown id:d0d0a3cc-dbca-4675-b79b-665308717ba5 tags:

### Examine the truth catalog columns

Here we list all the available columns in the truth catalog. The meaning of these columns is described in the appendix of Viitanen+2026.

It is important to note that the truth catalog contains ONE row per object. That is, a single row corresponds to either an AGN, galaxy, or a star. Consequently, not all the different classes of objects have all the different columns available. For example, (host) galaxy stellar mass ('M') is only available for galaxies (and AGNs).

%% Cell type:code id:cd17c16a-2e08-48c8-b45a-46542673d5b2 tags:

``` python
catalog_combined.get_dtype()
```

%% Cell type:markdown id:64b46ad7-07a2-470a-b035-2a690d5dd0ee tags:

## Examine the output files

At this stage, the static truth catalog is completed. To find out which files have been written to disk, we check the output directory.

%% Cell type:code id:61db188b-6ca2-45f9-96e8-971daf27eb6d tags:

``` python
os.listdir(dirname)
```

%% Cell type:markdown id:a0074fd5-14b0-4a53-9c02-f9baa36cfb6f tags:

The explanation of the catalog files is as follows:

+ egg.fits -- the EGG galaxy catalog
+ agn.fits -- the AGN catalog
+ stars.fits -- the full star catalog
+ binaries.fits -- the full binary star catalog
+ catalog.fits -- the combined truth catalog

In addition to:

+ egg-seds* -- databse files needed to generate EGG SEDs (refer to EGG documentation)
+ seds/ -- AGN SEDs in EGG format (refer to EGG documentation)
+ lightcurves/ -- stored lightcurves (see the AGN catalog section above)

Note that by default, all galaxy SEDs only reside in the EGG database. These can be stored on disk on demand, but essentially doubles the disk space used which can be problematic on larger catalogs.

Also, light curves are only generated by request. In case images are simulated, then a subset of light curves will be estimated automatically for the sources within the region of interest.

%% Cell type:markdown id:01298432-fed8-4570-bf87-fbb507302845 tags:

## Example plot: (host) galaxy stellar mass versus redshift

%% Cell type:code id:73045582-05f0-4fb3-8202-e77862cfed06 tags:

``` python
plt.figure(dpi=200)
select = catalog_combined["Z"] > 0
plt.plot(catalog_combined["Z"][select], catalog_combined["M"][select], ".")
plt.xlabel(r"$z$")
plt.ylabel(r"$\log (M\,/\,M_\mathrm{star})$");
```

%% Cell type:markdown id:22e8a779-187b-4482-a750-2d72bfbd41eb tags:

## Example plot: g-band flux distribution

%% Cell type:code id:eb4c0774-7cc7-45e2-b84d-8a21826add84 tags:

``` python
# Example plot: distribution of the observed g-band flux.
plt.figure(dpi=200)
plt.hist(catalog_combined["lsst-g_total"], bins=np.logspace(-3, 3, 61))
plt.loglog()
plt.xlabel(r"lsst-g_total [uJy]")
plt.ylabel("frequency");
```

%% Cell type:markdown id:0bb771ab-a997-4f51-87aa-3fc3826c08db tags:

## A note on the occupation fraction

%% Cell type:markdown id:73415ddf-6699-4d14-8edd-f3f18d7dca7b tags:

In the simulation, it is preliminarily assumed that each galaxy has a SMBH, and
a corresponding value $M_\mathrm{BH}$. Local observations suggest that this
might not be the case, and especially at low values of Mstar, a BH might be
missing altogether. To facilitate for this, in the truth catalog a flag
``has_bh'' is provided. The formal definition of this flag is:

\begin{equation}
    \mathrm{has\_bh} = U < f_\mathrm{occ}(M_\mathrm{star}),
\end{equation}

where $U$ is a uniform random variable $U \sim \mathrm{Unif}(0, 1)$, and
$f_\mathrm{occ}$ follows the observational results of Zou+2025. This column may
be used as a weight for all calculations of statistical distributions. For
example, the galaxy BH mass function can be calculated with and without the
occupation fraction by simply weighting the mass function by unity (every
galaxy is expected to host a BH), or ``has_bh'' (galaxies are expected to host
BHs in accordance with the occupation fraction). An example is provided below.

%% Cell type:code id:385e6d41-6bca-4fea-975e-b8979609a91a tags:

``` python
# Select all galaxies
is_galaxy = catalog_combined["Z"] > 0.0

# Select central BHs
has_bh = catalog_combined["has_bh"]

# Combine the two selections
is_galaxy_and_has_bh = is_galaxy & has_bh

# Do some statistical tests
print(f"Total number of galaxies: {is_galaxy.sum()}")
print(f"Total number of galaxies with BHs: {is_galaxy_and_has_bh.sum()}")

# Plot Mstar vs. occupation fraction
plt.title("BH occupation fraction vs. stellar mass")
plt.plot(catalog_combined["M"][is_galaxy], catalog_combined["occupation_fraction"][is_galaxy], ".")
plt.xlabel(r"$\log \left( M_\mathrm{star} / M_\odot \right)$")
plt.ylabel(r"$f_\mathrm{occ}$")
```

%% Cell type:markdown id:99cf5e38-5d31-4aab-9112-b6ba6b76a0a3 tags:

For AGN, the logic remains the same. To weigh AGN by the occupation fraction,
one would use the product $\mathrm{is\_agn} \times \mathrm{has\_bh}$ instead of
simply $\mathrm{is\_agn}$. Logically, this is equivalent to the condition "has
BH AND BH is active". For $L_\mathrm{X} >
10^{42}\,\mathrm{erg}\,\mathrm{s}^{-1}$, this has a negligible effect on the
AGN population at large, as was investigated in Viitanen+2026.

%% Cell type:code id:54c2f484-11fb-46d8-a9db-a0cca9dc5664 tags:

``` python
# Select galaxies with BHs
has_bh = catalog_combined["has_bh"]

# Select AGNs
is_agn = catalog_combined["is_agn"]

# Select logLX > 42 objects for the sake of an example
is_loglx_gt_42 = catalog_combined["log_LX_2_10"] > 42

# Combine the selections
has_bh_and_is_agn = has_bh & is_agn
has_bh_and_is_agn = has_bh & is_agn & is_loglx_gt_42


print(f"Total number of AGNs: {is_agn.sum()}")
print(f"Total number of BHs with AGNs: {has_bh_and_is_agn.sum()}")
```

%% Cell type:code id:522423f3-dff7-40da-938d-3b3606160cf7 tags:

``` python
```
Loading