Merge branch 'feature_documentation' of... (adbcd57e) · Commits · Akke Esa Tapio Viitanen / lsst_inaf_agile

docs/notebooks/create_truth_catalog.ipynb

+150 −1

Original line number	Diff line number	Diff line
		%% Cell type:markdown id:70b92b7a-6039-4468-bf49-87dc87c2d965 tags:

		# Create an example AGILE truth catalog

		This notebooks demonstrates the basic usage of the truth catalog library, and generates an extermely small truth catalog for testing purposes.

		In AGILE, each individual class of source (e.g. AGN, galaxy, star) is first contained in its own catalog. The final step in the truth catalog creation is to simply combine these individual source catalogs into one combined catalog that can be used for future image simulations.

		The different classes of objects considered are:

		1. Galaxies
		2. AGNs
		3. Stars
		4. Binary stars


		The following notebook may be accessed [here](https://www.ict.inaf.it/gitlab/akke.viitanen/lsst_inaf_agile/-/blob/feature_documentation/docs/notebooks/create_photometric_catalog.ipynb)

		%% Cell type:markdown id:3cbfa609-8950-474a-97f0-9fcefb71c103 tags:

		## Initialize

		%% Cell type:code id:9759604f-8be1-4a5d-b63a-88da53fab067 tags:

		``` python
		import os

		if not os.path.exists("src/lsst_inaf_agile"):
		os.chdir("../../")
		os.getcwd()
		dirname = "data/tests/test_agile"
		```

		%% Cell type:code id:a01c8b5b-8e8e-409d-94c2-185806a70646 tags:

		``` python
		import logging
		import matplotlib.pyplot as plt
		import numpy as np
		from lsst_inaf_agile.catalog_agn import CatalogAGN
		from lsst_inaf_agile.catalog_combined import CatalogCombined
		from lsst_inaf_agile.catalog_galaxy import CatalogGalaxy
		from lsst_inaf_agile.catalog_star import CatalogStar
		from lsst_inaf_agile.egg import Egg
		from lsst_inaf_agile.image_simulator import ImageSimulator
		from lsst_inaf_agile.merloni2014 import Merloni2014
		from lsst_inaf_agile import util
		```

		%% Cell type:markdown id:897ccf7a-3bdc-4ae6-943d-20a7eb2b6c9e tags:

		## Create the EGG galaxy catalog

		%% Cell type:code id:37224123-c322-4cc8-beab-fe8d56d3455b tags:

		``` python
		# The following list of keyword arguments may be expanded on.
		# Look into 'egg-gencat help'for all the available EGG arguments.
		egg_kwargs = Egg.get_example_egg_kwargs(dirname + "/egg.fits")
		egg_kwargs
		```

		%% Cell type:code id:e004eed5-a555-4307-80c2-ebe8460d4ae0 tags:

		``` python
		egg = Egg(egg_kwargs)
		egg.run()
		```

		%% Cell type:code id:62685aff-dd9b-4054-92b6-5bf4f3e6cb6c tags:

		``` python
		catalog_egg = Egg.read(egg_kwargs["out"])
		```

		%% Cell type:code id:cb439284-3a83-45d6-8727-ee749892eb03 tags:

		``` python
		# Convert to the AGILE galaxy catalog class
		catalog_galaxy = CatalogGalaxy(dirname, catalog_egg)
		```

		%% Cell type:markdown id:f0634979-f64c-40af-ac05-b8ab7095e39d tags:

		## Create the AGN catalog

		%% Cell type:code id:4f39b065-f97b-4aa7-9f59-8ed05794b724 tags:

		``` python
		# Create the AGN catalog
		kwargs_catalog_agn = dict(
		dirname=dirname,
		catalog_galaxy=catalog_galaxy,
		type_plambda="zou+2024",
		save_sed=1,
		seed=20251005,
		merloni2014=Merloni2014(1, 0, 0.05, 0.95),
		)
		catalog_agn = CatalogAGN(**kwargs_catalog_agn)
		```

		%% Cell type:markdown id:1fd2389a-8048-41de-bbc1-b7c8848fb87d tags:

		### Examine an example AGN

		%% Cell type:code id:6a915c09-565d-4cc3-87ce-c7a1bb2ba24e tags:

		``` python
		# Select a type1 AGN
		select = (catalog_agn["is_agn"] == 1) & (catalog_agn["is_optical_type2"] == 0)

		# Select the first AGN by ID
		ids = catalog_agn["ID"][select]
		my_id = ids[0]

		print(f"Found a type1 AGN with ID = {my_id}")
		```

		%% Cell type:markdown id:5ca1fcb0-67b4-4d9a-b72a-75ee712c4b11 tags:

		### Examine an AGN SED

		%% Cell type:code id:b00e821a-689e-47d8-a3af-841fef3b261a tags:

		``` python
		lam, flux = catalog_agn.get_sed(my_id)
		plt.loglog(lam, flux)
		```

		%% Cell type:markdown id:b5c2b317-bae9-4cc7-a221-075c73ae7a77 tags:

		### Estimate a single AGN light curve

		%% Cell type:code id:c345c7d1-4bcb-439a-98ca-803689db5604 tags:

		``` python
		# Select a type1 AGN
		select = (catalog_agn["is_agn"] == 1) & (catalog_agn["is_optical_type2"] == 1)
		select = (catalog_agn["is_agn"] == 1) & (catalog_agn["is_optical_type2"] == 0)

		# Select the first AGN by ID
		my_id = catalog_agn["ID"][select][0]

		# Estimate its lightcurve.
		# The lightcurve is estimated for 10-years with a cadence of one day.
		# The returned flux is in the observer frame.
		lc = catalog_agn.get_lightcurve(my_id, "lsst-r")
		plt.plot(lc)
		plt.xlabel("MJD")
		plt.ylabel(r"flux $r$ [uJy]")
		```

		%% Cell type:markdown id:768a7990-1ab8-439d-8646-b5f885940d2d tags:

		### Estimate Damped Randow Walk parameters $\tau$ and $\mathrm{SF}_\infty$

		The following example shows how to estimate the Damped Random Walk parameters $\tau$ and $\mathrm{SF}_\infty$ for a single AGN using the LSST $r$-band. To estimate the parameters for the whole catalog, one can wrap

		%% Cell type:code id:a2f16bb7-08d4-4882-9a6b-d78a2af3a200 tags:

		``` python
		# Select a type1 AGN
		select = (catalog_agn["is_agn"] == 1) & (catalog_agn["is_optical_type2"] == 0)

		# Select the first AGN by ID
		my_id = catalog_agn["ID"][select][0]

		# LSST bandpass -- one of 'ugrizy'
		b = "r"

		tau, sf_inf = catalog_agn.get_lightcurve(my_id, f"lsst-{b}", return_tau_sf_inf=True)
		print(f"{tau=}, {sf_inf=}")
		```

		%% Cell type:markdown id:448f5d18-f961-4410-81e3-b5bbfe23c15c tags:

		## Create the star and binary star catalogs

		%% Cell type:code id:3b21fd72-661e-4094-b6d5-f09cef1f8403 tags:

		``` python
		# Create the star catalogs
		catalog_star = CatalogStar(dirname, catalog_galaxy, is_binary=False)
		catalog_binary = CatalogStar(dirname, catalog_galaxy, is_binary=True)
		```

		%% Cell type:code id:93b9dbe2-c79e-4c2d-a2f8-74e6195ddeff tags:

		``` python
		# NOTE: dal Tio+ simulated only 10% of the toal number of binary systems. They suggest a value of
		# fbin=0.40 to account for this. To achieve this:
		# 1) input star catalog is downsampled to 1 - fbin = 60% of the original size
		# 2) the binary catalog is repeated 4 times
		# The helper function below does these two steps automatically.
		if "_once" not in globals():
		print(f"Old sizes are {len(catalog_star.stars)=}, {len(catalog_binary.stars)=}")
		catalog_star, catalog_binary = CatalogStar.get_star_binary_fbin(
		catalog_star, catalog_binary, fbin=0.40, nrepeat=4
		)
		_once = True
		print(f"New sizes are {len(catalog_star.stars)=}, {len(catalog_binary.stars)=}")
		```

		%% Cell type:markdown id:5c6ad011-601a-433e-86e5-f67ffb50bc12 tags:

		## Create the combined catalog of AGNs, galaxies, and stars

		%% Cell type:code id:8901a2f1-86a1-4d43-a32e-6158f8f017a8 tags:

		``` python
		# NOTE: we set cache=False to prevent an existing catalog file to be loaded from the disk.
		catalog_combined = CatalogCombined(
		dirname, catalog_galaxy, catalog_agn, catalog_star, catalog_binary, cache=False
		)
		```

		%% Cell type:markdown id:d0d0a3cc-dbca-4675-b79b-665308717ba5 tags:

		### Examine the truth catalog columns

		Here we list all the available columns in the truth catalog. The meaning of these columns is described in the appendix of Viitanen+2026.

		It is important to note that the truth catalog contains ONE row per object. That is, a single row corresponds to either an AGN, galaxy, or a star. Consequently, not all the different classes of objects have all the different columns available. For example, (host) galaxy stellar mass ('M') is only available for galaxies (and AGNs).

		%% Cell type:code id:cd17c16a-2e08-48c8-b45a-46542673d5b2 tags:

		``` python
		catalog_combined.get_dtype()
		```

		%% Cell type:markdown id:64b46ad7-07a2-470a-b035-2a690d5dd0ee tags:

		## Examine the output files

		At this stage, the static truth catalog is completed. To find out which files have been written to disk, we check the output directory.

		%% Cell type:code id:61db188b-6ca2-45f9-96e8-971daf27eb6d tags:

		``` python
		os.listdir(dirname)
		```

		%% Cell type:markdown id:a0074fd5-14b0-4a53-9c02-f9baa36cfb6f tags:

		The explanation of the catalog files is as follows:

		+ egg.fits -- the EGG galaxy catalog
		+ agn.fits -- the AGN catalog
		+ stars.fits -- the full star catalog
		+ binaries.fits -- the full binary star catalog
		+ catalog.fits -- the combined truth catalog

		In addition to:

		+ egg-seds* -- databse files needed to generate EGG SEDs (refer to EGG documentation)
		+ seds/ -- AGN SEDs in EGG format (refer to EGG documentation)
		+ lightcurves/ -- stored lightcurves (see the AGN catalog section above)

		Note that by default, all galaxy SEDs only reside in the EGG database. These can be stored on disk on demand, but essentially doubles the disk space used which can be problematic on larger catalogs.

		Also, light curves are only generated by request. In case images are simulated, then a subset of light curves will be estimated automatically for the sources within the region of interest.

		%% Cell type:markdown id:01298432-fed8-4570-bf87-fbb507302845 tags:

		## Example plot: (host) galaxy stellar mass versus redshift

		%% Cell type:code id:73045582-05f0-4fb3-8202-e77862cfed06 tags:

		``` python
		plt.figure(dpi=200)
		select = catalog_combined["Z"] > 0
		plt.plot(catalog_combined["Z"][select], catalog_combined["M"][select], ".")
		plt.xlabel(r"$z$")
		plt.ylabel(r"$\log (M\,/\,M_\mathrm{star})$");
		```

		%% Cell type:markdown id:22e8a779-187b-4482-a750-2d72bfbd41eb tags:

		## Example plot: g-band flux distribution

		%% Cell type:code id:eb4c0774-7cc7-45e2-b84d-8a21826add84 tags:

		``` python
		# Example plot: distribution of the observed g-band flux.
		plt.figure(dpi=200)
		plt.hist(catalog_combined["lsst-g_total"], bins=np.logspace(-3, 3, 61))
		plt.loglog()
		plt.xlabel(r"lsst-g_total [uJy]")
		plt.ylabel("frequency");
		```

		%% Cell type:markdown id:0bb771ab-a997-4f51-87aa-3fc3826c08db tags:

		## A note on the occupation fraction

		%% Cell type:markdown id:73415ddf-6699-4d14-8edd-f3f18d7dca7b tags:

		In the simulation, it is preliminarily assumed that each galaxy has a SMBH, and
		a corresponding value $M_\mathrm{BH}$. Local observations suggest that this
		might not be the case, and especially at low values of Mstar, a BH might be
		missing altogether. To facilitate for this, in the truth catalog a flag
		``has_bh'' is provided. The formal definition of this flag is:

		\begin{equation}
		\mathrm{has\_bh} = U < f_\mathrm{occ}(M_\mathrm{star}),
		\end{equation}

		where $U$ is a uniform random variable $U \sim \mathrm{Unif}(0, 1)$, and
		$f_\mathrm{occ}$ follows the observational results of Zou+2025. This column may
		be used as a weight for all calculations of statistical distributions. For
		example, the galaxy BH mass function can be calculated with and without the
		occupation fraction by simply weighting the mass function by unity (every
		galaxy is expected to host a BH), or ``has_bh'' (galaxies are expected to host
		BHs in accordance with the occupation fraction). An example is provided below.

		%% Cell type:code id:385e6d41-6bca-4fea-975e-b8979609a91a tags:

		``` python
		# Select all galaxies
		is_galaxy = catalog_combined["Z"] > 0.0

		# Select central BHs
		has_bh = catalog_combined["has_bh"]

		# Combine the two selections
		is_galaxy_and_has_bh = is_galaxy & has_bh

		# Do some statistical tests
		print(f"Total number of galaxies: {is_galaxy.sum()}")
		print(f"Total number of galaxies with BHs: {is_galaxy_and_has_bh.sum()}")

		# Plot Mstar vs. occupation fraction
		plt.title("BH occupation fraction vs. stellar mass")
		plt.plot(catalog_combined["M"][is_galaxy], catalog_combined["occupation_fraction"][is_galaxy], ".")
		plt.xlabel(r"$\log \left( M_\mathrm{star} / M_\odot \right)$")
		plt.ylabel(r"$f_\mathrm{occ}$")
		```

		%% Cell type:markdown id:99cf5e38-5d31-4aab-9112-b6ba6b76a0a3 tags:

		For AGN, the logic remains the same. To weigh AGN by the occupation fraction,
		one would use the product $\mathrm{is\_agn} \times \mathrm{has\_bh}$ instead of
		simply $\mathrm{is\_agn}$. Logically, this is equivalent to the condition "has
		BH AND BH is active". For $L_\mathrm{X} >
		10^{42}\,\mathrm{erg}\,\mathrm{s}^{-1}$, this has a negligible effect on the
		AGN population at large, as was investigated in Viitanen+2026.

		%% Cell type:code id:54c2f484-11fb-46d8-a9db-a0cca9dc5664 tags:

		``` python
		# Select galaxies with BHs
		has_bh = catalog_combined["has_bh"]

		# Select AGNs
		is_agn = catalog_combined["is_agn"]

		# Select logLX > 42 objects for the sake of an example
		is_loglx_gt_42 = catalog_combined["log_LX_2_10"] > 42

		# Combine the selections
		has_bh_and_is_agn = has_bh & is_agn
		has_bh_and_is_agn = has_bh & is_agn & is_loglx_gt_42


		print(f"Total number of AGNs: {is_agn.sum()}")
		print(f"Total number of BHs with AGNs: {has_bh_and_is_agn.sum()}")
		```

		%% Cell type:markdown id:444fe8bd-87cd-4f11-a8e4-26a03ca16b49 tags:

		# Simulate example images

		This notebooks demonstrates how to use the AGILE truth catalog in order to simulate synthetic LSST images.

		We use imSim (https://lsstdesc.org/imSim/index.html) to simulate the images. The purpose of this notebook is not to doucment the feature of imSim, but only to demonstrate how to pass our truth catalog as input to imSim, and how to run software using the tools available in AGILE. Note that imSim only simulates raw LSSTCam data (incl. e.g. sky and instrumental noise), which are not science ready products. The raw images are processed in subsequent steps by using the LSST Science pipelines.

		Below, we provide helpful links and resources to further understand the LSST survey, LSSTCam, and the survey strategy.

		+ VRO survey strategy: https://survey-strategy.lsst.io/
		+ opSim: https://usdf-maf.slac.stanford.edu/
		+ imSim: https://lsstdesc.org/imSim/index.html
		+ LSSTCam: https://ctn-001.lsst.io/
		+ LSSTCam detector layout: https://lsstdesc.org/imSim/lsst-camera.html

		The following notebook may be accessed [here](https://www.ict.inaf.it/gitlab/akke.viitanen/lsst_inaf_agile/-/blob/feature_documentation/docs/notebooks/create_photometric_catalog.ipynb)

		%% Cell type:code id:9e8e138e-b30d-4fbd-a3b0-e3092e7d7a56 tags:

		``` python
		from lsst_inaf_agile.image_simulator import ImageSimulator
		```

		%% Cell type:markdown id:50e7a40f-5d71-441d-9444-7db8a26c4df2 tags:

		## Initialize the ImageSimulator

		%% Cell type:code id:9e0b483b-9173-47e1-bea9-0213ff86f055 tags:

		``` python
		# Select a baseline
		# NOTE: other baselines are available here: https://usdf-maf.slac.stanford.edu/
		filename_baseline = "data/baseline/baseline_v4.0_10yrs.db"
		image_simulator = ImageSimulator(f"{dirname}/imsim", catalog_combined, filename_baseline)
		```

		%% Cell type:markdown id:5d089fac-8721-44a7-ad79-b5dbbe87a204 tags:

		## Explore the visits database

		%% Cell type:code id:e7345293-5ef9-4399-9ace-4cf129164083 tags:

		``` python
		visits = image_simulator.get_visit(limit=10)
		visits.info()
		```

		%% Cell type:code id:1b7d38a1-bacb-42fc-9809-a47f07ce5982 tags:

		``` python
		visits[:5]
		```

		%% Cell type:code id:06511ffa-a07c-4d61-9095-16926c184ff0 tags:

		``` python
		my_observation_id = 779
		my_visit = image_simulator.get_visit(observation_id=my_observation_id)
		my_visit
		```

		%% Cell type:markdown id:f29ac803-97bf-4cf4-a085-60c465aa092b tags:

		## Write an instance catalog

		Here, we write a single instance catalog. The instance catalog represents a snapshot of the truth catalog, where the magnitudes of the different objects (e.g. AGNs or binary stars) are modified according to the light curve.

		%% Cell type:code id:5ac5999c-64ea-41a1-95ce-209398fa04b7 tags:

		``` python
		# The following command writes a single instance catalog for the observation_id
		image_simulator.write_instance_catalog(my_observation_id)
		```

		%% Cell type:code id:15383386-2a9f-4005-a123-9b8e65c8621f tags:

		``` python
		# The following command writes a single instance catalog for the observation_id
		image_simulator.write_instance_catalog(my_observation_id)
		```

		%% Cell type:code id:c0184a48-a5b3-4e82-a984-4a50190ca1f7 tags:

		``` python
		# The following command writes a single instance catalog OVERRIDING
		# the default ra, dec of the pointing -- which is useful for debugging small
		# (<10deg2) fields
		ra_dec = +150.11916667, +2.20583333
		image_simulator.write_instance_catalog(my_observation_id, ra_dec=ra_dec)
		```

		%% Cell type:markdown id:1730d588-058e-495e-897c-47f4726b2a44 tags:

		## Simulate a single visit and single detector

		%% Cell type:code id:7dc5f391-5eeb-4179-a329-66f7b93cd547 tags:

		``` python
		detector = 94
		image_simulator.simulate_image(observation_id=my_observation_id, detector=detector)
		```

		%% Cell type:markdown id:d8771f24-8612-4aac-8fd2-2b553f0bf1e3 tags:

		## Explore the output products

		%% Cell type:code id:4e7f5b47-6f60-4e9b-ab4c-b451c9946aed tags:

		``` python
		dirname_output = os.path.join(dirname, "imsim", str(observation_id), "output")
		print("Dirname_output is {dirname_output}")
		```

		%% Cell type:code id:5d9f6a62-00ce-4a44-86cd-ef94d539705a tags:

		``` python
		os.listdir(dirname_output)
		```

		%% Cell type:markdown id:13bb4972-2286-4cc9-9e48-848021542a40 tags:

		## On simulating multiple images and/or detectors

		Simulating any large dataset is a computationally expensive task. To simulate any significantly large dataset (such as the AGILE DR1), one is adviced to wrap the `image_simulator.simulate_image` into a callable script which could be executed with multiple cores and/or threads. In AGILE DR1, this orchestration was done with slurm (https://slurm.schedmd.com/documentation.html).