dl2_production.rst

.. _dl2:

DL2 files generation
====================

The script `DL1_to_DL2.py` allows to analyze DL1 data and produce DL2 files.
Usage:

.. code-block::

    usage: DL1_to_DL2.py [-h] [--prod PROD] [--outdir OUTDIR] [--config CONFIG] [--config-analysis CONFIG_ANALYSIS]
                     [--verbose] [--source_name SOURCE_NAME] [--tcuname TCUNAME] [--runlist RUNLIST]
                     [--distance DISTANCE] [--ra RA] [--dec DEC] [--submit] [--dry] [--globber]

    DL1 to DL2 converter

    optional arguments:
      --config CONFIG, -c CONFIG
                            Specify a personal config file for the analysis
      --config-analysis CONFIG_ANALYSIS
                            Specify a config file which described analysis profile to use
      --dec DEC             Dec coordinate of the target. To add if you want to use custom position
      --distance DISTANCE, -dis DISTANCE
                            Max distance in degrees between the target position and the run pointing position for the
                            run selection, negative value means no selection using this parameter (default: -1).
      --dry                 Make a dry run, no true submission
      --globber, -g         If True, overwrites existing output file without asking
      --outdir OUTDIR, -o OUTDIR
                            Directory to store the output
      --prod PROD, -p PROD  Prod to use (default: v0.8.4)
      --ra RA               RA coordinate of the target. To add if you want to use custom position
      --runlist RUNLIST, -rl RUNLIST
                        File with a list of run and the associated night to be analysed
      --source_name SOURCE_NAME, -n SOURCE_NAME
                            Name of the source
      --submit              Submit the cmd to slurm on site
      --tcuname TCUNAME     Apply run selection based on TCU source name
      --verbose, -v         Increase output verbosity
      -h, --help            show this help message and exit

It makes use of a configuration file `config_dl1_to_dl2.yaml` (option ``--config``):

.. code-block:: yaml

    # Directory where job file are written
    jobmanager: ../jobmanager

    # Database file name
    db: database.csv

    # LST real data path (don't modify it)
    data_folder: /fefs/aswg/data/real

    # path to main data folder of the user
    # change it accordingly to your working env
    base_dir: /fefs/aswg/alice.donini/Analysis/data

    # Path to personal directory where output data will be saved.
    # Uncomment and modify in case you want to use a non standard path
    #output_folder: ../DL2/Crab

    # Directory where config files are stored
    #config_folder: ./

    # Path to trained RF files
    path_models: ../models

    # Values for automatic selection of DL1 data
    dl1_data:
      DL1_dir: /fefs/aswg/data/real/DL1     # path to DL1 directory
      night: [20210911, 20210912]           # day(s) of observation (more than one is possible)
      version: v0.9.1                       # v0.7.3, v0.8.4, v0.9, v0.9.1
      cleaning: tailcut84

Edit the configuration file: change the paths based on your working directory and modify the DL1 data information used to search for the files at the IT.

The script uses the created database (:ref:`db_generation`) to find the runs to analyze, based on the nights specified in the configuration file `config_dl1_to_dl2.yaml`.
An extra selection can be done in coordinates (``--distance``, ``--ra`` and ``--dec`` are mandatory arguments) or by the name of the source as saved in TCU (argument ``--tcuname``).

If none of these selection methods is given, then all the runs available in the dates specified in the configuration file are considered for the analysis.

The search for DL1 files can be also done by giving a file list of runs and nights to be analyzed (option ``--runlist``).
No database file is needed in this case.

.. warning::

  The search of runs through the database has an issue on the dates at the moment. The database is generated from the drive log, so all the runs taken after the midnight are saved under the following day. This doesn't happen at the IT, where the runs are stored under the day of the starting night. So for some runs the search could fail, even if they are there.
  
  Thus if you use the database search always add in the config file also the date of the night after, so that you are sure all the runs take after the midnight are considered too.

Example of a file run list:

.. code-block::

    2909 20201117
    2911 20201117
    3089 20201206

The argument ``--dry`` allows to perform a dry run.
No jobs are submitted and only the verbose output is printed.
Usefull to have a check of which runs are selected and the goodness of the sent command.

Some examples of how to run the script:

.. code-block::

    python DL1_to_DL2.py -c config_dl1_to_dl2.yaml -n Crab --tcuname Crab -v --submit

    python DL1_to_DL2.py -c config_dl1_to_dl2.yaml -n Crab --distance 2 --ra 83.633080 --dec 22.014500 -v --submit

    python DL1_to_DL2.py -c config_dl1_to_dl2.yaml -n Crab --runlist $CONFIG_FOLDER/Crab_Nov2020.txt -v --submit