Skip to content

VAST Pipeline Exploration

Interacting with results from the VAST Pipeline

This notebook gives an example of how to use vast-tools in a notebook environment to interact with output from the VAST Pipeline (https://github.com/askap-vast/vast-pipeline).

Documentation for the pipeline is currently being written but there is a short Wiki page on the methods used here: https://github.com/askap-vast/vast-project/wiki/VAST-Pipeline-Methods-Description. If anything is unclear with the results don't hesistate in contacting the Pipeline development team.

vaex

For large pipeline runs vast-tools makes use of vaex to load the measurements and measurement_pairs dataframes in an out-of-core context. vaex behaves differently to pandas but offers many advantages in handling large dataframes. See the documentation here: https://vaex.readthedocs.io/en/latest/index.html. If you use the built-in methods of the pipeline class then vaex will be taken care of behind the scenes such that users shouldn't have to worry about it. However be mindful when performing manual analysis using vaex dataframes. Warnings will appear when vaex has been used. Also note that vaex is a relatively young project so you may run into a bug or two when using it yourself.

Imports

Below are the imports required for this example. The main import required from vast-tools is the Pipeline class. Astropy objects are also imported as well as bokeh as some plots use the bokeh library. Note that we initialise the bokeh function output_notebook() such that bokeh outputs are shown in the notebook environment. matplotlib.pyplot is also imported in case we want to manage any plots produced.

from vasttools.pipeline import Pipeline
from bokeh.io import output_notebook
from bokeh.plotting import show
from astropy.coordinates import Angle, SkyCoord
from astropy import units as u
import matplotlib.pyplot as plt

%matplotlib inline
output_notebook()
Loading BokehJS ...

Loading a Pipeline run

The first step is to initialise the Pipeline. If the environment variable PIPELINE_WORKING_DIR is defined then you do not need to provide the pipeline run directory. Otherwise, use the project_dir argument in the Pipeline initialisation to set the directory. Ask your admin if you are not sure where the pipeline directory is located.

pipe = Pipeline(project_dir='/data/vast-pipeline/vast-pipeline/pipeline-runs')

If you are unsure of the run name you want to load, you can use the list_piperuns() method to list all available runs.

pipe.list_piperuns()
['10s_FRB190110_FullField_DeepSubtraction',
 '10s_FRB190110_FullField_DeepSubtraction.bak',
 'BANE_no_forced_fits',
 'GalCtr_p2',
 'VAST_2118-06A',
 'combined',
 'dwf_O12',
 'dwf_sep2021_final',
 'full_survey_tiles',
 'full_survey_tiles_test',
 'galctr_p1_test',
 'gw190814_10_epochs',
 'mc_p1_test',
 'racs-low-tiles-bane',
 'racs_comparison',
 'racs_comparison_no_force',
 'smc-low',
 'test_ui_1',
 'tiles-low-corrected',
 'tiles-mid-corrected',
 'tiles_corrected',
 'two_epoch_test',
 'ym_Test']

For this example we will load the VAST_2118-06A run. To do this we call the load_run() method and save the run to my_run:

my_run = pipe.load_run('VAST_2118-06A')

It's also possible to load multiple runs as one object using:

pipe.load_runs(['VAST_2118-06A', 'GalCtr_p2'])

If you have loaded two separately you can also merge one with the other by doing:

my_run = my_run.combine_with_run(my_other_run)

For the purposes of this example we will continue to use the single VAST_2118-06A run.

The pipeline object has loaded all the outputs and information of the pipeline run. You have access to the following dataframes:

  1. my_run.images contains the information on the images used in the pipeline run.
my_run.images.head()
band_id skyreg_id measurements_path polarisation name path noise_path background_path datetime jd ... beam_bmin beam_bpa rms_median rms_min rms_max centre_ra centre_dec xtr_radius frequency bandwidth
id
2091 1 1043 /data/vast-pipeline/vast-pipeline/pipeline-run... I RACS_2118-06A.EPOCH00.I.fits /data/vast-survey/pilot/EPOCH00/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH00/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH00/COMBINED/STOKE... 2019-04-28 23:39:28.153000+00:00 2.458602e+06 ... 0.003091 -57.846547 0.253992 0.171100 1.584303 319.653271 -6.298819 6.740103 887.0 0.0
2092 1 1044 /data/vast-pipeline/vast-pipeline/pipeline-run... I VAST_2118-06A.EPOCH01.I.fits /data/vast-survey/pilot/EPOCH01/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH01/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH01/COMBINED/STOKE... 2019-08-27 18:52:00.556000+00:00 2.458723e+06 ... 0.003261 -70.402943 0.267957 0.179847 1.726998 319.652258 -6.298900 6.740103 887.0 0.0
2093 1 1044 /data/vast-pipeline/vast-pipeline/pipeline-run... I VAST_2118-06A.EPOCH02.I.fits /data/vast-survey/pilot/EPOCH02/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH02/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH02/COMBINED/STOKE... 2019-10-30 10:11:56.913000+00:00 2.458787e+06 ... 0.003160 75.375255 0.243300 0.163784 1.672402 319.652258 -6.298900 6.740103 887.0 0.0
2094 1 1044 /data/vast-pipeline/vast-pipeline/pipeline-run... I VAST_2118-06A.EPOCH03x.I.fits /data/vast-survey/pilot/EPOCH03x/COMBINED/STOK... /data/vast-survey/pilot/EPOCH03x/COMBINED/STOK... /data/vast-survey/pilot/EPOCH03x/COMBINED/STOK... 2019-10-29 13:39:33.996000+00:00 2.458786e+06 ... 0.003052 -82.036874 0.248223 0.162059 2.286700 319.652258 -6.298900 6.740103 887.0 0.0
2095 1 1044 /data/vast-pipeline/vast-pipeline/pipeline-run... I VAST_2118-06A.EPOCH05x.I.fits /data/vast-survey/pilot/EPOCH05x/COMBINED/STOK... /data/vast-survey/pilot/EPOCH05x/COMBINED/STOK... /data/vast-survey/pilot/EPOCH05x/COMBINED/STOK... 2020-01-11 05:40:11.007000+00:00 2.458860e+06 ... 0.003008 70.737592 0.252249 0.170839 1.824399 319.652258 -6.298900 6.740103 887.0 0.0

5 rows × 29 columns

  1. my_run.sources contains the information on the unique sources found in the pipeline run.
my_run.sources.head()
wavg_ra wavg_dec avg_compactness min_snr max_snr wavg_uncertainty_ew wavg_uncertainty_ns avg_flux_int avg_flux_peak max_flux_peak ... n_neighbour_dist vs_abs_significant_max_peak m_abs_significant_max_peak vs_abs_significant_max_int m_abs_significant_max_int n_measurements n_selavy n_forced n_siblings n_relations
id
4187950 317.819257 -7.199839 1.070134 41.472119 64.067511 0.000094 0.000094 14.340365 13.399933 17.157400 ... 0.040130 16.401732 0.429676 10.949743 0.484327 9 9 0 0 0
4187951 320.123111 -7.622772 1.639199 48.247148 75.265403 0.000094 0.000094 23.811492 14.640493 16.379000 ... 0.000010 10.720544 0.273406 10.532355 0.411996 9 9 0 7 1
4187952 318.752399 -6.777790 1.142731 54.334532 71.412037 0.000094 0.000094 17.021120 14.906352 15.499172 ... 0.005963 0.000000 0.000000 0.000000 0.000000 9 9 0 1 0
4187953 320.238178 -8.554920 1.211277 47.676101 71.986364 0.000094 0.000094 17.834988 14.747300 15.837000 ... 0.010273 5.713529 0.131924 0.000000 0.000000 9 9 0 1 0
4187954 320.740207 -10.561575 1.027612 33.177156 33.177156 0.000287 0.000287 14.626000 14.233000 14.233000 ... 0.033644 0.000000 0.000000 0.000000 0.000000 1 1 0 0 0

5 rows × 31 columns

  1. my_run.measurements contains the information on all of the measurements found in the pipeline run, both from selavy and forced (if used in the pipeline run). Measurements are the datapoints in time of a source, hence the source column tells you which source the measurement belongs to.
my_run.measurements.head()
source island_id component_id local_rms ra ra_err dec dec_err flux_peak flux_peak_err ... ns_sys_err error_radius uncertainty_ew uncertainty_ns weight_ew weight_ns forced flux_int_isl_ratio flux_peak_isl_ratio id
0 4192967 RACS_1.05_2118-06A_island_1000 RACS_1.05_2118-06A_component_1000a 0.282 318.759576 0.000061 -7.777971 0.000036 14.270 0.304283 ... 0.000278 0.000071 0.000287 0.000287 1.217339e+07 1.217339e+07 False 1.000000 1.000000 25422028
1 4192967 SB9668_island_827 SB9668_component_827a 0.336 318.759865 0.000070 -7.778213 0.000035 16.727 0.358023 ... 0.000278 0.000078 0.000288 0.000288 1.201865e+07 1.201865e+07 False 1.000000 1.000000 25438507
2 4192967 SB10335_island_890 SB10335_component_890a 0.314 318.759986 0.000060 -7.778322 0.000036 15.369 0.339949 ... 0.000278 0.000069 0.000286 0.000286 1.219699e+07 1.219699e+07 False 1.000000 1.000000 25455320
3 4192967 SB10342_island_962 SB10342_component_962a 0.259 318.760368 0.000034 -7.777980 0.000032 13.611 0.273961 ... 0.000278 0.000047 0.000282 0.000282 1.260259e+07 1.260259e+07 False 0.709664 0.774805 25447101
4 4192967 SB11169_island_960 SB11169_component_960a 0.259 318.760094 0.000037 -7.777721 0.000032 12.352 0.269743 ... 0.000278 0.000048 0.000282 0.000282 1.257671e+07 1.257671e+07 False 0.551184 0.758536 25463894

5 rows × 42 columns

There is also easy access to a SkyCoord object of the sources:

my_run.sources_skycoord
<SkyCoord (ICRS): (ra, dec) in deg
    [(317.81925668, -7.19983898), (320.12311114, -7.62277166),
     (318.7523995 , -6.77779   ), ..., (321.459605  , -4.071522  ),
     (322.06972   , -3.808927  ), (322.294627  , -6.708297  )]>

You also have access to: * my_run.relations which contains the relation information of the pipeline run, i.e. what source is related to another source. * my_run.skyregions which contains the sky region information of the pipeline run.

You should rarely need to access these, but are there in case they are required.

Looking at a specific source

We can load any source into a vast-tools source instance (like those used in other notebook examples) by using the my_run.get_source(id) function. The id is the same id as listed in the sources table. Below I load the source with the id = 4187950.

my_source = my_run.get_source(4187950)

All the normal functions are available:

lc = my_source.plot_lightcurve()
epoch1 = my_source.show_png_cutout(1)
all_cutouts = my_source.show_all_png_cutouts(columns=5, figsize=(20,10))
ned_results = my_source.ned_search()
ned_results
Table length=6
No.Object NameRADECTypeVelocityRedshiftRedshift FlagMagnitude and FilterSeparationReferencesNotesPhotometry PointsPositionsRedshift PointsDiameter PointsAssociations
degreesdegreeskm / sarcmin
int32str30float64float64objectfloat64float64objectobjectfloat64int32int32int32int32int32int32int32
1SDSS J211115.40-071152.9317.81419-7.19805*----22.5g0.3210051030
2SDSS J211116.27-071152.3317.81779-7.19787G----22.6g0.14700151040
3WISEA J211116.47-071142.1317.81864-7.19501*----16.5g0.29200233040
4WISEA J211116.47-071205.6317.81869-7.20157G----19.4g0.1100374040
5NVSS J211116-071159317.81958-7.19989RadioS----1.070.0190011000
6WISEA J211117.59-071201.4317.82338-7.20043G----18.5g0.24700272030

Filtering the Measurements and Re-Generating the Sources

WARNING! This method can require significant memory and time to run when used on large runs. On shared instances, such as Jupyter Hub, it might require speaking to the administrator to raise the memory limit.

There may be situations where you find you wish to remove a bulk of measurements from the analysis. For example, maybe you have identified a bad image, or you want to remove all measurements that have a sibling. The pipeline has functions to help with this.

For this example lets pretend that the EPOCH02 image (id number 2093) is bad and we want to remove all the measurements from this image:

my_run.images.head(3)
band_id skyreg_id measurements_path polarisation name path noise_path background_path datetime jd ... beam_bmin beam_bpa rms_median rms_min rms_max centre_ra centre_dec xtr_radius frequency bandwidth
id
2091 1 1043 /data/vast-pipeline/vast-pipeline/pipeline-run... I RACS_2118-06A.EPOCH00.I.fits /data/vast-survey/pilot/EPOCH00/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH00/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH00/COMBINED/STOKE... 2019-04-28 23:39:28.153000+00:00 2.458602e+06 ... 0.003091 -57.846547 0.253992 0.171100 1.584303 319.653271 -6.298819 6.740103 887.0 0.0
2092 1 1044 /data/vast-pipeline/vast-pipeline/pipeline-run... I VAST_2118-06A.EPOCH01.I.fits /data/vast-survey/pilot/EPOCH01/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH01/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH01/COMBINED/STOKE... 2019-08-27 18:52:00.556000+00:00 2.458723e+06 ... 0.003261 -70.402943 0.267957 0.179847 1.726998 319.652258 -6.298900 6.740103 887.0 0.0
2093 1 1044 /data/vast-pipeline/vast-pipeline/pipeline-run... I VAST_2118-06A.EPOCH02.I.fits /data/vast-survey/pilot/EPOCH02/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH02/COMBINED/STOKE... /data/vast-survey/pilot/EPOCH02/COMBINED/STOKE... 2019-10-30 10:11:56.913000+00:00 2.458787e+06 ... 0.003160 75.375255 0.243300 0.163784 1.672402 319.652258 -6.298900 6.740103 887.0 0.0

3 rows × 29 columns

The first step is to create a filtered measurements dataframe to feed into the function.

The method is slightly different on whether the measurements dataframe is loaded using pandas or vaex (there will be a warning if vaex is being used when you opened the job).

If pandas we can remove all the measurements associated with that image by doing:

my_run_filtered_measurements = my_run.measurements.loc[my_run.measurements.image_id != 2093].copy()

If vaex:

# my_run_filtered_measurements = my_run.measurements[my_run.measurements.image_id != 3]
# my_run_filtered_measurements.extract()

We can then use this filtered measurements dataframe to regenerate a sources dataframe with all the metrics updated. In summary it will:

  • Recalculate all aggregate metrics on the source.
  • Drop all sources that no longer have a selavy measurement.

However two metrics are not re-calculated:

  • new & new_source_high_sigma

these will remain unchanged as the process to recalcuate these two is too intensive to do here.

To get the new sources we use the function recalc_sources_df as so:

my_run_filtered_sources = my_run.recalc_sources_df(my_run_filtered_measurements)
my_run_filtered_sources.head()
# note the columns will be in a different order
avg_compactness min_snr max_snr avg_flux_int avg_flux_peak max_flux_peak max_flux_int min_flux_peak min_flux_int min_flux_peak_isl_ratio ... eta_int eta_peak new new_high_sigma vs_significant_max_int vs_significant_max_peak m_abs_significant_max_int m_abs_significant_max_peak n_relations n_neighbour_dist
source
4187950 1.073478 41.472119 64.067511 14.686661 13.688800 17.157400 18.964287 11.156000 12.330000 1.000000 ... 15.437954 44.779367 False 0.0 9.002095 15.089183 0.423993 0.423926 0 0.040124
4187951 1.678868 48.247148 75.265403 24.270804 14.566305 16.379000 27.775000 12.439438 18.286429 0.752677 ... 31.390139 30.017829 False 0.0 10.532355 10.720544 0.411996 0.273406 1 0.000012
4187952 1.148762 54.334532 71.412037 17.143260 14.937271 15.499172 17.797000 14.047000 16.479000 1.000000 ... 0.777715 4.284874 False 0.0 0.000000 0.000000 0.000000 0.000000 0 0.037722
4187953 1.208877 47.676101 71.986364 17.789361 14.741837 15.837000 18.504888 13.877000 16.805000 0.918356 ... 1.282513 6.162614 False 0.0 1.329542 5.713529 0.057134 0.131924 0 0.010270
4187954 1.027612 33.177156 33.177156 14.626000 14.233000 14.233000 14.626000 14.233000 14.626000 1.000000 ... 0.000000 0.000000 False 0.0 0.000000 0.000000 0.000000 0.000000 0 0.033644

5 rows × 31 columns

The new my_run_filtered_sources along with the my_run_filtered_measurements can then be passed into the get_source and variability functions below to use this different sources_df rather than the default pipeline. For example if we again fetch source number 4187950:

my_source_filtered = my_run.get_source(4187950, user_sources=my_run_filtered_sources, user_measurements=my_run_filtered_measurements)
lc = my_source_filtered.plot_lightcurve()

Note that the datapoint from image number 2093 has now been removed.

Changing the Measurement Flux Values

WARNING! This method can require significant memory and time to run when used on large runs. On shared instances, such as Jupyter Hub, it might require speaking to the administrator to raise the memory limit.

There may be situations where the flux values of the measurements should be changed, for example, to account for flux corrections. In this case the measurement pairs dataframe, containing the two epoch metrics, should also be recalculated and supplied to recalc_sources_df such that the new significant vs and m columns can be calculated. If not supplied to recalc_sources_df it is assumed that the fluxes are unchanged.

See the Two-epoch analysis section below for more details on the two-epoch metrics.

The measurement pairs dataframe can be calculated using the recalc_measurement_pairs_df method, using the same new measurements dataframe as used earlier.

# scale the flux values of the my_run_filtered_measurements by 10x
my_run_filtered_measurements['flux_int'] = my_run_filtered_measurements['flux_int'] * 10.
my_run_filtered_measurements['flux_peak'] = my_run_filtered_measurements['flux_peak'] * 10.

# Recalculate the measurement pairs dataframe
my_run_new_meas_pairs_df = my_run.recalc_measurement_pairs_df(my_run_filtered_measurements)
my_run_new_meas_pairs_df.head()
meas_id_a meas_id_b flux_int_a flux_int_err_a flux_peak_a flux_peak_err_a image_name_a flux_int_b flux_int_err_b flux_peak_b flux_peak_err_b image_name_b source_id id pair_epoch_key vs_peak vs_int m_peak m_int
0 25422034 25464033 159.22 0.428554 151.84 0.239571 RACS_2118-06A.EPOCH00.I.fits 136.74 0.456748 128.36 0.252638 VAST_2118-06A.EPOCH06x.I.fits 4187950 114880415 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO... 67.438926 35.892189 0.167595 0.151912
1 25422034 25471792 159.22 0.428554 151.84 0.239571 RACS_2118-06A.EPOCH00.I.fits 141.94 0.389347 140.01 0.222766 VAST_2118-06A.EPOCH08.I.fits 4187950 114880416 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO... 36.162161 29.844164 0.081069 0.114756
2 25422034 25431070 159.22 0.428554 151.84 0.239571 RACS_2118-06A.EPOCH00.I.fits 134.61 0.464165 129.61 0.260662 VAST_2118-06A.EPOCH01.I.fits 4187950 114880417 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO... 62.790779 38.955322 0.157968 0.167512
3 25422034 25447451 159.22 0.428554 151.84 0.239571 RACS_2118-06A.EPOCH00.I.fits 123.30 0.509975 111.56 0.275243 VAST_2118-06A.EPOCH03x.I.fits 4187950 114880418 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO... 110.385897 53.923152 0.305847 0.254283
4 25422034 25480751 159.22 0.428554 151.84 0.239571 RACS_2118-06A.EPOCH00.I.fits 144.10 0.417157 126.35 0.220451 VAST_2118-06A.EPOCH09.I.fits 4187950 114880420 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO... 78.294450 25.281636 0.183256 0.099697

This new measurement pairs dataframe can then be supplied to for use when recalculating the sources dataframe.

my_run_filtered_sources = my_run.recalc_sources_df(
    my_run_filtered_measurements, measurement_pairs_df=my_run_new_meas_pairs_df)
my_run_filtered_sources.head()
avg_compactness min_snr max_snr avg_flux_int avg_flux_peak max_flux_peak max_flux_int min_flux_peak min_flux_int min_flux_peak_isl_ratio ... eta_int eta_peak new new_high_sigma vs_significant_max_int vs_significant_max_peak m_abs_significant_max_int m_abs_significant_max_peak n_relations n_neighbour_dist
source
4187950 1.073478 41.472119 64.067511 146.866609 136.888000 171.57400 189.64287 111.56000 123.30000 1.000000 ... 1543.795405 4477.936720 False 0.0 90.020955 150.891829 0.423993 0.423926 0 0.040124
4187951 1.678868 48.247148 75.265403 242.708036 145.663048 163.79000 277.75000 124.39438 182.86429 0.752677 ... 3139.013907 3001.782925 False 0.0 105.323554 107.205442 0.411996 0.273406 1 0.000012
4187952 1.148762 54.334532 71.412037 171.432601 149.372715 154.99172 177.97000 140.47000 164.79000 1.000000 ... 77.771549 428.487362 False 0.0 19.446298 42.135532 0.076905 0.098298 0 0.037722
4187953 1.208877 47.676101 71.986364 177.893610 147.418370 158.37000 185.04888 138.77000 168.05000 0.918356 ... 128.251327 616.261359 False 0.0 23.799392 57.135285 0.096284 0.131924 0 0.010270
4187954 1.027612 33.177156 33.177156 146.260000 142.330000 142.33000 146.26000 142.33000 146.26000 1.000000 ... 0.000000 0.000000 False 0.0 0.000000 0.000000 0.000000 0.000000 0 0.033644

5 rows × 31 columns

Performing transient and variable analysis

There are a couple of built-in analyses that you can perform on the data.

  1. Two-epoch analysis (e.g. Mooley et al., 2016; https://ui.adsabs.harvard.edu/abs/2016ApJ...818..105M/abstract).
  2. eta-v sigma analysis (e.g. Rowlinson et al., 2019; https://ui.adsabs.harvard.edu/abs/2019A%26C....27..111R/abstract; though no machine learning is used here, only sigma cuts).

For both searches we are going to perform some filtering on the sources to minimise false positives. This can be done by defining a string that will be passed to the df.filter() method of a dataframe:

my_query_string = (
    "n_measurements >= 3 "
    "& n_selavy >= 2 "
    "& n_neighbour_dist > 1./60. "
    "& 0.8 < avg_compactness < 1.4 "
    "& n_relations == 0"
    "& max_snr >= 7.0"
)

Here we are asking for sources that:

  • have 3 or more measurements,
  • are detected in selavy at least 2 times,
  • are 1 arcmin away from their nearest neighbour,
  • have an average compactness value (f_int / f_peak) between 0.8 and 1.4,
  • have no relations.
  • and have a maximum SNR greater of equal to 7 (this SNR metric relates to selavy detections)

Note: This is an example of cuts you may wish to make. Consider your own science goals when making the selections.

With this query string ready, two-epoch analysis will be the first example.

Two-epoch analysis

The pipeline pre-calculates the two epoch metrics for us, so the first step is that we need to load the two epoch metrics dataframe into the job. This is a dataframe that contains all possible pairs of source measurements along with the calculated metrics. This is done by:

my_run.load_two_epoch_metrics()

The measurement_pairs_df is now availble in the run.

Note By default the Vs parameters are not stored as absolute values, but should be converted to absolute for any manual analysis you do. All the steps below automatically convert to absolute values.

my_run.measurement_pairs_df.head()
meas_id_a meas_id_b flux_int_a flux_int_err_a flux_peak_a flux_peak_err_a image_name_a flux_int_b flux_int_err_b flux_peak_b flux_peak_err_b image_name_b vs_peak vs_int m_peak m_int source_id id pair_epoch_key
0 25422034 25464033 15.922 0.428554 15.184 0.239571 RACS_2118-06A.EPOCH00.I.fits 13.674 0.456748 12.836 0.252638 VAST_2118-06A.EPOCH06x.I.fits 6.743893 3.589219 0.167595 0.151912 4187950 114880415 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...
1 25422034 25471792 15.922 0.428554 15.184 0.239571 RACS_2118-06A.EPOCH00.I.fits 14.194 0.389347 14.001 0.222766 VAST_2118-06A.EPOCH08.I.fits 3.616216 2.984416 0.081069 0.114756 4187950 114880416 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...
2 25422034 25431070 15.922 0.428554 15.184 0.239571 RACS_2118-06A.EPOCH00.I.fits 13.461 0.464165 12.961 0.260662 VAST_2118-06A.EPOCH01.I.fits 6.279078 3.895532 0.157968 0.167512 4187950 114880417 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...
3 25422034 25447451 15.922 0.428554 15.184 0.239571 RACS_2118-06A.EPOCH00.I.fits 12.330 0.509975 11.156 0.275243 VAST_2118-06A.EPOCH03x.I.fits 11.038589 5.392315 0.305847 0.254283 4187950 114880418 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...
4 25422034 25438985 15.922 0.428554 15.184 0.239571 RACS_2118-06A.EPOCH00.I.fits 11.570 0.415895 11.089 0.233366 VAST_2118-06A.EPOCH02.I.fits 12.244119 7.287549 0.311727 0.316601 4187950 114880419 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...

It also loads a pairs_df which contains information about all the unique epoch pairs possible in the pipeline run:

my_run.pairs_df.head()
image_id_a image_id_b datetime_a image_name_a datetime_b image_name_b td pair_epoch_key total_pairs
id
15 2094 2093 2019-10-29 13:39:33.996000+00:00 VAST_2118-06A.EPOCH03x.I.fits 2019-10-30 10:11:56.913000+00:00 VAST_2118-06A.EPOCH02.I.fits 0 days 20:32:22.917000 VAST_2118-06A.EPOCH03x.I.fits_VAST_2118-06A.EP... 14312
33 2097 2098 2020-01-18 05:00:37.814000+00:00 VAST_2118-06A.EPOCH08.I.fits 2020-01-19 04:56:28.318000+00:00 VAST_2118-06A.EPOCH09.I.fits 0 days 23:55:50.504000 VAST_2118-06A.EPOCH08.I.fits_VAST_2118-06A.EPO... 14305
26 2095 2096 2020-01-11 05:40:11.007000+00:00 VAST_2118-06A.EPOCH05x.I.fits 2020-01-12 05:36:03.834000+00:00 VAST_2118-06A.EPOCH06x.I.fits 0 days 23:55:52.827000 VAST_2118-06A.EPOCH05x.I.fits_VAST_2118-06A.EP... 14300
30 2096 2097 2020-01-12 05:36:03.834000+00:00 VAST_2118-06A.EPOCH06x.I.fits 2020-01-18 05:00:37.814000+00:00 VAST_2118-06A.EPOCH08.I.fits 5 days 23:24:33.980000 VAST_2118-06A.EPOCH06x.I.fits_VAST_2118-06A.EP... 14294
31 2096 2098 2020-01-12 05:36:03.834000+00:00 VAST_2118-06A.EPOCH06x.I.fits 2020-01-19 04:56:28.318000+00:00 VAST_2118-06A.EPOCH09.I.fits 6 days 23:20:24.484000 VAST_2118-06A.EPOCH06x.I.fits_VAST_2118-06A.EP... 14291

With these dataframes now loaded into the run, it is now possible to perform the analysis.

This is performed by using the method run_two_epoch_analysis, which returns a dataframe containing source candidates and a dataframe of the pairs that meet the metrics thresholds that are set when running. I.e. if a source has a pair that meet the Vs and |m| thresholds, it is a candidate. For this we are using the values of V=4.3 and m=0.26 (refer to the Mooley et al. paper).

We also pass in the query string that was set up previously which allows the pool of measurements to be pre-filtered.

two_epoch_candidates_df, two_epoch_pairs_df = my_run.run_two_epoch_analysis(4.3, 0.26, query=my_query_string)

Pairs shows the unique epoch pairs and their time delta:

The returned sources are variable candidates according to the thresholds set:

two_epoch_candidates_df.head()
wavg_ra wavg_dec avg_compactness min_snr max_snr wavg_uncertainty_ew wavg_uncertainty_ns avg_flux_int avg_flux_peak max_flux_peak ... n_neighbour_dist vs_abs_significant_max_peak m_abs_significant_max_peak vs_abs_significant_max_int m_abs_significant_max_int n_measurements n_selavy n_forced n_siblings n_relations
id
4187950 317.819257 -7.199839 1.070134 41.472119 64.067511 0.000094 0.000094 14.340365 13.399933 17.1574 ... 0.040130 16.401732 0.429676 10.949743 0.484327 9 9 0 0 0
4187959 318.133920 -2.025453 1.024012 32.331492 58.709459 0.000094 0.000094 14.972294 14.533016 17.5110 ... 0.025218 12.625878 0.397536 15.625120 0.837996 9 9 0 0 0
4187979 318.276406 -1.946311 1.035392 27.828363 40.505338 0.000095 0.000095 10.272123 9.983591 11.3820 ... 0.040547 6.322468 0.267069 7.894679 0.569636 9 9 0 0 0
4187982 316.045403 -4.242655 1.077036 23.910299 48.110672 0.000096 0.000096 10.125076 9.469285 12.2540 ... 0.045307 14.347658 0.570320 8.813314 0.545437 9 9 0 0 0
4188019 316.611715 -9.505873 1.337252 18.500000 29.984674 0.000103 0.000103 9.034583 6.801737 7.8260 ... 0.047963 4.487187 0.283463 0.000000 0.000000 9 9 0 0 0

5 rows × 31 columns

We can see if any sources are marked as new and sort by the m_abs_significant_max_peak (largest first):

two_epoch_candidates_df[two_epoch_candidates_df.new == True].sort_values(by='m_abs_significant_max_peak', ascending=False)
wavg_ra wavg_dec avg_compactness min_snr max_snr wavg_uncertainty_ew wavg_uncertainty_ns avg_flux_int avg_flux_peak max_flux_peak ... n_neighbour_dist vs_abs_significant_max_peak m_abs_significant_max_peak vs_abs_significant_max_int m_abs_significant_max_int n_measurements n_selavy n_forced n_siblings n_relations
id
4193783 322.437561 -4.484845 1.203682 10.539906 32.070248 0.000098 0.000098 2.733421 2.278977 7.761000 ... 0.086375 5.537175 13.064529 5.537175 13.064529 9 4 5 0 0
4194566 323.073507 -6.381174 0.971096 5.322115 8.228814 0.000099 0.000099 0.873776 0.900665 1.942000 ... 0.036527 6.115568 1.946999 0.000000 0.000000 9 2 7 0 0
4189801 321.551773 -9.186431 1.048819 6.825658 10.446309 0.000105 0.000105 1.531493 1.474715 3.113000 ... 0.084676 6.834040 1.658547 4.954395 1.682520 9 4 5 0 0
4197702 321.597431 -5.577311 1.299194 5.549342 9.446948 0.000115 0.000115 2.000905 1.673795 3.026207 ... 0.069836 4.851799 1.346956 0.000000 0.000000 9 5 4 0 0
4193678 322.289415 -3.574866 1.081873 5.312236 8.004082 0.000104 0.000104 1.178886 1.136997 1.961000 ... 0.037415 4.438453 1.342524 0.000000 0.000000 9 3 6 0 0
4201597 314.961550 -6.050265 0.971288 5.964810 7.819672 0.000104 0.000104 1.041571 1.056274 1.908000 ... 0.029760 4.679312 1.283121 0.000000 0.000000 9 3 6 0 0
4197594 315.787054 -4.948162 1.050053 5.010335 9.147982 0.000132 0.000132 1.747918 1.662587 2.229000 ... 0.069194 4.654945 1.190429 0.000000 0.000000 9 8 1 0 0
4197573 324.178813 -8.034279 0.925314 8.018450 8.695122 0.000103 0.000103 1.505784 1.582006 2.351000 ... 0.028529 4.591868 1.056907 0.000000 0.000000 9 4 5 0 0
4190579 320.197278 -6.690515 1.106264 11.745283 18.438424 0.000107 0.000107 3.223989 2.921844 3.743000 ... 0.032472 7.966725 1.004880 6.691594 1.114055 9 8 1 0 0
4193615 318.398038 -6.714669 1.053874 6.687204 10.113208 0.000111 0.000111 1.467191 1.410413 2.144000 ... 0.058393 4.505128 0.914858 0.000000 0.000000 9 5 4 0 0

10 rows × 31 columns

Let's check the first one:

candidate_source = my_run.get_source(4193783)
lc = candidate_source.plot_lightcurve()
candidate_source.simbad_search()
Table length=2
MAIN_IDRADECRA_PRECDEC_PRECCOO_ERR_MAJACOO_ERR_MINACOO_ERR_ANGLECOO_QUALCOO_WAVELENGTHCOO_BIBCODERA_dDEC_dSCRIPT_NUMBER_ID
"h:m:s""d:m:s"masmasdegdegdeg
objectstr13str13int16int16float32float32int16str1str1objectfloat64float64int32
CRTS J212945.0-04290621 29 45.0469-04 29 06.97314140.0600.05590AO2020yCat.1350....0G322.43769552-4.485270441
PSR J2129-0421 29 45.29-04 29 11.966----0D2014ApJ...795...72L322.43871000-4.486640001

It's PSR J2129-04!

Plotting two epoch metrics

You are also able to plot the metrics for a speceific epoch pair, i.e. two images containing the same sources. To pick an epoch you can use the pairs_df, which also contains total_pairs which I'll use to sort here to pick the epoch with the most pairs.

my_run.pairs_df.sort_values(by='total_pairs', ascending=False).head()
image_id_a image_id_b datetime_a image_name_a datetime_b image_name_b td pair_epoch_key total_pairs
id
21 2093 2095 2019-10-30 10:11:56.913000+00:00 VAST_2118-06A.EPOCH02.I.fits 2020-01-11 05:40:11.007000+00:00 VAST_2118-06A.EPOCH05x.I.fits 72 days 19:28:14.094000 VAST_2118-06A.EPOCH02.I.fits_VAST_2118-06A.EPO... 14313
15 2094 2093 2019-10-29 13:39:33.996000+00:00 VAST_2118-06A.EPOCH03x.I.fits 2019-10-30 10:11:56.913000+00:00 VAST_2118-06A.EPOCH02.I.fits 0 days 20:32:22.917000 VAST_2118-06A.EPOCH03x.I.fits_VAST_2118-06A.EP... 14312
3 2091 2095 2019-04-28 23:39:28.153000+00:00 RACS_2118-06A.EPOCH00.I.fits 2020-01-11 05:40:11.007000+00:00 VAST_2118-06A.EPOCH05x.I.fits 257 days 06:00:42.854000 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO... 14311
16 2094 2095 2019-10-29 13:39:33.996000+00:00 VAST_2118-06A.EPOCH03x.I.fits 2020-01-11 05:40:11.007000+00:00 VAST_2118-06A.EPOCH05x.I.fits 73 days 16:00:37.011000 VAST_2118-06A.EPOCH03x.I.fits_VAST_2118-06A.EP... 14311
5 2091 2097 2019-04-28 23:39:28.153000+00:00 RACS_2118-06A.EPOCH00.I.fits 2020-01-18 05:00:37.814000+00:00 VAST_2118-06A.EPOCH08.I.fits 264 days 05:21:09.661000 RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO... 14310

We'll use the id of the top pair (in this case 21) to create the plot and we'll also pass in the query string. By default it returns a bokeh plot, which we will use the show function imported at the beginning.

two_epoch_plot = my_run.plot_two_epoch_pairs(21, query=my_query_string)
show(two_epoch_plot)

There is also an option to plot using matplotlib.

two_epoch_plot_plt = my_run.plot_two_epoch_pairs(14, query=my_query_string, plot_type='matplotlib')

There is also a second style two epoch metrics plot available if you wish. This can be accessed by passing plot_style='b':

two_epoch_plot = my_run.plot_two_epoch_pairs(21, query=my_query_string, plot_style='b')
show(two_epoch_plot)
two_epoch_plot_plt = my_run.plot_two_epoch_pairs(21, query=my_query_string, plot_type='matplotlib', plot_style='b')

Remember that you can change aspects of the returned matplotlib figure by fetching the axes:

two_epoch_plot_plt.axes[0].set_ylim([-1, 1])
two_epoch_plot_plt.axes[0].set_xlim([0, 100])
two_epoch_plot_plt

The plotting of two epochs only supports one 'epoch' at a time but feel free to create your own plot if you require.

The eta-v analysis

Now we'll also run the eta-v analysis. This is called by the my_run.run_eta_v_analysis(). We will use the same query string as before (copied again here for clarity, but you wouldn't need to re-define it):

my_query_string = (
    "n_measurements >= 3 "
    "& n_selavy >= 2 "
    "& n_neighbour_dist > 1./60. "
    "& 0.8 < avg_compactness < 1.4 "
    "& n_relations == 0 "
    "& max_snr > 7.0"
)

Now we run the analysis, using the run_eta_v_analysis function, where we use 1 sigma as the threshold value to select candidates. It returns the eta and v threshold values (see Rowlinson et al. paper), a dataframe of candidate sources and a plot for visualisation.

eta_thresh, v_thresh, eta_v_candidates, plot = my_run.run_eta_v_analysis(1.0, 1.0, query=my_query_string)
print(eta_thresh, v_thresh)
2.9615706075262547 0.18091737933045948

View the plot from the analysis. The datapoints are colour coded to represent the number of selavy (i.e actual) detections.

# use the bokeh `show` function imported at the beginning.
show(plot)

For the purposes of the example lets run the analysis again this time fetching a matplotlib plot and the diagnostic plot.

eta_thresh, v_thresh, eta_v_candidates, plot, diag_plot = my_run.run_eta_v_analysis(1.0, 1.0, query=my_query_string, plot_type='matplotlib', diagnostic=True)
plot

By looking at the plot above, there is one source that is clearly beyond the 1 sigma threshold in both metrics. If you hover over the point in the bokeh plot you'll see it is source 4193783 - PSR J2129-04.

The other candidates can be explored using the candidates dataframe returned by the analysis:

eta_v_candidates
wavg_ra wavg_dec avg_compactness min_snr max_snr wavg_uncertainty_ew wavg_uncertainty_ns avg_flux_int avg_flux_peak max_flux_peak ... n_neighbour_dist vs_abs_significant_max_peak m_abs_significant_max_peak vs_abs_significant_max_int m_abs_significant_max_int n_measurements n_selavy n_forced n_siblings n_relations
id
4187982 316.045403 -4.242655 1.077036 23.910299 48.110672 0.000096 0.000096 10.125076 9.469285 12.254 ... 0.045307 14.347658 0.570320 8.813314 0.545437 9 9 0 0 0
4188036 314.942625 -9.140506 1.063771 22.504316 41.086466 0.000096 0.000096 9.095536 8.612593 11.253 ... 0.031668 12.798074 0.575057 6.831000 0.518743 9 9 0 0 0
4188060 316.181547 -9.591056 1.064400 6.286842 25.265823 0.000123 0.000123 3.959750 3.757028 5.988 ... 0.076302 7.891876 0.859257 5.183373 0.845252 9 9 0 0 0
4188185 322.164402 -1.689381 1.144700 7.415162 11.924138 0.000124 0.000124 3.319862 2.890348 3.458 ... 0.040987 0.000000 0.000000 0.000000 0.000000 9 9 0 0 0
4188216 315.559724 -9.295193 1.176118 6.936201 12.207831 0.000127 0.000127 4.111690 3.547707 4.610 ... 0.050276 0.000000 0.000000 0.000000 0.000000 9 9 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4202038 322.655794 -10.190375 1.172381 10.456954 52.290566 0.000101 0.000101 11.218264 9.649979 13.857 ... 0.022024 18.984270 0.680660 5.306456 0.694894 9 9 0 0 0
4202328 317.173894 -8.768331 1.089107 5.156997 9.492647 0.000146 0.000146 2.253941 2.101504 2.768 ... 0.028469 0.000000 0.000000 0.000000 0.000000 9 9 0 0 0
4202396 318.559264 -4.769185 1.092138 5.417582 9.929204 0.000116 0.000116 1.456132 1.374798 2.244 ... 0.031112 4.738900 1.151075 0.000000 0.000000 9 5 4 0 0
4202454 321.866353 -3.776694 1.189452 6.681507 11.866171 0.000131 0.000131 2.510025 2.146802 3.192 ... 0.071478 4.694476 0.809722 0.000000 0.000000 9 8 1 0 0
4202699 320.212711 -3.906477 1.154878 5.042849 9.214953 0.000126 0.000126 1.843376 1.639570 2.127 ... 0.085835 0.000000 0.000000 0.000000 0.000000 9 7 2 0 0

122 rows × 31 columns

Creating a MOC of the Pipeline Job

WARNING Beware of creating a MOC for a large pipeline run as it can take a very long time to complete. If you are using the VAST Pilot Survey data then load the MOCS directly from VASTMOCS.

There is also the ability to create a MOC file of the pipeline job. This can be useful to see the area and to also use the MOC to get pre-filtered surveys to crossmatch with (see the catalogue crossmatching example notebook).

The MOC is created by calling the following (it may take a few seconds, or minutes for large runs):

my_run_moc = my_run.create_moc()

We can then plot the MOC as demonstrated in the other notebook examples (and in the mocpy documentation).

from mocpy import World2ScreenMPL
from astropy.visualization.wcsaxes.frame import EllipticalFrame

fig = plt.figure(figsize=(12,6))

with World2ScreenMPL(
    fig,
    fov=320 * u.deg,
    center=SkyCoord(0,0, unit='deg', frame='icrs'),
    coordsys="icrs",
    rotation=Angle(0, u.degree),
) as wcs:
    ax = fig.add_subplot(111, projection=wcs, frame_class=EllipticalFrame)
    ax.set_title("VAST Pipeline Runs VAST_2118-06A Area")
    ax.grid(color="black", linestyle="dotted")
    my_run_moc.fill(ax=ax, wcs=wcs, alpha=0.9, fill=True, linewidth=0, color="#00bb00")
    my_run_moc.border(ax=ax, wcs=wcs, alpha=0.5, color="black")

And don't forget to save the MOC so it doesn't have to be generated again!

my_run_moc.write('my_run_moc.fits', overwrite=True)


Last update: July 18, 2023
Created: August 5, 2020