VAST Pipeline Exploration

Interacting with results from the VAST Pipeline¶

This notebook gives an example of how to use vast-tools in a notebook environment to interact with output from the VAST Pipeline (https://github.com/askap-vast/vast-pipeline).

Documentation for the pipeline is currently being written but there is a short Wiki page on the methods used here: https://github.com/askap-vast/vast-project/wiki/VAST-Pipeline-Methods-Description. If anything is unclear with the results don't hesistate in contacting the Pipeline development team.

vaex¶

For large pipeline runs vast-tools makes use of vaex to load the measurements and measurement_pairs dataframes in an out-of-core context. vaex behaves differently to pandas but offers many advantages in handling large dataframes. See the documentation here: https://vaex.readthedocs.io/en/latest/index.html. If you use the built-in methods of the pipeline class then vaex will be taken care of behind the scenes such that users shouldn't have to worry about it. However be mindful when performing manual analysis using vaex dataframes. Warnings will appear when vaex has been used. Also note that vaex is a relatively young project so you may run into a bug or two when using it yourself.

Imports¶

Below are the imports required for this example. The main import required from vast-tools is the Pipeline class. Astropy objects are also imported as well as bokeh as some plots use the bokeh library. Note that we initialise the bokeh function output_notebook() such that bokeh outputs are shown in the notebook environment. matplotlib.pyplot is also imported in case we want to manage any plots produced.

from vasttools.pipeline import Pipeline
from bokeh.io import output_notebook
from bokeh.plotting import show
from astropy.coordinates import Angle, SkyCoord
from astropy import units as u
import matplotlib.pyplot as plt

%matplotlib inline
output_notebook()

Loading BokehJS ...

Loading a Pipeline run¶

The first step is to initialise the Pipeline. If the environment variable PIPELINE_WORKING_DIR is defined then you do not need to provide the pipeline run directory. Otherwise, use the project_dir argument in the Pipeline initialisation to set the directory. Ask your admin if you are not sure where the pipeline directory is located.

pipe = Pipeline(project_dir='/data/vast-pipeline/vast-pipeline/pipeline-runs')

If you are unsure of the run name you want to load, you can use the list_piperuns() method to list all available runs.

pipe.list_piperuns()

['10s_FRB190110_FullField_DeepSubtraction',
 '10s_FRB190110_FullField_DeepSubtraction.bak',
 'BANE_no_forced_fits',
 'GalCtr_p2',
 'VAST_2118-06A',
 'combined',
 'dwf_O12',
 'dwf_sep2021_final',
 'full_survey_tiles',
 'full_survey_tiles_test',
 'galctr_p1_test',
 'gw190814_10_epochs',
 'mc_p1_test',
 'racs-low-tiles-bane',
 'racs_comparison',
 'racs_comparison_no_force',
 'smc-low',
 'test_ui_1',
 'tiles-low-corrected',
 'tiles-mid-corrected',
 'tiles_corrected',
 'two_epoch_test',
 'ym_Test']

For this example we will load the VAST_2118-06A run. To do this we call the load_run() method and save the run to my_run:

my_run = pipe.load_run('VAST_2118-06A')

It's also possible to load multiple runs as one object using:

pipe.load_runs(['VAST_2118-06A', 'GalCtr_p2'])

If you have loaded two separately you can also merge one with the other by doing:

my_run = my_run.combine_with_run(my_other_run)

For the purposes of this example we will continue to use the single VAST_2118-06A run.

The pipeline object has loaded all the outputs and information of the pipeline run. You have access to the following dataframes:

my_run.images contains the information on the images used in the pipeline run.

my_run.images.head()

	band_id	skyreg_id	measurements_path	polarisation	name	path	noise_path	background_path	datetime	jd	...	beam_bmin	beam_bpa	rms_median	rms_min	rms_max	centre_ra	centre_dec	xtr_radius	frequency	bandwidth
id
2091	1	1043	/data/vast-pipeline/vast-pipeline/pipeline-run...	I	RACS_2118-06A.EPOCH00.I.fits	/data/vast-survey/pilot/EPOCH00/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH00/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH00/COMBINED/STOKE...	2019-04-28 23:39:28.153000+00:00	2.458602e+06	...	0.003091	-57.846547	0.253992	0.171100	1.584303	319.653271	-6.298819	6.740103	887.0	0.0
2092	1	1044	/data/vast-pipeline/vast-pipeline/pipeline-run...	I	VAST_2118-06A.EPOCH01.I.fits	/data/vast-survey/pilot/EPOCH01/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH01/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH01/COMBINED/STOKE...	2019-08-27 18:52:00.556000+00:00	2.458723e+06	...	0.003261	-70.402943	0.267957	0.179847	1.726998	319.652258	-6.298900	6.740103	887.0	0.0
2093	1	1044	/data/vast-pipeline/vast-pipeline/pipeline-run...	I	VAST_2118-06A.EPOCH02.I.fits	/data/vast-survey/pilot/EPOCH02/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH02/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH02/COMBINED/STOKE...	2019-10-30 10:11:56.913000+00:00	2.458787e+06	...	0.003160	75.375255	0.243300	0.163784	1.672402	319.652258	-6.298900	6.740103	887.0	0.0
2094	1	1044	/data/vast-pipeline/vast-pipeline/pipeline-run...	I	VAST_2118-06A.EPOCH03x.I.fits	/data/vast-survey/pilot/EPOCH03x/COMBINED/STOK...	/data/vast-survey/pilot/EPOCH03x/COMBINED/STOK...	/data/vast-survey/pilot/EPOCH03x/COMBINED/STOK...	2019-10-29 13:39:33.996000+00:00	2.458786e+06	...	0.003052	-82.036874	0.248223	0.162059	2.286700	319.652258	-6.298900	6.740103	887.0	0.0
2095	1	1044	/data/vast-pipeline/vast-pipeline/pipeline-run...	I	VAST_2118-06A.EPOCH05x.I.fits	/data/vast-survey/pilot/EPOCH05x/COMBINED/STOK...	/data/vast-survey/pilot/EPOCH05x/COMBINED/STOK...	/data/vast-survey/pilot/EPOCH05x/COMBINED/STOK...	2020-01-11 05:40:11.007000+00:00	2.458860e+06	...	0.003008	70.737592	0.252249	0.170839	1.824399	319.652258	-6.298900	6.740103	887.0	0.0

5 rows × 29 columns

my_run.sources contains the information on the unique sources found in the pipeline run.

my_run.sources.head()

	wavg_ra	wavg_dec	avg_compactness	min_snr	max_snr	wavg_uncertainty_ew	wavg_uncertainty_ns	avg_flux_int	avg_flux_peak	max_flux_peak	...	n_neighbour_dist	vs_abs_significant_max_peak	m_abs_significant_max_peak	vs_abs_significant_max_int	m_abs_significant_max_int	n_measurements	n_selavy	n_forced	n_siblings	n_relations
id
4187950	317.819257	-7.199839	1.070134	41.472119	64.067511	0.000094	0.000094	14.340365	13.399933	17.157400	...	0.040130	16.401732	0.429676	10.949743	0.484327	9	9	0	0	0
4187951	320.123111	-7.622772	1.639199	48.247148	75.265403	0.000094	0.000094	23.811492	14.640493	16.379000	...	0.000010	10.720544	0.273406	10.532355	0.411996	9	9	0	7	1
4187952	318.752399	-6.777790	1.142731	54.334532	71.412037	0.000094	0.000094	17.021120	14.906352	15.499172	...	0.005963	0.000000	0.000000	0.000000	0.000000	9	9	0	1	0
4187953	320.238178	-8.554920	1.211277	47.676101	71.986364	0.000094	0.000094	17.834988	14.747300	15.837000	...	0.010273	5.713529	0.131924	0.000000	0.000000	9	9	0	1	0
4187954	320.740207	-10.561575	1.027612	33.177156	33.177156	0.000287	0.000287	14.626000	14.233000	14.233000	...	0.033644	0.000000	0.000000	0.000000	0.000000	1	1	0	0	0

5 rows × 31 columns

my_run.measurements contains the information on all of the measurements found in the pipeline run, both from selavy and forced (if used in the pipeline run). Measurements are the datapoints in time of a source, hence the source column tells you which source the measurement belongs to.

my_run.measurements.head()

	source	island_id	component_id	local_rms	ra	ra_err	dec	dec_err	flux_peak	flux_peak_err	...	ns_sys_err	error_radius	uncertainty_ew	uncertainty_ns	weight_ew	weight_ns	forced	flux_int_isl_ratio	flux_peak_isl_ratio	id
0	4192967	RACS_1.05_2118-06A_island_1000	RACS_1.05_2118-06A_component_1000a	0.282	318.759576	0.000061	-7.777971	0.000036	14.270	0.304283	...	0.000278	0.000071	0.000287	0.000287	1.217339e+07	1.217339e+07	False	1.000000	1.000000	25422028
1	4192967	SB9668_island_827	SB9668_component_827a	0.336	318.759865	0.000070	-7.778213	0.000035	16.727	0.358023	...	0.000278	0.000078	0.000288	0.000288	1.201865e+07	1.201865e+07	False	1.000000	1.000000	25438507
2	4192967	SB10335_island_890	SB10335_component_890a	0.314	318.759986	0.000060	-7.778322	0.000036	15.369	0.339949	...	0.000278	0.000069	0.000286	0.000286	1.219699e+07	1.219699e+07	False	1.000000	1.000000	25455320
3	4192967	SB10342_island_962	SB10342_component_962a	0.259	318.760368	0.000034	-7.777980	0.000032	13.611	0.273961	...	0.000278	0.000047	0.000282	0.000282	1.260259e+07	1.260259e+07	False	0.709664	0.774805	25447101
4	4192967	SB11169_island_960	SB11169_component_960a	0.259	318.760094	0.000037	-7.777721	0.000032	12.352	0.269743	...	0.000278	0.000048	0.000282	0.000282	1.257671e+07	1.257671e+07	False	0.551184	0.758536	25463894

5 rows × 42 columns

There is also easy access to a SkyCoord object of the sources:

my_run.sources_skycoord

<SkyCoord (ICRS): (ra, dec) in deg
    [(317.81925668, -7.19983898), (320.12311114, -7.62277166),
     (318.7523995 , -6.77779   ), ..., (321.459605  , -4.071522  ),
     (322.06972   , -3.808927  ), (322.294627  , -6.708297  )]>

You also have access to: * my_run.relations which contains the relation information of the pipeline run, i.e. what source is related to another source. * my_run.skyregions which contains the sky region information of the pipeline run.

You should rarely need to access these, but are there in case they are required.

Looking at a specific source¶

We can load any source into a vast-tools source instance (like those used in other notebook examples) by using the my_run.get_source(id) function. The id is the same id as listed in the sources table. Below I load the source with the id = 4187950.

my_source = my_run.get_source(4187950)

All the normal functions are available:

lc = my_source.plot_lightcurve()

No description has been provided for this image

epoch1 = my_source.show_png_cutout(1)

all_cutouts = my_source.show_all_png_cutouts(columns=5, figsize=(20,10))

ned_results = my_source.ned_search()
ned_results

Table length=6

No.	Object Name	RA	DEC	Type	Velocity	Redshift	Redshift Flag	Magnitude and Filter	Separation	References	Notes	Photometry Points	Positions	Redshift Points	Diameter Points	Associations
		degrees	degrees		km / s				arcmin
int32	str30	float64	float64	object	float64	float64	object	object	float64	int32	int32	int32	int32	int32	int32	int32
1	SDSS J211115.40-071152.9	317.81419	-7.19805	*	--	--		22.5g	0.321	0	0	5	1	0	3	0
2	SDSS J211116.27-071152.3	317.81779	-7.19787	G	--	--		22.6g	0.147	0	0	15	1	0	4	0
3	WISEA J211116.47-071142.1	317.81864	-7.19501	*	--	--		16.5g	0.292	0	0	23	3	0	4	0
4	WISEA J211116.47-071205.6	317.81869	-7.20157	G	--	--		19.4g	0.11	0	0	37	4	0	4	0
5	NVSS J211116-071159	317.81958	-7.19989	RadioS	--	--		1.07	0.019	0	0	1	1	0	0	0
6	WISEA J211117.59-071201.4	317.82338	-7.20043	G	--	--		18.5g	0.247	0	0	27	2	0	3	0

Filtering the Measurements and Re-Generating the Sources¶

WARNING! This method can require significant memory and time to run when used on large runs. On shared instances, such as Jupyter Hub, it might require speaking to the administrator to raise the memory limit.

There may be situations where you find you wish to remove a bulk of measurements from the analysis. For example, maybe you have identified a bad image, or you want to remove all measurements that have a sibling. The pipeline has functions to help with this.

For this example lets pretend that the EPOCH02 image (id number 2093) is bad and we want to remove all the measurements from this image:

my_run.images.head(3)

	band_id	skyreg_id	measurements_path	polarisation	name	path	noise_path	background_path	datetime	jd	...	beam_bmin	beam_bpa	rms_median	rms_min	rms_max	centre_ra	centre_dec	xtr_radius	frequency	bandwidth
id
2091	1	1043	/data/vast-pipeline/vast-pipeline/pipeline-run...	I	RACS_2118-06A.EPOCH00.I.fits	/data/vast-survey/pilot/EPOCH00/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH00/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH00/COMBINED/STOKE...	2019-04-28 23:39:28.153000+00:00	2.458602e+06	...	0.003091	-57.846547	0.253992	0.171100	1.584303	319.653271	-6.298819	6.740103	887.0	0.0
2092	1	1044	/data/vast-pipeline/vast-pipeline/pipeline-run...	I	VAST_2118-06A.EPOCH01.I.fits	/data/vast-survey/pilot/EPOCH01/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH01/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH01/COMBINED/STOKE...	2019-08-27 18:52:00.556000+00:00	2.458723e+06	...	0.003261	-70.402943	0.267957	0.179847	1.726998	319.652258	-6.298900	6.740103	887.0	0.0
2093	1	1044	/data/vast-pipeline/vast-pipeline/pipeline-run...	I	VAST_2118-06A.EPOCH02.I.fits	/data/vast-survey/pilot/EPOCH02/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH02/COMBINED/STOKE...	/data/vast-survey/pilot/EPOCH02/COMBINED/STOKE...	2019-10-30 10:11:56.913000+00:00	2.458787e+06	...	0.003160	75.375255	0.243300	0.163784	1.672402	319.652258	-6.298900	6.740103	887.0	0.0

3 rows × 29 columns

The first step is to create a filtered measurements dataframe to feed into the function.

The method is slightly different on whether the measurements dataframe is loaded using pandas or vaex (there will be a warning if vaex is being used when you opened the job).

If pandas we can remove all the measurements associated with that image by doing:

my_run_filtered_measurements = my_run.measurements.loc[my_run.measurements.image_id != 2093].copy()

If vaex:

# my_run_filtered_measurements = my_run.measurements[my_run.measurements.image_id != 3]
# my_run_filtered_measurements.extract()

We can then use this filtered measurements dataframe to regenerate a sources dataframe with all the metrics updated. In summary it will:

Recalculate all aggregate metrics on the source.
Drop all sources that no longer have a selavy measurement.

However two metrics are not re-calculated:

new & new_source_high_sigma

these will remain unchanged as the process to recalcuate these two is too intensive to do here.

To get the new sources we use the function recalc_sources_df as so:

my_run_filtered_sources = my_run.recalc_sources_df(my_run_filtered_measurements)
my_run_filtered_sources.head()
# note the columns will be in a different order

	avg_compactness	min_snr	max_snr	avg_flux_int	avg_flux_peak	max_flux_peak	max_flux_int	min_flux_peak	min_flux_int	min_flux_peak_isl_ratio	...	eta_int	eta_peak	new	new_high_sigma	vs_significant_max_int	vs_significant_max_peak	m_abs_significant_max_int	m_abs_significant_max_peak	n_relations	n_neighbour_dist
source
4187950	1.073478	41.472119	64.067511	14.686661	13.688800	17.157400	18.964287	11.156000	12.330000	1.000000	...	15.437954	44.779367	False	0.0	9.002095	15.089183	0.423993	0.423926	0	0.040124
4187951	1.678868	48.247148	75.265403	24.270804	14.566305	16.379000	27.775000	12.439438	18.286429	0.752677	...	31.390139	30.017829	False	0.0	10.532355	10.720544	0.411996	0.273406	1	0.000012
4187952	1.148762	54.334532	71.412037	17.143260	14.937271	15.499172	17.797000	14.047000	16.479000	1.000000	...	0.777715	4.284874	False	0.0	0.000000	0.000000	0.000000	0.000000	0	0.037722
4187953	1.208877	47.676101	71.986364	17.789361	14.741837	15.837000	18.504888	13.877000	16.805000	0.918356	...	1.282513	6.162614	False	0.0	1.329542	5.713529	0.057134	0.131924	0	0.010270
4187954	1.027612	33.177156	33.177156	14.626000	14.233000	14.233000	14.626000	14.233000	14.626000	1.000000	...	0.000000	0.000000	False	0.0	0.000000	0.000000	0.000000	0.000000	0	0.033644

5 rows × 31 columns

The new my_run_filtered_sources along with the my_run_filtered_measurements can then be passed into the get_source and variability functions below to use this different sources_df rather than the default pipeline. For example if we again fetch source number 4187950:

my_source_filtered = my_run.get_source(4187950, user_sources=my_run_filtered_sources, user_measurements=my_run_filtered_measurements)
lc = my_source_filtered.plot_lightcurve()

Note that the datapoint from image number 2093 has now been removed.

Changing the Measurement Flux Values¶

WARNING! This method can require significant memory and time to run when used on large runs. On shared instances, such as Jupyter Hub, it might require speaking to the administrator to raise the memory limit.

There may be situations where the flux values of the measurements should be changed, for example, to account for flux corrections. In this case the measurement pairs dataframe, containing the two epoch metrics, should also be recalculated and supplied to recalc_sources_df such that the new significant vs and m columns can be calculated. If not supplied to recalc_sources_df it is assumed that the fluxes are unchanged.

See the Two-epoch analysis section below for more details on the two-epoch metrics.

The measurement pairs dataframe can be calculated using the recalc_measurement_pairs_df method, using the same new measurements dataframe as used earlier.

# scale the flux values of the my_run_filtered_measurements by 10x
my_run_filtered_measurements['flux_int'] = my_run_filtered_measurements['flux_int'] * 10.
my_run_filtered_measurements['flux_peak'] = my_run_filtered_measurements['flux_peak'] * 10.

# Recalculate the measurement pairs dataframe
my_run_new_meas_pairs_df = my_run.recalc_measurement_pairs_df(my_run_filtered_measurements)
my_run_new_meas_pairs_df.head()

	meas_id_a	meas_id_b	flux_int_a	flux_int_err_a	flux_peak_a	flux_peak_err_a	image_name_a	flux_int_b	flux_int_err_b	flux_peak_b	flux_peak_err_b	image_name_b	source_id	id	pair_epoch_key	vs_peak	vs_int	m_peak	m_int
0	25422034	25464033	159.22	0.428554	151.84	0.239571	RACS_2118-06A.EPOCH00.I.fits	136.74	0.456748	128.36	0.252638	VAST_2118-06A.EPOCH06x.I.fits	4187950	114880415	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...	67.438926	35.892189	0.167595	0.151912
1	25422034	25471792	159.22	0.428554	151.84	0.239571	RACS_2118-06A.EPOCH00.I.fits	141.94	0.389347	140.01	0.222766	VAST_2118-06A.EPOCH08.I.fits	4187950	114880416	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...	36.162161	29.844164	0.081069	0.114756
2	25422034	25431070	159.22	0.428554	151.84	0.239571	RACS_2118-06A.EPOCH00.I.fits	134.61	0.464165	129.61	0.260662	VAST_2118-06A.EPOCH01.I.fits	4187950	114880417	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...	62.790779	38.955322	0.157968	0.167512
3	25422034	25447451	159.22	0.428554	151.84	0.239571	RACS_2118-06A.EPOCH00.I.fits	123.30	0.509975	111.56	0.275243	VAST_2118-06A.EPOCH03x.I.fits	4187950	114880418	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...	110.385897	53.923152	0.305847	0.254283
4	25422034	25480751	159.22	0.428554	151.84	0.239571	RACS_2118-06A.EPOCH00.I.fits	144.10	0.417157	126.35	0.220451	VAST_2118-06A.EPOCH09.I.fits	4187950	114880420	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...	78.294450	25.281636	0.183256	0.099697

This new measurement pairs dataframe can then be supplied to for use when recalculating the sources dataframe.

my_run_filtered_sources = my_run.recalc_sources_df(
    my_run_filtered_measurements, measurement_pairs_df=my_run_new_meas_pairs_df)
my_run_filtered_sources.head()

	avg_compactness	min_snr	max_snr	avg_flux_int	avg_flux_peak	max_flux_peak	max_flux_int	min_flux_peak	min_flux_int	min_flux_peak_isl_ratio	...	eta_int	eta_peak	new	new_high_sigma	vs_significant_max_int	vs_significant_max_peak	m_abs_significant_max_int	m_abs_significant_max_peak	n_relations	n_neighbour_dist
source
4187950	1.073478	41.472119	64.067511	146.866609	136.888000	171.57400	189.64287	111.56000	123.30000	1.000000	...	1543.795405	4477.936720	False	0.0	90.020955	150.891829	0.423993	0.423926	0	0.040124
4187951	1.678868	48.247148	75.265403	242.708036	145.663048	163.79000	277.75000	124.39438	182.86429	0.752677	...	3139.013907	3001.782925	False	0.0	105.323554	107.205442	0.411996	0.273406	1	0.000012
4187952	1.148762	54.334532	71.412037	171.432601	149.372715	154.99172	177.97000	140.47000	164.79000	1.000000	...	77.771549	428.487362	False	0.0	19.446298	42.135532	0.076905	0.098298	0	0.037722
4187953	1.208877	47.676101	71.986364	177.893610	147.418370	158.37000	185.04888	138.77000	168.05000	0.918356	...	128.251327	616.261359	False	0.0	23.799392	57.135285	0.096284	0.131924	0	0.010270
4187954	1.027612	33.177156	33.177156	146.260000	142.330000	142.33000	146.26000	142.33000	146.26000	1.000000	...	0.000000	0.000000	False	0.0	0.000000	0.000000	0.000000	0.000000	0	0.033644

5 rows × 31 columns

Performing transient and variable analysis¶

There are a couple of built-in analyses that you can perform on the data.

Two-epoch analysis (e.g. Mooley et al., 2016; https://ui.adsabs.harvard.edu/abs/2016ApJ...818..105M/abstract).
eta-v sigma analysis (e.g. Rowlinson et al., 2019; https://ui.adsabs.harvard.edu/abs/2019A%26C....27..111R/abstract; though no machine learning is used here, only sigma cuts).

For both searches we are going to perform some filtering on the sources to minimise false positives. This can be done by defining a string that will be passed to the df.filter() method of a dataframe:

my_query_string = (
    "n_measurements &gt;= 3 "
    "&amp; n_selavy &gt;= 2 "
    "&amp; n_neighbour_dist &gt; 1./60. "
    "&amp; 0.8 &lt; avg_compactness &lt; 1.4 "
    "&amp; n_relations == 0"
    "&amp; max_snr &gt;= 7.0"
)

Here we are asking for sources that:

have 3 or more measurements,
are detected in selavy at least 2 times,
are 1 arcmin away from their nearest neighbour,
have an average compactness value (f_int / f_peak) between 0.8 and 1.4,
have no relations.
and have a maximum SNR greater of equal to 7 (this SNR metric relates to selavy detections)

Note: This is an example of cuts you may wish to make. Consider your own science goals when making the selections.

With this query string ready, two-epoch analysis will be the first example.

Two-epoch analysis¶

The pipeline pre-calculates the two epoch metrics for us, so the first step is that we need to load the two epoch metrics dataframe into the job. This is a dataframe that contains all possible pairs of source measurements along with the calculated metrics. This is done by:

my_run.load_two_epoch_metrics()

The measurement_pairs_df is now availble in the run.

Note By default the Vs parameters are not stored as absolute values, but should be converted to absolute for any manual analysis you do. All the steps below automatically convert to absolute values.

my_run.measurement_pairs_df.head()

	meas_id_a	meas_id_b	flux_int_a	flux_int_err_a	flux_peak_a	flux_peak_err_a	image_name_a	flux_int_b	flux_int_err_b	flux_peak_b	flux_peak_err_b	image_name_b	vs_peak	vs_int	m_peak	m_int	source_id	id	pair_epoch_key
0	25422034	25464033	15.922	0.428554	15.184	0.239571	RACS_2118-06A.EPOCH00.I.fits	13.674	0.456748	12.836	0.252638	VAST_2118-06A.EPOCH06x.I.fits	6.743893	3.589219	0.167595	0.151912	4187950	114880415	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...
1	25422034	25471792	15.922	0.428554	15.184	0.239571	RACS_2118-06A.EPOCH00.I.fits	14.194	0.389347	14.001	0.222766	VAST_2118-06A.EPOCH08.I.fits	3.616216	2.984416	0.081069	0.114756	4187950	114880416	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...
2	25422034	25431070	15.922	0.428554	15.184	0.239571	RACS_2118-06A.EPOCH00.I.fits	13.461	0.464165	12.961	0.260662	VAST_2118-06A.EPOCH01.I.fits	6.279078	3.895532	0.157968	0.167512	4187950	114880417	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...
3	25422034	25447451	15.922	0.428554	15.184	0.239571	RACS_2118-06A.EPOCH00.I.fits	12.330	0.509975	11.156	0.275243	VAST_2118-06A.EPOCH03x.I.fits	11.038589	5.392315	0.305847	0.254283	4187950	114880418	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...
4	25422034	25438985	15.922	0.428554	15.184	0.239571	RACS_2118-06A.EPOCH00.I.fits	11.570	0.415895	11.089	0.233366	VAST_2118-06A.EPOCH02.I.fits	12.244119	7.287549	0.311727	0.316601	4187950	114880419	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...

It also loads a pairs_df which contains information about all the unique epoch pairs possible in the pipeline run:

my_run.pairs_df.head()

	image_id_a	image_id_b	datetime_a	image_name_a	datetime_b	image_name_b	td	pair_epoch_key	total_pairs
id
15	2094	2093	2019-10-29 13:39:33.996000+00:00	VAST_2118-06A.EPOCH03x.I.fits	2019-10-30 10:11:56.913000+00:00	VAST_2118-06A.EPOCH02.I.fits	0 days 20:32:22.917000	VAST_2118-06A.EPOCH03x.I.fits_VAST_2118-06A.EP...	14312
33	2097	2098	2020-01-18 05:00:37.814000+00:00	VAST_2118-06A.EPOCH08.I.fits	2020-01-19 04:56:28.318000+00:00	VAST_2118-06A.EPOCH09.I.fits	0 days 23:55:50.504000	VAST_2118-06A.EPOCH08.I.fits_VAST_2118-06A.EPO...	14305
26	2095	2096	2020-01-11 05:40:11.007000+00:00	VAST_2118-06A.EPOCH05x.I.fits	2020-01-12 05:36:03.834000+00:00	VAST_2118-06A.EPOCH06x.I.fits	0 days 23:55:52.827000	VAST_2118-06A.EPOCH05x.I.fits_VAST_2118-06A.EP...	14300
30	2096	2097	2020-01-12 05:36:03.834000+00:00	VAST_2118-06A.EPOCH06x.I.fits	2020-01-18 05:00:37.814000+00:00	VAST_2118-06A.EPOCH08.I.fits	5 days 23:24:33.980000	VAST_2118-06A.EPOCH06x.I.fits_VAST_2118-06A.EP...	14294
31	2096	2098	2020-01-12 05:36:03.834000+00:00	VAST_2118-06A.EPOCH06x.I.fits	2020-01-19 04:56:28.318000+00:00	VAST_2118-06A.EPOCH09.I.fits	6 days 23:20:24.484000	VAST_2118-06A.EPOCH06x.I.fits_VAST_2118-06A.EP...	14291

With these dataframes now loaded into the run, it is now possible to perform the analysis.

This is performed by using the method run_two_epoch_analysis, which returns a dataframe containing source candidates and a dataframe of the pairs that meet the metrics thresholds that are set when running. I.e. if a source has a pair that meet the Vs and |m| thresholds, it is a candidate. For this we are using the values of V=4.3 and m=0.26 (refer to the Mooley et al. paper).

We also pass in the query string that was set up previously which allows the pool of measurements to be pre-filtered.

two_epoch_candidates_df, two_epoch_pairs_df = my_run.run_two_epoch_analysis(4.3, 0.26, query=my_query_string)

Pairs shows the unique epoch pairs and their time delta:

The returned sources are variable candidates according to the thresholds set:

two_epoch_candidates_df.head()

	wavg_ra	wavg_dec	avg_compactness	min_snr	max_snr	wavg_uncertainty_ew	wavg_uncertainty_ns	avg_flux_int	avg_flux_peak	max_flux_peak	...	n_neighbour_dist	vs_abs_significant_max_peak	m_abs_significant_max_peak	vs_abs_significant_max_int	m_abs_significant_max_int	n_measurements	n_selavy	n_forced	n_siblings	n_relations
id
4187950	317.819257	-7.199839	1.070134	41.472119	64.067511	0.000094	0.000094	14.340365	13.399933	17.1574	...	0.040130	16.401732	0.429676	10.949743	0.484327	9	9	0	0	0
4187959	318.133920	-2.025453	1.024012	32.331492	58.709459	0.000094	0.000094	14.972294	14.533016	17.5110	...	0.025218	12.625878	0.397536	15.625120	0.837996	9	9	0	0	0
4187979	318.276406	-1.946311	1.035392	27.828363	40.505338	0.000095	0.000095	10.272123	9.983591	11.3820	...	0.040547	6.322468	0.267069	7.894679	0.569636	9	9	0	0	0
4187982	316.045403	-4.242655	1.077036	23.910299	48.110672	0.000096	0.000096	10.125076	9.469285	12.2540	...	0.045307	14.347658	0.570320	8.813314	0.545437	9	9	0	0	0
4188019	316.611715	-9.505873	1.337252	18.500000	29.984674	0.000103	0.000103	9.034583	6.801737	7.8260	...	0.047963	4.487187	0.283463	0.000000	0.000000	9	9	0	0	0

5 rows × 31 columns

We can see if any sources are marked as new and sort by the m_abs_significant_max_peak (largest first):

two_epoch_candidates_df[two_epoch_candidates_df.new == True].sort_values(by='m_abs_significant_max_peak', ascending=False)

	wavg_ra	wavg_dec	avg_compactness	min_snr	max_snr	wavg_uncertainty_ew	wavg_uncertainty_ns	avg_flux_int	avg_flux_peak	max_flux_peak	...	n_neighbour_dist	vs_abs_significant_max_peak	m_abs_significant_max_peak	vs_abs_significant_max_int	m_abs_significant_max_int	n_measurements	n_selavy	n_forced	n_siblings	n_relations
id
4193783	322.437561	-4.484845	1.203682	10.539906	32.070248	0.000098	0.000098	2.733421	2.278977	7.761000	...	0.086375	5.537175	13.064529	5.537175	13.064529	9	4	5	0	0
4194566	323.073507	-6.381174	0.971096	5.322115	8.228814	0.000099	0.000099	0.873776	0.900665	1.942000	...	0.036527	6.115568	1.946999	0.000000	0.000000	9	2	7	0	0
4189801	321.551773	-9.186431	1.048819	6.825658	10.446309	0.000105	0.000105	1.531493	1.474715	3.113000	...	0.084676	6.834040	1.658547	4.954395	1.682520	9	4	5	0	0
4197702	321.597431	-5.577311	1.299194	5.549342	9.446948	0.000115	0.000115	2.000905	1.673795	3.026207	...	0.069836	4.851799	1.346956	0.000000	0.000000	9	5	4	0	0
4193678	322.289415	-3.574866	1.081873	5.312236	8.004082	0.000104	0.000104	1.178886	1.136997	1.961000	...	0.037415	4.438453	1.342524	0.000000	0.000000	9	3	6	0	0
4201597	314.961550	-6.050265	0.971288	5.964810	7.819672	0.000104	0.000104	1.041571	1.056274	1.908000	...	0.029760	4.679312	1.283121	0.000000	0.000000	9	3	6	0	0
4197594	315.787054	-4.948162	1.050053	5.010335	9.147982	0.000132	0.000132	1.747918	1.662587	2.229000	...	0.069194	4.654945	1.190429	0.000000	0.000000	9	8	1	0	0
4197573	324.178813	-8.034279	0.925314	8.018450	8.695122	0.000103	0.000103	1.505784	1.582006	2.351000	...	0.028529	4.591868	1.056907	0.000000	0.000000	9	4	5	0	0
4190579	320.197278	-6.690515	1.106264	11.745283	18.438424	0.000107	0.000107	3.223989	2.921844	3.743000	...	0.032472	7.966725	1.004880	6.691594	1.114055	9	8	1	0	0
4193615	318.398038	-6.714669	1.053874	6.687204	10.113208	0.000111	0.000111	1.467191	1.410413	2.144000	...	0.058393	4.505128	0.914858	0.000000	0.000000	9	5	4	0	0

10 rows × 31 columns

Let's check the first one:

candidate_source = my_run.get_source(4193783)
lc = candidate_source.plot_lightcurve()

candidate_source.simbad_search()

Table length=2

MAIN_ID	RA	DEC	RA_PREC	DEC_PREC	COO_ERR_MAJA	COO_ERR_MINA	COO_ERR_ANGLE	COO_QUAL	COO_WAVELENGTH	COO_BIBCODE	RA_d	DEC_d	SCRIPT_NUMBER_ID
	"h:m:s"	"d:m:s"			mas	mas	deg				deg	deg
object	str13	str13	int16	int16	float32	float32	int16	str1	str1	object	float64	float64	int32
CRTS J212945.0-042906	21 29 45.0469	-04 29 06.973	14	14	0.060	0.055	90	A	O	2020yCat.1350....0G	322.43769552	-4.48527044	1
PSR J2129-04	21 29 45.29	-04 29 11.9	6	6	--	--	0	D		2014ApJ...795...72L	322.43871000	-4.48664000	1

It's PSR J2129-04!

Plotting two epoch metrics¶

You are also able to plot the metrics for a speceific epoch pair, i.e. two images containing the same sources. To pick an epoch you can use the pairs_df, which also contains total_pairs which I'll use to sort here to pick the epoch with the most pairs.

my_run.pairs_df.sort_values(by='total_pairs', ascending=False).head()

	image_id_a	image_id_b	datetime_a	image_name_a	datetime_b	image_name_b	td	pair_epoch_key	total_pairs
id
21	2093	2095	2019-10-30 10:11:56.913000+00:00	VAST_2118-06A.EPOCH02.I.fits	2020-01-11 05:40:11.007000+00:00	VAST_2118-06A.EPOCH05x.I.fits	72 days 19:28:14.094000	VAST_2118-06A.EPOCH02.I.fits_VAST_2118-06A.EPO...	14313
15	2094	2093	2019-10-29 13:39:33.996000+00:00	VAST_2118-06A.EPOCH03x.I.fits	2019-10-30 10:11:56.913000+00:00	VAST_2118-06A.EPOCH02.I.fits	0 days 20:32:22.917000	VAST_2118-06A.EPOCH03x.I.fits_VAST_2118-06A.EP...	14312
3	2091	2095	2019-04-28 23:39:28.153000+00:00	RACS_2118-06A.EPOCH00.I.fits	2020-01-11 05:40:11.007000+00:00	VAST_2118-06A.EPOCH05x.I.fits	257 days 06:00:42.854000	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...	14311
16	2094	2095	2019-10-29 13:39:33.996000+00:00	VAST_2118-06A.EPOCH03x.I.fits	2020-01-11 05:40:11.007000+00:00	VAST_2118-06A.EPOCH05x.I.fits	73 days 16:00:37.011000	VAST_2118-06A.EPOCH03x.I.fits_VAST_2118-06A.EP...	14311
5	2091	2097	2019-04-28 23:39:28.153000+00:00	RACS_2118-06A.EPOCH00.I.fits	2020-01-18 05:00:37.814000+00:00	VAST_2118-06A.EPOCH08.I.fits	264 days 05:21:09.661000	RACS_2118-06A.EPOCH00.I.fits_VAST_2118-06A.EPO...	14310

We'll use the id of the top pair (in this case 21) to create the plot and we'll also pass in the query string. By default it returns a bokeh plot, which we will use the show function imported at the beginning.

two_epoch_plot = my_run.plot_two_epoch_pairs(21, query=my_query_string)
show(two_epoch_plot)

There is also an option to plot using matplotlib.

two_epoch_plot_plt = my_run.plot_two_epoch_pairs(14, query=my_query_string, plot_type='matplotlib')

There is also a second style two epoch metrics plot available if you wish. This can be accessed by passing plot_style='b':

two_epoch_plot = my_run.plot_two_epoch_pairs(21, query=my_query_string, plot_style='b')
show(two_epoch_plot)

two_epoch_plot_plt = my_run.plot_two_epoch_pairs(21, query=my_query_string, plot_type='matplotlib', plot_style='b')

Remember that you can change aspects of the returned matplotlib figure by fetching the axes:

two_epoch_plot_plt.axes[0].set_ylim([-1, 1])
two_epoch_plot_plt.axes[0].set_xlim([0, 100])
two_epoch_plot_plt

The plotting of two epochs only supports one 'epoch' at a time but feel free to create your own plot if you require.

The eta-v analysis¶

Now we'll also run the eta-v analysis. This is called by the my_run.run_eta_v_analysis(). We will use the same query string as before (copied again here for clarity, but you wouldn't need to re-define it):

my_query_string = (
    "n_measurements &gt;= 3 "
    "&amp; n_selavy &gt;= 2 "
    "&amp; n_neighbour_dist &gt; 1./60. "
    "&amp; 0.8 &lt; avg_compactness &lt; 1.4 "
    "&amp; n_relations == 0 "
    "&amp; max_snr &gt; 7.0"
)

Now we run the analysis, using the run_eta_v_analysis function, where we use 1 sigma as the threshold value to select candidates. It returns the eta and v threshold values (see Rowlinson et al. paper), a dataframe of candidate sources and a plot for visualisation.

eta_thresh, v_thresh, eta_v_candidates, plot = my_run.run_eta_v_analysis(1.0, 1.0, query=my_query_string)
print(eta_thresh, v_thresh)

2.9615706075262547 0.18091737933045948

View the plot from the analysis. The datapoints are colour coded to represent the number of selavy (i.e actual) detections.

# use the bokeh `show` function imported at the beginning.
show(plot)

For the purposes of the example lets run the analysis again this time fetching a matplotlib plot and the diagnostic plot.

eta_thresh, v_thresh, eta_v_candidates, plot, diag_plot = my_run.run_eta_v_analysis(1.0, 1.0, query=my_query_string, plot_type='matplotlib', diagnostic=True)

plot

By looking at the plot above, there is one source that is clearly beyond the 1 sigma threshold in both metrics. If you hover over the point in the bokeh plot you'll see it is source 4193783 - PSR J2129-04.

The other candidates can be explored using the candidates dataframe returned by the analysis:

eta_v_candidates

	wavg_ra	wavg_dec	avg_compactness	min_snr	max_snr	wavg_uncertainty_ew	wavg_uncertainty_ns	avg_flux_int	avg_flux_peak	max_flux_peak	...	n_neighbour_dist	vs_abs_significant_max_peak	m_abs_significant_max_peak	vs_abs_significant_max_int	m_abs_significant_max_int	n_measurements	n_selavy	n_forced	n_siblings	n_relations
id
4187982	316.045403	-4.242655	1.077036	23.910299	48.110672	0.000096	0.000096	10.125076	9.469285	12.254	...	0.045307	14.347658	0.570320	8.813314	0.545437	9	9	0	0	0
4188036	314.942625	-9.140506	1.063771	22.504316	41.086466	0.000096	0.000096	9.095536	8.612593	11.253	...	0.031668	12.798074	0.575057	6.831000	0.518743	9	9	0	0	0
4188060	316.181547	-9.591056	1.064400	6.286842	25.265823	0.000123	0.000123	3.959750	3.757028	5.988	...	0.076302	7.891876	0.859257	5.183373	0.845252	9	9	0	0	0
4188185	322.164402	-1.689381	1.144700	7.415162	11.924138	0.000124	0.000124	3.319862	2.890348	3.458	...	0.040987	0.000000	0.000000	0.000000	0.000000	9	9	0	0	0
4188216	315.559724	-9.295193	1.176118	6.936201	12.207831	0.000127	0.000127	4.111690	3.547707	4.610	...	0.050276	0.000000	0.000000	0.000000	0.000000	9	9	0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
4202038	322.655794	-10.190375	1.172381	10.456954	52.290566	0.000101	0.000101	11.218264	9.649979	13.857	...	0.022024	18.984270	0.680660	5.306456	0.694894	9	9	0	0	0
4202328	317.173894	-8.768331	1.089107	5.156997	9.492647	0.000146	0.000146	2.253941	2.101504	2.768	...	0.028469	0.000000	0.000000	0.000000	0.000000	9	9	0	0	0
4202396	318.559264	-4.769185	1.092138	5.417582	9.929204	0.000116	0.000116	1.456132	1.374798	2.244	...	0.031112	4.738900	1.151075	0.000000	0.000000	9	5	4	0	0
4202454	321.866353	-3.776694	1.189452	6.681507	11.866171	0.000131	0.000131	2.510025	2.146802	3.192	...	0.071478	4.694476	0.809722	0.000000	0.000000	9	8	1	0	0
4202699	320.212711	-3.906477	1.154878	5.042849	9.214953	0.000126	0.000126	1.843376	1.639570	2.127	...	0.085835	0.000000	0.000000	0.000000	0.000000	9	7	2	0	0

122 rows × 31 columns

Creating a MOC of the Pipeline Job¶

WARNING Beware of creating a MOC for a large pipeline run as it can take a very long time to complete. If you are using the VAST Pilot Survey data then load the MOCS directly from VASTMOCS.

There is also the ability to create a MOC file of the pipeline job. This can be useful to see the area and to also use the MOC to get pre-filtered surveys to crossmatch with (see the catalogue crossmatching example notebook).

The MOC is created by calling the following (it may take a few seconds, or minutes for large runs):

my_run_moc = my_run.create_moc()

We can then plot the MOC as demonstrated in the other notebook examples (and in the mocpy documentation).

from mocpy import World2ScreenMPL
from astropy.visualization.wcsaxes.frame import EllipticalFrame

fig = plt.figure(figsize=(12,6))

with World2ScreenMPL(
    fig,
    fov=320 * u.deg,
    center=SkyCoord(0,0, unit='deg', frame='icrs'),
    coordsys="icrs",
    rotation=Angle(0, u.degree),
) as wcs:
    ax = fig.add_subplot(111, projection=wcs, frame_class=EllipticalFrame)
    ax.set_title("VAST Pipeline Runs VAST_2118-06A Area")
    ax.grid(color="black", linestyle="dotted")
    my_run_moc.fill(ax=ax, wcs=wcs, alpha=0.9, fill=True, linewidth=0, color="#00bb00")
    my_run_moc.border(ax=ax, wcs=wcs, alpha=0.5, color="black")

And don't forget to save the MOC so it doesn't have to be generated again!

my_run_moc.write('my_run_moc.fits', overwrite=True)