Catalogue Crossmatch
Matching catalogues to the VAST Pilot Survey¶
This notebook gives an example of how to use vast-tools in a notebook environment to perform a crossmatch between a catalogue and the VAST Pilot Survey.
Note The settings and filters applied in this notebook, while sensible, are somewhat generic - always consider your science goals on what filters you want to make!
It is highly recommended that results from the VAST Pipeline are used and this is what will be primarily covered in this example. It is possible to run a search just using vast-tools but the results are nowhere near as rich as the pipeline - this is covered at the of this document.
The VAST Pipeline¶
The pipeline hosted on the Nimbus server will contain the pipeline run for the full pilot survey. For a complete demo of what can be done with the vast-tools Pipeline
class see the vast-pipeline-example.ipynb
example notebook.
The Imports¶
Below are the imports required for this example. The main imports required from vast-tools are the Pipeline and VASTMOCS objects. The Query object is for the vast-tools query option that is shown at the end of this notebook. Astropy objects are also imported as they are critical to perfoming the crossmatch.
from vasttools.moc import VASTMOCS
from vasttools.pipeline import Pipeline
from vasttools.query import Query
from mocpy import World2ScreenMPL
import matplotlib.pyplot as plt
from astropy import units as u
from astropy.coordinates import Angle, SkyCoord
%matplotlib inline
Catalogue selection¶
For this example we will be using the ATNF Pulsar Catalogue (Manchester+, 2005)
catalogue, which has the Vizier ID of B/psr
.
Note: Of course your catalogue doesn't have to come from Vizier. If you have a csv
or FITS
file then simply load this data into a DataFrame, create a SkyCoord object and you'll be good to go.
To start our search, the first question we want to answer is:
What sources from the catalogue are in the VAST Pilot Survey footprint?
This can be efficiently answered by using the query_vizier_vast_pilot()
method in VASTMOCS.
First we initialise the VASTMOCS object:
mocs = VASTMOCS()
We then use the query vizier method to obtain all the sources from the PSRCAT catalogue which are contained within the footprint. It will likely take a bit of time to complete.
psrcat_vast_sources = mocs.query_vizier_vast_pilot('B/psr', max_rows=200000)
psrcat_vast_sources
We see that 347 sources are within the VAST Pilot Survey footprint.
Tip: The table returned above is an astropy.table. This can be converted to pandas by using psrcat_vast_sources = psrcat_vast_sources.to_pandas()
.
These can be plotted along with the VAST Pilot Survey footprint using the MOC. See the vast-mocs-example notebook for more on using MOCS and the Wordl2ScreenMPL
method.
from astropy.visualization.wcsaxes.frame import EllipticalFrame
fig = plt.figure(figsize=(16,8))
# Load the Epoch 1 MOC file to use
epoch1_moc = mocs.load_pilot_epoch_moc('1')
#
with World2ScreenMPL(
fig,
fov=320 * u.deg,
center=SkyCoord(0, 0, unit='deg', frame='icrs'),
coordsys="icrs",
rotation=Angle(0, u.degree),
) as wcs:
ax = fig.add_subplot(111, projection=wcs, frame_class=EllipticalFrame)
ax.set_title("PSRCAT Catalogue Sources in the VAST Pilot Survey")
ax.grid(color="black", linestyle="dotted")
epoch1_moc.fill(ax=ax, wcs=wcs, alpha=0.5, fill=True, linewidth=0, color="#00bb00")
epoch1_moc.border(ax=ax, wcs=wcs, alpha=0.5, color="black")
ax.scatter(
psrcat_vast_sources['_RAJ2000'],
psrcat_vast_sources['_DEJ2000'],
transform=ax.get_transform('world'),
zorder=10,
s=3
)
Loading the VAST Pipeline Data¶
Now the results of the VAST Pipeline need to be loaded. This example will not give full details of the Pipeline class, but please refer to the vast-pipeline-example.ipynb
example notebook for a full example and description.
We'll be using the full VAST Pilot Survey pipeline containing epochs 0–13 (a test version called tiles_corrected
).
# below I suppress DeprecationWarnings due to ipykernel bug and an astropy warning due to FITS header warnings.
import warnings
from astropy.utils.exceptions import AstropyWarning
warnings.simplefilter('ignore', category=AstropyWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)
# define pipeline object
pipe = Pipeline()
# load the run
pipe_run = pipe.load_run('tiles_corrected')
We now have access to the unique sources found by the pipeline:
pipe_run.sources.head()
Performing the Crossmatch¶
The crossmatch can be performed using the astropy match_to_catalog_sky
function. The first step is to create the sky coord objects for each catalogue. First the PSRCAT catalog which was already obtained above:
# Unfortunately we cannot use guess_from_table for the Vizier results, so we construct manually
psrcat_skycoord = SkyCoord(psrcat_vast_sources['_RAJ2000'], psrcat_vast_sources['_DEJ2000'], unit=(u.deg, u.deg))
psrcat_names = psrcat_vast_sources['PSRJ'].tolist()
and then by default the pipeline run object has the default sources saved as a sky coord object as pipe_run.sources_skycoord
:
pipe_run.sources_skycoord
Now the crossmatching can be performed. See https://docs.astropy.org/en/stable/coordinates/matchsep.html#astropy-coordinates-matching for details on the astropy functions and outputs.
idx, d2d, d3d = psrcat_skycoord.match_to_catalog_sky(pipe_run.sources_skycoord)
radius_limit = 15 * u.arcsec
(d2d <= radius_limit).sum()
From above we can see that 155 PSRCAT objects have a match to the pipeline sources. If you wish you could merge the results together:
# Convert PSRCAT to pandas first
psrcat_vast_sources_pd = psrcat_vast_sources.to_pandas()
# Create a d2d mask
d2d_mask = d2d <= radius_limit
# Select the crossmatches less than 15 arcsec
psrcat_crossmatch_result_15asec = psrcat_vast_sources_pd.loc[d2d_mask].copy()
# Append the id and distance of the VAST crossmatch to the PSRCAT sources
psrcat_crossmatch_result_15asec['vast_xmatch_id'] = pipe_run.sources.iloc[idx[d2d_mask]].index.values
psrcat_crossmatch_result_15asec['vast_xmatch_d2d_asec'] = d2d[d2d_mask].arcsec
# Join the result
psrcat_crossmatch_result_15asec = psrcat_crossmatch_result_15asec.merge(pipe_run.sources, how='left', left_on='vast_xmatch_id', right_index=True, suffixes=("_psrcat", "_vast"))
psrcat_crossmatch_result_15asec
With the crossmatches in hand you can now start to do any kind of analysis you wish to perform. For example we can perform a quick check to see if the pipeline has picked out any of these sources as having significant two-epoch variability:
psrcat_crossmatch_result_15asec[psrcat_crossmatch_result_15asec['m_abs_significant_max_peak'] > 0.00]
And remember you can use the vast-toools source tools to view any source as in the other example notebooks:
# Get the first VAST source above from the table above
first_source_id = psrcat_crossmatch_result_15asec[psrcat_crossmatch_result_15asec['m_abs_significant_max_peak'] > 0.00].iloc[0].vast_xmatch_id
first_source = pipe_run.get_source(first_source_id)
first_source.plot_lightcurve(min_points=1);
first_source.show_all_png_cutouts(columns=5, figsize=(12,5), size=Angle(2. * u.arcmin));
Filtering the Pipeline Sources (Optional)¶
The example above has used all the sources from the pipeline results, but these may need to be filtered further to improve results. For example Below is an example of how to filter the sources.
my_query_string = (
"n_measurements >= 3 "
"& n_selavy >= 2 "
"& n_neighbour_dist > 1./60. "
"& 0.8 < avg_compactness < 1.4 "
"& n_relations == 0 "
"& max_snr > 7.0"
)
pipe_run_filtered_sources = pipe_run.sources.query(my_query_string)
pipe_run_filtered_sources
You can either: * apply this to the crossmatch results above, or * substitute pipe_run_filtered_sources
into the complete crossmatch process above in the place of my_run.sources
(you need to create a new SkyCoord object first).
pipe_run_filtered_sources_skycoord = pipe_run.get_sources_skycoord(user_sources=pipe_run_filtered_sources)
pipe_run_filtered_sources_skycoord
Finding All Crossmatches Between Sources¶
The crossmatch above only finds the nearest neighbour to the sources in your catalogue. Astropy also offers the functionality to find all matches between objects within a defined radius. See https://docs.astropy.org/en/stable/coordinates/matchsep.html#searching-around-coordinates for full details. This is done by performing the below, using the 15 arcsec radius:
idx_vast, idx_psrcat, d2d, d3d = psrcat_skycoord.search_around_sky(pipe_run.sources_skycoord, 15 * u.arcsec)
A merged dataframe of this crossmatch can be made like that below. Note there are multiple matches to sources so this will generate duplicate sources within the dataframe.
# Create a subset dataframe of the PSRCAT sources with a match
psrcat_search_around_results_15asec = psrcat_vast_sources_pd.iloc[idx_psrcat].copy()
# Add the VAST d2d and match id columns
psrcat_search_around_results_15asec['vast_xmatch_d2d_asec'] = d2d.arcsec
psrcat_search_around_results_15asec['vast_xmatch_id'] = pipe_run.sources.iloc[idx_vast].index.values
# Perform the merge
psrcat_search_around_results_15asec = psrcat_search_around_results_15asec.merge(pipe_run.sources, how='left', left_on='vast_xmatch_id', right_index=True, suffixes=("_psrcat", "_vast"))
psrcat_search_around_results_15asec
This is the end of the example of performing a catalogue crossmatch using the VAST Pipeline. The information below this point is about using the vast-tools query method to find sources from the pilot survey if a pipeline run is not available. A pipeline run should be used whenever possible due to the superior quality of data it generates.
Find VAST Matches Using VAST Tools¶
If a pipeline run isn't available you can use VAST Tools to match to the VAST data instead.
Here the same PSRCAT dataframe that was created in the pipeline section above is used.
The first step is to construct a Query to see how many sources have matches to selavy components in the VAST Pilot Survey. In the Query definition below we use the matches_only
argument. This means that only those sources that have an actual match are returned. For simplicity I only search epochs 1
through 4
to ensure that the query can complete in a reasonable time, but it is possible to search the entire survey. Note you must pre-create the output directory for the query if you intend to use it.
psrcat_query = Query(
coords=psrcat_skycoord,
source_names=psrcat_names,
epochs='1,2,3,4',
max_sep=1.5,
crossmatch_radius=10.0,
matches_only=True,
no_rms=True,
output_dir='psrcat-vast-crossmatching',
use_tiles=False,
)
Note: we provide the argument use_tiles=False
to use the COMBINED
dataset, which was the default for the pilot survey. The default for the full survey is to use the post-processed tiles data - your query settings should only differ from the default with good reason!
And run find_sources
- again a warning that this will take a little while to process.
psrcat_query.find_sources()
We can check the results attribute to see how many sources return a match.
len(psrcat_query.results)
Using the results¶
110 sources have returned a match in the the first four epochs of the VAST survey.
We can create new skycoord and name objects ready for a new query:
matches_mask = [i in (psrcat_query.results) for i in psrcat_vast_sources['PSRJ']]
matched_names = psrcat_vast_sources['PSRJ'][matches_mask].tolist()
matched_skycoords = psrcat_skycoord[matches_mask]
Or loop through and save all the measurements for each source.
# for i in psrcat_query.results:
# i.write_measurements()
While you can explore the sources as normal, for example
my_source = psrcat_query.results['J1755-2725']
lc = my_source.plot_lightcurve();
cutout = my_source.show_png_cutout(1);
it's not recommended to produce cut outs for all sources in the notebook as this will start to take a lot of memory and be quite slow. If you'd like to do this then please use the find_sources.py
script.
VAST Tools Variability¶
Unlike the Pipeline, the sources returned using this method do not contain any of the caluclated metrics. However, you can also perform some rudimentary variablility analysis on the results if you wish.
I would recommened using the VAST Pipeline if possible for this kind of analysis as the associations will be much better and the you'll get a lot more information, but nevertheless this is an example of what you can do with the data from vast-tools.
In the code below I create a dataframe from the query results (which is a pandas series) and assign it to variables_df
and define a function that returns the eta and V metrics for each source when passed through .apply()
. These are then assigned to new eta
and v
columns in the variables_df
dataframe.
import pandas as pd
def get_variable_metrics(row):
"""
Function to return the eta and v metrics using apply.
"""
return row['object'].calc_eta_and_v_metrics()
# create the variables_df dataframe, rename the column holding the objects as 'object'
variables_df = pd.DataFrame(psrcat_query.results).rename(columns={'name':'object'})
# obtain the metrics
variables_df[['eta', 'v']] = variables_df.apply(get_variable_metrics, result_type='expand', axis=1)
We can then, for example, plot the log eta distribution, making sure we choose sources that have more than 2 detections.
mask = [i.detections > 2 for i in variables_df['object']]
import numpy as np
np.log10(variables_df.eta[mask]).hist(bins=100)
plt.show()
You could then do the same for v
and start to fit Gaussians to the distributions and select candidates.
Note for large queries it is recommened to use the script version of find_sources.py
to get cutouts for all results.
Created: August 5, 2020