Skip to content

Generating Arrow Files

This page describes how to generate the measurement arrow files for a pipeline run if the option in the configuration file to create them was turned off.

Arrow files only be generated by the creator or an administrator.

Two files are produced by the method:

File Description
measurements.arrow An Apache Arrow format file containing all the measurements associated with the pipeline run (see Arrow Files). Extra processing is performed in the creation of this file such that source ids are already in place for the measurements.
measurement_pairs.arrow An Apache Arrow format file containing all the measurement pair metrics (see Arrow Files).

Arrow Files Available

Users can see if arrow files are present for the run of interest by checking the respective run detail page.

Arrow files available.

Admin Tip

The arrow files can be generated using the command line using the command createmaeasarrow).

Why Create Arrow Files?

Large pipeline runs (hundreds of images) mean that to read the measurements, hundreds of parquet files need to be read in, and can contain millions of rows. This can be slow using libraries such as pandas, and also consumes a lot of system memory.

Instead, if the measurements are saved in the Apache Arrow format, libraries such as vaex are able to open .arrow files in an out-of-core context so the memory footprint is hugely reduced along with the reading of the file being very fast. The two-epoch measurement pairs are also saved to arrow format due to the same reasons.

See Reading with vaex for further details on using vaex.

Step-by-step Guide

1. Navigate to the Run Detail Page

Navigate to the detail page of the run you wish to generate arrow files for.

Pipeline run detail page.

2. Select the Generate Arrow Files Option

Click the Generate Arrow Files option at the top-right of the page.

Generate arrow button.

This will open the generate arrow files modal.

Generate arrow modal.

3. Submit Generate Arrow Files Request

It is possible to overwrite existing arrow files by toggling the Overwrite Current Files option.

When ready, click the Generate Arrow Files button on the modal to submit the generate request. A notification will show to indicate whether the submission was successful.

Generate arrow files notification.

4. Refresh and Check the Generate Arrow Files Log File

It is possible to check the progress by looking at the Generate Arrow Files Log File which can be found on the run detail page. The log will not be refreshed automatically and instead the page needs to be manually refreshed.

Once completed the arrow files will be available for use.

Generate arrow files log file.