VAST Pipeline Architecture¶
The pipeline is essentially a Django app in which the pipeline is run as a Django admin command. The main structure of the pipeline is described in this schematics:
Design Philosophy¶
We design the pipeline in order to make it easy to use but at the same time powerful and fast. We decided to use the familiar Pandas Dataframe structure to wrap all the data manipulations, including the association operations, in the back-end. The Python community, as well as the research and scientific communities (including the astro-physicists) are very familiar with Pandas, and they should be able to understand, mantain and develop the code base.
Usually in the "Big Data" world the commond tools adopted by the industry and research are Apache Hadoop and Spark. We decided to use Dask which is similar to Spark in same ways, but it integrates well with Pandas Dataframe and its syntax is quite similar to Pandas. Further it provides scalability by means of clustering and integrating with HPC (High Performance Comptuing) stacks.
The pipeline code itself and the web app are integrated into one code base, for the sake of simplicity, easy to develop using one central repository. The user can still run the pipeline via CLI (Command Line Interface), using Django Admin Commands, as well as thorugh the web app itself. The integration avoid duplication in code, especially on regards the declaration of the schema in the ORM (Object Relational Mapping), and add user and permission management on the underlyng data, through the in-built functionality of Django framework.
The front-end is built in simple HTML, CSS and Javascript using a freely available Bootstrap 4 template. The developers know best practices in the web development are focusing mostly on single page applications using framework such as ReactJS and AngularJS. The choice of using just the basic web stack (HTML + CSS + JS) was driven by the fact that future developers do not need to learn modern web frameworks such as React and Angular, but the fundamental web programming which is still the core of those tools.
Technology Stack¶
Back-End¶
- Astropy 4+
- Astroquery 0.4+
- Bokeh 2+
- Dask 2+
- Django 3+
- Django Rest Framework
- Rest Framework Datatables
- Django Q
- Python Social Auth - Django
- Django Crispy Forms
- Django Tagulous
- Pandas 1+
- Python 3.7+
- Pyarrow 0.17+
- Postgres 10+
- Q3C
- Vaex 3+
Front-End¶
- Aladin Lite
- Bokeh
- Bootstrap 4
- DataTables
- D3 Celestial
- Jquery
- JS9
- ParticleJS
- PrismJS
- SB Admin 2 template
Additional¶
Created: January 19, 2021