Welcome to the XPIPE weak lensing toolset

xpipe is a python package created to automate work with measuring and calibrating weak lensing shear and mass profiles in wide area lensing surveys such as DES. Features include:

  • Automated pipeline for tangential shear and deltasigma profiles

  • Object-oriented API for lensing related measurements

  • Python wrapper for Erin Sheldon’s xshear.

Installation

The package itself can be obtained from bitbucket:

git clone https://github.com/vargatn/xpipe.git

There are two modes: pipeline and API, for instructions, see the below sections.

Requirements and Dependencies

The code is written in python 2.7 (but should also run without a problem on python 3.5+ )

Additinal required packages:

  • Anaconda: numpy, scipy, pandas, astropy,

  • pip fitsio

  • manual install: kmeans_radec

In addition xshear requires a C99 compliant compiler.

Pipeline mode

First build the pipeline by executing

make pipeline

In the main folder of the repository. This performs the following steps:

  • Sets up the main xpipe package

  • Pulls and builds the submodule xshear. The executable is located at:

    [XPIPE_FOLDER]/submodules/xshear/bin/xshear
    
  • Writes a logfile to the user path. This is necessary for the package to find the absolute location of the config files.

After this the package can be simply imported as

import xpipe

and ther are some pre-defined scripts located at

[XPIPE_FOLDER]/bin/redpipe

a detaild description of what they do is given in the Pipeline Guide section.

When using these scripts, be sure to note the config files located within

[XPIPE_FOLDER]/settings/

namely default_params.yml and default_inputs.yml, as these will define what happens when you execute them. You can define your own settings in params.yml and inputs.yml, which are automatically looked for when using the pipeline. (more on how to specify these settings is explained in the Config files explained page.

API mode

The API mode can be accessed by installing the python package:

python setup.py install

Then simply import it in a python session:

import xpipe

Note that this only gives access to the python part of the code, you have to compile xshear manually, and keep track of the file paths.

Quickstart

DES Y3 style data

lorem ipsum

DES Y3 style data (legacy)

lorem ipsum

Weak lensing pipeline guide

This is a brief introduction on how to use this package in pipeline mode. Please note that this Tutorial describes a simple scenario, in case you encounter problems or unexpected behaviour, inspect the source code, or contact us directly.

All of these scripts have a set of dedicated runtime flags, e.g. to skip processing random points.

Measuring the weak lensing data vector

Some pre-defined scripts are located in bin/redpipe/

  1. Define the parameters and inputs as described in Config files explained

  2. Exectute mkbins.py, there are some flags available, e.g. in case you don’t have any random points, you can use the --norands flag to skip them.

    This script loads the input files, and splits them into parameter bins and JK-patches, and writes them to disk in a format which is understood by XSHEAR

    The input files are written to [custom_data_path]/xshear_in/[tag]/

  3. Run XSHEAR on the created input files. Depending on the choice of source galaxy catalog use either xshear.py for normal runs, and xshear_metacal.py for METACALIBRATION.

    Note that this step might take a very long time, consider running it on a dedicated computing cluster

    This step support OpenMP style parralelization to assign the calculation of separate K-means regions to multiple cores. As a backup solution, it also supports splitting it up to multiple individual tasks via the flags --nchunk (number of chunks), and --ichunk (ID of chunks).

    The output files are written to [custom_data_path]/xshear_out/[tag]/

  4. Extract the lensing profile from the xshear results via postprocess.py

    By default the extracted quantity is \Delta\Sigma, but \gamma_t can also be extracted by re-defining attributes of StackedProfileContainer.

    The results are written to [custom_data_path]/results/[tag]/

    The resulting lensing profiles are written as _profile.dat, and the corresponding Jackknife covariance is saved as _cov.dat.

    In case random points are also defined, there are three types of output files: lens, randoms and subtracted.

Boost factor estimates from P(Z) decomposition

Some pre-defined scripts are located in bin/tools/photoz/, but note that some of the later steps can be controlled when run in an interactive session.

  1. Extract \sum \; w \; p(z) and \sum \; w for each K-means region via extract_full_pwsum.py.

    Note that this step might take a very long time, consider running it on a dedicated computing cluster

    This step support OpenMP style parralelization to assign the calculation of separate K-means regions to multiple cores. As a backup solution, it also supports splitting it up to multiple individual tasks via the flags --nchunk (number of chunks), and --ichunk (ID of chunks).

    Furthermore the --ibin flag restricts the calculation to a single parameter bin.

    The output files are written to [custom_data_path]/xshear_out/[tag]/ as npz files

  2. Combine K-means regions into a PDFContainer with Jackknife regions using extract_full_PDF.py (note that this is a difference between one-patch and all-except-one-patch).

  3. Perform the P(z) decomposition as outlined in mkboost.py.

    Note that this step in practice requires to set the parameter bounds for the fit, and for this reason it’s best run in an interactive mode. The script is only intended to serve as an example on how the decomposition can be performed.

Config files explained

The aim of these config files is to automate running the weak lensing measurements and post-processing. The module xpipe.paths automatically tries to read the configs upon import, but you can also specify them later.

In the pipeline mode xpipe keeps track of reading lens catalogs, splitting them up into parameter bins, measuring the lensing signal and estimating covariances and contaminations automatically. However in order to do this, first we need to specify the details of these tasks in the below config files:

  • params.yml defines the measurement parameters

  • inputs.yml defines the available data files, and short aliases for them

These files do not exist yet when you first clone the repository, however there is a default_params.yml and default_inputs.yml which you should use as a reference. These defaults are set up such that when you create params.yml and inputs.yml they will be automatically looked for and read.

Config files are defined as yaml files, and are read as dictionaries, each entry consisting of a key and a corresponding value (note that this includes nested dictonaries).

Note that in yaml one should use null instead of None

Load order

  1. default_params.yml

  2. looks for params.yml

  3. tries to read custom_params_file from params.yml

from this point the load is recursive, e.g. param files are loaded as long as there is a valid custom params file defined in the last loaded config. Each new config file only updates the settings, such that keys which are not present in the later files are left at their latest value.

The parameters defined here are loaded into the dictionary xpipe.paths.params

params.yml

Key reference

  • custom_params_file: params.yml

    If you want to use an other parameter file, then specify it here. It must be in the same directory

  • custom_data_path: False

    Absolute path to the data directory of the pipeline. If False: uses default project_path + /data

  • mode: full

    The pipeline supports two modes: full and dev. This is primarily used in setting up the input files for the measurement. e.g. you can define two binning schemes: one really complex for the full run, and a simple, quicker for dev

  • tag: default

    Prefix for all files (with NO trailing “_”). In addition this will be the name of the directory wher input and output files are written to.

  • shear_style: reduced

    Format of the source galaxy catalog. Available formats are reduced, lensfit and metacal

  • cat_to_use: default

    Alias for the lens catalog to be used (in this case the default). Aliases are defined in inputs.yml

  • shear_to_use: default

    Alias for the source catalog to be used (in this case the default). Aliases are defined in inputs.yml

  • param_bins_full

    Parameter bins defined for mode: full, e.g.:

    param_bins_full:
        q0_edges: [0.2, 0.35, 0.5, 0.65]
        q1_edges: [5., 10., 14., 20., 30., 45., 60., 999]
    

    q0 and q1 refer to the zero-th and first quantities (in this order) you want to split your lens catalog by. For defining what these relate to see lenskey and randkey. In the above example q0 is redshift, and q1 is optical richness.

    In general you can define an arbitrary number of quantities keeping the notation that the binning edges for quantity n are written as q[n]_edges.

  • param_bins_dev

    Parameter bins defined for mode: dev, e.g.:

    param_bins_dev:
        q0_edges: [0.2, 0.35]
        q1_edges: [45, 60]
    

    q0 and q1 refer to the zero-th and first quantities (in this order) you want to split your lens catalog by. For defining what these relate to see lenskey and randkey. In the above example q0 is redshift, and q1 is optical richness.

    In general you can define an arbitrary number of quantities keeping the notation that the binning edges for quantity n are written as q[n]_edges.

  • lenskey

    Aliases for the columns of the lens data table (assuming fits-like record table):

    lenskey:
      id: MEM_MATCH_ID
      ra: RA
      dec: DEC
      z: Z_LAMBDA
      q0: Z_LAMBDA
      q1: LAMBDA_CHISQ
    

    q0 and q1 refer to the zero-th and first quantities (in this order) you want to split your lens catalog by (see param_bins_*). In general you can define an arbitrary number of quantities keeping the notation that the alias for quantity n are written as q[n]. In the above example q0 is redshift, and q1 is optical richness.

  • randkey

    Aliases for the columns of the random points data table (assuming fits-like record table):

    randkey:
      q0: ZTRUE
      q1: AVG_LAMBDAOUT
      ra: RA
      dec: DEC
      z: ZTRUE
      w: WEIGHT
    

    q0 and q1 refer to the zero-th and first quantities (in this order) you want to split your random points catalog by. In general you can define an arbitrary number of quantities keeping the notation that the alias for quantity n are written as q[n]. In the above example q0 is redshift, and q1 is optical richness

    Note that for random points you have to specify the same quantities as for the lens catalog.

  • nprocess: 2

    Number of maximum processes or CPU-s to use at the same time (OpenMP-style parallelization).

  • njk_max: 100

    Maximum number of Jackknife regions to use in resampling. Actual number is max(n_lens, njk_max)

  • nrandoms

    Number of random points to use:

    nrandoms:
      full: 50000
      dev: 1000
    
  • seeds

    Random seed for choosing the random points random_seed, and for generating rotated shear catalogs shear_seed_master:

    seeds:
      random_seed: 5
      shear_seed_master: 10
    
  • cosmo_params

    Cosmology parameters defined as:

    cosmo_params:
      H0: 70.
      Om0: 0.3
    
  • radial_bins

    Logarithmic (base 10) radial bins from rmin to rmax:

    radial_bins:
      nbin: 15
      rmin: 0.0323
      rmax: 30.0
      units: Mpc
    

    Available units: Mpc, comoving_mpc or arcmin

  • weight_style: "optimal"

    Source weight style in the xshear lensing measurement. Use optimal when estimating \Delta\Sigma and uniform when measuring \gamma.

  • pairlog

    Specifies the amount of source-lens pairs to be saved, and for which radial range:

    pairlog:
     pairlog_rmin: 0
     pairlog_rmax: 0
     pairlog_nmax: 0
    

    Note that the pair limit is considered for each call of xshear separately. That is if you separate lenses into Jackknife regions then this is applicable for a single region.

  • lens_prefix: y1clust

    Prefix for lens-files

  • rand_prefix: y1rand

    Prefix for random points files

  • subtr_prefix: y1subtr

    Prefix for lens - random points files

  • fields_to_use: ['spt', 's82']

    List of names of observational fields to use (as defined below)

  • fields

    Definition of observational field boundaries:

    fields:
      spt:
        dec_top: -30.
        dec_bottom: -60.
        ra_left: 0.
        ra_right: 360.
      s82:
        dec_top: 10.
        dec_bottom: -10.
        ra_left: 300.
        ra_right: 10.
      d04:
        dec_top: 10.
        dec_bottom: -30.
        ra_left: 10.
        ra_right: 250.
    

    These can be approximate, the only requirement is that they divide the lens dataset into the appropriate chunks

  • pzpars

    Parameters for the boost factor extraction:

    pzpars:
      hist:
        nbin: 15
        zmin: 0.0
        zmax: 3.0
        tag: "zhist"
      full:
        tag: "zpdf"
      boost:
        rbmin: 3
        rbmax: 13
        refbin: 14
    

    There are two modes histogram hist which relies on Monte-Carlo samples of redshifts and is less robust, and full which uses the full P(z) of each source galaxy.

    • tag defines the name appended to the corresponding files.

    • boost defines the radial range for the boost estimation in radial bins

  • pdf_paths: null

    Regular expression matching the absolute paths of the BPZ output files containing the full redshift PDF. (e.g. /home/data/*.h5).

    NOTE This is only required for estimating the Boost factors, and can be safely left null in a simple lensing run.

inputs.yml

This config file lists the available data products. Currently all products are listed under the local key, indicating that they are found on disk, (as opposed to downloaded from some network location).

The two major sub-headings are:

  • shearcat

    Lists the available xshear-style source catalog files located within:

    [custom_data_path]/shearcat/
    

    where [custom_data_path] is the absolute path to the data folder specified by the corresponding key in params.yml

    Each input file has it’s key as an alias for the file name, such that you can use the key you define here for a valid value of shear_to_use for params.yml, e.g.:

    shearcat:
      default: default.dat
      im3shape: im3shape_shear_catalog.dat
      metacal: metacal_shear_catalog.dat
    

    These input files should be written in ASCII

  • lenscat

    Lists the available lens catalog files located within:

    [custom_data_path]/lenscat/
    

    where [custom_data_path] is the absolute path to the data folder specified by the corresponding key in params.yml

    Each dataset has it’s key as an alias, which you can use to define the lens dataset for a valid value of cat_to_use for params.yml. In addition, each dataset is implicitely assumed to consist of a lens catalog, and a corresponding catalog of random points, such that for each key there are two sub-keys: lens and rand. Both of these files should be written in fits format:

    lenscat:
      y1clust:
          lens: des_y1_lens_catalog.fits
          rand: des_y1_rand_catalog.fits
      svclust:
          lens: des_sv_lens_catalog.fits
          rand: des_sv_rand_catalog.fits
      testclust:
          lens: test_catalog.fits
          rand: null
    

    In case there are no random points available for the dataset you are using, it is safe to leave the rand field empty, but in this case make sure you also use the --norands flag when exectuing the pipeline scripts.

    In case the input catalog is defined in multiple files (for example when the parameter bins are not trivial to define), a list of filenames can be defined for lens and rand:

    lenscat:
        pre_binned_data:
            lens: [
                [des_y1_lens_catalog_bin-0-0.fits, des_y1_lens_catalog_bin-0-1.fits],
                [des_y1_lens_catalog_bin-1-0.fits, des_y1_lens_catalog_bin-1-1.fits],
            ]
    

    Note: The defined files will be assumed to correspond to separat parameter selections, and thus this mode cannot be used together with the definiton of parameter bins in params.yml

API Reference

DES Y3 wrappers

SOMPZ

Combine shear profiles and apply calibrations

xpipe.xhandle.shearops.AutoCalibrateProfile(...)

WEIGHTS must be from the base input dataset for Random points!!!

Input file manipulation

The bulk of the action is performed by the main writer class:

xpipe.xhandle.parbins.XIO(lenses[, randoms, ...])

XSHEAR style input file creator

Additional helper functions are defined below.

Specify observational fields for the lens catalog:

xpipe.xhandle.parbins.get_file_lists([...])

Return lists of input files

xpipe.xhandle.parbins.field_cut(ra, dec, borders)

Applies RA, DEC cut based on DES field boundaries

xpipe.xhandle.parbins.get_fields_auto([params])

Extracts field boundaries from project params dictionary

Define K-means and Jackknife regions on the sphere:

xpipe.xhandle.parbins.assign_kmeans_labels(...)

Defines 2D patches on the sky via spherical k-means

xpipe.xhandle.parbins.assign_jk_labels(ra, ...)

Assigns a Jacknife (JK) label to the points based on the passed centers

xpipe.xhandle.parbins.extract_jk_labels(labels)

Extracts JK-labels from k-means label array

Load and prepare lens and random point catalogs:

xpipe.xhandle.parbins.load_lenscat([params, ...])

Loads lens catalog from fits file

xpipe.xhandle.parbins.prepare_lenses([...])

Loads lens data and defines sub-selections for different parameter bins

xpipe.xhandle.parbins.load_randcat([params, ...])

Loads random point catalog from fits file

xpipe.xhandle.parbins.prepare_random([...])

Loads random points and defines sub-selections for different parameter bins


XSHEAR wrapper

metacal file tags

in addition the metacalibration tags are defined in xpipe.xhandle.xwrap.sheared_tags

sheared_tags = ["_1p", "_1m", "_2p", "_2m"]

xshear config file writer

the main writer functions:

xpipe.xhandle.xwrap.write_xconf(fname[, pairs])

Writes simple XSHEAR config file based on paths.params

xpipe.xhandle.xwrap.write_custom_xconf(fname)

Writes custom XSHEAR config file

addittional helper functions:

xpipe.xhandle.xwrap.get_main_source_settings([...])

Load settings for unsheared METACAL run with pairlogging

xpipe.xhandle.xwrap.get_main_source_settings_nopairs()

Load settings for unsheared METACAL run with out pairlogging

xpipe.xhandle.xwrap.addlines(cfg, odict)

appends lines to config file

xpipe.xhandle.xwrap.get_pairlog([params])

Returns the source-lens pair logging config part of an XSHEAR config file

xpipe.xhandle.xwrap.get_redges([params])

Returns the radial binning config part of an XSHEAR config file

xpipe.xhandle.xwrap.get_shear([params])

Returns the shear config part of an XSHEAR config file

xpipe.xhandle.xwrap.get_head([params])

Returns the cosmology config part of an XSHEAR config file

xpipe.xhandle.xwrap.get_metanames(fnames)

Calculates METACAL-style sheared file names

Running xshear

xpipe.xhandle.xwrap.create_infodict(flist[, ...])

Creates configuration dictionary which can be passed to multiprocessing map_async()

xpipe.xhandle.xwrap.call_xshear(infodict)

Calls xshear in a single process

xpipe.xhandle.xwrap.call_chunks(chunk)

Executes serial calculation for each chunk (simple for loop)

xpipe.xhandle.xwrap.multi_xrun(infodicts[, ...])

OpenMP style parallelization for xshear tasks

Random rotations of the source catalog

xpipe.xhandle.xwrap.single_rotate(flist, ...)

runs one single rotation of the source catalog with METACAL SELECTION RESPONSES

xpipe.xhandle.xwrap.serial_rotate(flist[, ...])

performs the random rotations serially and saves them to file

xpipe.xhandle.xwrap.chunkwise_rotate(flist)

Performs the random rotations and saves them to file

The catalog rotator object

xpipe.xhandle.xwrap.CatRotator(fname[, ...])

Loads shear catalog, and saves a randomly rotated version

additional functions:

xpipe.xhandle.xwrap.get_rot_seeds(nrot, ...)

Radnom generates seeds for random rotations using the master seed

xpipe.xhandle.xwrap.rot2d(e1, e2, alpha)

2D counterclockwise roation matrix


Postprocessing XSHEAR output

High level wrapper for postprocessing single parameter bins:

xpipe.xhandle.shearops.process_profile(fnames)

Extracts StackedProfileContainer from xshear output

Which wraps the main container class, responsible for most of the postprocessing:

xpipe.xhandle.shearops.StackedProfileContainer(...)

Object Oriented interface for stacked shear and \Delta\Sigma calculated via xshear

Some other useful functions

Extract area-weighted radial bins centers for the lensing measurement:

xpipe.xhandle.shearops.redges(rmin, rmax, nbin)

Calculates nominal edges and centers for logarithmic radial bins(base10 logarithm)

Jackknife covariance between different parameter bins:

xpipe.xhandle.shearops.stacked_pcov(plist)

Calculates the Covariance between a list of profiles

XSHEAR results I/O

The main reader function

xpipe.xhandle.ioshear.xread(xdata, **kwargs)

Reader for xshear output if style is set as both

Addtitional helpers for I/O:

xpipe.xhandle.ioshear.read_single_bin(fname)

Reads and interprets xshear output from a single file

xpipe.xhandle.ioshear.read_multiple_bin(fnames)

Reads and interprets xshear output from many smaller files

xpipe.xhandle.ioshear.xpatches(raw_chunks)

Processes many smaller xshear output files via xread

xpipe.xhandle.ioshear.read_raw(fname)

Reads xshear output from file

xpipe.xhandle.ioshear.read_multiple_raw(fnames)

Reads xshear output from multiple files, and concatenates them

xpipe.xhandle.ioshear.read_sheared_raw(fname)

reads raw xshear output from metacal sheared runs

xpipe.xhandle.ioshear.read_multiple_sheared_raw(fnames)

reads raw xshear output from metacal sheared runs

xpipe.xhandle.ioshear.makecat(fname, mid, ...)

Write an xshear style lens catalog to file

xpipe.xhandle.ioshear.read_lens_pos(fnames)

Reads postions based on the list of filenames passed


Cluster member contamination estimates

Tool to package information about what to do:

Calculate average photo-z P(z) PDF:

Additional useful tools:

P(z) and Boost container object

The Main Container Object is:

The JK-region collation is performed by:

Classes for the P(z) decomposition:

The JK-region collation is performed by:

Additional useful tools:


Useful tools

tools.catalogs

xpipe.tools.catalogs.to_pandas(recarr)

Converts potentially nested record array (such as a FITS Table) into Pandas DataFrame

additional functions:

xpipe.tools.catalogs.flat_type(recarr)

Assigns the dtypes to the flattened array

xpipe.tools.catalogs.flat_copy(recarr)

Copies the record array into a new recarray which has only 1-D columns

tools.selector

Manipulate arrays and parameter distributions

xpipe.tools.selector.selector(pps, limits)

Applies selection to array based on the passed parameter limits

xpipe.tools.selector.matchdd(pars, refpars)

Matches two D-dimensional distributions by reweighting individual objects

xpipe.tools.selector.partition(lst, n)

Divides a list into N roughly equal chunks

xpipe.tools.selector.safedivide(x, y[, eps])

Calculates x / y for arrays, setting result to zero if x ~ 0 OR y ~ 0

# TODO Visualization

example function

xpipe.xhandle.shearops.olivers_mock_function(a, b, c)

This is a one line description

Contact

In case of questions or if you would like to use parts of this pipeline in a publication, please contact me at

T.Varga [at] physik.lmu.de