Utilities

pytadbit.get_dependencies_version(dico=False)[source]

Check versions of TADbit and all dependencies, as well and retrieves system info. May be used to ensure reproducibility.

Returns

string with description of versions installed

pytadbit.utils.tadmaths.calinski_harabasz(scores, clusters)[source]

Implementation of the CH score [CalinskiHarabasz1974], that has shown to be one the most accurate way to compare clustering methods [MilliganCooper1985] [Tibshirani2001].

The CH score is:

\[CH(k) = \frac{B(k) / (k-1)}{W(k)/(n - k)}\]

Where \(B(k)\) and \(W(k)\) are the between and within cluster sums of squares, with \(k\) clusters, and \(n\) the total number of elements (models in this case).

Parameters
  • scores – a dict with, as keys, a tuple with a pair of models; and, as value, the distance between these models.

  • clusters – a dict with, as key, the cluster number, and as value a list of models

  • nmodels – total number of models

Returns

the CH score

pytadbit.utils.extraviews.plot_3d_optimization_result(result, axes=('scale', 'maxdist', 'upfreq', 'lowfreq'))[source]

Displays a three dimensional scatter plot representing the result of the optimization.

Parameters
  • result – 3D numpy array contating correlation values

  • axes ('scale','maxdist','upfreq','lowfreq') – tuple of axes to represent. The order will define which parameter will be placed on the w, z, y or x axe.

pytadbit.utils.extraviews.plot_2d_optimization_result(result, axes=('scale', 'kbending', 'maxdist', 'lowfreq', 'upfreq'), dcutoff=None, show_best=0, skip=None, savefig=None, clim=None, cmap='inferno')[source]

A grid of heatmaps representing the result of the optimization. In the optimization up to 5 parameters can be optimized: ‘scale’, ‘kbending’, ‘maxdist’, ‘lowfreq’, and ‘upfreq’. The maps will be divided in different pages depending on the ‘scale’ and ‘kbending’ values. In each page there will be different maps depending the ‘maxdist’ values. Each map has ‘upfreq’ values along the x-axes, and ‘lowfreq’ values along the y-axes.

Parameters
  • result – 3D numpy array contating the computed correlation values

  • axes ('scale','kbending','maxdist','lowfreq','upfreq') – tuple of axes to represent. The order is important here. It will define which parameter will be placed respectively on the v, w, z, y, or x axes.

  • show_best (0) – number of best correlation value to highlight in the heatmaps. The best correlation is highlithed by default

  • skip (None) – a dict can be passed here in order to fix a given parameter value, e.g.: {‘scale’: 0.001, ‘kbending’: 30, ‘maxdist’: 500} will represent all the correlation values at fixed ‘scale’, ‘kbending’, and ‘maxdist’ values, respectively equal to 0.001, 30, and 500.

  • dcutoff (None) – The distance cutoff (dcutoff) used to compute the contact matrix in the models.

  • savefig (None) – path to a file where to save the generated image. If None, the image will be displayed using matplotlib GUI. NOTE: the extension of the file name will automatically determine the desired format.

  • clim (None) – color scale. If None, the max and min values of the input are used.

  • cmap (inferno) – matplotlib colormap

pytadbit.utils.extraviews.compare_models(sm1, sm2, cutoff=150, models1=None, cluster1=None, models2=None, cluster2=None)[source]

Plots the difference of contact maps of two group of structural models.

Parameters
  • sm1 – a StructuralModel

  • sm2 – a StructuralModel

  • dcutoff (150) – distance threshold (nm) to determine if two particles are in contact

  • models (None) – if None (default) the contact map will be computed using all the models. A list of numbers corresponding to a given set of models can be passed

  • cluster (None) – compute the contact map only for the models in the cluster number ‘cluster’

pytadbit.utils.extraviews.color_residues(x, **kwargs)[source]

Function to color residues from blue to red.

Parameters

model – a given pytadbit.imp.impmodel.IMPmodel

Returns

a list of rgb tuples (red, green, blue), each between 0 and 1.

pytadbit.utils.extraviews.tad_coloring(x, mstart=None, mend=None, tads=None, **kwargs)[source]

Colors TADs from blue to red (first to last TAD). TAD borders are displayed in scale of grey, from light to dark grey (again first to last border)

Parameters
  • model – a given pytadbit.imp.impmodel.IMPmodel

  • tads – a dictionary of TADs, Experiments.tads can be directly passed

Returns

a list of rgb tuples (red, green, blue), each between 0 and 1.

pytadbit.utils.extraviews.tad_border_coloring(x, mstart=None, mend=None, tads=None, **kwargs)[source]

Colors TAD borders from blue to red (bad to good score). TAD are displayed in scale of grey, from light to dark grey (first to last particle in the TAD)

Parameters
  • model – a given pytadbit.imp.impmodel.IMPmodel

  • tads – a dictionary of TADs, Experiments.tads can be directly passed

Returns

a list of rgb tuples (red, green, blue), each between 0 and 1.

pytadbit.utils.extraviews.plot_3d_model(x, y, z, label=False, axe=None, thin=False, savefig=None, show_axe=False, azimuth=- 90, elevation=0.0, color='index', smooth=0.001, particle_size=50, alpha_part=0.5, lw_main=3, **kwargs)[source]

Given a 3 lists of coordinates (x, y, z) plots a three-dimentional model using matplotlib

Parameters
  • model – a pytadbit.imp.impmodel.IMPmodel

  • label (False) – show labels

  • axe (None) – a matplotlib.axes.Axes object to define the plot appearance

  • thin (False) – draw a thin black line instead of representing particles and edges

  • smooth (0.001) – connction between particles is smoothed according to the input condition

  • savefig (None) – path to a file where to save the image generated; if None, the image will be shown using matplotlib GUI (the extension of the file name will determine the desired format).

  • azimuth (-90) – angle to rotate camera along the y axis

  • elevation (0) – angle to rotate camera along the x axis

  • color ('index') –

    can be:

pytadbit.utils.extraviews.chimera_view(cmm_files, chimera_bin='chimera', chimera_cmd=None, savefig=None, center_of_mass=False, gyradius=False, align=True, grid=False, highlight='all', **kwargs)[source]

Open a list of .cmm files with Chimera (http://www.cgl.ucsf.edu/chimera) to view models.

Parameters
  • cmm_files – list of .cmm files

  • chimera_bin ('chimera') – path to chimera binary

  • chimera_cmd (None) – list of command lines for chimera

  • savefig (None) – path to a file where to save generated image

  • center_of_mass (False) – if True, draws the center of mass

  • gyradius (False) – increment the radius of the center of mass by the value of the radius of girations given.

  • align (False) – align models

  • grid (False) – tile models

  • higlight ('all') – higlights a given model, or group of models. Can be either ‘all’, or a model number

pytadbit.utils.hic_filtering.hic_filtering_for_modelling(matrx, silent=False, perc_zero=90, auto=True, min_count=None, draw_hist=False, savefig=None, diagonal=True)[source]

Call filtering function, to remove artifactual columns in a given Hi-C matrix. This function will detect columns with very low interaction counts; and columns with NaN values (in this case NaN will be replaced by zero in the original Hi-C data matrix). Filtered out columns will be stored in the dictionary Experiment._zeros.

Parameters
  • matrx – Hi-C matrix of a given experiment

  • silent (False) – does not warn for removed columns

  • savefig (None) – path to a file where to save the image generated; if None, the image will be shown using matplotlib GUI (the extension of the file name will determine the desired format).

  • diagonal (True) – remove row/columns with zero in the diagonal

  • perc_zero (90) – maximum percentage of cells with no interactions allowed.

  • min_count (None) – minimum number of reads mapped to a bin (recommended value could be 2500). If set this option overrides the perc_zero filtering… This option is slightly slower.

  • auto (True) – if False, only filters based on the given percentage zeros

Returns

the indexes of the columns not to be considered for the calculation of the z-score

pytadbit.utils.file_handling.magic_open(filename, verbose=False, cpus=None)[source]

To read uncompressed zip gzip bzip2 or tar.xx files

Parameters

filename – either a path to a file, or a file handler

Returns

opened file ready to be iterated

pytadbit.utils.file_handling.get_free_space_mb(folder, div=2)[source]

Return folder/drive free space (in bytes)

Based on stackoverflow answer: http://stackoverflow.com/questions/51658/cross-platform-space-remaining-on-volume-using-python

Parameters
  • folder – folder/drive to measure

  • div (2) – divide the size by (div*1024) to get the size in Mb (3 is Gb…)

pytadbit.utils.three_dim_stats.calc_eqv_rmsd(models, beg, end, zeros, dcutoff=200, one=False, what='score', normed=True)[source]

Calculates the RMSD, dRMSD, the number of equivalent positions and a score combining these three measures. The measure are done between a group of models in a one against all manner.

Parameters
  • beg – start particle number of the region to compare

  • end – end particle number of the region to compare

  • zeros – list of True/False representing particles to skip

  • dcutoff (200) – distance in nanometer from which it is considered that two particles are separated.

  • fact (0.75) – Factor for equivalent positions

  • one (False) – if True assumes that only two models are passed, and returns the rmsd of their comparison

  • what ('score') – values to return. Can be one of ‘score’, ‘rmsd’, ‘drmsd’ or ‘eqv’

  • normed (True) – normalize result by maximum value (only applies to rmsd and drmsd)

Returns

a score of each pairwise comparison according to:

\[score_i = eqvs_i \times \frac{dRMSD_i / max(dRMSD)} {RMSD_i / max(RMSD)}\]

where \(eqvs_i\) is the number of equivalent position for the ith pairwise model comparison.

pytadbit.utils.three_dim_stats.square_distance(part1, part2)[source]

Calculates the square distance between two particles.

Parameters
  • part1 – coordinate (dict format with x, y, z keys)

  • part2 – coordinate (dict format with x, y, z keys)

Returns

square distance between two points in space

pytadbit.utils.three_dim_stats.distance(part1, part2)[source]

Calculates the distance between two particles.

Parameters
  • part1 – coordinate in list format (x, y, z)

  • part2 – coordinate in list format (x, y, z)

Returns

distance between two points in space

pytadbit.utils.three_dim_stats.angle_between_3_points(point1, point2, point3)[source]

Calculates the angle between 3 particles

Given three particles A, B and C, the angle g (angle ACB, shown below):

      A
     /|
    /i|
  c/  |
  /   |
 /    |
B )g  |b
 \    |
  \   |
  a\  |
    \h|
     \|
      C

is given by the theorem of Al-Kashi:

\[b^2 = a^2 + c^2 - 2ac\cos(g)\]
Parameters
  • point1 – list of 3 coordinate for x, y and z

  • point2 – list of 3 coordinate for x, y and z

  • point3 – list of 3 coordinate for x, y and z

Returns

angle in radians

pytadbit.utils.three_dim_stats.dihedral(a, b, c, d, e)[source]

Calculates dihedral angle between 4 points in 3D (array with x,y,z)

pytadbit.utils.three_dim_stats.generate_circle_points(x, y, z, u, v, w, n)[source]

Returns list of 3d coordinates of points on a circle using the Rodrigues rotation formula.

see Murray, G. (2013). Rotation About an Arbitrary Axis in 3 Dimensions for details

Parameters
  • x – x coordinate of a point somewhere on the circle

  • y – y coordinate of a point somewhere on the circle

  • z – z coordinate of a point somewhere on the circle

  • a – x coordinate of the center

  • b – y coordinate of the center

  • c – z coordinate of the center

  • u – 1st element of a vector in the same plane as the circle

  • v – 2nd element of a vector in the same plane as the circle

  • w – 3rd element of a vector in the same plane as the circle

  • n – number of points in the circle

TODO: try simplification for a=b=c=0 (and do the translation in the main

function)