Chromosome class

class pytadbit.chromosome.Chromosome(name, species=None, assembly=None, experiment_resolutions=None, experiment_tads=None, experiment_hic_data=None, experiment_norm_data=None, experiment_names=None, max_tad_size=inf, chr_len=0, parser=None, centromere_search=False, silent=False, **kw_descr)[source]

A Chromosome object designed to deal with Topologically Associating Domains predictions from different experiments, in different cell types for a given chromosome of DNA, and to compare them.

Parameters
  • name – name of the chromosome (might be a chromosome name for example)

  • species (None) – species name

  • assembly (None) – version number of the genomic assembly used

  • resolutions (None) – list of resolutions corresponding to a list of experiments passed.

  • experiment_hic_data (None) – list() of paths to files containing the Hi-C count matrices corresponding to different experiments

  • experiment_tads (None) – list() of paths to files containing the definition of TADs corresponding to different experiments

  • experiment_names (None) – list() of the names of each experiment

  • max_tad_size (infinite) – maximum TAD size allowed. TADs longer than this value will not be considered, and size of the corresponding chromosome size will be reduced accordingly

  • chr_len (0) – size of the DNA chromosome in bp. By default it will be inferred from the distribution of TADs

  • parser (None) –

    a parser function that returns a tuple of lists representing the data matrix and the length of a row/column. With the file example.tsv:

    chrT_001       chrT_002        chrT_003        chrT_004
    chrT_001       629     164     88      105
    chrT_002       164     612     175     110
    chrT_003       88      175     437     100
    chrT_004       105     110     100     278
    

    the output of parser(‘example.tsv’) would be be: [([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, 110, 100, 278]), 4]

  • kw_descr (None) –

    any other argument passed would be stored as complementary descriptive field. For example:

    crm  = Chromosome('19', species='mus musculus',
                      subspecies='musculus musculus',
                      skin_color='black')
    print crm
    
    # Chromosome 19:
    #    0  experiment loaded:
    #    0  alignment loaded:
    #    species         : mus musculus
    #    assembly version: UNKNOWN
    #    subspecies      : musculus musculus
    #    skin_color      : black
    

    note that these fields may appear in the header of generated out files

Returns

Chromosome object

add_experiment(name, resolution=None, tad_def=None, hic_data=None, norm_data=None, replace=False, parser=None, conditions=None, **kwargs)[source]

Add a Hi-C experiment to Chromosome

Parameters
  • name – name of the experiment or of the Experiment object

  • resolution – resolution of the experiment (needed if name is not an Experiment object)

  • hic_data (None) – whether a file or a list of lists corresponding to the Hi-C data

  • tad_def (None) – a file or a dict with precomputed TADs for this experiment

  • replace (False) – overwrite the experiments loaded under the same name

  • parser (None) –

    a parser function that returns a tuple of lists representing the data matrix and the length of a row/column. With a file example.tsv containing:

    chrT_001   chrT_002        chrT_003        chrT_004
    chrT_001   629     164     88      105
    chrT_002   164     612     175     110
    chrT_003   88      175     437     100
    chrT_004   105     110     100     278
    

    the output of parser(‘example.tsv’) would be: [([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, 110, 100, 278]), 4]

align_experiments(names=None, verbose=False, randomize=False, rnd_method='interpolate', rnd_num=1000, get_score=False, **kwargs)[source]

Align the predicted boundaries of two different experiments. The resulting alignment will be stored in the self.experiment list.

Parameters
  • names (None) – list of names of the experiments to align. If None, align all

  • experiment1 – name of the first experiment to align

  • experiment2 – name of the second experiment to align

  • penalty (-0.1) – penalty for inserting a gap in the alignment

  • max_dist (100000) – maximum distance between two boundaries allowing match (100Kb seems fair with HUMAN chromosomes)

  • verbose (False) – if True, print some information about the alignments

  • randomize (False) – check the alignment quality by comparing randomized boundaries over Chromosomes of the same size. This will return a extra value, the p-value of accepting that the observed alignment is not better than a random alignment

  • get_score (False) – returns alignemnt object, alignment score and percentage of identity from one side and from the other

  • rnd_method (interpolate) – by default uses the interpolation of TAD distribution. The alternative method is ‘shuffle’, where TADs are simply shuffled

  • rnd_num (1000) – number of randomizations to do

  • method (reciprocal) – if global, Needleman-Wunsch is used to align (see pytadbit.boundary_aligner.globally.needleman_wunsch()); if reciprocal, a method based on reciprocal closest boundaries is used (see pytadbit.boundary_aligner.reciprocally.reciprocal())

Returns

an alignment object or, if the randomizattion was invoked, an alignment object, and a list of statistics that are, the alignment score, the probability that observed alignment performs better than randoms, the proportion of borders from the first experiment found aligned in the second experiment and the proportion of borders from the second experiment found aligned in the first experiment. Returned calues can be catched like this:

ali = crm.align_experiments()

or, with randomization test:

ali, (score, pval, prop1, prop2) = crm.align_experiments(randomize=True)

find_tad(experiments, name=None, n_cpus=1, verbose=True, max_tad_size='max', heuristic=True, batch_mode=False, **kwargs)[source]

Call the pytadbit.tadbit.tadbit() function to calculate the position of Topologically Associated Domain boundaries

Parameters
  • experiment – A square matrix of interaction counts of Hi-C data or a list of such matrices for replicated experiments. The counts must be evenly sampled and not normalized. ‘experiment’ can be either a list of lists, a path to a file or a file handler

  • normalized (True) – if False simple normalization will be computed, as well as a simple column filtering will be applied (remove columns where value at the diagonal is null)

  • n_cpus (1) – The number of CPUs to allocate to TADbit. If n_cpus=’max’ the total number of CPUs will be used

  • max_tad_size (max) – an integer defining the maximum size of a TAD. Default (auto) defines it as the number of rows/columns

  • heuristic (True) – whether to use or not some heuristics

  • batch_mode (False) – if True, all the experiments will be concatenated into one for the search of TADs. The resulting TADs found are stored under the name ‘batch’ plus a concatenation of the experiment names passed (e.g.: if experiments=[‘exp1’, ‘exp2’], the name would be: ‘batch_exp1_exp2’).

get_experiment(name)[source]

Fetch an Experiment according to its name. This can also be done directly with Chromosome.experiments[name].

Parameters

name – name of the experiment to select

Returns

pytadbit.Experiment

get_tad_hic(tad, x_name, normed=True, matrix_num=0)[source]

Retrieve the Hi-C data matrix corresponding to a given TAD.

Parameters
  • tad – a given TAD (dict)

  • x_name – name of the experiment

  • normed (True) – if True, normalize the Hi-C data

Returns

Hi-C data matrix for the given TAD

iter_tads(x_name, normed=True)[source]

Iterate over the TADs corresponding to a given experiment.

Parameters
  • x_name – name of the experiment

  • normed (True) – normalize Hi-C data returned

Yields

Hi-C data corresponding to each TAD

save_chromosome(out_f, fast=True, divide=True, force=False)[source]

Save a Chromosome object to a file (it uses pickle.load() from the pickle). Once saved, the object can be loaded with load_chromosome().

Parameters
  • out_f – path to the file where to store the pickle object

  • fast (True) – if True, skip Hi-C data and weights

  • divide (True) – if True writes two pickles, one with what would result by using the fast option, and the second with the Hi-C and weights data. The second file name will be extended by ‘_hic’ (ie: with out_f=’chromosome12.pik’ we would obtain chromosome12.pik and chromosome12.pik_hic). When loaded load_chromosome() will automatically search for both files

  • force (False) – overwrite the existing file

set_max_tad_size(value)[source]

Change the maximum size allowed for TADs. It also applies to the computed experiments.

Parameters

value – an int value (default is 5000000)

tad_density_plot(name, axe=None, focus=None, extras=None, normalized=True, savefig=None, shape='ellipse')[source]

Draw an summary of the TAD found in a given experiment and their density in terms of relative Hi-C interaction count.

Parameters
  • name – name of the experiment to visualize

  • focus (None) – can pass a tuple (bin_start, bin_stop) to display the alignment between these genomic bins

  • extras (None) – list of coordinates (genomic bin) where to draw a red cross

  • ymax (None) – limit the y axis up to a given value

  • ) (('grey',) – successive colors for alignment

  • normalized (True) – normalized Hi-C count are plotted instead of raw data.

  • shape ('ellipse') – which kind of shape to use as schematic representation of TADs. Implemented: ‘ellipse’, ‘rectangle’, ‘triangle’

  • savefig (None) – path to a file where to save the image generated; if None, the image will be shown using matplotlib GUI (the extension of the file name will determine the desired format).

visualize(names=None, tad=None, focus=None, paint_tads=False, axe=None, show=True, logarithm=True, normalized=False, relative=True, decorate=True, savefig=None, clim=None, scale=(8, 6), cmap='jet')[source]

Visualize the matrix of Hi-C interactions of a given experiment

Parameters
  • names (None) – name of the experiment to visualize, or list of experiment names. If None, all experiments will be shown

  • tad (None) –

    a given TAD in the form:

    {'start': start,
     'end'  : end,
     'brk'  : end,
     'score': score}
    

    Alternatively a list of the TADs can be passed (all the TADs between the first and last one passed will be showed. Thus, passing more than two TADs might be superfluous)

  • focus (None) – a tuple with the start and end positions of the region to visualize

  • paint_tads (False) – draw a box around the TADs defined for this experiment

  • axe (None) – an axe object from matplotlib can be passed in order to customize the picture

  • show (True) – either to pop-up matplotlib image or not

  • logarithm (True) – show the logarithm values

  • normalized (True) – show the normalized data (weights might have been calculated previously). Note: white rows/columns may appear in the matrix displayed; these rows correspond to filtered rows (see pytadbit.utils.hic_filtering.hic_filtering_for_modelling() )

  • relative (True) – color scale is relative to the whole matrix of data, not only to the region displayed

  • decorate (True) – draws color bar, title and axes labels

  • savefig (None) – path to a file where to save the image generated; if None, the image will be shown using matplotlib GUI (the extension of the file name will determine the desired format).

  • clim (None) – tuple with minimum and maximum value range for color scale. I.e. clim=(-4, 10)

  • cmap ('jet') – color map from matplotlib. Can also be a preconfigured cmap object.

Load chromosome

pytadbit.chromosome.load_chromosome(in_f, fast=2)[source]

Load a Chromosome object from a file. A Chromosome object can be saved with the Chromosome.save_chromosome() function.

Parameters
  • in_f – path to a saved Chromosome object file

  • fast (2) – if fast=2 do not load the Hi-C data (in the case that they were saved in a separate file see Chromosome.save_chromosome()). If fast is equal to 1, the weights will be skipped from load to save memory. Finally if fast=0, both the weights and Hi-C data will be loaded

Returns

a Chromosome object

TODO: remove first try/except type error… this is loading old experiments

ExperimentList class

class pytadbit.chromosome.ExperimentList(thing, crm)[source]

Inherited from python built in list(), modified for TADbit pytadbit.Experiment.

Mainly, getitem, setitem, and append were modified in order to be able to search for experiments by index or by name, and to add experiments simply using Chromosome.experiments.append(Experiment).

The whole ExperimentList object is linked to a Chromosome instance (pytadbit.Chromosome).

append(exp)[source]

Append object to the end of the list.