Chromosome class¶

class pytadbit.chromosome.Chromosome(name, species=None, assembly=None, experiment_resolutions=None, experiment_tads=None, experiment_hic_data=None, experiment_norm_data=None, experiment_names=None, max_tad_size=inf, chr_len=0, parser=None, centromere_search=False, silent=False, **kw_descr)[source]¶

A Chromosome object designed to deal with Topologically Associating Domains predictions from different experiments, in different cell types for a given chromosome of DNA, and to compare them.

Parameters

name – name of the chromosome (might be a chromosome name for example)
species (None) – species name
assembly (None) – version number of the genomic assembly used
resolutions (None) – list of resolutions corresponding to a list of experiments passed.
experiment_hic_data (None) – list() of paths to files containing the Hi-C count matrices corresponding to different experiments
experiment_tads (None) – list() of paths to files containing the definition of TADs corresponding to different experiments
experiment_names (None) – list() of the names of each experiment
max_tad_size (infinite) – maximum TAD size allowed. TADs longer than this value will not be considered, and size of the corresponding chromosome size will be reduced accordingly
chr_len (0) – size of the DNA chromosome in bp. By default it will be inferred from the distribution of TADs

parser (None) –

a parser function that returns a tuple of lists representing the data matrix and the length of a row/column. With the file example.tsv:

chrT_001       chrT_002        chrT_003        chrT_004
chrT_001       629     164     88      105
chrT_002       164     612     175     110
chrT_003       88      175     437     100
chrT_004       105     110     100     278

the output of parser(‘example.tsv’) would be be: [([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, 110, 100, 278]), 4]

kw_descr (None) –

any other argument passed would be stored as complementary descriptive field. For example:

crm  = Chromosome('19', species='mus musculus',
                  subspecies='musculus musculus',
                  skin_color='black')
print crm

# Chromosome 19:
#    0  experiment loaded:
#    0  alignment loaded:
#    species         : mus musculus
#    assembly version: UNKNOWN
#    subspecies      : musculus musculus
#    skin_color      : black

note that these fields may appear in the header of generated out files

Returns

Chromosome object

add_experiment(name, resolution=None, tad_def=None, hic_data=None, norm_data=None, replace=False, parser=None, conditions=None, **kwargs)[source]¶

Add a Hi-C experiment to Chromosome

Parameters

name – name of the experiment or of the Experiment object
resolution – resolution of the experiment (needed if name is not an Experiment object)
hic_data (None) – whether a file or a list of lists corresponding to the Hi-C data
tad_def (None) – a file or a dict with precomputed TADs for this experiment
replace (False) – overwrite the experiments loaded under the same name

parser (None) –

a parser function that returns a tuple of lists representing the data matrix and the length of a row/column. With a file example.tsv containing:

chrT_001   chrT_002        chrT_003        chrT_004
chrT_001   629     164     88      105
chrT_002   164     612     175     110
chrT_003   88      175     437     100
chrT_004   105     110     100     278

the output of parser(‘example.tsv’) would be: [([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, 110, 100, 278]), 4]

align_experiments(names=None, verbose=False, randomize=False, rnd_method='interpolate', rnd_num=1000, get_score=False, **kwargs)[source]¶

Align the predicted boundaries of two different experiments. The resulting alignment will be stored in the self.experiment list.

Parameters

names (None) – list of names of the experiments to align. If None, align all
experiment1 – name of the first experiment to align
experiment2 – name of the second experiment to align
penalty (-0.1) – penalty for inserting a gap in the alignment
max_dist (100000) – maximum distance between two boundaries allowing match (100Kb seems fair with HUMAN chromosomes)
verbose (False) – if True, print some information about the alignments
randomize (False) – check the alignment quality by comparing randomized boundaries over Chromosomes of the same size. This will return a extra value, the p-value of accepting that the observed alignment is not better than a random alignment
get_score (False) – returns alignemnt object, alignment score and percentage of identity from one side and from the other
rnd_method (interpolate) – by default uses the interpolation of TAD distribution. The alternative method is ‘shuffle’, where TADs are simply shuffled
rnd_num (1000) – number of randomizations to do
method (reciprocal) – if global, Needleman-Wunsch is used to align (see pytadbit.boundary_aligner.globally.needleman_wunsch()); if reciprocal, a method based on reciprocal closest boundaries is used (see pytadbit.boundary_aligner.reciprocally.reciprocal())

Returns

an alignment object or, if the randomizattion was invoked, an alignment object, and a list of statistics that are, the alignment score, the probability that observed alignment performs better than randoms, the proportion of borders from the first experiment found aligned in the second experiment and the proportion of borders from the second experiment found aligned in the first experiment. Returned calues can be catched like this:

ali = crm.align_experiments()

or, with randomization test:

ali, (score, pval, prop1, prop2) = crm.align_experiments(randomize=True)

find_tad(experiments, name=None, n_cpus=1, verbose=True, max_tad_size='max', heuristic=True, batch_mode=False, **kwargs)[source]¶

Call the pytadbit.tadbit.tadbit() function to calculate the position of Topologically Associated Domain boundaries

Parameters

experiment – A square matrix of interaction counts of Hi-C data or a list of such matrices for replicated experiments. The counts must be evenly sampled and not normalized. ‘experiment’ can be either a list of lists, a path to a file or a file handler
normalized (True) – if False simple normalization will be computed, as well as a simple column filtering will be applied (remove columns where value at the diagonal is null)
n_cpus (1) – The number of CPUs to allocate to TADbit. If n_cpus=’max’ the total number of CPUs will be used
max_tad_size (max) – an integer defining the maximum size of a TAD. Default (auto) defines it as the number of rows/columns
heuristic (True) – whether to use or not some heuristics
batch_mode (False) – if True, all the experiments will be concatenated into one for the search of TADs. The resulting TADs found are stored under the name ‘batch’ plus a concatenation of the experiment names passed (e.g.: if experiments=[‘exp1’, ‘exp2’], the name would be: ‘batch_exp1_exp2’).

get_experiment(name)[source]¶

Fetch an Experiment according to its name. This can also be done directly with Chromosome.experiments[name].

Parameters: name – name of the experiment to select
Returns: pytadbit.Experiment

get_tad_hic(tad, x_name, normed=True, matrix_num=0)[source]¶

Retrieve the Hi-C data matrix corresponding to a given TAD.

Parameters

tad – a given TAD (dict)
x_name – name of the experiment
normed (True) – if True, normalize the Hi-C data

Returns

Hi-C data matrix for the given TAD

iter_tads(x_name, normed=True)[source]¶

Iterate over the TADs corresponding to a given experiment.

Parameters

x_name – name of the experiment
normed (True) – normalize Hi-C data returned

Yields

Hi-C data corresponding to each TAD

save_chromosome(out_f, fast=True, divide=True, force=False)[source]¶

Save a Chromosome object to a file (it uses pickle.load() from the pickle). Once saved, the object can be loaded with load_chromosome().

Parameters

out_f – path to the file where to store the pickle object
fast (True) – if True, skip Hi-C data and weights
divide (True) – if True writes two pickles, one with what would result by using the fast option, and the second with the Hi-C and weights data. The second file name will be extended by ‘_hic’ (ie: with out_f=’chromosome12.pik’ we would obtain chromosome12.pik and chromosome12.pik_hic). When loaded load_chromosome() will automatically search for both files
force (False) – overwrite the existing file

set_max_tad_size(value)[source]¶

Change the maximum size allowed for TADs. It also applies to the computed experiments.

Parameters: value – an int value (default is 5000000)

tad_density_plot(name, axe=None, focus=None, extras=None, normalized=True, savefig=None, shape='ellipse')[source]¶

Draw an summary of the TAD found in a given experiment and their density in terms of relative Hi-C interaction count.

Parameters

name – name of the experiment to visualize
focus (None) – can pass a tuple (bin_start, bin_stop) to display the alignment between these genomic bins
extras (None) – list of coordinates (genomic bin) where to draw a red cross
ymax (None) – limit the y axis up to a given value
) (('grey',) – successive colors for alignment
normalized (True) – normalized Hi-C count are plotted instead of raw data.
shape ('ellipse') – which kind of shape to use as schematic representation of TADs. Implemented: ‘ellipse’, ‘rectangle’, ‘triangle’
savefig (None) – path to a file where to save the image generated; if None, the image will be shown using matplotlib GUI (the extension of the file name will determine the desired format).

visualize(names=None, tad=None, focus=None, paint_tads=False, axe=None, show=True, logarithm=True, normalized=False, relative=True, decorate=True, savefig=None, clim=None, scale=(8, 6), cmap='jet')[source]¶

Visualize the matrix of Hi-C interactions of a given experiment

Parameters

names (None) – name of the experiment to visualize, or list of experiment names. If None, all experiments will be shown
tad (None) –
a given TAD in the form:
```
{'start': start,
 'end'  : end,
 'brk'  : end,
 'score': score}
```
Alternatively a list of the TADs can be passed (all the TADs between the first and last one passed will be showed. Thus, passing more than two TADs might be superfluous)
focus (None) – a tuple with the start and end positions of the region to visualize
paint_tads (False) – draw a box around the TADs defined for this experiment
axe (None) – an axe object from matplotlib can be passed in order to customize the picture
show (True) – either to pop-up matplotlib image or not
logarithm (True) – show the logarithm values
normalized (True) – show the normalized data (weights might have been calculated previously). Note: white rows/columns may appear in the matrix displayed; these rows correspond to filtered rows (see pytadbit.utils.hic_filtering.hic_filtering_for_modelling() )
relative (True) – color scale is relative to the whole matrix of data, not only to the region displayed
decorate (True) – draws color bar, title and axes labels
savefig (None) – path to a file where to save the image generated; if None, the image will be shown using matplotlib GUI (the extension of the file name will determine the desired format).
clim (None) – tuple with minimum and maximum value range for color scale. I.e. clim=(-4, 10)
cmap ('jet') – color map from matplotlib. Can also be a preconfigured cmap object.

Load chromosome¶

pytadbit.chromosome.load_chromosome(in_f, fast=2)[source]¶

Load a Chromosome object from a file. A Chromosome object can be saved with the Chromosome.save_chromosome() function.

Parameters

in_f – path to a saved Chromosome object file
fast (2) – if fast=2 do not load the Hi-C data (in the case that they were saved in a separate file see Chromosome.save_chromosome()). If fast is equal to 1, the weights will be skipped from load to save memory. Finally if fast=0, both the weights and Hi-C data will be loaded

Returns

a Chromosome object

TODO: remove first try/except type error… this is loading old experiments

ExperimentList class¶

class pytadbit.chromosome.ExperimentList(thing, crm)[source]¶

Inherited from python built in list(), modified for TADbit pytadbit.Experiment.

Mainly, getitem, setitem, and append were modified in order to be able to search for experiments by index or by name, and to add experiments simply using Chromosome.experiments.append(Experiment).

The whole ExperimentList object is linked to a Chromosome instance (pytadbit.Chromosome).

append(exp)[source]¶: Append object to the end of the list.

Chromosome class¶

Load chromosome¶

ExperimentList class¶

Table of Contents

Previous topic

Next topic