TAD detection functions

pytadbit.tadbit.tadbit(x, remove=None, n_cpus=1, verbose=True, max_tad_size='max', no_heuristic=0, use_topdom=False, topdom_window=5, **kwargs)[source]

The TADbit algorithm works on raw chromosome interaction count data. The normalization is neither necessary nor recommended, since the data is assumed to be discrete counts.

TADbit is a breakpoint detection algorithm that returns the optimal segmentation of the chromosome under BIC-penalized likelihood. The model assumes that counts have a Poisson distribution and that the expected value of the counts decreases like a power-law with the linear distance on the chromosome. This expected value of the counts at position (i,j) is corrected by the counts at diagonal positions (i,i) and (j,j). This normalizes for different restriction enzyme site densities and ‘mappability’ of the reads in case a bin contains repeated regions.

Parameters
  • x – a square matrix of interaction counts in the HI-C data or a list of such matrices for replicated experiments. The counts must be evenly sampled and not normalized. x might be either a list of list, a path to a file or a file handler

  • norm ('visibility') – kind of normalization to use. Choose between ‘visibility’ of ‘Imakaev’

  • remove (None) – a python list of lists of booleans mapping positively columns to remove (if None only columns with a 0 in the diagonal will be removed)

  • n_cpus (1) – The number of CPUs to allocate to TADbit. If n_cpus=’max’ the total number of CPUs will be used

  • max_tad_size (auto) – an integer defining maximum size of TAD. Default (auto or max) defines it as the number of rows/columns

  • no_heuristic (False) – whether to use or not some heuristics

  • use_topdom (False) – whether to use TopDom algorithm to find tads or not (http://www.ncbi.nlm.nih.gov/pubmed/26704975, http://zhoulab.usc.edu/TopDom/)

  • topdom_window (5) – the window size for topdom algorithm

  • get_weights (False) – either to return the weights corresponding to the Hi-C count (weights are a normalization dependent of the count of each columns)

Returns

the list() of topologically associated domains’ boundaries, and the corresponding list associated log likelihoods. If no weights are given, it may also return calculated weights.

pytadbit.tadbit.batch_tadbit(directory, parser=None, **kwargs)[source]

Use tadbit on directories of data files. All files in the specified directory will be considered data file. The presence of non data files will cause the function to either crash or produce aberrant results.

Each file has to contain the data for a single unit/chromosome. The files can be separated in sub-directories corresponding to single experiments or any other organization. Data files that should be considered replicates have to start with the same characters, until the character sep. For instance, all replicates of the unit ‘chr1’ should start with ‘chr1_’, using the default value of sep.

The data files are read through read.delim. You can pass options to read.delim through the list read_options. For instance if the files have no header, use read_options=list(header=FALSE) and if they also have row names, read_options=list(header=FALSE, row.names=1).

Other arguments such as max_size, n_CPU and verbose are passed to tadbit().

NOTE: only used externally, not from Chromosome

Parameters
  • directory – the directory containing the data files

  • kwargs – arguments passed to tadbit() function

  • parser (None) – a parser function that takes file name as input and returns a tuple representing the matrix of data. Tuple is a concatenation of column1 + column2 + column3 + …

Returns

A list() where each element has the name of the unit/chromosome, and is the output of tadbit() run on the corresponding files assumed to be replicates

pytadbit.tadbit.print_result_r(result, write=True)[source]

Print a table summarizing the TADs found by tadbit. This function outputs something similar to the R function.

Parameters
  • result – the dict that returns tadbit()

  • write (True) – print table. If False, returns the string

Returns

if write is False, returns a string corresponding to the table of results