Parsers¶
- pytadbit.parsers.hic_parser.read_matrix(things, parser=None, hic=True, resolution=1, **kwargs)[source]¶
Read and checks a matrix from a file (using
pytadbit.parser.hic_parser.autoreader()
) or a list.- Parameters
things – might be either a file name, a file handler or a list of list (all with same length)
parser (None) –
a parser function that returns a tuple of lists representing the data matrix, with this file example.tsv:
chrT_001 chrT_002 chrT_003 chrT_004 chrT_001 629 164 88 105 chrT_002 86 612 175 110 chrT_003 159 216 437 105 chrT_004 100 111 146 278
the output of parser(‘example.tsv’) might be:
([629, 86, 159, 100, 164, 612, 216, 111, 88, 175, 437, 146, 105, 110, 105, 278])
resolution (1) – resolution of the matrix
hic (True) – if False, TADbit assumes that files contains normalized data
- Returns
the corresponding matrix concatenated into a huge list, also returns number or rows
- pytadbit.parsers.hic_parser.load_hic_data_from_reads(fnam, resolution, **kwargs)[source]¶
- Parameters
fnam – tsv file with reads1 and reads2
resolution – the resolution of the experiment (size of a bin in bases)
genome_seq – a dictionary containing the genomic sequence by chromosome
get_sections (False) – for very very high resolution, when the column index does not fit in memory
- pytadbit.parsers.genome_parser.parse_fasta(f_names, chr_names=None, chr_filter=None, chr_regexp=None, verbose=True, save_cache=True, reload_cache=False, only_length=False)[source]¶
Parse a list of fasta files, or just one fasta.
WARNING: The order is important
- Parameters
f_names – list of pathes to files, or just a single path
chr_names (None) – pass list of chromosome names, or just one. If None are passed, then chromosome names will be inferred from fasta headers
chr_filter (None) – use only chromosome in the input list
chr_regexp (None) – use only chromosome matching
save_cache (True) – save a cached version of this file for faster loadings (~4 times faster)
reload_cache (False) – reload cached genome
only_length (False) – returns dictionary with length of genome,not sequence
- Returns
a sorted dictionary with chromosome names as keys, and sequences as values (sequence in upper case)
- pytadbit.parsers.sam_parser.parse_sam(f_names1, f_names2=None, out_file1=None, out_file2=None, genome_seq=None, re_name=None, verbose=False, clean=True, mapper=None, **kwargs)[source]¶
Parse sam/bam file using pysam tools.
- Keep a summary of the results into 2 tab-separated files that will contain 6
columns: read ID, Chromosome, position, strand (either 0 or 1), mapped sequence lebgth, position of the closest upstream RE site, position of the closest downstream RE site
- Parameters
f_names1 – a list of path to sam/bam files corresponding to the mapping of read1, can also be just one file
f_names1 – a list of path to sam/bam files corresponding to the mapping of read2, can also be just one file
out_file1 – path to outfile tab separated format containing mapped read1 information
out_file1 – path to outfile tab separated format containing mapped read2 information
genome_seq – a dictionary generated by
pyatdbit.parser.genome_parser.parse_fasta()
. containing the genomic sequencere_name – name of the restriction enzyme used
mapper (None) – software used to map (supported are GEM and BOWTIE2). Guessed from file by default.
- pytadbit.parsers.tad_parser.parse_tads(handler)[source]¶
Parse a tab separated value file that contains the list of TADs of a given experiment. This file might have been generated whith the
pytadbit.tadbit.print_result_R()
or with the R binding for tadbit- Parameters
handler – path to file
bin_size (1) – resolution of the experiment
- Returns
list of TADs and list of weights, each TAD being a dict of type:
{TAD_num: {'start': start, 'end' : end, 'brk' : end, 'score': score}}