Parsers¶

pytadbit.parsers.hic_parser.read_matrix(things, parser=None, hic=True, resolution=1, **kwargs)[source]¶

Read and checks a matrix from a file (using pytadbit.parser.hic_parser.autoreader()) or a list.

Parameters

things – might be either a file name, a file handler or a list of list (all with same length)

parser (None) –

a parser function that returns a tuple of lists representing the data matrix, with this file example.tsv:

chrT_001    chrT_002    chrT_003    chrT_004
chrT_001    629    164    88    105
chrT_002    86    612    175    110
chrT_003    159    216    437    105
chrT_004    100    111    146    278

the output of parser(‘example.tsv’) might be: ([629, 86, 159, 100, 164, 612, 216, 111, 88, 175, 437, 146, 105, 110, 105, 278])

resolution (1) – resolution of the matrix
hic (True) – if False, TADbit assumes that files contains normalized data

Returns

the corresponding matrix concatenated into a huge list, also returns number or rows

pytadbit.parsers.hic_parser.load_hic_data_from_reads(fnam, resolution, **kwargs)[source]¶

Parameters

fnam – tsv file with reads1 and reads2
resolution – the resolution of the experiment (size of a bin in bases)
genome_seq – a dictionary containing the genomic sequence by chromosome
get_sections (False) – for very very high resolution, when the column index does not fit in memory

pytadbit.parsers.genome_parser.parse_fasta(f_names, chr_names=None, chr_filter=None, chr_regexp=None, verbose=True, save_cache=True, reload_cache=False, only_length=False)[source]¶

Parse a list of fasta files, or just one fasta.

WARNING: The order is important

Parameters

f_names – list of pathes to files, or just a single path
chr_names (None) – pass list of chromosome names, or just one. If None are passed, then chromosome names will be inferred from fasta headers
chr_filter (None) – use only chromosome in the input list
chr_regexp (None) – use only chromosome matching
save_cache (True) – save a cached version of this file for faster loadings (~4 times faster)
reload_cache (False) – reload cached genome
only_length (False) – returns dictionary with length of genome,not sequence

Returns

a sorted dictionary with chromosome names as keys, and sequences as values (sequence in upper case)

pytadbit.parsers.sam_parser.parse_sam(f_names1, f_names2=None, out_file1=None, out_file2=None, genome_seq=None, re_name=None, verbose=False, clean=True, mapper=None, **kwargs)[source]¶

Parse sam/bam file using pysam tools.

Keep a summary of the results into 2 tab-separated files that will contain 6: columns: read ID, Chromosome, position, strand (either 0 or 1), mapped sequence lebgth, position of the closest upstream RE site, position of the closest downstream RE site

Parameters

f_names1 – a list of path to sam/bam files corresponding to the mapping of read1, can also be just one file
f_names1 – a list of path to sam/bam files corresponding to the mapping of read2, can also be just one file
out_file1 – path to outfile tab separated format containing mapped read1 information
out_file1 – path to outfile tab separated format containing mapped read2 information
genome_seq – a dictionary generated by pyatdbit.parser.genome_parser.parse_fasta(). containing the genomic sequence
re_name – name of the restriction enzyme used
mapper (None) – software used to map (supported are GEM and BOWTIE2). Guessed from file by default.

pytadbit.parsers.tad_parser.parse_tads(handler)[source]¶

Parse a tab separated value file that contains the list of TADs of a given experiment. This file might have been generated whith the pytadbit.tadbit.print_result_R() or with the R binding for tadbit

Parameters

handler – path to file
bin_size (1) – resolution of the experiment

Returns

list of TADs and list of weights, each TAD being a dict of type:

{TAD_num: {'start': start,
           'end'  : end,
           'brk'  : end,
           'score': score}}

Parsers¶

Table of Contents

Previous topic

Next topic