python - Dinucleotide count skips repeats -


i'm working on finding dinucleotide count in .txt file. sample data set i'm using 'ssss'. code below running of now.

import os stseq = open(os.path.expanduser("/users/mitch_whitaker/desktop/a5 count.txt")) lines = stseq.read() mystr = '\t'.join([line.strip() line in lines]) all_counts = [] base1 in ['s', 't']:     base2 in ['s', 't']:         dinucleotide = base1 + base2         count = lines.count(dinucleotide)         print("count " + str(count) + " " + dinucleotide)         all_counts.append(count) print(all_counts) 

i returned 'ss' count of 2 when in reality should 3. me figure out solution skipping occurs while counting characters.

your problem here might because of overlapping substrings not being counted. i'm assuming substring sss should counted 2 instances of dinucleotide ss? count() method used return 1. if indeed problem, have design own count method.


Comments

Popular posts from this blog

java - Run spring boot application error: Cannot instantiate interface org.springframework.context.ApplicationListener -

reactjs - React router and this.props.children - how to pass state to this.props.children -

Excel VBA "Microsoft Windows Common Controls 6.0 (SP6)" Location Changes -