def wordCount(inPath):
inFile = open(inPath, 'r')
lineList = inFile.readlines()
counter = {}
for line in range(len(lineList)):
currentLine = lineList[line].rstrip("\n")
for letter in range(len(currentLine)):
if currentLine[letter] in counter:
counter[currentLine[letter]] += 1
else:
counter[currentLine[letter]] = 1
sorted(counter.keys(), key=lambda counter: counter[0])
for letter in counter:
print('{:3}{}'.format(letter, counter[letter]))
inPath = "file.txt"
wordCount(inPath)
This is the output:
a 1
k 1
u 1
l 2
12
h 5
T 1
r 4
c 2
d 1
s 5
i 6
o 3
f 2
H 1
A 1
e 10
n 5
x 1
t 5
This is the output I want:
12
A 1
H 1
T 1
a 1
c 2
d 1
e 10
f 2
h 5
i 6
k 1
l 2
n 5
o 3
r 4
s 5
t 5
u 1
x 1
How do I sort the "counter" alphabetically?
I've tried simply sorting by keys and values but it doesn't return it alphabetically starting with capitals first
Thank you for your help!
sorted(counter.keys(), key=lambda counter: counter[0])
alone does nothing: it returns a result which isn't used at all (unless you recall it using _ but that's rather a command-line practice)
As opposed to what you can do with a list with .sort() method, you cannot sort dictionary keys "in-place". But what you can do is iterating on the sorted version of the keys:
for letter in sorted(counter.keys()):
, key=lambda counter: counter[0] is useless here: you only have letters in your keys.
Aside: your whole code could be simplified a great deal using collections.Counter to count the letters.
import collections
c = collections.Counter("This is a Sentence")
for k,v in sorted(c.items()):
print("{} {}".format(k,v))
result (including space char):
3
S 1
T 1
a 1
c 1
e 3
h 1
i 2
n 2
s 2
t 1
Related
This question already has answers here:
Conditional while loop to calculate cumulative sum?
(8 answers)
Closed 2 years ago.
I have the following conditional and I would like the first column of the print to go from 1 to 100. It now goes from 1 to 20.
I have a list of letters and a defined range of 20.
lista = ['a','b','c','d','e']
intervalo = 20
for i, r in enumerate(lista):
s = 1
f = 1
while f <= intervalo*s:
print(f, r)
f+=1
Current output:
1 a
2 a
3 a
4 a
....
1 b
2 b
3 b
4 b
Desired output:
1 a
2 a
3 a
4 a
...
15 a
...
20 a
21 b
22 b
23 b
24 b
....
What about using doubled for?
lista = ['a','b','c','d','e']
interval = 20
for i, item in enumerate(lista):
for j in range(interval):
print(i*interval + j+1, item)
You should increment both f and s, but at the right times:
lista = ['a','b','c','d','e']
interval = 20
s = 1
f = 1
for i, r in enumerate(lista):
while f <= interval*s:
print(f, r)
f+=1
s += 1
There are several files like this:
sample_a.txt containing:
a
b
c
sample_b.txt containing:
b
w
e
sample_c.txt containing:
a
m
n
I want to make a matrix of absence/presence like this:
a b c w e m n
sample_a 1 1 1 0 0 0 0
sample_b 0 1 0 1 1 0 0
sample_c 1 0 0 0 0 1 1
I know a dirty and dumb way how to solve it: make up a list of all possible letters in those files, and then iteratively comparing each line of each file with this 'library' fill in the final matrix by index. But I guess there's a smarter solution. Any ideas?
Upd:
the sample files can be of different length.
You can try:
import pandas as pd
from collections import defaultdict
dd = defaultdict(list) # dictionary where each value per key is a list
files = ["sample_a.txt","sample_b.txt","sample_c.txt"]
for file in files:
with open(file,"r") as f:
for row in f:
dd[file.split(".")[0]].append(row[0])
#appending to dictionary dd:
#KEY: file.split(".")[0] is file name without extension
#VALUE: row[0] is first character of line in text file
# (second character was new line '\n' so I removed it)
df = pd.DataFrame.from_dict(dd, orient='index').T.melt() #converting dictionary to long format of dataframe
pd.crosstab(df.variable, df.value) #make crosstab, similar to pd.pivot_table
result:
value a b c e f m n o p w
variable
sample_a 1 1 1 0 0 0 0 0 0 0
sample_b 0 1 0 1 1 0 0 0 0 1
sample_c 1 0 0 0 0 1 1 1 1 0
Please note letters (columns) are in alphabetical order.
I cannot find a solution for this very specific problem I have.
In essence, I have two lists with two elements each: [A, B] and [1,2]. I want to create a nested loop that iterates and expands on the second list and adds each element of first list after each iteration.
What I want to see in the end is this:
A B
1 A
1 B
2 A
2 B
1 1 A
1 2 A
2 1 A
2 2 A
1 1 B
1 2 B
2 1 B
2 2 B
1 1 1 A
1 1 2 A
...
My problem is that my attempt at doing this recursively splits the A and B apart so that this pattern emerges (note the different first line, too):
A
1 A
2 A
1 1 A
1 2 A
2 1 A
2 2 A
1 1 1 A
1 1 2 A
...
B
1 B
2 B
1 1 B
1 2 B
2 1 B
2 2 B
1 1 1 B
1 1 2 B
...
How do I keep A and B together?
Here is the code:
def second_list(depth):
if depth < 1:
yield ''
else:
for elements in [' 1 ', ' 2 ']:
for other_elements in list (second_list(depth-1)):
yield elements + other_elements
for first_list in [' A ', ' B ']:
for i in range(0,4):
temp=second_list(i)
for temp_list in list(temp):
print temp_list + first_list
I would try something in the following style:
l1 = ['A', 'B']
l2 = ['1', '2']
def expand(l1, l2):
nl1 = []
for e in l1:
for f in l2:
nl1.append(f+e)
yield nl1[-1]
yield from expand(nl1,l2)
for x in expand(l1, l2):
print (x)
if len(x) > 5:
break
Note: the first line of your output does not seem to be the product of the same rule, so it is not generated here, you can add it, if you want, manually.
Note2: it would be more elegant not to build the list of the newly generated elements, but then you would have to calculate them twice.
matrix = []
for index, value in enumerate(['A','C','G','T']):
matrix.append([])
matrix[index].append(value + ':')
for i in range(len(lines[0])):
total = 0
for sequence in lines:
if sequence[i] == value:
total += 1
matrix[index].append(total)
unity = ''
for i in range(len(lines[0])):
column = []
for row in matrix:
column.append(row[1:][i])
maximum = column.index(max(column))
unity += ['A', 'C', 'G', 'T'][maximum]
print("Unity: " + unity)
for row in matrix:
print(' '.join(map(str, row)))
OUTPUT:
Unity: GGCTACGC
A: 1 2 0 2 3 2 0 0
C: 0 1 4 2 1 3 2 4
G: 3 3 2 0 1 2 4 1
T: 3 1 1 3 2 0 1 2
With this code I get this matrix but I want to form the matrix like this:
A C G T
G: 1 0 3 3
G: 2 1 3 1
C: 0 4 2 1
T: 2 2 0 3
A: 3 1 1 2
C: 2 3 2 0
G: 0 2 4 1
C: 0 4 1 2
But I don't know how. I hope someone can help me. Thanks already for the answers.
The sequences are:
AGCTACGT
TAGCTAGC
TAGCTACG
GCTAGCGC
TGCTAGCC
GGCTACGT
GTCACGTC
You're needing to do a transpose of your matrix. I've added comments in the code below to explain what has been changed to make the table.
matrix = []
for index, value in enumerate(['A','C','G','T']):
matrix.append([])
# Don't put colons in column headers
matrix[index].append(value)
for i in range(len(lines[0])):
total = 0
for sequence in lines:
if sequence[i] == value:
total += 1
matrix[index].append(total)
unity = ''
for i in range(len(lines[0])):
column = []
for row in matrix:
column.append(row[1:][i])
maximum = column.index(max(column))
unity += ['A', 'C', 'G', 'T'][maximum]
# Tranpose matrix
matrix = list(map(list, zip(*matrix)))
# Print header with tabs to make it look pretty
print( '\t'+'\t'.join(matrix[0]))
# Print rows in matrix
for row,unit in zip(matrix[1:],unity):
print(unit + ':\t'+'\t'.join(map(str, row)))
The following will be printed:
A C G T
G: 1 0 3 3
G: 2 1 3 1
C: 0 4 2 1
T: 2 2 0 3
A: 3 1 1 2
C: 2 3 2 0
G: 0 2 4 1
C: 0 4 1 2
I think that the best way is to convert your matrix to pandas dataframe and to then use transpose function.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transpose.html
I have the following data:
1 3 4 2 6 7 8 8 93 23 45 2 0 0 0 1
0 3 4 2 6 7 8 8 90 23 45 2 0 0 0 1
0 3 4 2 6 7 8 6 93 23 45 2 0 0 0 1
-1 3 4 2 6 7 8 8 21 23 45 2 0 0 0 1
-1 3 4 2 6 7 8 8 0 23 45 2 0 0 0 1
The above data is in a file. I want to count the number of 1's,0's,-1's but only in 1st column. I am taking the file in standard input but the only way I could think of is to do like this:
cnt = 0
cnt1 = 0
cnt2 = 0
for line in sys.stdin:
(t1, <having 15 different variables as that many columns are in files>) = re.split("\s+", line.strip())
if re.match("+1", t1):
cnt = cnt + 1
if re.match("-1", t1):
cnt1 = cnt1 + 1
if re.match("0", t1):
cnt2 = cnt2 + 1
How can I make it better especially the 15 different variables part as thats the only place where I will be using those variables.
Use collections.Counter:
from collections import Counter
with open('abc.txt') as f:
c = Counter(int(line.split(None, 1)[0]) for line in f)
print c
Output:
Counter({0: 2, -1: 2, 1: 1})
Here str.split(None, 1) splits the line just once:
>>> s = "1 3 4 2 6 7 8 8 93 23 45 2 0 0 0 1"
>>> s.split(None, 1)
['1', '3 4 2 6 7 8 8 93 23 45 2 0 0 0 1']
Numpy makes it even easy:
>>> import numpy as np
>>> from collections import Counter
>>> Counter(np.loadtxt('abc.txt', usecols=(0,), dtype=np.int))
Counter({0: 2, -1: 2, 1: 1})
If you only want the first column, then only split the first column. And use a dictionary to store the counts for each value.
count = dict()
for line in sys.stdin:
(t1, rest) = line.split(' ', 1)
try:
count[t1] += 1
except KeyError:
count[t1] = 1
for item in count:
print '%s occurs %i times' % (item, count[item])
Instead of using tuple unpacking, where you need a number of variables exactly equal to the number of parts returned by split(), you can just use the first element of those parts:
parts = re.split("\s+", line.strip())
t1 = parts[0]
or equivalently, simply
t1 = re.split("\s+", line.strip())[0]
import collections
def countFirstColum(fileName):
res = collections.defaultdict(int)
with open(fileName) as f:
for line in f:
key = line.split(" ")[0]
res[key] += 1;
return res
rows = []
for line in f:
column = line.strip().split(" ")
rows.append(column)
then you get a 2-dimensional array.
1st column:
for row in rows:
print row[0]
output:
1
0
0
-1
-1
This is from a script of mine with an infile, I checked and it works with standard input as infile:
dictionary = {}
for line in someInfile:
line = line.strip('\n') # if infile but you should
f = line.split() # do your standard input thing
dictionary[f[0]]=0
for line in someInfile:
line = line.strip('\n') # if infile but you should
f = line.split() # do your standard input thing
dictionary[f[0]]+=1
print dictionary