Python: Count Frequency in List and aligned the result [duplicate] - python

This question already has answers here:
Create nice column output in python
(22 answers)
Closed 3 years ago.
I have a list of a random word as a string and I need to count the frequency of it.
a = ['a','ccc','bb','ccc','a','ccc','bb','bb','a','bb']
I want to make it into a loop. So the output will be
1 a 3
2 bb 4
3 ccc 3
with the number is aligned in the right with 4 spaces or character in the left, elements on the list are aligned in the left with 5 characters in the left and the frequency aligned in the right like above.
I know how to count the frequency but I don't know how to arrange them
total_word = {}
for word in clear_word:
if word not in total_word:
total_word[word] = 0
total_word[word] += 1
Sorry to interrupt

There are at least two efficient ways:
from collections import Counter, defaultdict
a = ['a','ccc','bb','ccc','a','ccc','bb','bb','a','bb']
# method 1:
d = defaultdict(int)
for elem in a:
d[elem] += 1
for ctr, k in enumerate(sorted(d), start = 1):
print(ctr,k,'\t',d[k])
# method 2:
d = Counter(a)
for ctr, k in enumerate(sorted(d), start = 1):
print(ctr,k,'\t',d[k])
Output:
1 a 3
2 bb 4
3 ccc 3
EDIT:
Here you go:
a = ['a','ccc','bb','ccc','a','ccc','bb','bb','a','bb']
unique = sorted(set(a))
for ctr, i in enumerate(unique,start=1):
print(ctr,i,'\t',a.count(i))

Try this, using collections.Counter
>>> from collections import Counter
>>> i=1
>>> for k, v in Counter(a).items():
print(f"{i:<3} {k:<10} {v}")
i+=1
Output:
1 a 3
2 ccc 3
3 bb 4

If you like me occasionally have to work on python less then 2.6 you could use oldscool string-formatting like:
print "%3s %-10s %s" % (i, the_word, count)
Here:
%3s will occupy 3 characters and get you left aligned text
%-10s will occupy 10 characters and be right (the minus sign) aligned
This formatting will work in any python-version.

If it is only about string formatting, you can prepare string similarly to:
arr = ['a', 'bb', 'ccc']
for i in range(len(arr)):
print('{} {:4} {}'.format(i, arr[i], i+5))
I am using this site as a resource for string formatting https://pyformat.info/#string_pad_align

I guess this is what your want:
Try this:
clear_word = ['a','ccc','bb','ccc','a','ccc','bb','bb','a','bb','bb','bb','bb','bb','bb','bb','bb','bb','bb']
total_word = {}
for word in clear_word:
if word not in total_word:
total_word[word] = 0
total_word[word] += 1
for i, k in enumerate(total_word):
print(" {0:2} {1:3} {2:5}".format(i, k, total_word[k]))
That is output:
align output

Hi based on your code this loop display your results
for a in total_word.keys():
print(a ,'\t', total_word[a])
This my output
a 3
bb 4
ccc 3

You can align your columns using f-strings (see here to understand the :>{padding}):
padding = max(len(word) for word in total_word.keys()) + 1
for word, count in total_word.items():
print(f"{word:<{padding}}{count:>3}")
if you wanted the index as well then add in enumerate:
for idx, (word, count) in enumerate(total_word.items()):
print(f"{idx:<3}{word:<{padding}}{count:>3}")
Putting it all together:
clear_word = ['a'] * 3 + ['ccc'] * 4 + ['bb'] * 10
total_word = {}
for word in clear_word:
if word not in total_word:
total_word[word] = 0
total_word[word] += 1
padding = max(len(word) for word in total_word.keys()) + 1
for idx, (word, count) in enumerate(total_word.items()):
print(f"{idx:<3}{word:<{padding}}{count:>3}")
your output is
0 a 3
1 ccc 4
2 bb 10

Related

how to access the elements of a list that is a pandas object

I have a listof characters (seqMut2) which is a series pandas object in dataframe, I try to browse this list as a normal list to retrieve the position of elements that are not spaces with this code:
index2 = chDeux[chDeux['allele'] == y].index.values
index3 = chTrois[chTrois['allele'] == x].index.values
list_chDeux = [chDeux.loc[index2, 'chaincode'], chDeux.loc[index2, 'allele'],chDeux.loc[index2, 'sequencegaps'], chDeux.loc[index2, 'sequencegapsalidiff']]
list_chTrois = [chTrois.loc[index3, 'chaincode'], chTrois.loc[index3, 'allele'],chTrois.loc[index3, 'sequencegaps'], chTrois.loc[index3, 'sequencegapsalidiff']]
seqG2 = list_chDeux[2].str.split(pat='')
seqG3 = list_chTrois[2].str.split(pat='')
seqMut2 = list_chDeux[3].str.split(pat='')
seqMut3 = list_chTrois[3].str.split(pat='')
for i in seqMut2 :
if j != " " :
print(j)
pos=seqMut2.index(j)
print(pos)
but with print(j), I see that it retrieves the whole list, so when I try with a normal list (manually without dataframe) I get the right result:
seq=" M M"
list=seq.tolist()
for j in list :
if j != " ":
print(j)
pos=list.index(j)
print(pos)
result: j = M and pos = 3
j = M and pos = 5
You can filter for the rows where the value is different from a space ' '
import pandas as pd
df = pd.DataFrame({'a':list('my name is')})
a
0 m
1 y
2
3 n
4 a
5 m
6 e
7
8 i
9 s
# Get only the values that are not empty strings
print(df[df['a'].ne(' ')])
Output:
a
0 m
1 y
3 n
4 a
5 m
6 e
8 i
9 s
Or, if there is variety of spaces, like 1/2/3, you can use str methods on pandas series, which yields the same result
print(df[df['a'].str.contains('\S+')])
You can use simpler method with a list comprehension and native Python Code :
result = [ch for ch in seq.split(" ") if ch != '']
If you have more special characters then you can also add more filters afterwards
UPDATE :
to get the elements' positions if you have one or several elements besides your elements:
elements = [a for a in seq.split(" ") if a != '']
positions = [seq.split(" ").index(el) for el in elements]
Then you can create a dictionnary with the elements and the position :
dict_pos = {el:pos for (el,pos) in zip(elements, positions)}

assigning a value to specific words in a dataframe in python

Hi I have a dataframe consisting of 7989 rows × 1 columns.
The different rows are consequences from different maritime piracy attack.
I then want to assign a value to the different rows depending on whether or not a specific word is included in one of the different list below. The value assigned will then depend on the different list.
The 6 lists:
five =['kill','execute','dead']
four =['kidnap','hostag','taken','abduct']
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']
I Have tried to do it like this:
df['five']=df.apply(lambda x: '5' if x == 'five' else '-')
and df is my dataframe
Can anyone help?
You can create dictionary for each list with value for number, merge all dictionaries together and then set new columns by numpy.where:
df = pd.DataFrame({'outcom':[['kill','dead'],['abduct','aaaa'],['hostag']]})
#same way add another lists
five = ['kill','execute','dead']
four = ['kidnap','hostag','taken','abduct']
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']
#same way add another dicts
d5 = dict.fromkeys(five, '5')
d4 = dict.fromkeys(four, '4')
d3 = dict.fromkeys(three, '3')
d2 = dict.fromkeys(two, '2')
d1 = dict.fromkeys(one, '1')
d0 = dict.fromkeys(zero, '0')
d = {**d5, **d4, **d3, **d2, **d1, **d0}
print (d)
for k, v in d.items():
df[k] = np.where(df['outcom'].apply(lambda x: k in x), v, '-')
print (df)
outcom kill execute dead kidnap hostag taken abduct
0 [kill, dead] 5 - 5 - - - -
1 [abduct, aaaa] - - - - - - 4
2 [hostag] - - - - 4 - -
Edited
you can use the loc function (documentation) like so:
import pandas as pd
five = ["I", "like"]
df = pd.DataFrame(["I", "like", "bacon", "in", "the", "morning"], columns=["Words"])
Words
0 I
1 likes
2 bacon
3 in
4 the
5 morning
df["New"] = df["Words"].copy()
df.loc[df["New"] == "I", "New"] = 5
Words New
0 I 5
1 like like
2 bacon bacon
3 in in
4 the the
5 morning morning
you can then use a for-loop to help you
Thank you all for the help I think I found a way to make it work:
list_of_words = zero + one + two + three + four + five
outcome_refined = df_Stop2['outcome'].apply(lambda x: [item for item in x if item
in list_of_words])
outcome_numbered=[] #Create an empty list
def max_val(list): #Ensures that then we only get the largest possible value
maximum_value = 0
for i in list:
if i > maximum_value:
maximum_value = i
return [maximum_value]
#Make sure that you loop through each of the lists
for words in outcome_refined:
tmp = [] #Create a temprorary empty list
for word in words:
if word in zero:
word = 0
elif word in one:
word = 1
elif word in two:
word = 2
elif word in three:
word = 3
elif word in four:
word = 4
elif word in five:
word = 5
tmp.append(word)
tmp = max_val(tmp)
outcome_numbered.append(tmp)
df_Stop['outcome_numbered']=outcome_numbered.copy()
df_Stop
Finally working

Count number's digits following by line

I have the number 444333113333 and I want to count every different digit in this number.
4 is 3 times
3 is 3 times
1 is 2 times
3 is 4 times
What I am trying to do is make a script that translates phone keypad taps to letters
like in this photo https://www.dcode.fr/tools/phone-keypad/images/keypad.png
if I press 3 times number 2, then the letter is 'C'
I want to make a script with it in python,but I cannot...
Using regex
import re
pattern = r"(\d)\1*"
text = '444333113333'
matcher = re.compile(pattern)
tokens = [match.group() for match in matcher.finditer(text)] #['444', '333', '11', '3333']
for token in tokens:
print(token[0]+' is '+str(len(token))+' times')
Output
4 is 3 times
3 is 3 times
1 is 2 times
3 is 4 times
You can use itertools.groupby
num = 444333113333
numstr = str(num)
import itertools
for c, cgroup in itertools.groupby(numstr):
print(f"{c} count = {len(list(cgroup))}")
Output:
4 count = 3
3 count = 3
1 count = 2
3 count = 4
Will this do the trick?
the function returns a 2d list with each number and the amount it found. Then you can cycle through the list and to get each all of the values
def count_digits(num):
#making sure num is a string
#adding an extra space so that the code below doesn't skip the last digit
#there is a better way of doing it but I can't seem to figure out it on spot
#essemtially it ignores the last set of char so I am just adding a space
#which will be ignored
num = str(num) + " "
quantity = []
prev_char = num[0]
count = 0
for i in num:
if i != prev_char:
quantity.append([prev_char,count])
count = 1
prev_char = i
elif i.rfind(i) == ([len(num)-1]):
quantity.append([prev_char,count])
count = 1
prev_char = i
else:
count = count + 1
return quantity
num = 444333113333
quantity = count_digits(num)
for i in quantity:
print(str(i[0]) + " is " + str(i[1]) + " times" )
Output:
4 is 3 times
3 is 3 times
1 is 2 times
3 is 4 times

How to take out first 2 digit number from a string in Python

I have 2 strings:
"SP-1-15::PROVPEC=NTK555EA,CTYPE=\"SP-2\",PEC=NTK555EA,REL= 1 ,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=00-005-20-03,ONSC=00-005-19-50:IS-ANR,FLT"
"SP-1-16::PROVPEC=NTK555FA,CTYPE=\"SP-2 Dual CPU\",PEC=NTK555FA,REL= 1 ,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=UNKNOWN,ONSC=UNKNOWN:IS-ANR,FLT"
I want 2 things:
If WRK in string remove 15 and 16 from (SP-1-15 and SP-1-16) resp.
If WRK is not in string, remove the odd value which in this case is 15.
This can be done with re.search and re.sub to meet the conditions of finding WRK in both I used .join() to find it
s = ["SP-1-15::PROVPEC=NTK555EA,CTYPE=\"SP-2\",PEC=NTK555EA,REL= 1 ,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=00-005-20-03,ONSC=00-005-19-50:IS-ANR,FLT", "SP-1-16::PROVPEC=NTK555FA,CTYPE=\"SP-2 Dual CPU\",PEC=NTK555FA,REL= 1 ,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=UNKNOWN,ONSC=UNKNOWN:IS-ANR,FLT"
]
import re
j = ''.join(s)
find_sp = re.compile(r'\d+::', re.I)
for idx, item in enumerate(s):
if 'WRK' in j:
s[idx] = re.sub(r'\d+::', '::', item)
elif 'WRK' not in j:
num = find_sp.search(item)
x = num.group(0).strip('::')
if int(x) % 2:
s[idx] = re.sub(r'\d+::', '::', item)
else:
pass
print(s)
['SP-1-::PROVPEC=NTK555EA,CTYPE="SP-2",PEC=NTK555EA,REL= 1
,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=00-005-20-03,ONSC=00-005-19-50:IS-
ANR,FLT', 'SP-1-16::PROVPEC=NTK555FA,CTYPE="SP-2 Dual
CPU",PEC=NTK555FA,REL= 1
,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=UNKNOWN,ONSC=UNKNOWN:IS-ANR,FLT']
and here with WRK in one of the lines
(xenial)vash#localhost:~/python/stack_overflow$ python3.7 strings.py
['SP-1-::PROVPEC=WRKNTK555EA,CTYPE="SP-2",PEC=NTK555EA,REL= 1
,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=00-005-20-03,ONSC=00-005-19-50:IS-
ANR,FLT', 'SP-1-::PROVPEC=NTK555FA,CTYPE="SP-2 Dual
CPU",PEC=NTK555FA,REL= 1
,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=UNKNOWN,ONSC=UNKNOWN:IS-ANR,FLT']

Write a program that prints the number of times the string contains a substring

s = "bobobobobobsdfsdfbob"
count = 0
for x in s :
if x == "bob" :
count += 1
print count
i want to count how many bobs in string s, the result if this gives me 17
what's wrong with my code i'm newbie python.
When you are looping overt the string, the throwaway variable will hold the characters, so in your loop x is never equal with bob.
If you want to count the non-overlaping strings you can simply use str.count:
In [52]: s.count('bob')
Out[52]: 4
For overlapping sub-strings you can use lookaround in regex:
In [57]: import re
In [59]: len(re.findall(r'(?=bob)', s))
Out[59]: 6
you can use string.count
for example:
s = "bobobobobobsdfsdfbob"
count = s.count("bob")
print(count)
I'm not giving the best solution, just trying to correct your code.
Understanding what for each (a.k.a range for) does in your case
for c in "Hello":
print c
Outputs:
H
e
l
l
o
In each iteration you are comparing a character to a string which results in a wrong answer.
Try something like
(For no overlapping, i.e no span)
s = "bobobobobobsdfsdfbob"
w = "bob"
count = 0
i = 0
while i <= len(s) - len(w):
if s[i:i+len(w)] == w:
count += 1
i += len(w)
else:
i += 1
print (count)
Output:
Count = 4
Overlapping
s = "bobobobobobsdfsdfbob"
w = "bob"
count = 0
for i in range(len(s) - len(w) + 1):
if s[i:i+len(w)] == w:
count += 1
print (count)
Output:
Count = 6

Categories

Resources