split and join strings

split and join strings - python

I get my data in python iterator where each line is a char separated by "\t".
I can create this like :
iter1 = []
str = ""
for j in range (0,3):
for i in range(0,9):
str += "1\t"
str += "1"
iter1.append(str)
str = ""
iter1 is looking like:
['1\t1\t1\t1\t1\t1\t1\t1\t1\t1', '1\t1\t1\t1\t1\t1\t1\t1\t1\t1', '1\t1\t1\t1\t1\t1\t1\t1\t1\t1']
Now, i want to join this iterator by "\n", but I also want that each "\t" will become "\n" so the final result would be :
1
1
1
1
1
1
After joining the iterator lines.
How can I do it in the fastest way?

You get tab-separated values in a list and want to convert all tabs to new-lines:
iter1 = ['\t'.join('1'*10) for _ in range(3)]
result = '\n'.join(iter1).replace('\t', '\n')

>>> iter1 = ['1\t1\t1\t1\t1',
'1\t1\t1\t1\t1',
'1\t1\t1\t1\t1']
>>> s = '\n'.join([char for line in iter1
for char in line.split('\t')])
>>> print s
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

Related

how to access the elements of a list that is a pandas object

I have a listof characters (seqMut2) which is a series pandas object in dataframe, I try to browse this list as a normal list to retrieve the position of elements that are not spaces with this code:
index2 = chDeux[chDeux['allele'] == y].index.values
index3 = chTrois[chTrois['allele'] == x].index.values
list_chDeux = [chDeux.loc[index2, 'chaincode'], chDeux.loc[index2, 'allele'],chDeux.loc[index2, 'sequencegaps'], chDeux.loc[index2, 'sequencegapsalidiff']]
list_chTrois = [chTrois.loc[index3, 'chaincode'], chTrois.loc[index3, 'allele'],chTrois.loc[index3, 'sequencegaps'], chTrois.loc[index3, 'sequencegapsalidiff']]
seqG2 = list_chDeux[2].str.split(pat='')
seqG3 = list_chTrois[2].str.split(pat='')
seqMut2 = list_chDeux[3].str.split(pat='')
seqMut3 = list_chTrois[3].str.split(pat='')
for i in seqMut2 :
if j != " " :
print(j)
pos=seqMut2.index(j)
print(pos)
but with print(j), I see that it retrieves the whole list, so when I try with a normal list (manually without dataframe) I get the right result:
seq=" M M"
list=seq.tolist()
for j in list :
if j != " ":
print(j)
pos=list.index(j)
print(pos)
result: j = M and pos = 3
j = M and pos = 5

You can filter for the rows where the value is different from a space ' '
import pandas as pd
df = pd.DataFrame({'a':list('my name is')})
a
0 m
1 y
2
3 n
4 a
5 m
6 e
7
8 i
9 s
# Get only the values that are not empty strings
print(df[df['a'].ne(' ')])
Output:
a
0 m
1 y
3 n
4 a
5 m
6 e
8 i
9 s
Or, if there is variety of spaces, like 1/2/3, you can use str methods on pandas series, which yields the same result
print(df[df['a'].str.contains('\S+')])

You can use simpler method with a list comprehension and native Python Code :
result = [ch for ch in seq.split(" ") if ch != '']
If you have more special characters then you can also add more filters afterwards
UPDATE :
to get the elements' positions if you have one or several elements besides your elements:
elements = [a for a in seq.split(" ") if a != '']
positions = [seq.split(" ").index(el) for el in elements]
Then you can create a dictionnary with the elements and the position :
dict_pos = {el:pos for (el,pos) in zip(elements, positions)}

Python: Count Frequency in List and aligned the result [duplicate]

This question already has answers here:
Create nice column output in python
(22 answers)
Closed 3 years ago.
I have a list of a random word as a string and I need to count the frequency of it.
a = ['a','ccc','bb','ccc','a','ccc','bb','bb','a','bb']
I want to make it into a loop. So the output will be
1 a 3
2 bb 4
3 ccc 3
with the number is aligned in the right with 4 spaces or character in the left, elements on the list are aligned in the left with 5 characters in the left and the frequency aligned in the right like above.
I know how to count the frequency but I don't know how to arrange them
total_word = {}
for word in clear_word:
if word not in total_word:
total_word[word] = 0
total_word[word] += 1
Sorry to interrupt

There are at least two efficient ways:
from collections import Counter, defaultdict
a = ['a','ccc','bb','ccc','a','ccc','bb','bb','a','bb']
# method 1:
d = defaultdict(int)
for elem in a:
d[elem] += 1
for ctr, k in enumerate(sorted(d), start = 1):
print(ctr,k,'\t',d[k])
# method 2:
d = Counter(a)
for ctr, k in enumerate(sorted(d), start = 1):
print(ctr,k,'\t',d[k])
Output:
1 a 3
2 bb 4
3 ccc 3
EDIT:
Here you go:
a = ['a','ccc','bb','ccc','a','ccc','bb','bb','a','bb']
unique = sorted(set(a))
for ctr, i in enumerate(unique,start=1):
print(ctr,i,'\t',a.count(i))

Try this, using collections.Counter
>>> from collections import Counter
>>> i=1
>>> for k, v in Counter(a).items():
print(f"{i:<3} {k:<10} {v}")
i+=1
Output:
1 a 3
2 ccc 3
3 bb 4

If you like me occasionally have to work on python less then 2.6 you could use oldscool string-formatting like:
print "%3s %-10s %s" % (i, the_word, count)
Here:
%3s will occupy 3 characters and get you left aligned text
%-10s will occupy 10 characters and be right (the minus sign) aligned
This formatting will work in any python-version.

If it is only about string formatting, you can prepare string similarly to:
arr = ['a', 'bb', 'ccc']
for i in range(len(arr)):
print('{} {:4} {}'.format(i, arr[i], i+5))
I am using this site as a resource for string formatting https://pyformat.info/#string_pad_align

I guess this is what your want:
Try this:
clear_word = ['a','ccc','bb','ccc','a','ccc','bb','bb','a','bb','bb','bb','bb','bb','bb','bb','bb','bb','bb']
total_word = {}
for word in clear_word:
if word not in total_word:
total_word[word] = 0
total_word[word] += 1
for i, k in enumerate(total_word):
print(" {0:2} {1:3} {2:5}".format(i, k, total_word[k]))
That is output:
align output

Hi based on your code this loop display your results
for a in total_word.keys():
print(a ,'\t', total_word[a])
This my output
a 3
bb 4
ccc 3

You can align your columns using f-strings (see here to understand the :>{padding}):
padding = max(len(word) for word in total_word.keys()) + 1
for word, count in total_word.items():
print(f"{word:<{padding}}{count:>3}")
if you wanted the index as well then add in enumerate:
for idx, (word, count) in enumerate(total_word.items()):
print(f"{idx:<3}{word:<{padding}}{count:>3}")
Putting it all together:
clear_word = ['a'] * 3 + ['ccc'] * 4 + ['bb'] * 10
total_word = {}
for word in clear_word:
if word not in total_word:
total_word[word] = 0
total_word[word] += 1
padding = max(len(word) for word in total_word.keys()) + 1
for idx, (word, count) in enumerate(total_word.items()):
print(f"{idx:<3}{word:<{padding}}{count:>3}")
your output is
0 a 3
1 ccc 4
2 bb 10

How to take out first 2 digit number from a string in Python

I have 2 strings:
"SP-1-15::PROVPEC=NTK555EA,CTYPE=\"SP-2\",PEC=NTK555EA,REL= 1 ,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=00-005-20-03,ONSC=00-005-19-50:IS-ANR,FLT"
"SP-1-16::PROVPEC=NTK555FA,CTYPE=\"SP-2 Dual CPU\",PEC=NTK555FA,REL= 1 ,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=UNKNOWN,ONSC=UNKNOWN:IS-ANR,FLT"
I want 2 things:
If WRK in string remove 15 and 16 from (SP-1-15 and SP-1-16) resp.
If WRK is not in string, remove the odd value which in this case is 15.

This can be done with re.search and re.sub to meet the conditions of finding WRK in both I used .join() to find it
s = ["SP-1-15::PROVPEC=NTK555EA,CTYPE=\"SP-2\",PEC=NTK555EA,REL= 1 ,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=00-005-20-03,ONSC=00-005-19-50:IS-ANR,FLT", "SP-1-16::PROVPEC=NTK555FA,CTYPE=\"SP-2 Dual CPU\",PEC=NTK555FA,REL= 1 ,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=UNKNOWN,ONSC=UNKNOWN:IS-ANR,FLT"
]
import re
j = ''.join(s)
find_sp = re.compile(r'\d+::', re.I)
for idx, item in enumerate(s):
if 'WRK' in j:
s[idx] = re.sub(r'\d+::', '::', item)
elif 'WRK' not in j:
num = find_sp.search(item)
x = num.group(0).strip('::')
if int(x) % 2:
s[idx] = re.sub(r'\d+::', '::', item)
else:
pass
print(s)
['SP-1-::PROVPEC=NTK555EA,CTYPE="SP-2",PEC=NTK555EA,REL= 1
,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=00-005-20-03,ONSC=00-005-19-50:IS-
ANR,FLT', 'SP-1-16::PROVPEC=NTK555FA,CTYPE="SP-2 Dual
CPU",PEC=NTK555FA,REL= 1
,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=UNKNOWN,ONSC=UNKNOWN:IS-ANR,FLT']
and here with WRK in one of the lines
(xenial)vash#localhost:~/python/stack_overflow$ python3.7 strings.py
['SP-1-::PROVPEC=WRKNTK555EA,CTYPE="SP-2",PEC=NTK555EA,REL= 1
,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=00-005-20-03,ONSC=00-005-19-50:IS-
ANR,FLT', 'SP-1-::PROVPEC=NTK555FA,CTYPE="SP-2 Dual
CPU",PEC=NTK555FA,REL= 1
,CLEI=,SER=NNTM,MDAT=UNKNOWN,AGE=UNKNOWN,ONSC=UNKNOWN:IS-ANR,FLT']

Write a program that prints the number of times the string contains a substring

s = "bobobobobobsdfsdfbob"
count = 0
for x in s :
if x == "bob" :
count += 1
print count
i want to count how many bobs in string s, the result if this gives me 17
what's wrong with my code i'm newbie python.

When you are looping overt the string, the throwaway variable will hold the characters, so in your loop x is never equal with bob.
If you want to count the non-overlaping strings you can simply use str.count:
In [52]: s.count('bob')
Out[52]: 4
For overlapping sub-strings you can use lookaround in regex:
In [57]: import re
In [59]: len(re.findall(r'(?=bob)', s))
Out[59]: 6

you can use string.count
for example:
s = "bobobobobobsdfsdfbob"
count = s.count("bob")
print(count)

I'm not giving the best solution, just trying to correct your code.
Understanding what for each (a.k.a range for) does in your case
for c in "Hello":
print c
Outputs:
H
e
l
l
o
In each iteration you are comparing a character to a string which results in a wrong answer.
Try something like
(For no overlapping, i.e no span)
s = "bobobobobobsdfsdfbob"
w = "bob"
count = 0
i = 0
while i <= len(s) - len(w):
if s[i:i+len(w)] == w:
count += 1
i += len(w)
else:
i += 1
print (count)
Output:
Count = 4
Overlapping
s = "bobobobobobsdfsdfbob"
w = "bob"
count = 0
for i in range(len(s) - len(w) + 1):
if s[i:i+len(w)] == w:
count += 1
print (count)
Output:
Count = 6

Printing out the output in separate lines in python

I am trying to print out the output of the maximum route each in a separate line.
The code is here:
def triangle(rows):
PrintingList = list()
for rownum in range (rows ):
PrintingList.append([])
newValues = map(int, raw_input().strip().split())
PrintingList[rownum] += newValues
return PrintingList
def routes(rows,current_row=0,start=0):
for i,num in enumerate(rows[current_row]):
if abs(i-start) > 1:
continue
if current_row == len(rows) - 1:
yield [num]
else:
for child in routes(rows,current_row+1,i):
yield [num] + child
testcases = int(raw_input())
output = []
for num in range(testcases):
rows= int(raw_input())
triangleinput = triangle(rows)
max_route = max(routes(triangleinput),key=sum)
output.append(sum(max_route))
print '\n'.join(output)
I tried this:
2
3
1
2 3
4 5 6
3
1
2 3
4 5 6
When i try to output out the value, i get this:
print '\n'.join(output)
TypeError: sequence item 0: expected string, int found
How do change this? Need some guidance...

Try this:
print '\n'.join(map(str, output))
Python can only join strings together, so you should convert the ints to strings first. This is what the map(str, ...) part does.

#grc is correct, but instead of creating a new string with newlines, you could simply do:
for row in output:
print row

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

split and join strings - python

You get tab-separated values in a list and want to convert all tabs to new-lines: iter1 = ['\t'.join('1'*10) for _ in range(3)] result = '\n'.join(iter1).replace('\t', '\n')

>>> iter1 = ['1\t1\t1\t1\t1', '1\t1\t1\t1\t1', '1\t1\t1\t1\t1'] >>> s = '\n'.join([char for line in iter1 for char in line.split('\t')]) >>> print s 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Related

how to access the elements of a list that is a pandas object

Python: Count Frequency in List and aligned the result [duplicate]

How to take out first 2 digit number from a string in Python

Write a program that prints the number of times the string contains a substring

Printing out the output in separate lines in python

Categories

Resources