I am looking to replace character pairs in a string using a dictionary in python.
It works for single characters but not doubles.
txt = "1122"
def processString6(txt):
dictionary = {'11': 'a', '22':'b'}
transTable = txt.maketrans(dictionary)
txt = txt.translate(transTable)
print(txt)
processString6(txt)
Error Message:
ValueError: string keys in translate table must be of length 1
Desired output:
ab
I'v also tried
s = ' 11 22 '
d = {' 11 ':'a', ' 22 ':'b'}
print( ''.join(d[c] if c in d else c for c in s))
but likewise it doesn't work
looking to use a dictionary as opposed to .replace() as
I just want to scan the string once
as .replace() does a scan for each key,value
You can use this piece of code to replace any length of strings:
import re
txt = "1122"
def processString6(txt):
dictionary = {'11': 'a', '22':'b'}
pattern = re.compile(
'|'.join(sorted(dictionary.keys(), key=len, reverse=True)))
result = pattern.sub(lambda x: dictionary[x.group()], txt)
return result
print(processString6(txt))
Related
Is there a simple way in python to replace multiples characters by another?
For instance, I would like to change:
name1_22:3-3(+):Pos_bos
to
name1_22_3-3_+__Pos_bos
So basically replace all "(",")",":" with "_".
I only know to do it with:
str.replace(":","_")
str.replace(")","_")
str.replace("(","_")
You could use re.sub to replace multiple characters with one pattern:
import re
s = 'name1_22:3-3(+):Pos_bos '
re.sub(r'[():]', '_', s)
Output
'name1_22_3-3_+__Pos_bos '
Use a translation table. In Python 2, maketrans is defined in the string module.
>>> import string
>>> table = string.maketrans("():", "___")
In Python 3, it is a str class method.
>>> table = str.maketrans("():", "___")
In both, the table is passed as the argument to str.translate.
>>> 'name1_22:3-3(+):Pos_bos'.translate(table)
'name1_22_3-3_+__Pos_bos'
In Python 3, you can also pass a single dict mapping input characters to output characters to maketrans:
table = str.maketrans({"(": "_", ")": "_", ":": "_"})
Sticking to your current approach of using replace():
s = "name1_22:3-3(+):Pos_bos"
for e in ((":", "_"), ("(", "_"), (")", "__")):
s = s.replace(*e)
print(s)
OUTPUT:
name1_22_3-3_+___Pos_bos
EDIT: (for readability)
s = "name1_22:3-3(+):Pos_bos"
replaceList = [(":", "_"), ("(", "_"), (")", "__")]
for elem in replaceList:
print(*elem) # : _, ( _, ) __ (for each iteration)
s = s.replace(*elem)
print(s)
OR
repList = [':','(',')'] # list of all the chars to replace
rChar = '_' # the char to replace with
for elem in repList:
s = s.replace(elem, rChar)
print(s)
Another possibility is usage of so-called list comprehension combined with so-called ternary conditional operator following way:
text = 'name1_22:3-3(+):Pos_bos '
out = ''.join(['_' if i in ':)(' else i for i in text])
print(out) #name1_22_3-3_+__Pos_bos
As it gives list, I use ''.join to change list of characters (strs of length 1) into str.
I have a list of keywords and I want to parse through a list of long strings for the keyword, any mention of a price in currency format and any other number in the string less than 10. For example:
keywords = ['Turin', 'Milan' , 'Nevada']
strings = ['This is a sentence about Turin with 5 and $10.00 in it.', ' 2.5 Milan is a city with £1,000 in it.', 'Nevada and $1,100,000. and 10.09']]
would hopefully return the following:
final_list = [('Turin', '$10.00', '5'), ('Milan', '£1,000', '2.5'), ('Nevada', '$1,100,000', '')]
I've got the following function with functioning regexes but I don't know how to combine the outputs into a list of tuples. Is there an easier way to achieve this? Should I split by word then look for matches?
def find_keyword_comments(list_of_strings,keywords_a):
list_of_tuples = []
for string in list_of_strings:
keywords = '|'.join(keywords_a)
keyword_rx = re.findall(r"^\b({})\b$".format(keywords), string, re.I)
price_rx = re.findall(r'^[\$\£\€]\s?\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{1,2})?$', string)
number_rx1 = re.findall(r'\b\d[.]\d{1,2}\b', string)
number_rx2 = re.findall(r'\s\d\s', string)
You can use re.findall:
import re
keywords = ['Turin', 'Milan' , 'Nevada']
strings = ['This is a sentence about Turin with 5 and $10.00 in it.', '2.5 Milan is a city with £1,000 in it.', 'Nevada and $1,100,000. and 10.09']
grouped_strings = [(i, [b for b in strings if i in b]) for i in keywords]
new_groups = [(a, filter(lambda x:re.findall('\d', x),[re.findall('[\$\d\.£,]+', c) for c in b][0])) for a, b in grouped_strings]
last_groups = [(a, list(filter(lambda x:re.findall('\d', x) and float(x) < 10 if x[0].isdigit() else True, b))) for a, b in new_groups]
Output:
[('Turin', ['5', '$10.00']), ('Milan', ['2.5', '£1,000']), ('Nevada', ['$1,100,000.'])]
I want to get a sequence of DNA as a string and i need to split the string into parts of a list.And each part must contain same characters only.And final output must be a list according to the order of the original sequence using python 3.4
Ex:- infected ="AATTTGCCAAA"
I need to get the output as followed
Modified. = ['AA','TTT','G','CC','AAA' ]
It's what that itertools.groupby is for :
>>> from itertools import groupby
>>> infected ="AATTTGCCAAA"
>>>
>>> [''.join(g) for _,g in groupby(infected)]
['AA', 'TTT', 'G', 'CC', 'AAA']
def fchar(ch,mi):
global numLi
fc=ch
li=""
for c in infected[mi:]:
if fc==c :
li+=fc
mi = mi+1
else:
break
if mi<len(infected) :
return li+" "+fchar(infected[mi],mi)
else:
return li
infected =input("Enter DNA sequence\n") ;#"AAATTTTTTTTGCCCCCCA"
x=fchar(infected[0],0)
newSet = x.split(' ')
print(newSet)
I want to replace my string based on the values in my dictionary. I want to try this with regular expression.
d = { 't':'ch' , 'r' : 'gh'}
s = ' Text to replace '
m = re.search('#a pattern to just get each character ',s)
m.group() # this should get me 'T' 'e' 'x' 't' .....
# how can I replace each character in string S with its corresponding key: value in my dictionary? I looked at re.sub() but could figure out how it can be used here.
I want to generate an output -> Texch cho gheplace
Using re.sub:
>>> d = { 't':'ch' , 'r' : 'gh'}
>>> s = ' Text to replace '
>>> import re
>>> pattern = '|'.join(map(re.escape, d))
>>> re.sub(pattern, lambda m: d[m.group()], s)
' Texch cho gheplace '
The second argument to the re.sub can be a function. The return value of the function is used as a replacement string.
If there is no character in the values of the dictionary appear as a key in the dictionary, then its fairly simple. You can straight away use str.replace function, like this
for char in d:
s = s.replace(char, d[char])
print s # Texch cho gheplace
Even simpler, you can use the following and this will work even if the keys appear in any of the values in the dictionary.
s, d = ' Text to replace ', { 't':'ch' , 'r' : 'gh'}
print "".join(d.get(char, char) for char in s) # Texch cho gheplace
I have a complicated string and would like to try to extract multiple substring from it.
The string consists of a set of items, separated by commas. Each item has an identifier (id-n) for a pair of words inside which is enclosed by brackets. I want to get only the word inside the bracket which has a number attached to its end (e.g. 'This-1'). The number actually indicates the position of how the words should be arrannged after extraction.
#Example of how the individual items would look like
id1(attr1, is-2) #The number 2 here indicates word 'is' should be in position 2
id2(attr2, This-1) #The number 1 here indicates word 'This' should be in position 1
id3(attr3, an-3) #The number 3 here indicates word 'an' should be in position 3
id4(attr4, example-4) #The number 4 here indicates word 'example' should be in position 4
id5(attr5, example-4) #This is a duplicate of the word 'example'
#Example of string - this is how the string with the items looks like
string = "id1(attr1, is-1), id2(attr2, This-2), id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"
#This is how the result should look after extraction
result = 'This is an example'
Is there an easier way to do this? Regex doesn't work for me.
A trivial/naive approach:
>>> z = [x.split(',')[1].strip().strip(')') for x in s.split('),')]
>>> d = defaultdict(list)
>>> for i in z:
... b = i.split('-')
... d[b[1]].append(b[0])
...
>>> ' '.join(' '.join(d[t]) for t in sorted(d.keys(), key=int))
'is This an example example'
You have duplicated positions for example in your sample string, which is why example is repeated in the code.
However, your sample is not matching your requirements either - but this results is as per your description. Words arranged as per their position indicators.
Now, if you want to get rid of duplicates:
>>> ' '.join(e for t in sorted(d.keys(), key=int) for e in set(d[t]))
'is This an example'
Why not regex? This works.
In [44]: s = "id1(attr1, is-2), id2(attr2, This-1), id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"
In [45]: z = [(m.group(2), m.group(1)) for m in re.finditer(r'(\w+)-(\d+)\)', s)]
In [46]: [x for y, x in sorted(set(z))]
Out[46]: ['This', 'is', 'an', 'example']
OK, how about this:
sample = "id1(attr1, is-2), id2(attr2, This-1),
id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"
def make_cryssie_happy(s):
words = {} # we will use this dict later
ll = s.split(',')[1::2]
# we only want items like This-1, an-3, etc.
for item in ll:
tt = item.replace(')','').lstrip()
(word, pos) = tt.split('-')
words[pos] = word
# there can only be one word at a particular position
# using a dict with the numbers as positions keys
# is an alternative to using sets
res = [words[i] for i in sorted(words)]
# sort the keys, dicts are unsorted!
# create a list of the values of the dict in sorted order
return ' '.join(res)
# return a nice string
print make_cryssie_happy(sample)