replace substring with dict value if the substring matches dict key? - python

I have a list of alphanumeric strings:
list = ["abc 123", "456 jkl"]
I also have a dictionary that contains substrings of the strings in the list above as keys:
dict = {"abc":"xyz", "jkl":"stu"}
I want to update the list using the dictionary so that the result looks like:
result = ["xyz 123", "456 stu"]
Basically, I want to replace any component (and only that component) in the list that matches dictionary keys with dictionary values.
I tried iterating through the dictionary + the list to do so, but I am having trouble updating just the substring. I would also like to learn more efficient/pythonic way of achieving this, please.
for element in list:
for key,value in dictionary.items():
if key in element:
element = value

If you are prepared to use regex:
>>> import re
>>> result = re.sub(
r'\b'+r'|\b'.join(dct)+r'\b',
lambda m: dct.get(m.group(), m.group()),
','.join(lst)
).split(',')
# or
>>> result = [re.sub(
r'\b'+r'|\b'.join(dct)+r'\b',
lambda m: dct.get(m.group(), m.group()),
item
) for item in lst]
>>> result
["xyz 123", "456 stu"]
Where,
r'\b'+r'|\b'.join(dct)+r'\b' joins keys of dct with delimter | to form the pattern string.
lambda m: dct.get(m.group(), m.group()) creates a callable, that, if match found, returns value for that matching key from dct else returns the match as is.
','.join(lst) and .split(',') is a way to do this without a loop, only if your strings do not contain comma, otherwise some other delimiter can be used.

is like this? you must create a new list (based on your code)
list = ["abc 123", "456 jkl"]
dict = {"abc":"xyz", "jkl":"stu"}
newDict = []
for element in list:
for key,value in dict.items():
if key in element:
newElement = element.replace(key, value)
newDict.append(newElement)
print newDict

You might get the result with list comprehension and replace() method like below -
l = ["abc 123", "456 jkl"]
d = {"abc":"xyz", "jkl":"stu"}
l = [e.replace(key, val) for e in l for key, val in d.items() if key in e]
print(l)
Simple loop will do as well.
But remember in this solution if any key of the dictionary present as part of the word that will be replaced as well. If you don't want that then you can split the elements of the list first to get the result. Or you can use regex to do that

Related

Convert the dictionary into tuple and printing it in reverse order

I am having values in dict for example
{"AR":True,"VF":False,"Siss":True}
Now I am only extracting the keys having value TRUE, so I am only getting the output AR and Siss, I am trying to save this output in tuple and now wants to print them out in reverse order like ("Siss","AR").
Below is my code snippet, When I convert it into tuple its showing me output in form of character instead of string
for i in dic:
if dic[i]==True:
t = tuple(i)
print (t)
Reverse(t)
def Reverse(tuples):
new_tup = tuples[::-1]
return new_tup
How to change those characters into words/strings ?
You can do it easily by transverse the dictionary in reversed order and filter out the non True values.
d = {'AR': True, 'VF': False, 'Siss': True}
print(tuple(k for k,v in reversed(d.items()) if v is True))
('Siss', 'AR')
Here is a simple step-wise approach that uses a list as an intermediate, fills it with the appropriate keys from your dictionary, reverses the list, then converts it into a tuple.
dic = {"AR":True,"VF":False,"Siss":True}
lst = []
for key in dic:
if dic[key]: (# ==True is redundant)
lst.append(key)
lst.reverse()
result = tuple(lst)
print(result)
#('Siss', 'AR')
A functional approach:
dictionary = { "AR": True, "VF": False, "Siss": True }
filtered = filter(lambda kv: kv[1], reversed(dictionary.items()))
just_key = map(lambda kv: kv[0], filtered)
print(list(just_key))
It works by:
reversed-ing the key-value pairs in the dictionary
filtering the dictionary's items, removing all the key-value pairs that are False.
Just preserving the key with a map

How do I save changes made in a dictionary values to itself?

I have a dictionary where the values are a list of tuples.
dictionary = {1:[('hello, how are you'),('how is the weather'),('okay
then')], 2:[('is this okay'),('maybe It is')]}
I want to make the values a single string for each key. So I made a function which does the job, but I do not know how to get insert it back to the original dictionary.
my function:
def list_of_tuples_to_string(dictionary):
for tup in dictionary.values():
k = [''.join(i) for i in tup] #joining list of tuples to make a list of strings
l = [''.join(k)] #joining list of strings to make a string
for j in l:
ki = j.lower() #converting string to lower case
return ki
output i want:
dictionary = {1:'hello, how are you how is the weather okay then', 2:'is this okay maybe it is'}
You can simply overwrite the values for each key in the dictionary:
for key, value in dictionary.items():
dictionary[key] = ' '.join(value)
Note the space in the join statement, which joins each string in the list with a space.
It can be done even simpler than you think, just using comprehension dicts
>>> dictionary = {1:[('hello, how are you'),('how is the weather'),('okay then')],
2:[('is this okay'),('maybe It is')]}
>>> dictionary = {key:' '.join(val).lower() for key, val in dictionary.items()}
>>> print(dictionary)
{1: 'hello, how are you how is the weather okay then', 2: 'is this okay maybe It is'}
Now, let's go through the method
we loop through the keys and values in the dictionary with dict.items()
assign the key as itself together with the value as a string consisting of each element in the list.
The elemts are joined together with a single space and set to lowercase.
Try:
for i in dictionary.keys():
dictionary[i]=' '.join(updt_key.lower() for updt_key in dictionary[i])

sorting a list by names in python

I have a list of filenames. I need to group them based on the ending names after underscore ( _ ). My list looks something like this:
[
'1_result1.txt',
'2_result2.txt',
'3_result2.txt',
'4_result3.txt',
'5_result4.txt',
'6_result1.txt',
'7_result2.txt',
'8_result3.txt',
]
My end result should be:
List1 = ['1_result1.txt', '6_result1.txt']
List2 = ['2_result2.txt', '3_result2.txt', '7_result2.txt']
List3 = ['4_result3.txt', '8_result3.txt']
List4 = ['5_result4.txt']
This will come down to making a dictionary of lists, then iterating the input and adding each item to its proper list:
output = {}
for item in inlist:
output.setdefault(item.split("_")[1], []).append(item)
print output.values()
We use setdefault to make sure there's a list for the entry, then add our current filename to the list. output.values() will return just the lists, not the entire dictionary, which appears to be what you want.
using defaultdict from collections module:
from collections import defaultdict
output = defaultdict(list)
for file in data:
output[item.split("_")[1]].append(file)
print output.values()
using groupby from itertools module:
data.sort(key=lambda x: x.split('_')[1])
for key, group in groupby(data, lambda x: x.split('_')[1]):
print list(group)
Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.
The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.
So if l is the name of your list then you could use something like :
l.sort(key=lambda s: s.split('_')[1])
More information about key functions at here

Python Nested List Comprehension with If Else

I was trying to use a list comprehension to replace multiple possible string values in a list of values.
I have a list of column names which are taken from a cursor.description;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
I then have header_replace;
{'MCB': 'SourceA', 'MCA': 'SourceB'}
I would like to replace the string values for header_replace.keys() found within the column names with the values.
I have had to use the following loop;
headers = []
for header in cursor.description:
replaced = False
for key in header_replace.keys():
if key in header[0]:
headers.append(str.replace(header[0], key, header_replace[key]))
replaced = True
break
if not replaced:
headers.append(header[0])
Which gives me the correct output;
['UNIX_Time', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
I tried using this list comprehension;
[str.replace(i[0],k,header_replace[k]) if k in i[0] else i[0] for k in header_replace.keys() for i in cursor.description]
But it meant that items were duplicated for the unmatched keys and I would get;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA',
'UNIX_Time', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB', 'col1_MCB', 'col2_MCB', 'col3_MCB']
But if instead I use;
[str.replace(i[0],k,header_replace[k]) for k in header_replace.keys() for i in cursor.description if k in i[0]]
#Bakuriu fixed syntax
I would get the correct replacement but then loose any items that didn't need to have an string replacement.
['col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
Is there a pythonesque way of doing this or am I over stretching list comprehensions? I certainly find them hard to read.
[str.replace(i[0],k,header_replace[k]) if k in i[0] for k in header_replace.keys() for i in cursor.description]
this is a SyntaxError, because if expressions must contain the else part. You probably meant:
[i[0].replace(k, header_replace[k]) for k in header_replace for i in cursor.description if k in i[0]]
With the if at the end. However I must say that list-comprehension with nested loops aren't usually the way to go.
I would use the expanded for loop. In fact I'd improve it removing the replaced flag:
headers = []
for header in cursor.description:
for key, repl in header_replace.items():
if key in header[0]:
headers.append(header[0].replace(key, repl))
break
else:
headers.append(header[0])
The else of the for loop is executed when no break is triggered during the iterations.
I don't understand why in your code you use str.replace(string, substring, replacement) instead of string.replace(substring, replacement). Strings have instance methods, so you them as such and not as if they were static methods of the class.
If your data is exactly as you described it, you don't need nested replacements and can boil it down to this line:
l = ['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
[i.replace('_MC', '_Source') for i in l]
>>> ['UNIX_Time',
>>> 'col1_SourceA',
>>> 'col2_SourceA',
>>> 'col3_SourceA',
>>> 'col1_SourceB',
>>> 'col2_SourceB',
>>> 'col3_SourceB']
I guess a function will be more readable:
def repl(key):
for k, v in header_replace.items():
if k in key:
return key.replace(k, v)
return key
print map(repl, names)
Another (less readable) option:
import re
rx = '|'.join(header_replace)
print [re.sub(rx, lambda m: header_replace[m.group(0)], name) for name in names]

concatenate the values of dictionary into single string or sequence

i have a dictionary called self.__sequences reads like "ID:DNA sequence", and the following is part of that dictionary
{'1111758': ('TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA\n', ''),
'1111762': ('AGAGTTTGATCCTGGCTCAGATTGA\n', ''),
'1111763': ('AGAGTTTGATCCTGGCCTT\n', '') }
I want to concatenate the values of the dictionary into one single string or sequence (no \n and no ""), that is, I want something like
"TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAAAGAGTTTGATCCTGGCTCAGATTGAAGAGTTTGATCCTGGCCTT"
I write the following code, however, it does not give what I want. I guess it is because the value has two elements(DNA sequence and ""). I am struggling improving my code. Can anyone help me to make it work?
def sequence_statistics(self):
total_len=self.__sequences.values()[0]
for i in range(len(self.__sequences)):
total_len += self.__sequences.values()[i]
return total_len
This will iterate over the sorted keys of your sequences, extract the first value of the tuples in the dict and strip whitespaces. Mind that dicts are unordered in Python 2.7:
''.join(d[k][0].strip() for k in sorted(self.__sequences))
>>> d = {'1111758': ('TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA\n', ''),
... '1111762': ('AGAGTTTGATCCTGGCTCAGATTGA\n', ''),
... '1111763': ('AGAGTTTGATCCTGGCCTT\n', '') }
>>>
>>> lis = []
>>> for tup in d.values():
... lis.append(tup[0].rstrip('\n'))
...
>>> ''.join(lis)
'AGAGTTTGATCCTGGCTCAGATTGAAGAGTTTGATCCTGGCCTTTTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA'
>>>
This is a generator that yields the first element of each value, with the "\n" stripped off:
(value[0].strip() for value in self.__sequences.values())
Since you probably want them sorted by keys, it becomes slightly more complicated:
(value[0].strip() for key, value in sorted(self.__sequences.items()))
And to turn that into a single string joined by '' (empty strings) in between, do:
''.join(value[0].strip() for key, value in sorted(self.__sequences.items()))
Try this code instead:
return "".join(v[0].strip() for k, v in self.__sequences.items())

Categories

Resources