concatenate the values of dictionary into single string or sequence - python

i have a dictionary called self.__sequences reads like "ID:DNA sequence", and the following is part of that dictionary
{'1111758': ('TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA\n', ''),
'1111762': ('AGAGTTTGATCCTGGCTCAGATTGA\n', ''),
'1111763': ('AGAGTTTGATCCTGGCCTT\n', '') }
I want to concatenate the values of the dictionary into one single string or sequence (no \n and no ""), that is, I want something like
"TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAAAGAGTTTGATCCTGGCTCAGATTGAAGAGTTTGATCCTGGCCTT"
I write the following code, however, it does not give what I want. I guess it is because the value has two elements(DNA sequence and ""). I am struggling improving my code. Can anyone help me to make it work?
def sequence_statistics(self):
total_len=self.__sequences.values()[0]
for i in range(len(self.__sequences)):
total_len += self.__sequences.values()[i]
return total_len

This will iterate over the sorted keys of your sequences, extract the first value of the tuples in the dict and strip whitespaces. Mind that dicts are unordered in Python 2.7:
''.join(d[k][0].strip() for k in sorted(self.__sequences))

>>> d = {'1111758': ('TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA\n', ''),
... '1111762': ('AGAGTTTGATCCTGGCTCAGATTGA\n', ''),
... '1111763': ('AGAGTTTGATCCTGGCCTT\n', '') }
>>>
>>> lis = []
>>> for tup in d.values():
... lis.append(tup[0].rstrip('\n'))
...
>>> ''.join(lis)
'AGAGTTTGATCCTGGCTCAGATTGAAGAGTTTGATCCTGGCCTTTTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA'
>>>

This is a generator that yields the first element of each value, with the "\n" stripped off:
(value[0].strip() for value in self.__sequences.values())
Since you probably want them sorted by keys, it becomes slightly more complicated:
(value[0].strip() for key, value in sorted(self.__sequences.items()))
And to turn that into a single string joined by '' (empty strings) in between, do:
''.join(value[0].strip() for key, value in sorted(self.__sequences.items()))

Try this code instead:
return "".join(v[0].strip() for k, v in self.__sequences.items())

Related

replace substring with dict value if the substring matches dict key?

I have a list of alphanumeric strings:
list = ["abc 123", "456 jkl"]
I also have a dictionary that contains substrings of the strings in the list above as keys:
dict = {"abc":"xyz", "jkl":"stu"}
I want to update the list using the dictionary so that the result looks like:
result = ["xyz 123", "456 stu"]
Basically, I want to replace any component (and only that component) in the list that matches dictionary keys with dictionary values.
I tried iterating through the dictionary + the list to do so, but I am having trouble updating just the substring. I would also like to learn more efficient/pythonic way of achieving this, please.
for element in list:
for key,value in dictionary.items():
if key in element:
element = value
If you are prepared to use regex:
>>> import re
>>> result = re.sub(
r'\b'+r'|\b'.join(dct)+r'\b',
lambda m: dct.get(m.group(), m.group()),
','.join(lst)
).split(',')
# or
>>> result = [re.sub(
r'\b'+r'|\b'.join(dct)+r'\b',
lambda m: dct.get(m.group(), m.group()),
item
) for item in lst]
>>> result
["xyz 123", "456 stu"]
Where,
r'\b'+r'|\b'.join(dct)+r'\b' joins keys of dct with delimter | to form the pattern string.
lambda m: dct.get(m.group(), m.group()) creates a callable, that, if match found, returns value for that matching key from dct else returns the match as is.
','.join(lst) and .split(',') is a way to do this without a loop, only if your strings do not contain comma, otherwise some other delimiter can be used.
is like this? you must create a new list (based on your code)
list = ["abc 123", "456 jkl"]
dict = {"abc":"xyz", "jkl":"stu"}
newDict = []
for element in list:
for key,value in dict.items():
if key in element:
newElement = element.replace(key, value)
newDict.append(newElement)
print newDict
You might get the result with list comprehension and replace() method like below -
l = ["abc 123", "456 jkl"]
d = {"abc":"xyz", "jkl":"stu"}
l = [e.replace(key, val) for e in l for key, val in d.items() if key in e]
print(l)
Simple loop will do as well.
But remember in this solution if any key of the dictionary present as part of the word that will be replaced as well. If you don't want that then you can split the elements of the list first to get the result. Or you can use regex to do that

How do I save changes made in a dictionary values to itself?

I have a dictionary where the values are a list of tuples.
dictionary = {1:[('hello, how are you'),('how is the weather'),('okay
then')], 2:[('is this okay'),('maybe It is')]}
I want to make the values a single string for each key. So I made a function which does the job, but I do not know how to get insert it back to the original dictionary.
my function:
def list_of_tuples_to_string(dictionary):
for tup in dictionary.values():
k = [''.join(i) for i in tup] #joining list of tuples to make a list of strings
l = [''.join(k)] #joining list of strings to make a string
for j in l:
ki = j.lower() #converting string to lower case
return ki
output i want:
dictionary = {1:'hello, how are you how is the weather okay then', 2:'is this okay maybe it is'}
You can simply overwrite the values for each key in the dictionary:
for key, value in dictionary.items():
dictionary[key] = ' '.join(value)
Note the space in the join statement, which joins each string in the list with a space.
It can be done even simpler than you think, just using comprehension dicts
>>> dictionary = {1:[('hello, how are you'),('how is the weather'),('okay then')],
2:[('is this okay'),('maybe It is')]}
>>> dictionary = {key:' '.join(val).lower() for key, val in dictionary.items()}
>>> print(dictionary)
{1: 'hello, how are you how is the weather okay then', 2: 'is this okay maybe It is'}
Now, let's go through the method
we loop through the keys and values in the dictionary with dict.items()
assign the key as itself together with the value as a string consisting of each element in the list.
The elemts are joined together with a single space and set to lowercase.
Try:
for i in dictionary.keys():
dictionary[i]=' '.join(updt_key.lower() for updt_key in dictionary[i])

How to print the sorted list with empty string first in python

I have the string like
list = ['stro', 'asdv', '', 'figh']
and I am using:
for ele in sorted(list):
print(ele)
and I needed the out put like:
asdv
figh
stro
empty space element from list
I need to get the empty string at last and the other strings to be in sorted order
and if I make a reverse of the sort I would like to get the output as:
empty string element
stro
figh
asdv
you can print first the elements that are not empty sorted and then you can print the empty element(s):
from itertools import chain
for element in chain(sorted(filter(bool, my_list)), filter(lambda x: not x, my_list)):
print(element)
You need to define your own key for the comparison. You want empty strings last. We can use the fact, that an empty string is falsey.
>>> bool('a')
True
>>> bool('')
False
True is bigger than False so non-empty strings would be sorted after empty strings but we need it the other way round.
>>> not 'a'
False
>>> not ''
True
As a second sort criterion we'll take the string itself. To do that we have to compare the tuple (not s, s) where s is the string.
We can feed this to sorted with the key parameter and a lambda function.
>>> data = ['stro', 'asdv', '', 'figh']
>>> print(sorted(data, key=lambda s: (not s, s)))
['asdv', 'figh', 'stro', '']
If you want it reversed add the reverse parameter.
>>> print(sorted(data, key=lambda s: (not s, s), reverse=True))
['', 'stro', 'figh', 'asdv']
Please note that I renamed your variable list to data. If you use list you overwrite the built-in list and that's a bad idea, even in an example.

How to parse Python set-inside-list structure into text?

I have the following strange format I am trying to parse.
The data structure I am trying to parse is a "set" of key-value pairs in a list:
[{'key1:value1', 'key2:value2', 'key3:value3',...}]
That's the only data I have, and it needs to be processed. I don't think this can be described as a Python data structure, but I need to parse this to become a string like
'key1:value1, key2:value2, key3:value3'.
Is this doable?
EDIT: Yes, it is key:value, not key:value
Also, this is Python3.x
Iterating over .items() and formatting differently then previous answers.
If your data is the following: list of dict objects then
>>> data = [{'key1':'value1', 'key2':'value2', 'key3':'value3'}]
>>> ', '.join('{0}:{1}'.format(*item) for item in my_dict.items() for my_dict in data)
'key2:value2, key3:value3, key1:value1'
If you data is the list of set objects then approach is simpler
>>> from itertools import chain
>>> data = [{'key1:value1', 'key2:value2', 'key3:value3'}]
>>> ', '.join(chain.from_iterable(data))
'key1:value1, key2:value2, key3:value3'
UPD
NOTE: order can be changed, because set and dict objects are not ordered.
', '.join('{0}:{1}'.format(key, value) for key, value in my_dict.iteritems() for my_dict in my_list)
where my_list is name of your list variable
Since your structure (let's call it myStruct) is a set rather than a dict, the following code should do what you want:
result = ", ".join([x for x in myStruct[0]])
Beware, a set is not ordered, so you might end up with something like 'key2:value2, key1:value1, key3:value3'.
I would use itertools
d = {'key1':'value1', 'key2':'value2', 'key3':'value3'}
for k, v in d.iteritems():
print k + v + ',',
So, you have a list whose only element is a dictionary, and you want to get all the keys-value pairs from that dictionary and put them into a string?
If so, try something like this:
d = yourList[0]
s = ""
for key in d.keys():
s += key + ":" + d[key] + ", "
s = s[:-2] #To trim off the last comma that gets added
Try this,
print ', '.join(i for x in list_of_set for i in x)
If it's set there is no issue with the parsing. The code is equals to
output = ''
for x in list of_set:
for i in x:
output += i
print output

Python Nested List Comprehension with If Else

I was trying to use a list comprehension to replace multiple possible string values in a list of values.
I have a list of column names which are taken from a cursor.description;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
I then have header_replace;
{'MCB': 'SourceA', 'MCA': 'SourceB'}
I would like to replace the string values for header_replace.keys() found within the column names with the values.
I have had to use the following loop;
headers = []
for header in cursor.description:
replaced = False
for key in header_replace.keys():
if key in header[0]:
headers.append(str.replace(header[0], key, header_replace[key]))
replaced = True
break
if not replaced:
headers.append(header[0])
Which gives me the correct output;
['UNIX_Time', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
I tried using this list comprehension;
[str.replace(i[0],k,header_replace[k]) if k in i[0] else i[0] for k in header_replace.keys() for i in cursor.description]
But it meant that items were duplicated for the unmatched keys and I would get;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA',
'UNIX_Time', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB', 'col1_MCB', 'col2_MCB', 'col3_MCB']
But if instead I use;
[str.replace(i[0],k,header_replace[k]) for k in header_replace.keys() for i in cursor.description if k in i[0]]
#Bakuriu fixed syntax
I would get the correct replacement but then loose any items that didn't need to have an string replacement.
['col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
Is there a pythonesque way of doing this or am I over stretching list comprehensions? I certainly find them hard to read.
[str.replace(i[0],k,header_replace[k]) if k in i[0] for k in header_replace.keys() for i in cursor.description]
this is a SyntaxError, because if expressions must contain the else part. You probably meant:
[i[0].replace(k, header_replace[k]) for k in header_replace for i in cursor.description if k in i[0]]
With the if at the end. However I must say that list-comprehension with nested loops aren't usually the way to go.
I would use the expanded for loop. In fact I'd improve it removing the replaced flag:
headers = []
for header in cursor.description:
for key, repl in header_replace.items():
if key in header[0]:
headers.append(header[0].replace(key, repl))
break
else:
headers.append(header[0])
The else of the for loop is executed when no break is triggered during the iterations.
I don't understand why in your code you use str.replace(string, substring, replacement) instead of string.replace(substring, replacement). Strings have instance methods, so you them as such and not as if they were static methods of the class.
If your data is exactly as you described it, you don't need nested replacements and can boil it down to this line:
l = ['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
[i.replace('_MC', '_Source') for i in l]
>>> ['UNIX_Time',
>>> 'col1_SourceA',
>>> 'col2_SourceA',
>>> 'col3_SourceA',
>>> 'col1_SourceB',
>>> 'col2_SourceB',
>>> 'col3_SourceB']
I guess a function will be more readable:
def repl(key):
for k, v in header_replace.items():
if k in key:
return key.replace(k, v)
return key
print map(repl, names)
Another (less readable) option:
import re
rx = '|'.join(header_replace)
print [re.sub(rx, lambda m: header_replace[m.group(0)], name) for name in names]

Categories

Resources