I was trying to use a list comprehension to replace multiple possible string values in a list of values.
I have a list of column names which are taken from a cursor.description;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
I then have header_replace;
{'MCB': 'SourceA', 'MCA': 'SourceB'}
I would like to replace the string values for header_replace.keys() found within the column names with the values.
I have had to use the following loop;
headers = []
for header in cursor.description:
replaced = False
for key in header_replace.keys():
if key in header[0]:
headers.append(str.replace(header[0], key, header_replace[key]))
replaced = True
break
if not replaced:
headers.append(header[0])
Which gives me the correct output;
['UNIX_Time', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
I tried using this list comprehension;
[str.replace(i[0],k,header_replace[k]) if k in i[0] else i[0] for k in header_replace.keys() for i in cursor.description]
But it meant that items were duplicated for the unmatched keys and I would get;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA',
'UNIX_Time', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB', 'col1_MCB', 'col2_MCB', 'col3_MCB']
But if instead I use;
[str.replace(i[0],k,header_replace[k]) for k in header_replace.keys() for i in cursor.description if k in i[0]]
#Bakuriu fixed syntax
I would get the correct replacement but then loose any items that didn't need to have an string replacement.
['col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
Is there a pythonesque way of doing this or am I over stretching list comprehensions? I certainly find them hard to read.
[str.replace(i[0],k,header_replace[k]) if k in i[0] for k in header_replace.keys() for i in cursor.description]
this is a SyntaxError, because if expressions must contain the else part. You probably meant:
[i[0].replace(k, header_replace[k]) for k in header_replace for i in cursor.description if k in i[0]]
With the if at the end. However I must say that list-comprehension with nested loops aren't usually the way to go.
I would use the expanded for loop. In fact I'd improve it removing the replaced flag:
headers = []
for header in cursor.description:
for key, repl in header_replace.items():
if key in header[0]:
headers.append(header[0].replace(key, repl))
break
else:
headers.append(header[0])
The else of the for loop is executed when no break is triggered during the iterations.
I don't understand why in your code you use str.replace(string, substring, replacement) instead of string.replace(substring, replacement). Strings have instance methods, so you them as such and not as if they were static methods of the class.
If your data is exactly as you described it, you don't need nested replacements and can boil it down to this line:
l = ['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
[i.replace('_MC', '_Source') for i in l]
>>> ['UNIX_Time',
>>> 'col1_SourceA',
>>> 'col2_SourceA',
>>> 'col3_SourceA',
>>> 'col1_SourceB',
>>> 'col2_SourceB',
>>> 'col3_SourceB']
I guess a function will be more readable:
def repl(key):
for k, v in header_replace.items():
if k in key:
return key.replace(k, v)
return key
print map(repl, names)
Another (less readable) option:
import re
rx = '|'.join(header_replace)
print [re.sub(rx, lambda m: header_replace[m.group(0)], name) for name in names]
Related
I have a list:
foolist = ['123-asd', '234-asd', '345-asd']
I want to find the index of the string that contains '234'. At the moment I have this:
boollist = ['234' in i for i in foolist]
which would produce a list of True and Falses for each index for foolist.
Then to find the index, [print(i) for i,j in enumerate(foolist) if j==True]
This seems abit unecessary, does anyone have a more eloquent method of finding the index of a string in list based on a part of that string
You can use next with a generator expression and split by a specific character to reflect the structure of your strings:
foolist = ['123-asd','234-asd','345-asd']
res = next(i for i, j in enumerate(foolist) if '123' in j.split('-')[0]) # 0
If there can be multiple matches, you can use a list comprehension:
res = [i for i, j in enumerate(foolist) if '123' in j.split('-')[0]] # [0]
For equality, you should use j.split('-')[0]] == '123' as your condition instead.
Note on print within a comprehension
It's good practice to avoid print in a list comprehension, as it will return None. You will notice that, while your terminal will print the indices it finds, you are also building a list of None items with length equal to the number of matches. For all intents, that list has no use.
If you are only interested in printing items, you can use a for loop with a generator expression:
for idx in (i for i, j in enumerate(foolist) if '123' in j.split('-')[0]):
print(idx)
i have a dictionary called self.__sequences reads like "ID:DNA sequence", and the following is part of that dictionary
{'1111758': ('TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA\n', ''),
'1111762': ('AGAGTTTGATCCTGGCTCAGATTGA\n', ''),
'1111763': ('AGAGTTTGATCCTGGCCTT\n', '') }
I want to concatenate the values of the dictionary into one single string or sequence (no \n and no ""), that is, I want something like
"TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAAAGAGTTTGATCCTGGCTCAGATTGAAGAGTTTGATCCTGGCCTT"
I write the following code, however, it does not give what I want. I guess it is because the value has two elements(DNA sequence and ""). I am struggling improving my code. Can anyone help me to make it work?
def sequence_statistics(self):
total_len=self.__sequences.values()[0]
for i in range(len(self.__sequences)):
total_len += self.__sequences.values()[i]
return total_len
This will iterate over the sorted keys of your sequences, extract the first value of the tuples in the dict and strip whitespaces. Mind that dicts are unordered in Python 2.7:
''.join(d[k][0].strip() for k in sorted(self.__sequences))
>>> d = {'1111758': ('TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA\n', ''),
... '1111762': ('AGAGTTTGATCCTGGCTCAGATTGA\n', ''),
... '1111763': ('AGAGTTTGATCCTGGCCTT\n', '') }
>>>
>>> lis = []
>>> for tup in d.values():
... lis.append(tup[0].rstrip('\n'))
...
>>> ''.join(lis)
'AGAGTTTGATCCTGGCTCAGATTGAAGAGTTTGATCCTGGCCTTTTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA'
>>>
This is a generator that yields the first element of each value, with the "\n" stripped off:
(value[0].strip() for value in self.__sequences.values())
Since you probably want them sorted by keys, it becomes slightly more complicated:
(value[0].strip() for key, value in sorted(self.__sequences.items()))
And to turn that into a single string joined by '' (empty strings) in between, do:
''.join(value[0].strip() for key, value in sorted(self.__sequences.items()))
Try this code instead:
return "".join(v[0].strip() for k, v in self.__sequences.items())
I have a dict like this:
d = {'a':'b+c', 'b':'f+g', 'f':'y+u'}
I want to recursively replace the letters in the values that are also keys, so I end up with:
d = {'a':'y+u+g+c', 'b':'y+u+g', 'f':'y+u'}
I tried using this code:
def getval(key,d):
if d.has_key(key):
temp=re.findall('\w+',d[key])
for i in range(len(temp)):
if d.has_key(temp[i]):
getval(temp[i],d)
else:
continue
for k,v in d.iteritems():
temp=re.findall('\w+',d[k])
for i in range(len(temp)):
if d.has_key(temp[i]):
getval(temp[i],d)
But it doesn't work. How can I do it? My real dictionary is much larger, but definitely doesn't contain any cycles.
I am actually not sure recursion is the most appropriate method here, here is a solution that makes replacements in a loop until none of the replacements change the current value:
import re
def make_replacements(d):
r = d.copy()
regex = dict((k, re.compile(r'\b' + re.escape(k) + r'\b')) for k in r)
for k in r:
done = False
while not done:
done = True
for k2 in r:
n = regex[k2].sub(r[k2], r[k])
if n != r[k]:
r[k] = n
done = False
return r
print make_replacements({'a': 'b+c', 'b': 'f+g', 'f': 'y+u'})
# {'a': 'y+u+g+c', 'b': 'y+u+g', 'f': 'y+u'}
Note that this doesn't detect any loops in the input, so if you give it something like {'a':'b+c','b':'c+a','c':'a+b'} it will enter an infinite loop (although it sounds like this should never happen from your comment).
The problem with iterative methods like that is that their runtime is highly sensitive to the depth of the nesting and the order of the items in the dict. This recursive version runs in linear time with the total number of "segments" in the resulting dict, where a segment is each piece of the expression that came from one of the original values.
It also isn't dependent on what symbols are used, so long as the strings used as keys aren't used for anything else.
import re
# this function both returns and mutates
# so that each list only has to be flattened once
def flatten(lst):
new_lst = []
for i, item in enumerate(lst):
if isinstance(item, list):
new_lst.extend(flatten(item))
else:
new_lst.append(item)
lst[:] = new_lst
return lst
def flatten_symbols(d):
# split the values using the keys as delimiters
delims = re.compile('({})'.format('|'.join(d)))
d = dict((key, delims.split(value)) for key, value in d.iteritems())
# turn the value lists into recursive lists
# replacing each occurence of a key with the corresponding value
for key, value in d.iteritems():
for i, item in enumerate(value):
if item in d:
d[key][i] = d[item]
# flatten the recursive lists
return dict((key, ''.join(flatten(value))) for key, value in d.iteritems())
d={'s1':{'a':'b+c','b':'f+g', 'f': 'd+e', 'e': 'h+i'},'s2':{'a':'b+c','b':'f+g'}}
new_d = dict((key, flatten_symbols(subdict)) for key, subdict in d.iteritems())
print new_d
You need to put that code in a function. The line of your comment should then call that function on whatever you want to replace, put it into the string, and assign the result into the dict.
This is an iterative procedure with a fuse that blows with too many iterations to check mutual infinite substitution.
Uses regex to split the string with multiple delimiters
Normalizes the string to
remove whitespaces.
Escapes the tokens, so you do not need to escape
the delimiters
Try the following implementation.
>>> def replace(d,delims,limit=5):
#Remove any whitespace characters
d=dict((k,v.translate(None,string.whitespace)) for k,v in d.iteritems())
#Escape the regex tokens
delims=re.escape(delims)
for i in range(limit): #Loop Limit, to prevent infinite Loop
changed=False
for k,v in d.iteritems():
#Its best to use regex if multiple tokens is involved
r="+".join(d.get(e,e) for e in re.split(delims,v))
if r!=v:
#Break if no change in any iteration
changed=True
d[k]=r
if not changed:
break
return d
>>> replace(d,"+")
{'a': 'y+u+g+c', 'b': 'y+u+g', 'f': 'y+u'}
I want to check if a value is in a list, no matter what the case of the letters are, and I need to do it efficiently.
This is what I have:
if val in list:
But I want it to ignore case
check = "asdf"
checkLower = check.lower()
print any(checkLower == val.lower() for val in ["qwert", "AsDf"])
# prints true
Using the any() function. This method is nice because you aren't recreating the list to have lowercase, it is iterating over the list, so once it finds a true value, it stops iterating and returns.
Demo : http://codepad.org/dH5DSGLP
If you know that your values are all of type str or unicode, you can try this:
if val in map(str.lower, list):
...Or:
if val in map(unicode.lower, list):
If you really have just a list of the values, the best you can do is something like
if val.lower() in [x.lower() for x in list]: ...
but it would probably be better to maintain, say, a set or dict whose keys are lowercase versions of the values in the list; that way you won't need to keep iterating over (potentially) the whole list.
Incidentally, using list as a variable name is poor style, because list is also the name of one of Python's built-in types. You're liable to find yourself trying to call the list builtin function (which turns things into lists) and getting confused because your list variable isn't callable. Or, conversely, trying to use your list variable somewhere where it happens to be out of scope and getting confused because you can't index into the list builtin.
You can lower the values and check them:
>>> val
'CaSe'
>>> l
['caSe', 'bar']
>>> val in l
False
>>> val.lower() in (i.lower() for i in l)
True
items = ['asdf', 'Asdf', 'asdF', 'asjdflk', 'asjdklflf']
itemset = set(i.lower() for i in items)
val = 'ASDF'
if val.lower() in itemset: # O(1)
print('wherever you go, there you are')
Howdy, codeboys and codegirls!
I have came across a simple problem with seemingly easy solution. But being a Python neophyte I feel that there is a better approach somewhere.
Say you have a list of mixed strings. There are two basic types of strings in the sack - ones with "=" in them (a=potato) and ones without (Lady Jane). What you need is to sort them into two lists.
The obvious approach is to:
for arg in arguments:
if '=' in arg:
equal.append(arg)
else:
plain.append(arg)
Is there any other, more elegant way into it? Something like:
equal = [arg for arg in arguments if '=' in arg]
but to sort into multiple lists?
And what if you have more than one type of data?
Try
for arg in arguments:
lst = equal if '=' in arg else plain
lst.append(arg)
or (holy ugly)
for arg in arguments:
(equal if '=' in arg else plain).append(arg)
A third option: Create a class which offers append() and which sorts into several lists.
You can use itertools.groupby() for this:
import itertools
f = lambda x: '=' in x
groups = itertools.groupby(sorted(data, key=f), key=f)
for k, g in groups:
print k, list(g)
I would just go for two list comprehensions. While that does incur some overhead (two loops on the list), it is more Pythonic to use a list comprehension than to use a for. It's also (in my mind) much more readable than using all sorts of really cool tricks, but that less people know about.
def which_list(s):
if "=" in s:
return 1
return 0
lists = [[], []]
for arg in arguments:
lists[which_list(arg)].append(arg)
plain, equal = lists
If you have more types of data, add an if clause to which_list, and initialize lists to more empty lists.
I would go for Edan's approach, e.g.
equal = [arg for arg in arguments if '=' in arg]
plain = [arg for arg in arguments if '=' not in arg]
I read somewhere here that you might be interested in a solution that
will work for more than two identifiers (equals sign and space).
The following solution just requires you update the uniques set with
anything you would like to match, the results are placed in a dictionary of lists
with the identifier as the key.
uniques = set('= ')
matches = dict((key, []) for key in uniques)
for arg in args:
key = set(arg) & uniques
try:
matches[key.pop()].append(arg)
except KeyError:
# code to handle where arg does not contain = or ' '.
Now the above code assumes that you will only have a single match for your identifier
in your arg. I.e that you don't have an arg that looks like this 'John= equalspace'.
You will have to also think about how you would like to treat cases that don't match anything in the set (KeyError occurs.)
Another approach is to use the filter function, although it's not the most efficient solution.
Example:
>>> l = ['a=s','aa','bb','=', 'a+b']
>>> l2 = filter(lambda s: '=' in s, l)
>>> l3 = filter(lambda s: '+' in s, l)
>>> l2
['a=s', '=']
>>> l3
['a+b']
I put this together, and then see that Ned Batchelder was already on this same tack. I chose to package the splitting method instead of the list chooser, though, and to just use the implicit 0/1 values for False and True.
def split_on_condition(source, condition):
ret = [],[]
for s in source:
ret[condition(s)].append(s)
return ret
src = "z=1;q=2;lady jane;y=a;lucy in the sky".split(';')
plain,equal = split_on_condition(src, lambda s:'=' in s)
Your approach is the best one. For sorting just into two lists it can't get clearer than that. If you want it to be a one-liner, encapsulate it in a function:
def classify(arguments):
equal, plain = [], []
for arg in arguments:
if '=' in arg:
equal.append(arg)
else:
plain.append(arg)
return equal, plain
equal, plain = classify(lst)