I tend to use list comprehension a lot in Python because I think it is a clean way to generate lists, but often I find myself coming back a week later and thinking to myself "What the hell did I do this for?!" and it's a 70+ character nested conditional list comprehension statement. I am wondering if it gets to a certain point if I should break it out into if/elif/else, and the performance impact, if any of doing so.
My current circumstance:
Returned structure from call is a list of tuples. I need to cast it to a list, some values need to be cleaned up, and I need to strip the last element from the list.
e.g.
[(val1, ' ', 'ChangeMe', 'RemoveMe),
(val1, ' ', 'ChangeMe', 'RemoveMe),
(val1, ' ', 'ChangeMe', 'RemoveMe)]
So in this case, I want to remove RemoveMe, replace all ' ' with '' and replace ChangeMe with val2. I know it is a lot of changes, but the data I am returned is terrible sometimes and I have no control over what is coming to me as a response.
I currently have something like:
response = cursor.fetchall()
response = [['' if item == ' ' else item if item != 'ChangeMe' else 'val2' for item in row][:-1] for row in response]`
Is a nested multi-conditional comprehension statement frowned upon? I know stylistically Python prefers to be very readable, but also compact and not as verbose.
Any tips or info would be greatly appreciated. Thanks all!
Python favors one-liner, on the sole condition that these make the code more readable, and not that it complicates it.
In this case, you use two nested list comprehension, two adjacent ternary operators, a list slicing, all of this on a single line which exceeds the 100 characters... It is everything but readable.
Sometimes it's better to use a classic for loop.
result = []
for val, space, item, remove in response:
result.append([val, '', 'val2'])
And then you realise you can write it as a list comprehension much more comprehensible (assuming your filter condition is simple):
result = [[val, '', 'val2'] for val, *_ in response]
Remember, every code is written once, but it is read many times.
This is one quick way you could do a list-comprehension making use of a dictionary for mapping items:
response = [('val1', ' ', 'ChangeMe', 'RemoveMe'), ('val1', ' ', 'ChangeMe', 'RemoveMe'), ('val1', ' ', 'ChangeMe', 'RemoveMe')]
map_dict = {' ': '', 'ChangeMe': 'val2', 'val1': 'val1'}
response = [tuple(map_dict[x] for x in tupl if x != 'RemoveMe') for tupl in response]
# [('val1', '', 'val2'), ('val1', '', 'val2'), ('val1', '', 'val2')]
Related
I have a dataframe which contians nested lists. Some of those lists are empty, or contain only whitespaces. e.g:
df=pd.DataFrame({'example':[[[' ', ' '],['Peter'],[' ', ' '],['bla','blaaa']]]})
for my further operations they are not allowed to be empty and cannot be deleted. Is there a way to fill them with e.g. 'some_string
i thought of something similar to
df.example = [[[a.replace(' ','some_string')if all a in i =='\s'for a in i]for i in x] for x in df.example], but this yields an invalid syntax error, further it wouldnt just fill the list, but each whitespace in the list.
Since i am still learning python, my idea of a solution might be too complicated or completely wrong.
i.e. the solution should look like:
example
0 [[some_string], [Peter], [some_string], [bla, blaaa]
Using apply
Ex:
df=pd.DataFrame({'example':[[[' ', ' '],['Peter'],[' ', ' '],['bla','blaaa']]]})
df["example"] = df["example"].apply(lambda x: [i if "".join(i).strip() else ['some_string'] for i in x])
print(df)
Output:
example
0 [[some_string], [Peter], [some_string], [bla, ...
Note: This will be slow if you data is very large because of the iteration.
Say you have the following code:
bicycles = ['Trek','Cannondale','Redline','Secialized']
print(bicycles[0],bicycles[1],bicycles[2],bicycles[3])
This would print out:
Trek Cannondale Redline Specialized
I have two questions. First, Is there a way to make the print string more organized so that you don't have to type out bicycles multiple times? I know that if you were to just do:
print(bicycles)
It would print the brackets also, which I'm trying to avoid.
Second question, how would I insert commas to display within the list when its printed?
This is how I would like the outcome:
Trek, Cannondale, Redline, Specialized.
I know that I could just do
print("Trek, Cannondale, Redline, Specialized.")
But using a list, is there anyway to make it more organzed? Or would printing the sentence out be the smartest way of doing it?
use .join() method:
The method join() returns a string in which the string elements of
sequence have been joined by str separator.
syntax: str.join(sequence)
bicycles = ['Trek','Cannondale','Redline','Secialized']
print (' '.join(bicycles))
output:
Trek Cannondale Redline Secialized
Example: change separotor into ', ':
print (', '.join(bicycles))
output:
Trek, Cannondale, Redline, Secialized
For python 3. you can also use unpacking:
We can use * to unpack the list so that all elements of it can be
passed as different parameters.
We use operator *
bicycles = ['Trek','Cannondale','Redline','Secialized']
print (*bicycles)
output:
Trek Cannondale Redline Secialized
NOTE:
It's using ' ' as a default separator, or specify one, eg:
print(*bicycles, sep=', ')
Output:
Trek, Cannondale, Redline, Secialized
It will also work if the elements in the list are different types (without having to explicitly cast to string)
eg, if bicycles was ['test', 1, 'two', 3.5, 'something else']
bicycles = ['test', 1, 'two', 3.5, 'something else']
print(*bicycles, sep=', ')
output:
test, 1, two, 3.5, something else
You can use join:
' '.join(bicycles)
', '.join(bicycles)
What is an efficient python algorithm to remove all mirrored text duplicates in a list where the items are in the format as below?
ExList = [' dutch italian english', ' italian english dutch', ' dutch italian german', ' dutch german italian' ]
Required result: [' dutch english italian ', 'dutch german italian' ]
This solution uses the set datastructure and focuses on producing compact code, mostly with list/set/generator comprehenstions. If this is a homework task for a beginner course and you just copy the result, it will be very obvious that you did not write the code yourself. Try to follow the thought process and reproduce the results yourself.
1) split each element at " " (space)
for item in ExList:
splitted = item.split(" ")
2) remove now empty elements due to superfluous spaces in the input. This can be done in 1 line with the step above (empty strings are "falsy") using a list comprehenstion:
for item in ExList:
splitted = [lang for lang in item.split(" ") if lang]
3) Put the result in a set, which by definition disregards order and ignores duplicates. For this step we primarily need the property of unordered identity, meaning set([1, 2]) == set([2, 1]). This can be combined with the line above using a generator comprehension:
for item in ExList:
itemSet = set(lang for lang in item.split(" ") if lang)
Now, within that loop, put all those sets of languages into another set. This time, because all the item sets with the same items in any order are considered equal, the outer set will automatically disregard any duplicates. To be able to put the item set into another set, it needs to be immutable (because mutability might cause a change in identity), which is called a frozenset in python. The code looks like this:
ExList = [' dutch italian english', ' italian english dutch', ' dutch italian german', ' dutch german italian' ]
result = set()
for item in ExList:
result.add(frozenset(lang for lang in item.split(" ") if lang))
Or, as a set comprehension on one line:
result = {frozenset(lang for lang in item.split(" ") if lang) for item in ExList}
The result is as follows:
>>> print(result)
{frozenset({'italian', 'dutch', 'german'}), frozenset({'italian', 'dutch', 'english'})}
you can turn that back into lists if the set print output looks confusing to you
>>> print([list(itemSet) for itemSet in result])
[['italian', 'dutch', 'german'], ['italian', 'dutch', 'english']]
This may work for you:
def unique_list(s):
x = set([tuple(sorted(s.split())) for s in ExList])
return [" ".join(s) for s in x]
print(unique_list(ExList)
This might not be the most efficient solution, but hope it will be of some help.
Using the property that keys of dictionary are unique.
m_dict = {}
for a in ExList:
b = a.split()
b.sort()
m_dict[' '.join(b)] = None
print m_dict.keys()
This is my code:
my_dict = {'Anthony Hopkins': ' Hannibal, The Edge, Meet Joe Black, Proof', 'Julia Roberts': ' Pretty Woman, Oceans Eleven, Runaway Bride', 'Salma Hayek': ' Desperado, Wild Wild West', 'Gwyneth Paltrow': ' Shakespeare in Love, Bounce, Proof', 'Meg Ryan': ' You have got mail, Sleepless in Seattle', 'Russell Crowe': ' Gladiator, A Beautiful Mind, Cinderella Man, American Gangster' .....}
dictrev={}
for i in mydict:
for j in mydict[i] :
if j not in dictrev:
dictrev.setdefault(j, []).append(i)
print (dictrev)
The problem is that when I debug I saw that the program reads only one character values (this line for j in mydict[i] : and I need the first value (there are multiple values).
Any suggestions what is the problem
Thank you very much for your help
Could you please format your code like this:
do whatever
You do that by typing enter two times, then for each line of code indenting four spaces. To type normally after that, start a new line and do not type the four spaces at the start of it.
If I understand what you are asking, you want to swap the key and value of the dictionary, and you are getting an error while doing so. I cannot read your unformatted code (no offense), so I will provide a dictionary swapping technique that works for me.
my_dict = {1: "bob", 2: "bill", 3: "rob"}
new_dict = {}
for key in my_dict:
new_key = my_dict[key]
new_value = key
new_dict.update({new_key:new_value})
print(new_dict)
This code works by having the original dictionary, my_dict and the uncompleted reversed dictionary, new_dict. It iterates through my_dict, which only provides the key, and using that key, it finds the value. The value that we want to be a key is assigned to new_key and the key that we want to be a value is assigned to new_value. It then updates the reversed dictionary with the new key/value. The final line prints the new, reversed dictionary. If you want to set my_dict to the reversed dict, use my_dict = new_dict. I hope this answers your question.
As has been pointed out in the comments, the values in your dict are strings, thus iterating over them will produce single characters. Split them into the desired tokens and it will work:
dictrev={} # movie: actors-list (I assume)
for k in mydict:
for v in mydict[k].split(', '): # iterate through the comma-separated titles
dictrev.setdefault(v, []).append(k)
If what you want is the reverse your dictionary values (separated by commas), the following may be the solution that you're looking for:
my_dict = {
'Anthony Hopkins': ' Hannibal, The Edge, Meet Joe Black, Proof',
'Julia Roberts' : ' Pretty Woman, Oceans Eleven, Runaway Bride'
}
res_dict {}
for item in my_dict:
res_dict[item] = ', '.join(reversed(my_dict[item].strip().split(','))).strip()
strip() used to remove spaces at the beginning / end of each value
split() used to split values (using , separator)
reversed() used to reverse the resulted list
join() used to form the final value for each key of res_dict
Output:
>>> res_dict
{'Anthony Hopkins': 'Proof, Meet Joe Black, The Edge, Hannibal', 'Julia Roberts': 'Runaway Bride, Oceans Eleven, Pretty Woman'}
This code is based on an elegant answer I received to this question and scaled up to accept nested lists of up to 5 elements. The overall goal is to merge nested lists that have repeating value in index position 1.
The exception pass suppresses the IndexError when a nested list in marker_array has 4 elements. But the code fails to include the last list after the 4 element list in the final output. My understanding was that the purpose of defaultdict was to avoid IndexErrors in the first place.
# Nested list can have 4 or 5 elements per list. Sorted by [1]
marker_array = [
['hard','00:01','soft','tall','round'],
['heavy','00:01','light','skinny','bouncy'],
['rock','00:01','feather','tree','ball'],
['fast','00:35','pidgeon','random'],
['turtle','00:40','wet','flat','tail']]
from collections import defaultdict
d1= defaultdict(list)
d2= defaultdict(list)
d3= defaultdict(list)
d4= defaultdict(list)
# Surpress IndexError due to 4 element list.
# Add + ' ' because ' '.join(d2[x])... create spaces between words.
try:
for pxa in marker_array:
d1[pxa[1]].extend(pxa[:1])
d2[pxa[1]].extend(pxa[2] + ' ')
d3[pxa[1]].extend(pxa[3] + ' ')
d4[pxa[1]].extend(pxa[4] + ' ')
except IndexError:
pass
# Combine all the pieces.
res = [[' '.join(d1[x]),
x,
''.join(d2[x]),
''.join(d3[x]),
''.join(d4[x])]
for x in sorted(d1)]
# Remove empty elements.
for p in res:
if not p[-1]:
p.pop()
print res
The output is almost what I need:
[['hard heavy rock', '00:01', 'soft light feather ', 'tall skinny tree ', 'round bouncy ball '], ['fast', '00:35', 'pidgeon ', 'random ']]
This scaled up version has certainly lost some of the original elegance due to my skill level. Any general pointers on improving this code are much appreciated, but my two main questions in order of importance are:
How can I make sure that the ['turtle','00:40','wet','flat','tail'] nested list is not ignored?
What can I do to avoid trailing white space as in 'soft light feather '?
The problem is the placement of your try block. The IndexError isn't being caused by the defaultdict, it is because you're trying to access pxa[4] in the 4th row of marker_array, which doesn't exist.
Move your try / except inside the for loop, like this:
for pxa in marker_array:
try:
d1[pxa[1]].extend(pxa[:1])
d2[pxa[1]].extend(pxa[2] + ' ')
d3[pxa[1]].extend(pxa[3] + ' ')
d4[pxa[1]].extend(pxa[4] + ' ')
except IndexError:
pass
Output will now include the 4th row.
To answer your second question, you can remove the whitespace by surrounding your various ''.join() calls with a strip() or rstrip() call on each join (e.g. strip(''.join(d2[x])).
Because your try statement starts outside the for loop, an exception in the for loop causes the program to go to the except block and not return to the loop afterwards. Instead, put the try before the main block inside the loop:
for pxa in marker_array:
try:
d1[pxa[1]].extend(pxa[:1])
d2[pxa[1]].extend(pxa[2] + ' ')
d3[pxa[1]].extend(pxa[3] + ' ')
d4[pxa[1]].extend(pxa[4] + ' ')
except IndexError:
pass
Technically it's best practice to include as little code as possible inside the try block, so if you're sure that lists will never have fewer than 4 items, you can move the start of the try block down to the line immediately before you extend d4.
If I understand your code correctly, you're getting the trailing white space because your adding a space after pxa[4]. Of course, removing the space in d4[pxa[1]].extend(pxa[4] + ' ') such that it's d4[pxa[1]].extend(pxa[4]) won't solve your problem for the shorter lists. Instead, you can not add a space after pxa[3] and instead add one before pxa[4], like this:
d3[pxa[1]].extend(pxa[3])
d4[pxa[1]].extend(' ' + pxa[4])
I think that should fix it.