In the beginning I have a master list I intend to put everything in:
master_list = []
I have data stored in nested lists like this:
multi_list = [[1,2,3,4,5],[6,7,8,9,10]]
The end result needs to have this data converted to a list of dicts like this:
master_list
>> [{'x1':1,'x2':2,'y1':3,'y2':4,'id':5},{'x1':6,'x2':7,'y1':8,'y2':9,'id':10}]
So that's my end goal. My approach to reach this goal was as follows:
multi_list = [[1,2,3,4,5],[6,7,8,9,10]]
master_list = []
iterating_dict = {}
for n in multi_list:
for idx,i in enumerate(['x1','x2','y1','y2','id']):
iterating_dict[i] = n[idx]
master_list.append(iterating_dict)
master_list
>>[{'x1':6,'x2':7,'y1':8,'y2':9,'id':10},{'x1':6,'x2':7,'y1':8,'y2':9,'id':10}]
What ends up happening is the second item in multi_list is stored twice. I want it to store converted dicts for all items in multi_list. What I believe this means is the append is not in the right place of the loop. However, when I put it further inside the loop, it appends nothing to master_list. I can't put append further outside without it going out of scope.
What are some conventional approaches to this kind of difficulty in python?
You need to reset iterating dict on each iteration of the external for loop. Otherwise it will keep accumulating values.
Something like this should work:
for n in multi_list:
iterating_dict = {}
for idx,i in enumerate(['x1','x2','y1','y2','id']):
iterating_dict[i] = n[idx]
master_list.append(iterating_dict)
master_list = [{key: l[i] for i, key in enumerate(['x1','x2','y1','y2','id'])} for l in multi_list]
try above code snippet, hope this helps
Concise way to do this in python is to use zip and list comprehension.
multi_list = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
dict_keys = ['x1','x2','y1','y2','id']
master_list = [dict(zip(dict_keys, sublist)) for sublist in multi_list]
zip() combines two sequences together and dict() converts it into a key-value pair. That's what you're doing with each sublist in your multi_list and ['x1','x2','y1','y2','id'].
So here we're making a dictionary out of a combined sequence of ['x1','x2','y1','y2','id'] and sublist for every sublist in the multi_list.
Related
I'm trying to simply append to a list of lists but cannot find a clean example of how to do that. I've looked at dozens of examples but they are all for appending to a one-dimensional list or extending lists.
Sample code:
testList = []
print(testList)
testList.append(3000)
print(testList)
testList[3000].append(1)
testList[3000].append(2)
print(testList)
Expected result:
testList[3000][1, 2]
Actual result:
[]
[3000]
Traceback (most recent call last):
File ".\test.py", line 5, in <module>
testList[3000].append(1)
IndexError: list index out of range
The first problem I see is that when you call testList.append() you are including the [3000]. That is problematic because with a list, that syntax means you're looking for the element at index 3000 within testList. All you need to do is call testList.append(<thing_to_append>) to append an item to testList.
The other problem is that you're expecting [1, 2] to be a separate list, but instead you're appending them each to testList.
If you want testList to be composed of multiple lists, a good starting point would be to instantiate them individually, and to subsequently append to testList. This should help you conceptualize the nested structure you're looking for.
For example:
testList = []
three_k = [3000]
one_two = [1, 2]
testList.append(three_k)
testList.append(one_two)
print(testList)
From there, the you can actually use the indexes of the nested lists to append to them. So if [3000] is the list at index 0 (zero), you can append to it by doing: testList[0].append(<new_append_thing>).
First, thank you to everyone for the quick responses. Several of you got me thinking in the right direction but my original question wasn't complete (apologies) so I'm adding more context here that's hopefully helpful for someone else in the future.
wp-overwatch.com jogged my memory and I realized that after working with only dictionaries in my application for days, I was treating the "3000" like a dictionary key instead of the list index. ("3000" is an example of an ID number that I have to use to track one of the lists of numbers.)
I couldn't use a dictionary, however, because I need to add new entries, remove the first entry, and calculate average for the numbers I'm working with. The answer was to create a dictionary of lists.
Example test code I used:
testDict = {}
blah10 = 10
blah20 = 20
blah30 = 30
blah40 = 40
exampleId = 3000
if exampleId == 3000:
testDict[3000] = []
testDict[3000].append(blah10)
testDict[3000].append(blah20)
print(testDict)
testDict[3000].pop(0) # Remove first entry
print(testDict)
testDict[3000].append(blah30) # Add new number to the list
average = sum(testDict[3000]) / len(testDict[3000])
print(average)
if exampleId == 3001:
testDict[3001].append(blah30)
testDict[3001].append(blah40)
Result:
{3000: [10, 20]}
{3000: [20]}
25.0
testList[3000].append(1) is telling Python to grab the 3000th item in the list and call the append function on it. Since you don't have 3000 items in the list, that's why you're getting that error.
If you want to lookup an item by a value such as 3000 instead of by its position in the list, then what you're wanting is not a list but a dictionary.
Using a dictionary you can do:
>>> testList = {} # use curly brackets for a dictionary
>>> print(testList)
{}
>>> testList[3000] = [] # create a new item with the lookup key of 3000 and set the corresponding value to an empty list
>>> print(testList)
{3000: []}
>>> testList[3000].append(1)
>>> testList[3000].append(2)
>>> print(testList)
{3000: [1, 2]}
>>>
It is because according to your program the python interpreter will look for the 3000 index at the list and try to append the given number in the index of 3000 but there is not that number so it will print error.
To fix it:
testList = []
print(testList)
testList.append(3000)
print(testList)
testList.append([1])
testList[1].append(2)
print(testList)
Using the index you can append the value as I appended.
list.append adds an object to the end of a list. So doing,
listA = []
listA.append(1)
now listA will have only the object 1 like [1].
you can construct a bigger list doing the following
listA = [1]*3000
which will give you a list of 3000 times 1 [1,1,1,1,1,...].
If you want to contract a c-like array you should do the following
listA = [[1]*3000 for _ in range(3000)]
Now you have a list of lists like [[1,1,1,1,....], [1,1,1,....], .... ]
PS Be very careful using [1]*3000 which in this case works
but in case you use it with a variable it will have side effects. For example,
a = 1
listA = [a]*3000
gives you the same result but if any of the variables 'a' change then all will be changed the same way.
The most safe is, using list comprehensions
a = 1
listA = [a for _ in range(3000)]
In order to update any value, just do the following,
listA[656] = 999
Lastly, in order to extend the above list just type
listA.append(999)
and then at the index 3000 (starting from zero) you will find 999
I have the following code:
s = (f'{item["Num"]}')
my_list = []
my_list.append(s)
print(my_list)
As you can see i want this to form a list that i will then be able to store under my_list, the output from my code looks like this (this is a sample from around 2000 different values)
['01849']
['01852']
['01866']
['01883']
etc...
This is not what i had in mind, i want it to look like this
[`01849', '01852', '01866', '01883']
Has anyone got any suggestions on what i do wrong when i create the list? Thanks
You can fix your problem and represent this compactly with a list comprehension. Assuming your collection is called items, it can be represented as such, without the loop:
my_list = [f'{item["Num"]}' for item in items]
You should first initialize a list here, and then use a for-loop to populate it. So:
my_list = []
for values in range(0, #length of your list):
s = (f'{item["Num"]}')
my_list.append(s)
print(my_list)
Even better, you can also use a list comprehension for this:
my_list = [(f'{item["Num"]}') for values in range(0, #length of your list)]
I'm trying to take a list of 4 million entries and rather than iterate over them all, reduce the list in the for loop that is enumerating them as it goes along.
The reduction criteria is found in the loop. Some later my_huge_list elements contain an combination of 2 consecutive elements that allows them to be discarded immediately.
Here I'm going to remove sublists with 1,2 and A,B in them from my_huge_list.
Please note I don't know in advance that 1,2 and A,B are illegal until I go into my for loop.
output_list = []
my_huge_list = [[0,1,2,3,4],[0,1,3,4],[0,1,2,3,4],[0,1,3,4],[0,1,2,4],[0,1,2,3,4],[A,B],[0,1,3,A,B],[0,1,2,3,4],[0,1,3,4],[0,1,2,3,4],[0,1,3,4],[0,1,2,4]...] #to 4m assorted entries
for sublist in my_huge_list[:]:
pair = None
for item_index in sublist[:-1]: #Edit for Barmar. each item in sublist is actually an object with attributes about allowed neighbors.
if sublist[item_index +1] in sublist[item_index].attributes['excludes_neighbors_list']:
pair = [sublist[item_index],sublist[item_index +1]] #TODO build a list of pairs
if pair != None: #Don't want pair in any item of output_list
my_huge_list = [x for x in my_huge_list if not ','.join(pair) in str(x)] #This list comprehension sole function to reduce my_huge_list from 4m item list to 1.7m items
#if '1, 2' in str(sublist): #Don't want 1,2 in any item of output_list
#my_huge_list = [x for x in my_huge_list if not '1, 2' in str(x)] #This list comprehension sole function to reduce my_huge_list
#elif 'A, B' in str(sublist): #Don't want A,B in any item of output_list
#my_huge_list = [x for x in my_huge_list if not 'A, B' in str(x)] #This list comprehension sole function to reduce my_huge_list from 1.7m item list to 1.1m items
else:
output_list.append(sublist)
my_huge_list
>>>[[0,1,3,4],[0,1,3,4],[0,1,3,4],[0,1,3,4]...]
So the 'for loop' unfortunately does not seem to get any faster because my_huge_list is still iterated over all 4m entries, even though it was quickly reduced by the list comprehension.
[The my_huge_list does not need to be processed in any order and does not need to be retained after this loop.]
[I have considered making the for loop into a sub-function and using map and also the shallow copy but can't figure this architecture out.]
[I'm sure by testing that the removal of list elements by list comprehension is quicker than brute-forcing all 4m sublists.]
Thanks!
Here's my dig on it:
my_huge_list = [[0,1,2,3,4],[0,1,3,4],[0,1,2,3,4],[0,1,3,4],[0,1,2,4],[0,1,2,3,4],['A','B'],[0,1,3,'A','B'],[0,'A','B'],[0,1,2,3,4],[0,1,3,4],[0,1,2,3,4],[0,1,3,4],[0,1,2,4]] #to 4m assorted entries
# ... do whatever and return unwanted list... #
# ... if needed, convert the returned items into lists before putting into unwanted ... #
unwanted = [[1,2], ['A','B']]
index = 0
while index < len(my_huge_list):
sublist = my_huge_list[index]
next = True
for u in unwanted:
if u in [sublist[j:j+len(u)] for j in range(len(sublist)-len(u)+1)] or u == sublist:
my_huge_list.pop(index)
next = False
index += next
print(my_huge_list)
# [[0, 1, 3, 4], [0, 1, 3, 4], [0, 1, 3, 4], [0, 1, 3, 4]]
It's not elegant but it gets the job done. A huge caveat is that modifying a list while iterating over it is bad karma (pros will probably shake their heads at me), but dealing with a size of 4 mil you can understand I'm trying to save some memory by modifying in place.
This is also scale-able so that if you have multiple numbers of unwanted in different sizes, it should still catch it from your huge list. If your element size is 1, try to match the expected element type from your my_huge_list. e.g. if your my_huge_list has a [1], your unwanted should be [1] as well. If the element is a string instead of list, you'll need that string in your unwanted. int/float will however break this current code as you can't iterate over it, but you can add extra handling before you iterate through unwanted.
You're iterating over your huge list twice, once in the main for loop, and then each time you find an invalid element you iterate over it again in the list comprehensions to remove all of those invalid elements.
It would be better to simply filter those elements out of the list once with a list comprehension.
def sublist_contains(l, pair):
for i in range(len(l)-1):
if l[i] == pair[0] and l[i+1] == pair[1]:
return True
return False
output_list = [sublist for sublist in my_huge_list if not(list_contains(sublist, ['A', 'B']) or list_contains(sublist, ['1', '2']))]
My sublist_contains() function assumes it's always just two elements in a row you have to test for. You can replace this with a more general function if necessary. See elegant find sub-list in list
This question already has answers here:
How can I use list comprehensions to process a nested list?
(13 answers)
Closed 7 months ago.
I recently looked for a way to flatten a nested python list, like this: [[1,2,3],[4,5,6]], into this: [1,2,3,4,5,6].
Stackoverflow was helpful as ever and I found a post with this ingenious list comprehension:
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
I thought I understood how list comprehensions work, but apparently I haven't got the faintest idea. What puzzles me most is that besides the comprehension above, this also runs (although it doesn't give the same result):
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Can someone explain how python interprets these things? Based on the second comprension, I would expect that python interprets it back to front, but apparently that is not always the case. If it were, the first comprehension should throw an error, because 'sublist' does not exist. My mind is completely warped, help!
Let's take a look at your list comprehension then, but first let's start with list comprehension at it's easiest.
l = [1,2,3,4,5]
print [x for x in l] # prints [1, 2, 3, 4, 5]
You can look at this the same as a for loop structured like so:
for x in l:
print x
Now let's look at another one:
l = [1,2,3,4,5]
a = [x for x in l if x % 2 == 0]
print a # prints [2,4]
That is the exact same as this:
a = []
l = [1,2,3,4,5]
for x in l:
if x % 2 == 0:
a.append(x)
print a # prints [2,4]
Now let's take a look at the examples you provided.
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
print flattened_l # prints [1,2,3,4,5,6]
For list comprehension start at the farthest to the left for loop and work your way in. The variable, item, in this case, is what will be added. It will produce this equivalent:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
Now for the last one
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Using the same knowledge we can create a for loop and see how it would behave:
for item in sublist:
for sublist in l:
exactly_the_same_as_l.append(item)
Now the only reason the above one works is because when flattened_l was created, it also created sublist. It is a scoping reason to why that did not throw an error. If you ran that without defining the flattened_l first, you would get a NameError
The for loops are evaluated from left to right. Any list comprehension can be re-written as a for loop, as follows:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
The above is the correct code for flattening a list, whether you choose to write it concisely as a list comprehension, or in this extended version.
The second list comprehension you wrote will raise a NameError, as 'sublist' has not yet been defined. You can see this by writing the list comprehension as a for loop:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for item in sublist:
for sublist in l:
flattened_l.append(item)
The only reason you didn't see the error when you ran your code was because you had previously defined sublist when implementing your first list comprehension.
For more information, you may want to check out Guido's tutorial on list comprehensions.
For the lazy dev that wants a quick answer:
>>> a = [[1,2], [3,4]]
>>> [i for g in a for i in g]
[1, 2, 3, 4]
While this approach definitely works for flattening lists, I wouldn't recommend it unless your sublists are known to be very small (1 or 2 elements each).
I've done a bit of profiling with timeit and found that this takes roughly 2-3 times longer than using a single loop and calling extend…
def flatten(l):
flattened = []
for sublist in l:
flattened.extend(sublist)
return flattened
While it's not as pretty, the speedup is significant. I suppose this works so well because extend can more efficiently copy the whole sublist at once instead of copying each element, one at a time. I would recommend using extend if you know your sublists are medium-to-large in size. The larger the sublist, the bigger the speedup.
One final caveat: obviously, this only holds true if you need to eagerly form this flattened list. Perhaps you'll be sorting it later, for example. If you're ultimately going to just loop through the list as-is, this will not be any better than using the nested loops approach outlined by others. But for that use case, you want to return a generator instead of a list for the added benefit of laziness…
def flatten(l):
return (item for sublist in l for item in sublist) # note the parens
Note, of course, that the sort of comprehension will only "flatten" a list of lists (or list of other iterables). Also if you pass it a list of strings you'll "flatten" it into a list of characters.
To generalize this in a meaningful way you first want to be able to cleanly distinguish between strings (or bytearrays) and other types of sequences (or other Iterables). So let's start with a simple function:
import collections
def non_str_seq(p):
'''p is putatively a sequence and not a string nor bytearray'''
return isinstance(p, collections.Iterable) and not (isinstance(p, str) or isinstance(p, bytearray))
Using that we can then build a recursive function to flatten any
def flatten(s):
'''Recursively flatten any sequence of objects
'''
results = list()
if non_str_seq(s):
for each in s:
results.extend(flatten(each))
else:
results.append(s)
return results
There are probably more elegant ways to do this. But this works for all the Python built-in types that I know of. Simple objects (numbers, strings, instances of None, True, False are all returned wrapped in list. Dictionaries are returned as lists of keys (in hash order).
I would like to extend a list while looping over it:
for idx in xrange(len(a_list)):
item = a_list[idx]
a_list.extend(fun(item))
(fun is a function that returns a list.)
Question:
Is this already the best way to do it, or is something nicer and more compact possible?
Remarks:
from matplotlib.cbook import flatten
a_list.extend(flatten(fun(item) for item in a_list))
should work but I do not want my code to depend on matplotlib.
for item in a_list:
a_list.extend(fun(item))
would be nice enough for my taste but seems to cause an infinite loop.
Context:
I have have a large number of nodes (in a dict) and some of them are special because they are on the boundary.
'a_list' contains the keys of these special/boundary nodes. Sometimes nodes are added and then every new node that is on the boundary needs to be added to 'a_list'. The new boundary nodes can be determined by the old boundary nodes (expresses here by 'fun') and every boundary node can add several new nodes.
Have you tried list comprehensions? This would work by creating a separate list in memory, then assigning it to your original list once the comprehension is complete. Basically its the same as your second example, but instead of importing a flattening function, it flattens it through stacked list comprehensions. [edit Matthias: changed + to +=]
a_list += [x for lst in [fun(item) for item in a_list] for x in lst]
EDIT: To explain what going on.
So the first thing that will happen is this part in the middle of the above code:
[fun(item) for item in a_list]
This will apply fun to every item in a_list and add it to a new list. Problem is, because fun(item) returns a list, now we have a list of lists. So we run a second (stacked) list comprehension to loop through all the lists in our new list that we just created in the original comprehension:
for lst in [fun(item) for item in a_list]
This will allow us to loop through all the lists in order. So then:
[x for lst in [fun(item) for item in a_list] for x in lst]
This means take every x (that is, every item) in every lst (all the lists we created in our original comprehension) and add it to a new list.
Hope this is clearer. If not, I'm always willing to elaborate further.
Using itertools, it can be written as:
import itertools
a_list += itertools.chain(* itertools.imap(fun, a_list))
or, if you're aiming for code golf:
a_list += sum(map(fun, a_list), [])
Alternatively, just write it out:
new_elements = map(fun, a_list) # itertools.imap in Python 2.x
for ne in new_elements:
a_list.extend(ne)
As you want to extend the list, but loop only over the original list, you can loop over a copy instead of the original:
for item in a_list[:]:
a_list.extend(fun(item))
Using generator
original_list = [1, 2]
original_list.extend((x for x in original_list[:]))
# [1, 2, 1, 2]