Sorry if I get terminology wrong - I've only just started learning Python, and I'm receiving instruction from friends instead of being on an actual course.
I want to search a list containing lots of arrays containing multiple elements, and find arrays with some elements matching, but some different.
In less confusing terms e.g. I have a list of arrays that each contain 2 elements (I think this is called a 2D array?) so:
list = [[1, 2], [2, 2], [3, 5], [4, 1], [5, 2], ...]
In my specific example, the first elements in each sub array just ascend linearly, but the second elements are almost random. I want to find or sort the arrays only by the second number. I could just remove the first number from each array:
list = [2, 2, 5, 1, 2 ...]
And then use something like "if list[x] == 1" to find '1' etc.
(side note: I'm not sure how to find ALL the values if one value is repeated - I can't remember quite what I wrote but it would only ever find the first instance where the value matched, so e.g. it would detect the first '2' but not the second or third)
But I want to keep the first values in each array. My friend told me that you could use a dictionary with values and keys, which would work for my example, but I want to know what the more general method would be.
So in my example, I hoped that if I wrote this:
if list[[?, x]] == [?, 1]
Then it would find the array where the second value of the array was 1, (i.e. [4, 1] in my example) and not care about the first value. Obviously it didn't work because '?' isn't Python syntax as far as I'm aware, but hopefully you can see what I'm trying to do?
So for a more general case, if I had a list of 5 dimensional arrays and I wanted to find the second and fourth values of each array, I would write:
if list[[?, x, ?, y, ?]] == [?, a, ?, b, ?]
And it would match any array where the value of the second element was 'a', and the value of the fourth was 'b'.
e.g. [3, a, 4, b, 7], [20, a, 1, b, 9], ['cat', a, 'dog', b, 'fish'] etc. would all be possible results found by the command.
So I want to know if there's any similar way to my method of using a question mark (but that actually works) to denote that an element in an array can have any value.
To sort on the second element for a list containg lists (or tuples):
from operator import itemgetter
mylist = [[1, 2], [2, 2], [3, 5], [4, 1], [5, 2]]
sortedlist = sorted(mylist, key=itemgetter(1))
See the Python sorting howto.
Use sorted if you want to keep original list unaffected
lst = [[1, 2], [2, 2], [3, 5], [4, 1], [5, 2]]
In [103]: sorted(lst, key=lambda x: x[1])
Out[103]: [[4, 1], [1, 2], [2, 2], [5, 2], [3, 5]]
else use list.sort to sort current list and keep sorted list
In [106]: lst.sort(key=lambda x: x[1])
In [107]: lst
Out[107]: [[4, 1], [1, 2], [2, 2], [5, 2], [3, 5]]
or use operator.itemgetter
from operator import itemgetter
In [108]: sorted(lst, key=itemgetter(1))
Out[108]: [[4, 1], [1, 2], [2, 2], [5, 2], [3, 5]]
You could use a list comprehension to build a list of all the desired items:
In [16]: seq = [[1, 2], [2, 2], [3, 5], [4, 1], [5, 2]]
To find all items where the second element is 1:
In [17]: [pair for pair in seq if pair[1] == 1]
Out[17]: [[4, 1]]
This finds all items where the second element is 2:
In [18]: [pair for pair in seq if pair[1] == 2]
Out[18]: [[1, 2], [2, 2], [5, 2]]
Instead of
if list[[?, x, ?, y, ?]] == [?, a, ?, b, ?]
you could use
[item for item in seq if item[1] == 'a' and item[3] == 'b']
Note, however, that each time you use a list comprehension, Python has to loop
through all the elements of seq. If you are doing this search multiple times,
you might be better off building a dict:
import collections
seq = [[1, 2], [2, 2], [3, 5], [4, 1], [5, 2]]
dct = collections.defaultdict(list)
for item in seq:
key = item[1]
dct[key].append(item)
And then you could access the items like this:
In [22]: dct[1]
Out[22]: [[4, 1]]
In [23]: dct[2]
Out[23]: [[1, 2], [2, 2], [5, 2]]
The list comprehension
[pair for pair in seq if pair[1] == 1]
is roughly equivalent to
result = list()
for pair in seq:
if pair[1] == 1:
result.append(pair)
in the sense that result would then equal the list comprehension.
The list comprehension is just a syntactically prettier way to express the same
thing.
The list comprehension above has three parts:
[expression for-loop conditional]
The expression is pair, the for-loop is for pair in seq, and the conditional is if pair[1] == 1.
Most, but not all list comprehensions share this syntax. The full list comprehension grammar is given here.
Related
Say I have a 2D list(of floats only)-
a=[[1,2],[1,3],[2,5],[4,3],[3,4],[4,9]]
and I want to remove those 1D lists which have the common 1st element and occur later. For example- the first list [1,2] contains 1 as the first element, so delete the next list with 1st element 1, which in this case is the 2nd list [1,3], then we take [2,5],[4,3],[3,4] as usual but since we took [4,3] already we won't take the [4,9] which has the common first element 4.
So the final output should be-
[[1,2],[2,5],[4,3],[3,4]]
How can this be done in Python? I can think of some nested for loops and a bunch of if else statements which would be clearly inefficient and I hope there is a trick with set/map/zip functions which is a bit more Pythonic.
Use set() to filter out the duplicates:
a = [[1, 2], [1, 3], [2, 5], [4, 3], [3, 4], [4, 9]]
out, seen = [], set()
for item in a:
if not item[0] in seen:
seen.add(item[0])
out.append(item)
print(out)
Prints:
[[1, 2], [2, 5], [4, 3], [3, 4]]
a=[[1,2],[1,3],[2,5],[4,3],[3,4],[4,9]]
k=[]
[a.remove(i) if i[0] in k else k.append(i[0]) for i in a]
print(a)
Given the following arrays (arr,indices) ,I need to sort the array with respect to (i[0])th index in ascending order if i[1] equals 0 and descending order if i[1] equals 1 ,where i refers to each element of the indices array.
Constraints
1<= len(indices) <=10
1<= len(arr) <=10^4
Example
arr=[[1,2,3],[3,2,1],[4,2,1],[6,4,3]]
indices=[[2,0],[0,1]]
required output
[[4,2,1],[3,2,1],[6,4,3],[1,2,3]]
Explanation
first arr gets sorted with respect to 2nd index as (indices[0][0]=2) in ascending order as (indices[0][1]=0)
[[3,2,1],[ 4,2,1],[1,2,3],[6,4,3]]
then it gets sorted with 0th index as (indices[1][0]=0) in descending order as (indices[1][1]=1)
[[4,2,1],[3,2,1],[6,4,3],[1,2,3]]
Note
arr,indices need to be taken as input , so it is not possible for me to write arr.sort(key=lambda x: (x[2],-x[0]))
My Approach
I have tried the following but it is not giving the correct output
arr.sort(key=lambda x:next(x[i[0]] if i[1]==0 else -x[i[0]] for i in indices))
My output
[[3,2,1],[4,2,1],[1,2,3],[6,4,3]]
Expected output
[[4,2,1],[3,2,1],[6,4,3],[1,2,3]]
This one requires a very complex key. It looks to me like you have many different layers of sorting, here, and earlier elements of indices take precedence over later elements, when sort order would be affected.
I think what the sorting key needs to do is return a tuple/iterable, where the first element to sort by is whatever the first element of indices says to do, and the second element to sort by (in case of a tie in the first) is whatever the second element of indices says to do, and so on.
In which case you'd want something like this (a nested comprehension inside the key lambda, to generate that tuple (or, list, in this case)):
arr=[[1,2,3],[3,2,1],[4,2,1],[6,4,3]]
indices=[[2,0],[0,1]]
out = sorted(arr, key=lambda a: [
(-1 if d else 1) * a[i]
for (i, d) in indices
])
# [[4, 2, 1], [3, 2, 1], [6, 4, 3], [1, 2, 3]]
For sorting numbers only, you can use a quick hack of "multiply by -1 to sort descending instead of ascending". Which I did here.
You could use the stability:
from operator import itemgetter
for i, r in reversed(indices):
arr.sort(key=itemgetter(i), reverse=r)
This doesn't use the negation trick, so it also works for data other than numbers.
Check this out:
>>> a
[[6, 4, 3], [4, 2, 1], [3, 2, 1], [1, 2, 3]]
>>> i
[[2, 0], [0, 1]]
>>> for j in enumerate(i):
... a.sort(key=lambda x:x[j[1][0]],reverse=False if j[1][1]==0 else True)
... print(a)
...
[[3, 2, 1], [4, 2, 1], [1, 2, 3], [6, 4, 3]]
[[6, 4, 3], [4, 2, 1], [3, 2, 1], [1, 2, 3]]
Is this what you want?
I think in your second example is small mistake. It should be
[[6, 4, 3], [4, 2, 1], [3, 2, 1], [1, 2, 3]]
rather than
[[4,2,1],[3,2,1],[6,4,3],[1,2,3]]
this is under
then it gets sorted with 0th index as (indices[1][0]=0) in descending order as (indices[1][1]=1)
Here are the values printing from a record, I need to sum up the 1st number when based on the second number.
If the first second numbers are same, the first number needes to be added.
record = [[2, 3], [3, 3], [5, 4], [1, 4]]
Expected output = [5, 3], [6, 4]]
You should first sort, then itertools.groupby the second value.
import itertools
import operator
records = [[2, 3], [3, 3], [5, 4], [1, 4]]
records.sort(key=operator.itemgetter(1))
groups = itertools.groupby(records, key=operator.itemgetter(1))
# groups is now a generator that produces the values:
# (3, [[2, 3], [3, 3]])
# (4, [[5, 4], [1, 4]])
Then produce a list over the results:
result = [[sum(record[0] for record in records), grpname] for grpname, records in groups]
You should use a Counter. It will be easy to use and you won't need to sort anything:
from collections import Counter
record = [[2, 3], [3, 3], [5, 4], [1, 4]]
sums = Counter()
for (value, index) in record:
sums[index] += value
sums
# Counter({3: 5, 4: 6})
It shouldn't be too hard to convert the Counter values to the desired output.
Algorithm:
1. Sort List according to the second element in the inner list.
2. Merge consecutive element heaving the same value of the second element.
record = [[1,5],[2, 3], [2,5] , [3, 3], [5, 4], [1, 4]]
record.sort(key=lambda x: x[1]) #sorting record according to second value in inner list
length=len(record)
newRecord=[record[0].copy()] #initating newRecording with first value from record
for i in range(1,length):
if newRecord[-1][1]==record[i][1]:
#if value of second element is equal than this will execute
newRecord[-1][0]+=record[i][0]
else:
#if value of second element is not equal than this will execute
newRecord.append(record[i])
print(newRecord)
Hope this help.
I've seen some questions here very related but their answer doesn't work for me. I have a list of lists where some sublists are repeated but their elements may be disordered. For example
g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
The output should be, naturally according to my question:
g = [[1,2,3],[9,0,1],[4,3,2]]
I've tried with set but only removes those lists that are equal (I thought It should work because sets are by definition without order). Other questions i had visited only has examples with lists exactly duplicated or repeated like this: Python : How to remove duplicate lists in a list of list?. For now order of output (for list and sublists) is not a problem.
(ab)using side-effects version of a list comp:
seen = set()
[x for x in g if frozenset(x) not in seen and not seen.add(frozenset(x))]
Out[4]: [[1, 2, 3], [9, 0, 1], [4, 3, 2]]
For those (unlike myself) who don't like using side-effects in this manner:
res = []
seen = set()
for x in g:
x_set = frozenset(x)
if x_set not in seen:
res.append(x)
seen.add(x_set)
The reason that you add frozensets to the set is that you can only add hashable objects to a set, and vanilla sets are not hashable.
If you don't care about the order for lists and sublists (and all items in sublists are unique):
result = set(map(frozenset, g))
If a sublist may have duplicates e.g., [1, 2, 1, 3] then you could use tuple(sorted(sublist)) instead of frozenset(sublist) that removes duplicates from a sublist.
If you want to preserve the order of sublists:
def del_dups(seq, key=frozenset):
seen = {}
pos = 0
for item in seq:
if key(item) not in seen:
seen[key(item)] = True
seq[pos] = item
pos += 1
del seq[pos:]
Example:
del_dups(g, key=lambda x: tuple(sorted(x)))
See In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?
What about using mentioned by roippi frozenset this way:
>>> g = [list(x) for x in set(frozenset(i) for i in [set(i) for i in g])]
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]
I would convert each element in the list to a frozenset (which is hashable), then create a set out of it to remove duplicates:
>>> g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
>>> set(map(frozenset, g))
set([frozenset([0, 9, 1]), frozenset([1, 2, 3]), frozenset([2, 3, 4])])
If you need to convert the elements back to lists:
>>> map(list, set(map(frozenset, g)))
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]
to get right down to it, I'm trying to iterate through a list of coordinate pairs in python and delete all cases where one of the coordinates is negative. For example:
in the array:
map = [[-1, 2], [5, -3], [2, 3], [1, -1], [7, 1]]
I want to remove all the pairs in which either coordinate is < 0, leaving:
map = [[2, 3], [7, 1]]
My problem is that python lists cannot have any gaps, so if I loop like this:
i = 0
for pair in map:
for coord in pair:
if coord < 0:
del map[i]
i += 1
All the indices shift when the element is deleted, messing up the iteration and causing all sorts of problems. I've tried storing the indices of the bad elements in another list and then looping through and deleting those elements, but I have the same problem: once one is gone, the whole list shifts and indices are no longer accurate.
Is there something I'm missing?
Thanks.
If the list is not large, then the easiest way is to create a new list:
In [7]: old_map = [[-1, 2], [5, -3], [2, 3], [1, -1], [7, 1]]
In [8]: new_map=[[x,y] for x,y in a_map if not (x<0 or y<0)]
In [9]: new_map
Out[9]: [[2, 3], [7, 1]]
You can follow this up with old_map = new_map if you want to discard the other pairs.
If the list is so large creating a new list of comparable size is a problem, then you can delete elements from a list in-place -- the trick is to delete them from the tail-end first:
the_map = [[-1, 2], [5, -3], [2, 3], [1, -1], [7, 1]]
for i in range(len(the_map)-1,-1,-1):
pair=the_map[i]
for coord in pair:
if coord < 0:
del the_map[i]
print(the_map)
yields
[[2, 3], [7, 1]]
PS. map is such a useful built-in Python function. It is best not to name a variable map since this overrides the built-in.
You can use list comprehension for this:
>>> mymap = [[-1, 2], [5, -3], [2, 3], [1, -1], [7, 1]]
>>> mymap = [m for m in mymap if m[0] > 0 and m[1] > 0]
>>> mymap
[[2, 3], [7, 1]]
If you do not have any other references to the map list, a list comprehension works best:
map = [[a,b] for (a,b) in map if a > 0 and b > 0]
If you do have other references and need to actually remove elements from the list referenced by map, you have to iterate over a copy of map:
for coord in map[:]:
if coord[0] < 0 or coord[1] < 0:
map.remove(coord)
Personally, I prefer in-place modification:
li = [[-1, 2], [5, -3], [2, 3], [1, -1], [7, 1]]
print li,'\n'
N = len(li)
for i,(a,b) in enumerate(li[::-1], start=1):
if a<0 or b<0:
del li[N-i]
print li
->
[[-1, 2], [5, -3], [2, 3], [1, -1], [7, 1]]
[[2, 3], [7, 1]]
If you wish to do this in place, without creating a new list, simply use a for loop with index running from len(map)-1 down to 0.
for index in range(len(map)-1,-1,-1):
if hasNegativeCoord(map[index]):
del(map[index])
Not very Pythonic, I admit.
If the list is small enough, it's more efficient to make a copy containing just the elements you need, as detailed in the other answers.
However, if the list is too large, or for some other reason you need to remove the elements from the list object in place, I've found the following little helper function quite useful:
def filter_in_place(func, target, invert=False):
"remove all elements of target where func(elem) is false"
pos = len(target)-1
while pos >= 0:
if (not func(target[pos])) ^ invert:
del target[pos]
pos -= 1
In your example, this could be applied as follows:
>>> data = [[-1, 2], [5, -3], [2, 3], [1, -1], [7, 1]]
>>> def is_good(elem):
return elem[0] >= 0 and elem[1] >= 0
>>> filter_in_place(is_good, data)
>>> data
[[2, 3], [7, 1]]
(This is just a list-oriented version of filter_in_place, one which supports all base Python datatypes is a bit more complex).
itertools.ifilter()/ifilterfalse() exist to do exactly this: filter an iterable by a predicate (not in-place, obviously).
Better still, avoid creating and allocating the entire filtered list object if at all possible, just iterate over it:
import itertools
l = [(4,-5), (-8,2), (-2,-3), (4,7)]
# Option 1: create a new filtered list
l_filtered = list( itertools.ifilter(lambda p: p[0]>0 and p[1]>0, l) )
# Option 2:
for p in itertools.ifilter(lambda p: p[0]>0 and p[1]>0, l):
... <subsequent code on your filtered list>
You probably want del pair instead.