Get only unique elements from two lists - python

If I have two lists (may be with different len):
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
result = [11,22,33,44]
im doing:
for element in f:
if element in x:
f.remove(element)
I'm getting
result = [11,22,33,44,4]

UPDATE:
Thanks to #Ahito:
In : list(set(x).symmetric_difference(set(f)))
Out: [33, 2, 22, 11, 44]
This article has a neat diagram that explains what the symmetric difference does.
OLD answer:
Using this piece of Python's documentation on sets:
>>> # Demonstrate set operations on unique letters from two words
...
>>> a = set('abracadabra')
>>> b = set('alacazam')
>>> a # unique letters in a
{'a', 'r', 'b', 'c', 'd'}
>>> a - b # letters in a but not in b
{'r', 'd', 'b'}
>>> a | b # letters in a or b or both
{'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}
>>> a & b # letters in both a and b
{'a', 'c'}
>>> a ^ b # letters in a or b but not both
{'r', 'd', 'b', 'm', 'z', 'l'}
I came up with this piece of code to obtain unique elements from two lists:
(set(x) | set(f)) - (set(x) & set(f))
or slightly modified to return list:
list((set(x) | set(f)) - (set(x) & set(f))) #if you need a list
Here:
| operator returns elements in x, f or both
& operator returns elements in both x and f
- operator subtracts the results of & from | and provides us with the elements that are uniquely presented only in one of the lists

If you want the unique elements from both lists, this should work:
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
res = list(set(x+f))
print(res)
# res = [1, 2, 3, 4, 33, 11, 44, 22]

Based on the clarification of this question in a new (closed) question:
If you want all items from the second list that do not appear in the first list you can write:
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
result = set(f) - set(x) # correct elements, but not yet in sorted order
print(sorted(result)) # sort and print
# Output: [11, 22, 33, 44]

x = [1, 2, 3, 4]
f = [1, 11, 22, 33, 44, 3, 4]
list(set(x) ^ set(f))
[33, 2, 22, 11, 44]

if you want to get only unique elements from the two list then you can get it by..
a=[1,2,3,4,5]
b= [2,4,1]
list(set(a) - set(b))
OP:- [3, 5]

Input :
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
Code:
l = list(set(x).symmetric_difference(set(f)))
print(l)
Output :
[2, 22, 33, 11, 44]

Your method won't get the unique element "2". What about:
list(set(x).intersection(f))

Simplified Version & in support of #iopheam's answer.
Use Set Subtraction.
# original list values
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
# updated to set's
y = set(x) # {1, 2, 3, 4}
z = set(f) # {1, 33, 3, 4, 11, 44, 22}
# parsed to the result variable
result = z - y # {33, 11, 44, 22}
# printed using the sorted() function to display as requested/stated by the op.
print(f"Result of f - x: {sorted(result)}")
# Result of f - x: [11, 22, 33, 44]

v_child_value = [{'a':1}, {'b':2}, {'v':22}, {'bb':23}]
shop_by_cat_sub_cats = [{'a':1}, {'b':2}, {'bbb':222}, {'bb':23}]
unique_sub_cats = []
for ind in shop_by_cat_sub_cats:
if ind not in v_child_value:
unique_sub_cats.append(ind)
unique_sub_cats = [{'bbb': 222}]

Python code to create a unique list from two lists :
a=[1,1,2,3,5,1,8,13,6,21,34,55,89,1,2,3]
b=[1,2,3,4,5,6,7,8,9,10,11,12,2,3,4]
m=list(dict.fromkeys([a[i] for i in range(0,len(a)) if a [i] in a and a[i] in b and a[i]]))
print(m)

L=[]
For i in x:
If i not in f:
L. Append(I)
For i in f:
If I not in x:
L. Append(I)
Return L

Related

How to get the closest indexes within a list

If there are two lists:
One being the items:
items = ['A', 'A', 'A', 'B', 'B', 'C', 'C']
The other being their indexes:
index = [0, 15, 20, 2, 16, 7, 17]
ie. The first 'A' is in index 0, the second 'A' is in index 15, etc.
How would I be able to get the closest indexes for the unique items, A, B, and C
ie. Get 15, 16, 17?
You can achieve this with a simples script. Consider those two lists as input, you just want to find the index on the letter list and it's correspondence on number list:
list_of_repeated=[]
list_of_closest=[]
for letter in list_letter:
if letter in list_of_repeated:
continue
else:
list_of_repeated.append(letter)
list_of_closest.append(list_number[list_letter.index(letter)])
What you are trying to do is minimize the sum of differences between indices.
You can find the minimal combination like this:
import numpy as np
from itertools import product
items = ['A', 'A', 'A', 'B', 'B', 'C', 'C']
index = [0, 15, 20, 2, 16, 7, 17]
cost = np.inf
for combination in product(*[list(filter(lambda x: x[0] == i, zip(items, index))) for i in set(items)]):
diff = sum(abs(np.ediff1d([i[1] for i in combination])))
if diff < cost:
cost = diff
idx = combination
print(idx)
This is bruteforcing the solution, there may be more elegant / faster ways to do this, but this is what comes to my mind on the fly.

Sum values in nested lists python

I have a list:
l = [['a', 10, 30], ['b', 34, 89], ['c', 40, 60],['d',30,20]]
where the first item inside each sublist is the name and other two number are marks (sub1 and sub2)
The nested lists can be dynamic, i.e the number of nested lists can change according to function.
What I am looking for is to find
average of subj 1 i.e (10+34+40+30)/4 and
similarly sub2 (30+89+60+20)/4
also average marks of a: (10+30)/2
average marks of b: (34+89)/2 and so on.
I tried:
c = 0
for i in range(0,len(list_marks1)):
c += list_marks1[i][1]
sub_1avg = float(c)/len(list_marks1)
d = 0
for i in range(0,len(list_marks1)):
d += list_marks1[i][2]
sub_2avg = float(d)/len(list_marks1)
but this is not correct.
Is there any optimal way to do this? Since the number of subjects in my nested lists can also change.
You could just use sum and a generator expression:
>>> l= [['a', 10, 30], ['b', 34, 89], ['c', 40, 60],['d',30,20]]
>>> length = float(len(l)) # in Python 3 you don't need the "float"
>>> sum(subl[1] for subl in l) / length
28.5
>>> sum(subl[2] for subl in l) / length
49.75
Or even do that inside a list comprehension:
>>> [sum(subl[subj] for subl in l) / length for subj in range(1, 3)]
[28.5, 49.75]
Similarly for the average of one sublist:
>>> length = float(len(l[0])) - 1
>>> [sum(subl[1:]) / length for subl in l]
[20.0, 61.5, 50.0, 25.0]
When you have python 3.4 or newer you can replace the sum / length with statistics.mean:
>>> from statistics import mean
>>> [mean(subl[subj] for subl in l) for subj in range(1, 3)]
[28.5, 49.75]
>>> [mean(subl[1:]) for subl in l]
[20, 61.5, 50, 25]
You asked about the best way so I probably should mention that there are packages dedicated for tabular data. For example If you have pandas it's even easier using DataFrame and mean:
>>> import pandas as pd
>>> df = pd.DataFrame(l, columns=['name', 'x', 'y'])
>>> df[['x', 'y']].mean(axis=0)
x 28.50
y 49.75
dtype: float64
>>> df[['x', 'y']].mean(axis=1)
0 20.0
1 61.5
2 50.0
3 25.0
dtype: float64
A fucnitonal approach:
l= [['a', 10, 30], ['b', 34, 89], ['c', 40, 60],['d',30,20]]
map(lambda x: sum(x)/float(len(x)), zip(*l)[1:])
[28.5, 49.75]
This way will work for any sublist length
l= [['a', 10, 30], ['b', 34, 89], ['c', 40, 60],['d',30,20]]
sub1_avg = sum(n for _,n, _ in l)/float(len(l))
sub2_avg = sum(n for _,_, n in l)/float(len(l))
student_avgs = [{x[0]: sum(x[1:])//float((len(x)-1))} for x in l]
print "Sub1 avg - {}\nSub2 avg - {}\nstudent avg - {}".format(sub1_avg, sub2_avg, student_avgs)
sample output
Sub1 avg - 28.5
Sub2 avg - 49.75
student avg - [{'a': 20.0}, {'b': 61.0}, {'c': 50.0}, {'d': 25.0}]

Python Subtract Arrays Based on Same Time

Is there a way I can subtract two arrays, but making sure I am subtracting elements that have the same day, hour, year, and or minute values?
list1 = [[10, '2013-06-18'],[20, '2013-06-19'], [50, '2013-06-23'], [15, '2013-06-30']]
list2 = [[5, '2013-06-18'], [5, '2013-06-23'] [20, '2013-06-25'], [20, '2013-06-30']]
Looking for:
list1-list2 = [[5, '2013-06-18'], [45, '2013-06-23'] [10, '2013-06-30']]
How about using a defaultdict of lists?
import itertools
from operator import sub
from collections import defaultdict
def subtract_lists(l1, l2):
data = defaultdict(list)
for sublist in itertools.chain(l1, l2):
value, date = sublist
data[date].append(value)
return [(reduce(sub, v), k) for k, v in data.items() if len(v) > 1]
list1 = [[10, '2013-06-18'],[20, '2013-06-19'], [50, '2013-06-23'], [15, '2013-06-30']]
list2 = [[5, '2013-06-18'], [5, '2013-06-23'], [20, '2013-06-25'], [20, '2013-06-30']]
>>> subtract_lists(list1, list2)
[(-5, '2013-06-30'), (45, '2013-06-23'), (5, '2013-06-18')]
>>> # if you want them sorted by date...
>>> sorted(subtract_lists(list1, list2), key=lambda t: t[1])
[(5, '2013-06-18'), (45, '2013-06-23'), (-5, '2013-06-30')]
Note that the difference for date 2013-06-30 is -5, not +5.
This works by using the date as a dictionary key for a list of all values for the given date. Then those lists having more than one value in its list are selected, and the values in those lists are reduced by subtraction. If you want the resulting list sorted, you can do so using sorted() with the date item as the key. You could move that operation into the subtract_lists() function if you always want that behavior.
I think this code does what you want:
list1 = [[10, '2013-06-18'],[20, '2013-06-19'], [50, '2013-06-23'], [15, '2013-06-30']]
list2 = [[5, '2013-06-18'], [5, '2013-06-23'], [20, '2013-06-25'], [20, '2013-06-30']]
list1=dict([[i[1],i[0]] for i in list1])
list2=dict([[i[1],i[0]] for i in list2])
def minus(a,b):
return { k: a.get(k, 0) - b.get(k, 0) for k in set(a) & set(b) }
minus(list2,list1)
# returns the below, which is now a dictionary
{'2013-06-18': 5, '2013-06-23': 45, '2013-06-30': 5}
# you can convert it back into your format like this
data = [[value,key] for key, value in minus(list1,list2).iteritems()]
But you seem to have an error in your output data. If you want to include data when it's in either list, define minus like this instead:
def minus(a,b):
return { k: a.get(k, 0) - b.get(k, 0) for k in set(a) | set(b) }
See this answer, on Merge and sum of two dictionaries, for more info.

Python sort array by another positions array

Assume I have two arrays, the first one containing int data, the second one containing positions
a = [11, 22, 44, 55]
b = [0, 1, 10, 11]
i.e. I want a[i] to be be moved to position b[i] for all i. If I haven't specified a position, then insert a -1
i.e
sorted_a = [11, 22,-1,-1,-1,-1,-1,-1,-1,-1, 44, 55]
^ ^ ^ ^
0 1 10 11
Another example:
a = [int1, int2, int3]
b = [5, 3, 1]
sorted_a = [-1, int3, -1, int2, -1, int1]
Here's what I've tried:
def sort_array_by_second(a, b):
sorted = []
for e1 in a:
sorted.appendAt(b[e1])
return sorted
Which I've obviously messed up.
Something like this:
res = [-1]*(max(b)+1) # create a list of required size with only -1's
for i, v in zip(b, a):
res[i] = v
The idea behind the algorithm:
Create the resulting list with a size capable of holding up to the largest index in b
Populate this list with -1
Iterate through b elements
Set elements in res[b[i]] with its proper value a[i]
This will leave the resulting list with -1 in every position other than the indexes contained in b, which will have their corresponding value of a.
I would use a custom key function as an argument to sort. This will sort the values according to the corresponding value in the other list:
to_be_sorted = ['int1', 'int2', 'int3', 'int4', 'int5']
sort_keys = [4, 5, 1, 2, 3]
sort_key_dict = dict(zip(to_be_sorted, sort_keys))
to_be_sorted.sort(key = lambda x: sort_key_dict[x])
This has the benefit of not counting on the values in sort_keys to be valid integer indexes, which is not a very stable thing to bank on.
>>> a = ["int1", "int2", "int3", "int4", "int5"]
>>> b = [4, 5, 1, 2, 3]
>>> sorted(a, key=lambda x, it=iter(sorted(b)): b.index(next(it)))
['int4', 'int5', 'int1', 'int2', 'int3']
Paulo Bu answer is the best pythonic way. If you want to stick with a function like yours:
def sort_array_by_second(a, b):
sorted = []
for n in b:
sorted.append(a[n-1])
return sorted
will do the trick.
Sorts A by the values of B:
A = ['int1', 'int2', 'int3', 'int4', 'int5']
B = [4, 5, 1, 2, 3]
from operator import itemgetter
C = [a for a, b in sorted(zip(A, B), key = itemgetter(1))]
print C
Output
['int3', 'int4', 'int5', 'int1', 'int2']
a = [11, 22, 44, 55] # values
b = [0, 1, 10, 11] # indexes to sort by
sorted_a = [-1] * (max(b) + 1)
for index, value in zip(b, a):
sorted_a[index] = value
print(sorted_a)
# -> [11, 22, -1, -1, -1, -1, -1, -1, -1, -1, 44, 55]

Assignment to discontinuous slices in python

In Matlab I can do this:
s1 = 'abcdef'
s2 = 'uvwxyz'
s1(1:2:end) = s2(1:2:end)
s1 is now 'ubwdyf'
This is just an example of the general:
A(I) = B
Where A,B are vectors, I a vector of indices and B is the same length as I. (Im ignoring matrices for the moment).
What would be the pythonic equivalent of the general case in Python? Preferably it should also run on jython/ironpython (no numpy)
Edit: I used strings as a simple example but solutions with lists (as already posted, wow) are what I was looking for. Thanks.
>>> s1 = list('abcdef')
>>> s2 = list('uvwxyz')
>>> s1[0::2] = s2[0::2]
>>> s1
['u', 'b', 'w', 'd', 'y', 'f']
>>> ''.join(s1)
'ubwdyf'
The main differences are:
Strings are immutable in Python. You can use lists of characters instead though.
Indexing is 0-based in Python.
The slicing syntax is [start : stop : step] where all parameters are optional.
Strings are immutable in Python, so I will use lists in my examples.
You can assign to slices like this:
a = range(5)
b = range(5, 7)
a[1::2] = b
print a
which will print
[0, 5, 2, 6, 4]
This will only work for slices with a constant increment. For the more general A[I] = B, you need to use a for loop:
for i, b in itertools.izip(I, B):
A[i] = b
NumPy arrays can be indexed with an arbitrary list, much as in Matlab:
>>> x = numpy.array(range(10)) * 2 + 5
>>> x
array([ 5, 7, 9, 11, 13, 15, 17, 19, 21, 23])
>>> x[[1,6,4]]
array([ 7, 17, 13])
and assignment:
>>> x[[1,6,4]] = [0, 0, 0]
>>> x
array([ 5, 0, 9, 11, 0, 15, 0, 19, 21, 23])
Unfortunately, I don't think it is possible to get this without numpy, so you'd just need to loop for those.

Categories

Resources