Python Subtract Arrays Based on Same Time - python

Is there a way I can subtract two arrays, but making sure I am subtracting elements that have the same day, hour, year, and or minute values?
list1 = [[10, '2013-06-18'],[20, '2013-06-19'], [50, '2013-06-23'], [15, '2013-06-30']]
list2 = [[5, '2013-06-18'], [5, '2013-06-23'] [20, '2013-06-25'], [20, '2013-06-30']]
Looking for:
list1-list2 = [[5, '2013-06-18'], [45, '2013-06-23'] [10, '2013-06-30']]

How about using a defaultdict of lists?
import itertools
from operator import sub
from collections import defaultdict
def subtract_lists(l1, l2):
data = defaultdict(list)
for sublist in itertools.chain(l1, l2):
value, date = sublist
data[date].append(value)
return [(reduce(sub, v), k) for k, v in data.items() if len(v) > 1]
list1 = [[10, '2013-06-18'],[20, '2013-06-19'], [50, '2013-06-23'], [15, '2013-06-30']]
list2 = [[5, '2013-06-18'], [5, '2013-06-23'], [20, '2013-06-25'], [20, '2013-06-30']]
>>> subtract_lists(list1, list2)
[(-5, '2013-06-30'), (45, '2013-06-23'), (5, '2013-06-18')]
>>> # if you want them sorted by date...
>>> sorted(subtract_lists(list1, list2), key=lambda t: t[1])
[(5, '2013-06-18'), (45, '2013-06-23'), (-5, '2013-06-30')]
Note that the difference for date 2013-06-30 is -5, not +5.
This works by using the date as a dictionary key for a list of all values for the given date. Then those lists having more than one value in its list are selected, and the values in those lists are reduced by subtraction. If you want the resulting list sorted, you can do so using sorted() with the date item as the key. You could move that operation into the subtract_lists() function if you always want that behavior.

I think this code does what you want:
list1 = [[10, '2013-06-18'],[20, '2013-06-19'], [50, '2013-06-23'], [15, '2013-06-30']]
list2 = [[5, '2013-06-18'], [5, '2013-06-23'], [20, '2013-06-25'], [20, '2013-06-30']]
list1=dict([[i[1],i[0]] for i in list1])
list2=dict([[i[1],i[0]] for i in list2])
def minus(a,b):
return { k: a.get(k, 0) - b.get(k, 0) for k in set(a) & set(b) }
minus(list2,list1)
# returns the below, which is now a dictionary
{'2013-06-18': 5, '2013-06-23': 45, '2013-06-30': 5}
# you can convert it back into your format like this
data = [[value,key] for key, value in minus(list1,list2).iteritems()]
But you seem to have an error in your output data. If you want to include data when it's in either list, define minus like this instead:
def minus(a,b):
return { k: a.get(k, 0) - b.get(k, 0) for k in set(a) | set(b) }
See this answer, on Merge and sum of two dictionaries, for more info.

Related

How do you find most duplicates in a 2d list?

I have a 2d list that i would like to return the most duplicates by using a list comprehension. For example, i have a list below
a = [[10, 15, 17,],[20,21,27],[10,15,17],[21,27,28],[21,27,28],[5,10,15],[15,17,20]]
I would like my result to be
b = [[10,15,17],[21,27,28]
The common solution for counting repetitions is collections.Counter:
from collections import Counter
a = [[10, 15, 17], [20, 21, 27], [10, 15, 17], [21, 27, 28], [21, 27, 28], [5, 10, 15], [15, 17, 20]]
# count duplicates
counts = Counter(map(tuple, a))
# find the maximum count (the values of counts are the duplicate count)
maximum_count = max(counts.values())
# filter and convert back to list
result = [list(e) for e, count in counts.items() if count == maximum_count]
print(result)
Output
[[10, 15, 17], [21, 27, 28]]
In your case in particular as the elements of the list are list, you need to convert them to tuples (to make them hashable), then just filter the list and keep the elements with maximum count.
One line splitted here:
[ a[k]
for k in range(len(a))
if a.count( a[k] ) > 1
and k == a.index( a[k] ) ]
The simplest way to do this to find the count for each element, and store the maximum count. Then, display all elements that have the maximum count (removing duplicates).
The below code will work for you:
a = [[10, 15, 17,],[20,21,27],[10,15,17],[21,27,28],[21,27,28],[5,10,15],[15,17,20]]
check=0
for i in a:
if a.count(i) > check:
check=a.count(i) #Check to see maximum count
b=[]
for i in a:
if a.count(i) == check: #Choosing elements with maximum count
if i not in b: #Eliminating duplicates
b.append(i)
print(b)
Output:
[[10, 15, 17], [21, 27, 28]]

Sum columns of parts of 2D array Python

how do you take the sum of the two last columns if the two first columns matches?
input:
M = [[1,1,3,5],
[1,1,4,6],
[1,2,3,7],
[1,2,6,6],
[2,1,0,8],
[2,1,3,5],
[2,2,9,6],
[2,2,3,4]]
output:
M = [[1,1,7,11],
[1,2,9,13],
[2,1,3,13],
[2,2,12,10]]
and can you do it whit a for loop?
Assuming that 2 similar lists follow each other always you could iterate over M[:-1] and then check the current list values against the next list values
M =[[1,1,3,5],[1,1,4,6],[1,2,3,7],[1,2,6,6],[2,1,0,8],[2,1,3,5],[2,2,9,6],[2,2,3,4]]
t=[]
for i,m in enumerate(M[:-1]):
if m[0] == M[i+1][0] and m[1]==M[i+1][1]:
t.append([m[0],m[1],m[2]+M[i+1][2],m[3]+M[i+1][3]])
print(t)
#[[1, 1, 7, 11], [1, 2, 9, 13], [2, 1, 3, 13], [2, 2, 12, 10]]
If the order might be scrambled I'd use 2 for loops. The second one will check m against every other list after it (it doesn't need to check those earlier since they checked against it).
for i,m in enumerate(M[:-1]):
for x,n in enumerate(M[i+1:]):
if m[0] == n[0] and m[1]==n[1]:
t.append([m[0],m[1],m[2]+n[2],m[3]+n[3]])
We can find the unique tuples in the first two columns and then iterate over those to find the sum of each column whose rows equal the tuple.
Not sure what the fastest solution is, but this is one option:
M =[[1,1,3,5],[1,1,4,6],[1,2,3,7],[1,2,6,6],[2,1,0,8],[2,1,3,5],[2,2,9,6],[2,2,3,4]]
ans = []
for vals in list(set((x[0], x[1]) for x in M)):
ans.append([vals[0], vals[1], sum(res[2] for res in M if (res[0], res[1]) == vals), sum(res[3] for res in M if (res[0], res[1]) == vals)])
A solution with list comprehension and itertools' groupby:
from itertools import groupby
M = [
[1,1,3,5],
[1,1,4,6],
[1,2,3,7],
[1,2,6,6],
[2,1,0,8],
[2,1,3,5],
[2,2,9,6],
[2,2,3,4],
]
print([
[
key[0],
key[1],
sum(x[2] for x in group),
sum(x[3] for x in group),
]
for key, group in [
(key, list(group))
for key, group in groupby(sorted(M), lambda x: (x[0], x[1]))
]
])
Result:
[[1, 1, 7, 11], [1, 2, 9, 13], [2, 1, 3, 13], [2, 2, 12, 10]]
With reduce it can be simplified to:
from itertools import groupby
from functools import reduce
M = [
[1,1,3,5],
[1,1,4,6],
[1,2,3,7],
[1,2,6,6],
[2,1,0,8],
[2,1,3,5],
[2,2,9,6],
[2,2,3,4],
]
print([
reduce(
lambda x, y: [y[0], y[1], y[2] + x[2], y[3] + x[3]],
group,
(0, 0, 0, 0),
)
for _, group in groupby(sorted(M), lambda x: (x[0], x[1]))
])

Get only unique elements from two lists

If I have two lists (may be with different len):
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
result = [11,22,33,44]
im doing:
for element in f:
if element in x:
f.remove(element)
I'm getting
result = [11,22,33,44,4]
UPDATE:
Thanks to #Ahito:
In : list(set(x).symmetric_difference(set(f)))
Out: [33, 2, 22, 11, 44]
This article has a neat diagram that explains what the symmetric difference does.
OLD answer:
Using this piece of Python's documentation on sets:
>>> # Demonstrate set operations on unique letters from two words
...
>>> a = set('abracadabra')
>>> b = set('alacazam')
>>> a # unique letters in a
{'a', 'r', 'b', 'c', 'd'}
>>> a - b # letters in a but not in b
{'r', 'd', 'b'}
>>> a | b # letters in a or b or both
{'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}
>>> a & b # letters in both a and b
{'a', 'c'}
>>> a ^ b # letters in a or b but not both
{'r', 'd', 'b', 'm', 'z', 'l'}
I came up with this piece of code to obtain unique elements from two lists:
(set(x) | set(f)) - (set(x) & set(f))
or slightly modified to return list:
list((set(x) | set(f)) - (set(x) & set(f))) #if you need a list
Here:
| operator returns elements in x, f or both
& operator returns elements in both x and f
- operator subtracts the results of & from | and provides us with the elements that are uniquely presented only in one of the lists
If you want the unique elements from both lists, this should work:
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
res = list(set(x+f))
print(res)
# res = [1, 2, 3, 4, 33, 11, 44, 22]
Based on the clarification of this question in a new (closed) question:
If you want all items from the second list that do not appear in the first list you can write:
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
result = set(f) - set(x) # correct elements, but not yet in sorted order
print(sorted(result)) # sort and print
# Output: [11, 22, 33, 44]
x = [1, 2, 3, 4]
f = [1, 11, 22, 33, 44, 3, 4]
list(set(x) ^ set(f))
[33, 2, 22, 11, 44]
if you want to get only unique elements from the two list then you can get it by..
a=[1,2,3,4,5]
b= [2,4,1]
list(set(a) - set(b))
OP:- [3, 5]
Input :
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
Code:
l = list(set(x).symmetric_difference(set(f)))
print(l)
Output :
[2, 22, 33, 11, 44]
Your method won't get the unique element "2". What about:
list(set(x).intersection(f))
Simplified Version & in support of #iopheam's answer.
Use Set Subtraction.
# original list values
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
# updated to set's
y = set(x) # {1, 2, 3, 4}
z = set(f) # {1, 33, 3, 4, 11, 44, 22}
# parsed to the result variable
result = z - y # {33, 11, 44, 22}
# printed using the sorted() function to display as requested/stated by the op.
print(f"Result of f - x: {sorted(result)}")
# Result of f - x: [11, 22, 33, 44]
v_child_value = [{'a':1}, {'b':2}, {'v':22}, {'bb':23}]
shop_by_cat_sub_cats = [{'a':1}, {'b':2}, {'bbb':222}, {'bb':23}]
unique_sub_cats = []
for ind in shop_by_cat_sub_cats:
if ind not in v_child_value:
unique_sub_cats.append(ind)
unique_sub_cats = [{'bbb': 222}]
Python code to create a unique list from two lists :
a=[1,1,2,3,5,1,8,13,6,21,34,55,89,1,2,3]
b=[1,2,3,4,5,6,7,8,9,10,11,12,2,3,4]
m=list(dict.fromkeys([a[i] for i in range(0,len(a)) if a [i] in a and a[i] in b and a[i]]))
print(m)
L=[]
For i in x:
If i not in f:
L. Append(I)
For i in f:
If I not in x:
L. Append(I)
Return L

Write a column next to list in csv_python

I have two lists, when I print them separately using:
writer.writerows(list_)
I get
list_1:
0 -0.00042 0.004813 0.010428 0.051006
1 0.000053 0.004531 0.010447 0.051962
2 0.000589 0.004518 0.009801 0.052226
3 0.000083 0.004581 0.010362 0.052288
4 -0.000192 0.003726 0.011258 0.051094
5 0.000281 0.004078 0.01008 0.052156
list_2:
-0.000419554 -0.000366128 0.000223134 0.000306416 0.000114709
It's been a whole day I've been trying to add list_2 as another column to list_1 and write them to a csv file. I thought it was straightforward but have stuck. I appreciate any help.
A general solution which combines columns from two lists of lists (or other iterables):
import csv
import itertools
import sys
def combine_columns(iterable1, iterable2):
for x, y in itertools.izip(iterable1, iterable2):
yield list(x) + list(y)
list1 = [[11, 12, 13], [21, 22, 23]]
list2 = [[14, 15], [24, 25]]
writer = csv.writer(sys.stdout)
writer.writerows(combine_columns(list1, list2))
Output:
11,12,13,14,15
21,22,23,24,25
Here is an example, not knowing what your data look like, so I have to guess:
list_1 = [
[1,2,3],
[4,5,6],
[7,8,9],
]
list_2 = [10,20,30]
list_1 = [row + [col] for row, col in zip(list_1, list_2)]
for row in list_1:
print row
Output:
[1, 2, 3, 10]
[4, 5, 6, 20]
[7, 8, 9, 30]
Now that list_1 has list_2 as a new column, you can use the csv module to write it out.

python array intersection efficiently

I don't know how to make an intersect between these two arrays:
a = [[125, 1], [193, 1], [288, 23]]
b = [[108, 1], [288, 1], [193, 11]]
result = [[288,24], [193, 12]]
So the intersection is by the first element, the second element of the array is summed, any ideas how to do this efficiently?
Ok i made a mistake for not explaining what i mean about efficient, sorry. Consider the following naive implementation:
a = [[125, 1], [193, 1], [288, 23]]
b = [[108, 1], [288, 1], [193, 11]]
result = {}
for i, j in a:
for k, l in b:
if i == k:
result[i] = j + l
print result
So i was trying to find a way to make more efficient solution to my problem, more pythonic in a way. So that's why i needed your help.
Try this test cases (my code is also on it):
Test Case
Running time: 28.6980509758
This data seems like it would be better stored as a dictionary
da = dict(a)
db = dict(b)
once you have it like that you can just:
result = [[k, da[k] + db[k]] for k in set(da.keys()).intersection(db.keys())]
or in python 2.7 you can also just use viewkeys instead of a set
result = [[k, da[k] + db[k]] for k in da.viewkeys() & db]
result = []
ms, mb = (dict(a),dict(b)) if len(a)<len(b) else (dict(b),dict(a))
for k in ms:
if k in mb:
result.append([k,ms[k]+mb[k]])
Use a counter:
c_sum = Counter()
c_len = Counter()
for elt in a:
c_sum[elt[0]] += elt[1]
c_len[elt[0]] += 1
for elt in b:
c_sum[elt[0]] += elt[1]
c_len[elt[0]] += 1
print [[k, c_sum[k]] for k, v in c_len.iteritems() if v > 1]
Here you go
a = [[125, 1], [193, 1], [288, 23]]
b = [[108, 1], [288, 1], [193, 11]]
for e in a:
for e2 in b:
if e[0] == e2[0]:
inter.append([e[0], e[1]+e2[1]])
print inter
Outputs
[[193, 12], [288, 24]]
This solution works if it you also want duplicate items within the lists to be counted.
from collections import defaultdict
a = [[125, 1], [193, 1], [288, 23]]
b = [[108, 1], [288, 1], [193, 11]]
d = defaultdict(int)
for value, num in a+b:
d[value] += num
result = filter(lambda x:x[1]>1, d.items())
result = map(list, result) # If it's important that the result be a list of lists rather than a list of tuples
print result
# [[288, 24], [193, 12]]
In first place, Python does not have arrays. It has lists. Just a matter of name, but it can be confusing. The one-liner for this is:
[ [ae[0],ae[1]+be[1]] for be in b for ae in a if be[0] == ae[0] ]
PS: As you say "intersection", I assume the lists are set-like ("bags", really), and that , as bags, they are properly normalized (i.e. they don't have repeated elements/keys).
Here is how I would approach it, assuming uniqueness on a and b:
k = {} # store totals
its = {} # store intersections
for i in a + b:
if i[0] in k:
its[i[0]] = True
k[i[0]] += i[1]
else:
k[i[0]] = i[1]
# then loop through intersections for results
result = [[i, k[i]] for i in its]
I got:
from collections import defaultdict
d = defaultdict(list)
for series in a, b:
for key, value in series:
d[key].append(value)
result2 = [(key, sum(values)) for key, values in d.iteritems() if len(values) > 1]
which runs in O(len(a)+len(b)), or about 0.02 seconds on my laptop vs 18.79 for yours. I also confirmed that it returned the same results as result.items() from your algorithm.
This solution might not be the fastest, but it's probably the simplest implementation, so I decided to post it, for completeness.
aa = Counter(dict(a))
bb = Counter(dict(b))
cc = aa + bb
cc
=> Counter({288: 24, 193: 12, 108: 1, 125: 1})
list(cc.items())
=> [(288, 24), (193, 12), (108, 1), (125, 1)]
If you must only include the common keys:
[ (k, cc[k]) for k in set(aa).intersection(bb) ]
=> [(288, 24), (193, 12)]
numpy serachsorted(), argsort(), and intersect1d() are possible alternatives and can be quite fast. This example should also take care of non-unique first element issue.
>>> import numpy as np
>>> a=np.array([[125, 1], [193, 1], [288, 23]])
>>> b=np.array([[108, 1], [288, 1], [193, 11]])
>>> aT=a.T
>>> bT=b.T
>>> aS=aT[0][np.argsort(aT[0])]
>>> bS=bT[0][np.argsort(bT[0])]
>>> i=np.intersect1d(aT[0], bT[0])
>>> cT=np.hstack((aT[:,np.searchsorted(aS, i)], bT[:,np.searchsorted(bS, i)]))
>>> [[item,np.sum(cT[1,np.argwhere(cT[0]==item).flatten()])] for item in i]
[[193, 12], [288, 24]] #not quite happy about this, can someone comes up with a vectorized way of doing it?

Categories

Resources