Reshaping vector with indices in python - python

I am having some problems resizing a list in python. I have a vector (A) with -9999999 as a few of the elements. I want to find those elements remove them and remove the corresponding elements in B.
I have tried to index the non -9999999 values like this:
i = [i for i in range(len(press)) if press[i] !=-9999999]
But I get an error when I try to use the index to reshape press and my other vector.
Type Error: list indices must be integers, not list
The vectors have a length of about 26000
Basically if I have vector A I want to remove -9999999 elements from A and 65 and 32 in B.
A = [33,55,-9999999,44,78,22,-9999999,10,34]
B = [22,33,65,87,43,87,32,77,99]

Since you mentioned vector, so I think you're looking for a NumPy based solution:
>>> import numpy as np
>>> a = np.array(A)
>>> b = np.array(B)
>>> b[a!=-9999999]
array([22, 33, 87, 43, 87, 77, 99])
Pure Python solution using itertools.compress:
>>> from itertools import compress
>>> list(compress(B, (x != -9999999 for x in A)))
[22, 33, 87, 43, 87, 77, 99]
Timing comparisons:
>>> A = [33,55,-9999999,44,78,22,-9999999,10,34]*10000
>>> B = [22,33,65,87,43,87,32,77,99]*10000
>>> a = np.array(A)
>>> b = np.array(B)
>>> %timeit b[a!=-9999999]
100 loops, best of 3: 2.78 ms per loop
>>> %timeit list(compress(B, (x != -9999999 for x in A)))
10 loops, best of 3: 22.3 ms per loop

A = [33,55,-9999999,44,78,22,-9999999,10,34]
B = [22,33,65,87,43,87,32,77,99]
A1, B1 = (list(x) for x in zip(*((a, b) for a, b in zip(A, B) if a != -9999999)))
print(A1)
print(B1)
This yields:
[33, 55, 44, 78, 22, 10, 34]
[22, 33, 87, 43, 87, 77, 99]

c = [j for i, j in zip(A, B) if i != -9999999]
zip merges two lists, creating a list of the pairs (x, y). Using list comprehension you can filter the elements that are -999999 in A.

Related

how to calculate Manhattan distance (or L1/ cityblock) for two 2D array?

For 1D vector/array it's easier. For example:
array1 = [1, 2, 3]
array2 = [1, 1, 1]
manhattan distance will be: (0+1+2) which is 3
import numpy as np
def cityblock_distance(A, B):
result = np.sum([abs(a - b) for (a, b) in zip(A, B)])
return result
The output for 2 points will be: 3
But what about a 2D array/vector. For example, what will be the manhattan(or L1 or cityblock) for two 2D vector like these (below):
arr1 = [[29, 30, 36, 30, 18],[37, 37, 49, 54, 23]]
arr2 = [[31, 33, 37, 34, 22],[37, 38, 50, 58, 26]]
if I use the code I mentioned above, it is giving 3 as output for 1 D vector. For the 2D vector the output it's showing as 2281. In my sense the logical manhattan distance should be like this :
difference of the first item between two arrays: 2,3,1,4,4 which sums to 14
difference of the second item between two array:0,1,1,4,3 which is 9.
The total sum will be 23 as so manhattan distance between those two 2D array will be 23.
Is my calculation going wrong or is there any problem with my concept of L1 distance?
If arr1 and arr2 are numpy arrays you can use:
# skip if already numpy arrays:
arr1 = np.array(arr1)
arr2 = np.array(arr2)
x = np.sum(np.abs(arr1 - arr2))
print(x)
Prints:
23
You can modify your code to calculate the desired result by comparing each list in the 2d array in turn (using your code for the 1D case) and then summing the result:
def cityblock_distance(A, B):
result = np.sum([np.sum([abs(a - b) for (a, b) in zip(C, D)]) for C, D in zip(A, B)])
return result
cityblock_distance(arr1,arr2)
Output:
23
Your code as written won't work, even after fixing the indentation:
In [177]:
...: def cityblock_distance(A, B):
...: result = np.sum([abs(a - b) for (a, b) in zip(A, B)])
...: return result
...:
In [178]: arr1=[[29, 30, 36, 30, 18],[37, 37, 49, 54, 23]]; arr2=[[31, 3
...: 3, 37, 34, 22],[37, 38, 50, 58, 26]]
In [179]: cityblock_distance(arr1, arr2)
------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-179-d1b44e49141d> in <module>
----> 1 cityblock_distance(arr1, arr2)
<ipython-input-177-907001cbfc9b> in cityblock_distance(A, B)
1 def cityblock_distance(A, B):
----> 2 result = np.sum([abs(a - b) for (a, b) in zip(A, B)])
3 return result
<ipython-input-177-907001cbfc9b> in <listcomp>(.0)
1 def cityblock_distance(A, B):
----> 2 result = np.sum([abs(a - b) for (a, b) in zip(A, B)])
3 return result
TypeError: unsupported operand type(s) for -: 'list' and 'list'
If you convert to arrays you'll get the L1 norm you wanted:
In [180]: cityblock_distance(np.array(arr1), np.array(arr2))
Out[180]: 23
but, because by default numpy.sum sums all the elements in the array, you can omit the list comprehension altogether:
In [181]: np.sum(abs(np.array(arr1)-np.array(arr2)))
Out[181]: 23
Wikipedia calls your norm the "entry-wise 1,1-norm". You can compute the matrix 1-norm ("maximum absolute row sum") with numpy.linalg.norm:
In [193]: np.linalg.norm(np.array(arr1)-np.array(arr2), 1)
Out[193]: 8.0

How to apply the same function to multiple lists

I have two lists and want to apply the same function to both, I know how to apply one at a time, but not both? Then I want to add each element to gather a total?
a = ['a','b','c','d','e']
b = ['b', np.nan,'c','e','a']
c = ['a','b','c','d','e']
I know you could do below to get the output, but I wanted to do it with serparation
a = [1 if 'a' in a else 99 for x in a]
b = [1 if 'a' in b else 99 for x in b]
c = [1 if 'a' in c else 99 for x in c]
I first want to outputs below:
a = [1, 99, 99, 99, 99]
b = [99, 99, 99, 99, 1]
c = [99, 99, 99, 99, 1]
Then Add each elements into one final list
sum = [199, 297, 297, 297, 101]
I'm not sure whether I understood your question correctly. As fgblomqvist commented, I replaced 1 if 'a' in a by 1 if x == 'a' in the list comprehension. Then I basically reproduced your second step with a for-loop and after that I used zip to iterate over the list values of all lists synchronously in order to calculate the sum.
a = ['a','b','c','d','e']
b = ['b','a','c','e','a']
c = ['a','b','c','d','e']
# add the lists to a list.
lists = [a,b,c]
outcomes = []
for l in lists:
outcome = [1 if x == 'a' else 99 for x in l]
outcomes.append(outcome)
print(f'one of the outcomes: {outcome}')
results = []
# iterate over all list values synchronously and calculate the sum
for outs in zip(*outcomes):
results.append(sum(outs))
print(f'sum of {outs} is {sum(outs)}')
print(f'final result:{results}')
This is the output:
one of the outcomes: [1, 99, 99, 99, 99]
one of the outcomes: [99, 1, 99, 99, 1]
one of the outcomes: [1, 99, 99, 99, 99]
sum of (1, 99, 1) is 101
sum of (99, 1, 99) is 199
sum of (99, 99, 99) is 297
sum of (99, 99, 99) is 297
sum of (99, 1, 99) is 199
final result:[101, 199, 297, 297, 199]
edit: To avoid looping twice you could join the loops together like so:
lists = [a,b,c]
sums = []
for values in zip(*lists):
the_sum = 0
for val in values:
the_sum += 1 if val == 'a' else 99
sums.append(the_sum)
print(f'sums: {sums}')
Keep in mind you can replace the 1 if val == 'a' else 99 by some_func(val)
pandas makes this quite easy
(although im sure its amost as easy just with numpy)
import pandas
df = pandas.DataFrame({'a':a,'b':b,'c':c})
mask = df == 'a'
df[mask] = 1
df[~mask] = 99
df.sum(axis=1)
import numpy as np
a = ['a','b','c','d','e']
b = ['b', np.nan,'c','e','a']
c = ['a','b','c','d','e']
dct = {"a":1}
sum_var = [np.nansum([dct.get(aa,99), dct.get(bb,99), dct.get(cc,99)]) for (aa, bb, cc) in zip(a,b,c)]
Explaination:
You can use list comprehension (as you did in your example) with a bit of modification. Instead of iterating through a single list, iterate through a collection of all the lists. You can achive this by using the builtin function "zip()" which essentially zips the lists together.
Since all 3 lists have the same length you can iterate through it and apply additional transformations on each element like shown in the example.
The additional function in this example is the dictionary.get() method which gets you the value for each key, here the 1 for an "a". Everything which is not in the dictionary will return a 99. But in the same way you can use custom made functions.

Random shuffle multiple lists python

I have a set of lists in Python and I want to shuffle both of them but switching elements in same positions in both lists like
a=[11 22 33 44] b = [66 77 88 99]
*do some shuffeling like [1 3 0 2]*
a=[22 44 11 33] b = [77 99 66 88]
Is this possible?
Here's a solution that uses list comprehensions:
>>> a = [11, 22, 33, 44]
>>> b = [66, 77, 88, 99]
>>> p = [1, 3, 0, 2]
>>>
>>> [a[i] for i in p]
[22, 44, 11, 33]
>>>
>>> [b[i] for i in p]
[77, 99, 66, 88]
>>>
You can use zip in concert with the random.shuffle operator:
a = [1,2,3,4] # list 1
b = ['a','b','c','d'] # list 2
c = zip(a,b) # zip them together
random.shuffle(c) # shuffle in place
c = zip(*c) # 'unzip' them
a = c[0]
b = c[1]
print a # (3, 4, 2, 1)
print b # ('c', 'd', 'b', 'a')
If you want to retain a,b as lists, then just use a=list(c[0]). If you don't want them to overwrite the original a/b then rename like a1=c[0].
Expanding upon Tom's answer, you can make the p list easily and randomize it like this:
import random
p = [x for x in range(len(a))]
random.shuffle(p)
This works for any size lists, but I'm assuming from your example that they're all equal in size.
Tom's answer:
Here's a solution that uses list comprehensions:
a = [11, 22, 33, 44]
b = [66, 77, 88, 99]
p = [1, 3, 0, 2]
[a[i] for i in p]
[22, 44, 11, 33]
[b[i] for i in p]
[77, 99, 66, 88]
a=[11,22,33,44]
order = [1,0,3,2] #give the order
new_a = [a[k] for k in order] #list comprehension that's it
You just give the order and then do list comprehension to get new list

Inverse cumsum for numpy

A is a ((d,e)) numpy array. I compute a ((d,e)) numpy array B where I compute the entry B[i,j] as follows
b=0
for k in range(i+1,d):
for l in range(j+1,e):
b=b+A[k,l]
B[i,j]=b
In other words, B[i,j] is the sum of A[k,l] taken over all indices k>i, l>j; this is sort of the opposite of the usual cumsum applied to both axis. I am wondering if there is a more elegant and faster way to do this (e.g. using np.cumsum)?
Assuming you're trying to do this:
A = np.arange(15).reshape((5, -1))
def cumsum2_reverse(arr):
out = np.empty_like(arr)
d, e = arr.shape
for i in xrange(d):
for j in xrange(e):
b = 0
for k in xrange(i + 1, d):
for l in xrange(j + 1, e):
b += arr[k, l]
out[i, j] = b
return out
Then if you do,
In [1]: A_revsum = cumsum2_reverse(A)
In [2]: A_revsum
Out[2]:
array([[72, 38, 0],
[63, 33, 0],
[48, 25, 0],
[27, 14, 0],
[ 0, 0, 0]])
You could use np.cumsum on the reverse-ordered arrays to compute the sum. For example, at first you might try something similar to what #Jaime suggested:
In [3]: np.cumsum(np.cumsum(A[::-1, ::-1], 0), 1)[::-1, ::-1]
Out[3]:
array([[105, 75, 40],
[102, 72, 38],
[ 90, 63, 33],
[ 69, 48, 25],
[ 39, 27, 14]])
Here we remember that np.cumsum starts with the value in the first column (in this case last column), so to ensure zeros there, you could shift the output of this operation. This might look like:
def cumsum2_reverse_alt(arr):
out = np.zeros_like(arr)
out[:-1, :-1] = np.cumsum(np.cumsum(arr[:0:-1, :0:-1], 0), 1)[::-1, ::-1]
return out
This gives the same values as above.
In [4]: (cumsum2_reverse(A) == cumsum2_reverse_alt(A)).all()
Out[4]: True
Note, that the one that utilizes np.cumsum is much faster for large arrays. For example:
In [5]: A=np.arange(3000).reshape((50, -1))
In [6]: %timeit cumsum2_reverse(A)
1 loops, best of 3: 453 ms per loop
In [7]: %timeit cumsum2_reverse_alt(A)
10000 loops, best of 3: 24.7 us per loop

How to multiple each object of several lists in Python

I have three list and I want to multiply their objects in the order they appear in the list and then return the results in a new list:
a = [1,5,4,3]
b = [20, 44, 40, 100]
c = [222, 432, 670, 190]
The new list should have the results of these calculations:
new_list = [(1*20*222),(5*44*432), (4*40*670), (3*100*190)]
new_list = [x * y * z for x, y, z in zip(a, b, c)]
Alternatively, especially usefuly if you have more than three lists:
import operator
new_list = [reduce(operator.mul, lst, 1) for lst in zip(a, b, c)]
[x * y * z for x, y, z in zip(a, b, c)]
iterates over a "zipped" list and multiplies their components.
In [1]: a = [1,5,4,3]
In [2]: b = [20, 44, 40, 100]
In [3]: c = [222, 432, 670, 190]
In [5]: [(x*y*z) for x,y,z in zip(a,b,c)]
Out[5]: [4440, 95040, 107200, 57000]

Categories

Resources