Getting a list out of nested list in python - python

I am getting list out of a nested list.
list_of_data = [{'id':99,
'rocketship':{'price':[10, 10, 10, 10, 10],
'ytd':[1, 1, 1.05, 1.1, 1.18]}},
{'id':898,
'rocketship':{'price':[10, 10, 10, 10, 10],
'ytd':[1, 1, 1.05, 1.1, 1.18]}},
{'id':903,
'rocketship':{'price':[20, 20, 20, 10, 10],
'ytd':[1, 1, 1.05, 1.1, 1.18]}},
{'id':999,
'rocketship':{'price':[20, 20, 20, 10, 10],
'ytd':[1, 3, 4.05, 1.1, 1.18]}},
]
price, ytd = map(list, zip(*((list_of_data[i]['rocketship']['price'], list_of_data[i]['rocketship']['ytd']) for i in range(0, len(list_of_data)))))
My expected output is below (But, I am getting something different):
price = [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]
ytd = [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]
But, I am getting this:
price
Out[19]:
[[10, 10, 10, 10, 10],
[10, 10, 10, 10, 10],
[20, 20, 20, 10, 10],
[20, 20, 20, 10, 10]]
Expected output:
price = [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]
ytd = [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]

try this:
update
Thanks #shawn caza
performance test for 100000 loops:
shawncaza answer: 0.10945558547973633 seconds
my answer with get method : 0.1443953514099121 seconds
my answer with square bracket method : 0.10936307907104492 seconds
list_of_data = [{'id': 99,
'rocketship': {'price': [10, 10, 10, 10, 10],
'ytd': [1, 1, 1.05, 1.1, 1.18]}},
{'id': 898,
'rocketship': {'price': [10, 10, 10, 10, 10],
'ytd': [1, 1, 1.05, 1.1, 1.18]}},
{'id': 903,
'rocketship': {'price': [20, 20, 20, 10, 10],
'ytd': [1, 1, 1.05, 1.1, 1.18]}},
{'id': 999,
'rocketship': {'price': [20, 20, 20, 10, 10],
'ytd': [1, 3, 4.05, 1.1, 1.18]}},
]
price = []
ytd = []
for i in list_of_data:
price.extend(i['rocketship']['price'])
ytd.extend(i['rocketship']['ytd'])
print(price)
print(ytd)
>>> [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]
>>> [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]

Using list comprehension:
price, ytd = [i for item in list_of_data for i in item["rocketship"]["price"]],
[i for item in list_of_data for i in item["rocketship"]["ytd"]]
Output
price: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]
ytd: [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]

I traded a bit of readability for performance here
import itertools
tuples = ((item['rocketship']['price'], item['rocketship']['ytd']) for item in list_of_data)
price, ytd = functools.reduce(lambda a, b: (a[0] + b[0], a[1] + b[1]), tuples, ([], []))
I tried to keep things in a single loop and use generator to optimize memory use. But if the data is big, the resulting price and ytd are also big too, hopefully you thought about that already.
Update:
Thanks to #j1-lee's performance test, I redo the code again as follow:
import functools
def extend_list(a, b):
a.extend(b)
return a
tuples = ((item['rocketship']['price'], item['rocketship']['ytd'])
for item in list_of_data)
price, ytd = map(
list,
functools.reduce(
lambda a, b: (extend_list(a[0], b[0]), extend_list(a[1], b[1])),
tuples,
([], [])
)
)
This reduce the execution time from 45.556s to 0.096s. My best guess would be when you use + operator, it would create a new list from 2 old list, which requires copying them over a new one, so it will go as:
list(4) + list(4) = list(8) # 8 copies
list(8) + list(4) = list(12) # 12 copies
list(12) + list(4) = list(16) # 16 copies
...
Using .extend() would only need to copy the new additional list into the old one, so it should be faster
list(4).extend(list(4)) = list(8) # 4 copies
list(8).extend(list(4)) = list(12) # 4 copies
list(12).extend(list(4)) = list(16) # 4 copies
...
It would be better if someone can point to the specific documentation or information though.

Perform a list comprehension and flatten your result.
ytd = sum([d['rocketship']['ytd'] for d in list_of_data], [])
price = sum([d['rocketship']['price'] for d in list_of_data], [])

Instead of passing the list function in your map, you could pass itertools.chain.from_iterable to merge all the individual lists. Then you can run the list() after to transform the generator into a list
import itertools
price_gen, ytd_gen = map(itertools.chain.from_iterable ,zip(*((i['rocketship']['price'], i['rocketship']['ytd']) for i in list_of_data)))
price = list(price_gen)
ytd = list(ytd_gen)
However, creating seperate generators for each dataset actually seems to be much faster. ~7x faster in my test.
import itertools
price_gen = itertools.chain.from_iterable(d['rocketship']['price'] for d in list_of_data)
ytd_gen = itertools.chain.from_iterable(d['rocketship']['ytd'] for d in list_of_data)
price = list(price_gen)
ytd = list(ytd_gen)
Maybe it's the zip that slows things down?
cProfile comparison using the small original dataset looping the task 99,999 times using different solutions presented in this post:
ncalls tottime percall cumtime percall filename:lineno(function)
99999 0.132 0.000 1.344 0.000 (opt_khanh)
99999 0.469 0.000 0.714 0.000 (opt_shawn)
99999 0.142 0.000 0.535 0.000 (opt_Jaeyoon)
99999 0.267 0.000 0.413 0.000 (opt_ramesh)
99999 0.076 0.000 0.399 0.000 (opt_abdo)

I try to use a double comprehension. I don't know it's a good idea as it could hurt code readibility, maybe.
price = [
item
for sublist in [rocket["rocketship"]["price"] for rocket in list_of_data]
for item in sublist
]
ytd = [
item
for sublist in [rocket["rocketship"]["ytd"] for rocket in list_of_data]
for item in sublist
]
print(price)
print(ytd)

Related

How to create a nested list conditioned on a parameter in python

I have generated a day-wise nested list and want to calculate total duration between login and logout sessions and store that value individually in a duration nested list, organized by the day in which the login happened.
My python script is:
import datetime
import itertools
Logintime = [
datetime.datetime(2021,1,1,8,10,10),
datetime.datetime(2021,1,1,10,25,19),
datetime.datetime(2021,1,2,8,15,10),
datetime.datetime(2021,1,2,9,35,10)
]
Logouttime = [
datetime.datetime(2021,1,1,10,10,11),
datetime.datetime(2021,1,1,17,0,10),
datetime.datetime(2021,1,2,9,30,10),
datetime.datetime(2021,1,2,17,30,12)
]
Logintimedaywise = [list(group) for k, group in itertools.groupby(Logintime,
key=datetime.datetime.toordinal)]
Logouttimedaywise = [list(group) for j, group in itertools.groupby(Logouttime,
key=datetime.datetime.toordinal)]
print(Logintimedaywise)
print(Logouttimedaywise)
# calculate total duration
temp = []
l = []
for p,q in zip(Logintimedaywise,Logouttimedaywise):
for a,b in zip(p, q):
tdelta = (b-a)
diff = int(tdelta.total_seconds()) / 3600
if diff not in temp:
temp.append(diff)
l.append(temp)
print(l)
this script generating the following output (the duration in variable l is coming out as a flat list inside a singleton list):
[[datetime.datetime(2021, 1, 1, 8, 10, 10), datetime.datetime(2021, 1, 1, 10, 25, 19)], [datetime.datetime(2021, 1, 2, 8, 15, 10), datetime.datetime(2021, 1, 2, 9, 35, 10)]]
[[datetime.datetime(2021, 1, 1, 10, 10, 11), datetime.datetime(2021, 1, 1, 17, 0, 10)], [datetime.datetime(2021, 1, 2, 9, 30, 10), datetime.datetime(2021, 1, 2, 17, 30, 12)]]
[[2.000277777777778, 6.5808333333333335, 1.25, 7.917222222222223]]
But my desired output format is the following nested list of durations (each item in the list should be the list of durations for a given login day):
[[2.000277777777778, 6.5808333333333335] , [1.25, 7.917222222222223]]
anyone can help how can i store total duration as a nested list according to the login day?
thanks in advance.
Try changing this peace of code:
# calculate total duration
temp = []
l = []
for p,q in zip(Logintimedaywise,Logouttimedaywise):
for a,b in zip(p, q):
tdelta = (b-a)
diff = int(tdelta.total_seconds()) / 3600
if diff not in temp:
temp.append(diff)
l.append(temp)
print(l)
To:
# calculate total duration
l = []
for p,q in zip(Logintimedaywise,Logouttimedaywise):
l.append([])
for a,b in zip(p, q):
tdelta = (b-a)
diff = int(tdelta.total_seconds()) / 3600
if diff not in l[-1]:
l[-1].append(diff)
print(l)
Then the output would be:
[[datetime.datetime(2021, 1, 1, 8, 10, 10), datetime.datetime(2021, 1, 1, 10, 25, 19)], [datetime.datetime(2021, 1, 2, 8, 15, 10), datetime.datetime(2021, 1, 2, 9, 35, 10)]]
[[datetime.datetime(2021, 1, 1, 10, 10, 11), datetime.datetime(2021, 1, 1, 17, 0, 10)], [datetime.datetime(2021, 1, 2, 9, 30, 10), datetime.datetime(2021, 1, 2, 17, 30, 12)]]
[[2.000277777777778, 6.5808333333333335], [1.25, 7.917222222222223]]
I add a new sublist for every iteration.
Your solution and the answer by #U11-Forward will break if login and logout for the same session happen in different days, since the inner lists in Logintimedaywise and Logouttimedaywise will have different number of elements.
To avoid that, a way simpler solution is if you first calculate the duration for all pairs of login, logout, then you create the nested lists based only on the login day (or logout day if you wish), like this:
import datetime
import itertools
import numpy
# define the login and logout times
Logintime = [datetime.datetime(2021,1,1,8,10,10),datetime.datetime(2021,1,1,10,25,19),datetime.datetime(2021,1,2,8,15,10),datetime.datetime(2021,1,2,9,35,10)]
Logouttime = [datetime.datetime(2021,1,1,10,10,11),datetime.datetime(2021,1,1,17,0,10), datetime.datetime(2021,1,2,9,30,10),datetime.datetime(2021,1,2,17,30,12) ]
# calculate the duration and the unique days in the set
duration = [ int((logout - login).total_seconds())/3600 for login,logout in zip(Logintime,Logouttime) ]
login_days = numpy.unique([login.day for login in Logintime])
# create the nested list of durations
# each inner list correspond to a unique login day
Logintimedaywise = [[ login for login in Logintime if login.day == day ] for day in login_days ]
Logouttimedaywise = [[ logout for login,logout in zip(Logintime,Logouttime) if login.day == day ] for day in login_days ]
duration_daywise = [[ d for d,login in zip(duration,Logintime) if login.day == day ] for day in login_days ]
# check
print(Logintimedaywise)
print(Logouttimedaywise)
print(duration_daywise)
Outputs
[[datetime.datetime(2021, 1, 1, 8, 10, 10), datetime.datetime(2021, 1, 1, 10, 25, 19)], [datetime.datetime(2021, 1, 2, 8, 15, 10), datetime.datetime(2021, 1, 2, 9, 35, 10)]]
[[datetime.datetime(2021, 1, 1, 10, 10, 11), datetime.datetime(2021, 1, 1, 17, 0, 10)], [datetime.datetime(2021, 1, 2, 9, 30, 10), datetime.datetime(2021, 1, 2, 17, 30, 12)]]
[[2.000277777777778, 6.5808333333333335], [1.25, 7.917222222222223]]

Selecting a random sample from a very large generator

I am trying to test some strategies for a game, which can be defined by 10 non-negative integers that add up to 100. There are 109 choose 9, or roughly 10^12 of these, so comparing them all is not practical. I would like to take a random sample of about 1,000,000 of these.
I have tried the methods from the answers to this question, and this one, but all still seem far too slow to work. The quickest method seems like it will take about 180 hours on my machine.
This is how I've tried to make the generator (adapted from a previous SE answer). For some reason, changing prob does not seem to impact the run time of turning it into a list.
def tuples_sum_sample(nbval,total, prob, order=True) :
"""
Generate all the tuples L of nbval positive or nul integer
such that sum(L)=total.
The tuples may be ordered (decreasing order) or not
"""
if nbval == 0 and total == 0 : yield tuple() ; raise StopIteration
if nbval == 1 : yield (total,) ; raise StopIteration
if total==0 : yield (0,)*nbval ; raise StopIteration
for start in range(total,0,-1) :
for qu in tuples_sum(nbval-1,total-start) :
if qu[0]<=start :
sol=(start,)+qu
if order :
if random.random() <prob:
yield sol
else :
l=set()
for p in permutations(sol,len(sol)) :
if p not in l :
l.add(p)
if random.random()<prob:
yield p
Rejection sampling seems like it would take about 3 million years, so this is out as well.
randsample = []
while len(randsample)<1000000:
x = (random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100))
if sum(x) == 100:
randsample.append(x)
randsample
Can anyone think of another way to do this?
Thanks
A couple of frame-challenging questions:
Is there any reason you must generate the entire population, then sample that population?
Why do you need to check if your numbers sum to 100?
You can generate a set of numbers that sum to a value. Check out the first answer here:
Random numbers that add to 100: Matlab
Then generate the number of such sets you desire (1,000,000 in this case).
import numpy as np
def set_sum(number=10, total=100):
initial = np.random.random(number-1) * total
sort_list = np.append(initial, [0, total]).astype(int)
sort_list.sort()
set_ = np.diff(sort_list)
return set_
if __name__ == '__main__':
import timeit
a = set_sum()
n = 1000000
sample = [set_sum() for i in range(n)]
Numpy to the rescue!
Specifically, you need a multinomial distribution:
import numpy as np
desired_sum = 100
n = 10
np.random.multinomial(desired_sum, np.ones(n)/n, size=1000000)
It outputs a matrix with a million rows of 10 random integers in a few seconds. Each row sums up to 100.
Here's a smaller example:
np.random.multinomial(desired_sum, np.ones(n)/n, size=10)
which outputs:
array([[ 8, 7, 12, 11, 11, 9, 9, 10, 11, 12],
[ 7, 11, 8, 9, 9, 10, 11, 14, 11, 10],
[ 6, 10, 11, 13, 8, 10, 14, 12, 9, 7],
[ 6, 11, 6, 7, 8, 10, 8, 18, 13, 13],
[ 7, 7, 13, 11, 9, 12, 13, 8, 8, 12],
[10, 11, 13, 9, 6, 11, 7, 5, 14, 14],
[12, 5, 9, 9, 10, 8, 8, 16, 9, 14],
[14, 8, 14, 9, 11, 6, 10, 9, 11, 8],
[12, 10, 12, 9, 12, 10, 7, 10, 8, 10],
[10, 7, 10, 19, 8, 5, 11, 8, 8, 14]])
The sums appear to be correct:
sum(np.random.multinomial(desired_sum, np.ones(n)/n, size=10).T)
# array([100, 100, 100, 100, 100, 100, 100, 100, 100, 100])
Python only
You could also start with a list on 10 zeroes, iterate 100 times and increment a random cell each time :
import random
desired_sum = 100
n = 10
row = [0] * n
for _ in range(desired_sum):
row[random.randrange(n)] += 1
row
# [16, 7, 9, 7, 10, 11, 4, 19, 4, 13]
sum(row)
# 100

Merging rows in numpy to form new array

This is a sample of what I am trying to accomplish. I am very new to python and have searched for hours to find out what I am doing wrong. I haven't been able to find what my issue is. I am still new enough that I may be searching for the wrong phrases. If so, could you please point me in the right direction?
I want to combine n mumber of arrays to make one array. I want to have the first row from x as the first row in the combined the first row from y as the second row in combined, the first row from z as the third row in combined the the second row in x as the fourth row in combined, etc.
so I would look something like this.
x = [x1 x2 x3]
[x4 x5 x6]
[x7 x8 x9]
y = [y1 y2 y3]
[y4 y5 y6]
[y7 y8 y9]
x = [z1 z2 z3]
[z4 z5 z6]
[z7 z8 z9]
combined = [x1 x2 x3]
[y1 y2 y3]
[z1 z2 z3]
[x4 x5 x6]
[...]
[z7 z8 z9]
The best I can come up with is the
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((9,3))
for rows in range(len(x)):
combined[0::3] = x[rows,:]
combined[1::3] = y[rows,:]
combined[2::3] = z[rows,:]
print(combined)
All this does is write the last value of the input array to every third row in the output array instead of what I wanted. I am not sure if this is even the best way to do this. Any advice would help out.
*I just figure out this works but if someone knows a higher performance method, *please let me know.
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((18,3))
for rows in range(6):
combined[rows*3,:] = x[rows,:]
combined[rows*3+1,:] = y[rows,:]
combined[rows*3+2,:] = z[rows,:]
print(combined)
You can do this using a list comprehension and zip:
combined = np.array([row for row_group in zip(x, y, z) for row in row_group])
Using vectorised operations only:
A = np.vstack((x, y, z))
idx = np.arange(A.shape[0]).reshape(-1, x.shape[0]).T.flatten()
A = A[idx]
Here's a demo:
import numpy as np
x, y, z = np.random.rand(3,3), np.random.rand(3,3), np.random.rand(3,3)
print(x, y, z)
[[ 0.88259564 0.17609363 0.01067734]
[ 0.50299357 0.35075811 0.47230915]
[ 0.751129 0.81839586 0.80554345]]
[[ 0.09469396 0.33848691 0.51550685]
[ 0.38233976 0.05280427 0.37778962]
[ 0.7169351 0.17752571 0.49581777]]
[[ 0.06056544 0.70273453 0.60681583]
[ 0.57830566 0.71375038 0.14446909]
[ 0.23799775 0.03571076 0.26917939]]
A = np.vstack((x, y, z))
idx = np.arange(A.shape[0]).reshape(-1, x.shape[0]).T.flatten()
print(idx) # [0 3 6 1 4 7 2 5 8]
A = A[idx]
print(A)
[[ 0.88259564 0.17609363 0.01067734]
[ 0.09469396 0.33848691 0.51550685]
[ 0.06056544 0.70273453 0.60681583]
[ 0.50299357 0.35075811 0.47230915]
[ 0.38233976 0.05280427 0.37778962]
[ 0.57830566 0.71375038 0.14446909]
[ 0.751129 0.81839586 0.80554345]
[ 0.7169351 0.17752571 0.49581777]
[ 0.23799775 0.03571076 0.26917939]]
I have changed your code a little bit to get the desired output
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((18,3))
combined[0::3] = x
combined[1::3] = y
combined[2::3] = z
print(combined)
You had the shape of the combined matrix wrong and there is no real need for the for loop.
This might not be the most pythonic way to do it but you could
for block in range(len(combined)/3):
for rows in range(len(x)):
combined[block*3+0::3] = x[rows,:]
combined[block*3+1::3] = y[rows,:]
combined[block*3+2::3] = z[rows,:]
A simple numpy solution is to stack the arrays on a new middle axis, and reshape the result to 2d:
In [5]: x = np.arange(9).reshape(3,3)
In [6]: y = np.arange(9).reshape(3,3)+10
In [7]: z = np.arange(9).reshape(3,3)+100
In [8]: np.stack((x,y,z),axis=1).reshape(-1,3)
Out[8]:
array([[ 0, 1, 2],
[ 10, 11, 12],
[100, 101, 102],
[ 3, 4, 5],
[ 13, 14, 15],
[103, 104, 105],
[ 6, 7, 8],
[ 16, 17, 18],
[106, 107, 108]])
It may be easier to see what's happening if we give each dimension a different value; e.g. 2 3x4 arrays:
In [9]: x = np.arange(12).reshape(3,4)
In [10]: y = np.arange(12).reshape(3,4)+10
np.array combines them on a new 1st axis, making a 2x3x4 array. To get the interleaving you want, we can transpose the first 2 dimensions, producing a 3x2x4. Then reshape to a 6x4.
In [13]: np.array((x,y))
Out[13]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21]]])
In [14]: np.array((x,y)).transpose(1,0,2)
Out[14]:
array([[[ 0, 1, 2, 3],
[10, 11, 12, 13]],
[[ 4, 5, 6, 7],
[14, 15, 16, 17]],
[[ 8, 9, 10, 11],
[18, 19, 20, 21]]])
In [15]: np.array((x,y)).transpose(1,0,2).reshape(-1,4)
Out[15]:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[ 4, 5, 6, 7],
[14, 15, 16, 17],
[ 8, 9, 10, 11],
[18, 19, 20, 21]])
np.vstack produces a 6x4, but with the wrong order. We can't transpose that directly.
np.stack with default axis behaves just like np.array. But with axis=1, it creates a 3x2x4, which we can reshape:
In [16]: np.stack((x,y), 1)
Out[16]:
array([[[ 0, 1, 2, 3],
[10, 11, 12, 13]],
[[ 4, 5, 6, 7],
[14, 15, 16, 17]],
[[ 8, 9, 10, 11],
[18, 19, 20, 21]]])
The list zip in the accepted answer is a list version of transpose, creating a list of 3 2-element tuples.
In [17]: list(zip(x,y))
Out[17]:
[(array([0, 1, 2, 3]), array([10, 11, 12, 13])),
(array([4, 5, 6, 7]), array([14, 15, 16, 17])),
(array([ 8, 9, 10, 11]), array([18, 19, 20, 21]))]
np.array(list(zip(x,y))) produces the same thing as the stack, a 3x2x4 array.
As for speed, I suspect the allocate and assign (as in Ash's answer) is fastest:
In [27]: z = np.zeros((6,4),int)
...: for i, arr in enumerate((x,y)):
...: z[i::2,:] = arr
...:
In [28]: z
Out[28]:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[ 4, 5, 6, 7],
[14, 15, 16, 17],
[ 8, 9, 10, 11],
[18, 19, 20, 21]])
For serious timings, use much larger examples than this.

Python VTK: Coordinates directly to PolyData

I want to convert all coordinate combinations for x,y and z in specific range with for now step 1 directly to vtk.polyData or vtk.points. My first approach was to use itertools.product, but I thought this would have a very bad runtime. So i came to another approach with vtk, which i need anyway for the next part sof my program.
First appraoch with itertools.product
import numpy as np
import itertools
import vtk
x1=[10,11,12....310]
y1=[10,11,12....310]
z1=[0,1,2....65]
points1 = vtk.vtkPoints()
for coords in itertools.product(x1,y1,z1):
points1.InsertNextPoint(coords)
boxPolyData1 = vtk.vtkPolyData()
boxPolyData1.SetPoints(points1)
My approach with vtk so far:
import numpy as np
from vtk.util import numpy_support
coords = np.mgrid[10:310, 10:310, 0:65]
vtk_data_array = numpy_support.numpy_to_vtk(num_array=coords.ravel(),deep=True,array_type=vtk.VTK_FLOAT)
points = vtk.vtkPoints()
points.SetData(vtk_data_array)
But his just crashes my python. Does anyone has an idea?
best regards!
Stack those coords in columns with np.column_stack or np.c_ and then feed those as input to num_array, like so -
x,y,z = np.mgrid[10:310, 10:310, 0:65]
out_data = np.column_stack((x.ravel(), y.ravel(), z.ravel()))
vtk_data_array = numpy_support.numpy_to_vtk(num_array=out_data,\
deep=True,array_type=vtk.VTK_FLOAT)
Alternatively, to get out_data directly -
out_data = np.mgrid[10:310, 10:310, 0:65].reshape(3,-1).T
Another approach using initialization to replace the 3D array created by np.mgrid would be like so -
def create_mgrid_array(d00,d01,d10,d11,d20,d21,dtype=int):
df0 = d01-d00
df1 = d11-d10
df2 = d21-d20
a = np.zeros((df0,df1,df2,3),dtype=dtype)
X,Y,Z = np.ogrid[d00:d01,d10:d11,d20:d21]
a[:,:,:,2] = Z
a[:,:,:,1] = Y
a[:,:,:,0] = X
a.shape = (-1,3)
return a
Sample run to showcase usage of create_mgrid_array -
In [151]: create_mgrid_array(3,6,10,14,20,22,dtype=int)
Out[151]:
array([[ 3, 10, 20],
[ 3, 10, 21],
[ 3, 11, 20],
[ 3, 11, 21],
[ 3, 12, 20],
[ 3, 12, 21],
[ 3, 13, 20],
[ 3, 13, 21],
[ 4, 10, 20],
[ 4, 10, 21],
[ 4, 11, 20],
[ 4, 11, 21],
[ 4, 12, 20],
[ 4, 12, 21],
[ 4, 13, 20],
[ 4, 13, 21],
[ 5, 10, 20],
[ 5, 10, 21],
[ 5, 11, 20],
[ 5, 11, 21],
[ 5, 12, 20],
[ 5, 12, 21],
[ 5, 13, 20],
[ 5, 13, 21]])
Runtime test
Approaches -
def loopy_app():
x1 = range(10,311)
y1 = range(10,311)
z1 = range(0,66)
points1 = vtk.vtkPoints()
for coords in itertools.product(x1,y1,z1):
points1.InsertNextPoint(coords)
return points1
def vectorized_app():
out_data = create_mgrid_array(10,311,10,311,0,66,dtype=float)
vtk_data_array = numpy_support.numpy_to_vtk(num_array=out_data,\
deep=True,array_type=vtk.VTK_FLOAT)
points2 = vtk.vtkPoints()
points2.SetData(vtk_data_array)
return points2
Timings and verification -
In [155]: # Verify outputs with loopy and vectorized approaches
...: out1 = vtk_to_numpy(loopy_app().GetData())
...: out2 = vtk_to_numpy(vectorized_app().GetData())
...: print np.allclose(out1, out2)
...:
True
In [156]: %timeit loopy_app()
1 loops, best of 3: 923 ms per loop
In [157]: %timeit vectorized_app()
10 loops, best of 3: 67.3 ms per loop
In [158]: 923/67.3
Out[158]: 13.714710252600298
13x+ speedup there with the proposed vectorized one over the loopy one!

how to design agg funtion for pandas groupby

my dataFrame is like this:
user,rating, f1,f2,f3,f4
20, 3, 0.1, 0, 3, 5
20, 4, 0.2, 3, 5, 2
18, 4, 0.6, 8, 7, 2
18, 1, 0.7, 9, 2, 7
I want to compute a profile for a user, for instance
for user 20, it should be 3*[0.1,0,3,5]+4*[0.2,3,5,2]
which is a weighted sum of f1 to f4
How should I write a agg function to complete this task?
df.groupby('user').agg(....)
you can try this :
df.groupby('user').apply(lambda x : sum(x['rating'] * (x['f1']+x['f2']+x['f3']+x['f4'])))

Categories

Resources