Related
I have a following dataframe where I have one another list of index position based on some condition so just want to create the new dataframe based on the index position and check some condition on that.
df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]
re_ind = list(np.where(df['diff_in_min'] >= 300))
re_ind = [np.array([85, 512], dtype='int64')]
Just I want to create another dataframe based on re_ind position, ex:
first_df = df[0:85]
another_df = [85:512]
last_df = [512:]
and each dataframe I want to check one condition
count = 0
temp_df = df[:re_ind[0]]
if temp_df['diff_in_min'].sum() > 500:
count += 1
temp_df = df[re_ind[0]:re_ind[1]]
if temp_df['diff_in_min'].sum() > 500:
count += 1
if temp_df = df[re_ind[1]:]
if temp_df['diff_in_min'].sum() > 500:
count += 1
How can I do that using for loop with creating new data frame using existing dataframe?
From sample data for groups created by df['diff_in_min'] >= 300) add cumulative sum, then aggregate sum, compare for another condition and count Trues by sum:
s = (df['diff_in_min'] >= 300).cumsum()
out = (df['diff_in_min'].groupby(s).sum() > 500).sum()
print (out)
2
jezrael's answer is much better and more succinct. However, in keeping with your style of programming, here is another way you could tackle it:
import pandas as pd
df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]
df_list = []
df_list.append(df[df['index']<85])
df_list.append(df[(df['index']>=85) & (df['index'] <512)])
df_list.append(df[df['index']>=512])
count = 0
for temp_df in df_list:
if temp_df['diff_in_min'].sum() > 500:
count += 1
print(f"Count = {count}")
OUTPUT:
Count = 2
Which is exactly what jezrael got, and why my vote goes to them.
I have a dataset where each time point is represented by a set of sparse x and y values. For data storage purposes, if y = 0, that data point is not recorded.
Imagine data point t0:
#Real data
#t0
x0 = [200, 201, 202, 203, 204, 205, 206, 207, ...]
y0 = [5, 10, 0, 7, 0, 0, 15, 20, ...]
#Data stored
#t0
x0 = [200, 201, 203, 206, 207, ...]
y0 = [5, 10, 7, 15, 20, ...]
Now, imagine I have data point t1:
#Data stored
#t1
x1 = [201, 204, 206, 207, ...]
y1 = [10, 15, 3, 20, ...]
Is there a simple and efficient way to rebuild the full dataset for a custom number of data points? Let's say I want a data structure that represents all data contained in t0 + t1:
#t0+t1
M = [[200, 201, 203, 204, 206, 207, ...], # this contains all xs recorded for both t0 and t1
[5, 10, 7, 0, 15, 20, ... ], # y values from t0. Missing values are filled with 0
[0, 10, 0, 15, 3, 20, ...] # y values from t1. Missing values are filled with 0
]
Any help would be really appreciated!
It looks like np.searchsorted is what you are looking for:
m0 = np.unique(x0 + x1) #assuming x0 and x1 are lists
M = np.zeros((3, len(m0)), dtype=int)
M[0] = m0
M[1, np.searchsorted(m0, x0)] = y0
M[2, np.searchsorted(m0, x1)] = y1
>>> M
array([[200, 201, 203, 204, 206, 207],
[ 5, 10, 7, 0, 15, 20],
[ 0, 10, 0, 15, 3, 20]])
I have 3 vectors:
u = np.array([0, 100, 200, 300]) #hundreds
v = np.array([0, 10, 20]) #tens
w = np.array([0, 1]) #units
Then I used np.meshgrid to sum u[i]+v[j],w[k]:
x, y, z = np.meshgrid(u, v, w)
func1 = x + y + z
So, when (i,j,k)=(3,2,1), func1[i, j, k] should return 321, but I only get 321 if I put func1[2, 3, 1].
Why is it asking me for vector v before u? Should I use numpy.ix_ instead?
From the meshgrid docs:
Notes
-----
This function supports both indexing conventions through the indexing
keyword argument. Giving the string 'ij' returns a meshgrid with
matrix indexing, while 'xy' returns a meshgrid with Cartesian indexing.
In the 2-D case with inputs of length M and N, the outputs are of shape
(N, M) for 'xy' indexing and (M, N) for 'ij' indexing. In the 3-D case
with inputs of length M, N and P, outputs are of shape (N, M, P) for
'xy' indexing and (M, N, P) for 'ij' indexing.
In [109]: U,V,W = np.meshgrid(u,v,w, sparse=True)
In [110]: U
Out[110]:
array([[[ 0], # (1,4,1)
[100],
[200],
[300]]])
In [111]: U+V+W
Out[111]:
array([[[ 0, 1],
[100, 101],
[200, 201],
[300, 301]],
[[ 10, 11],
[110, 111],
[210, 211],
[310, 311]],
[[ 20, 21],
[120, 121],
[220, 221],
[320, 321]]])
The result is (3,4,2) array; This is the cartesian case described in the notes.
With the documented indexing change:
In [113]: U,V,W = np.meshgrid(u,v,w, indexing='ij',sparse=True)
In [114]: U.shape
Out[114]: (4, 1, 1)
In [115]: (U+V+W).shape
Out[115]: (4, 3, 2)
Which matches the ix_ that you wanted:
In [116]: U,V,W = np.ix_(u,v,w)
In [117]: (U+V+W).shape
Out[117]: (4, 3, 2)
You are welcome to use either. Or even np.ogrid as mentioned in the docs.
Or even the home-brewed broadcasting:
In [118]: (u[:,None,None]+v[:,None]+w).shape
Out[118]: (4, 3, 2)
Maybe the 2d layout clarifies the two coordinates:
In [119]: Out[111][:,:,0]
Out[119]:
array([[ 0, 100, 200, 300], # u going across, x-axis
[ 10, 110, 210, 310],
[ 20, 120, 220, 320]])
In [120]: (u[:,None,None]+v[:,None]+w)[:,:,0]
Out[120]:
array([[ 0, 10, 20], # u going down - rows
[100, 110, 120],
[200, 210, 220],
[300, 310, 320]])
For your indexing method, you need axis 0 to be the direction of increment of 1s, axis 1 to be for 10s, and axis 2 to be for 100s.
You can just transpose to swap the axes to suit your indexing method -
u = np.array([0, 100, 200, 300]) #hundreds
v = np.array([0, 10, 20, 30]) #tens
w = np.array([0, 1, 2, 3]) #units
x,y,z = np.meshgrid(w,v,u)
func1 = x + y + z
func1 = func1.transpose(2,0,1)
func1
# axis 0 is 1s
#------------------>
array([[[ 0, 1, 2, 3],
[ 10, 11, 12, 13], #
[ 20, 21, 22, 23], # Axis 1 is 10s
[ 30, 31, 32, 33]],
[[100, 101, 102, 103], #
[110, 111, 112, 113], # Axis 2 is 100s
[120, 121, 122, 123], #
[130, 131, 132, 133]],
[[200, 201, 202, 203],
[210, 211, 212, 213],
[220, 221, 222, 223],
[230, 231, 232, 233]],
[[300, 301, 302, 303],
[310, 311, 312, 313],
[320, 321, 322, 323],
[330, 331, 332, 333]]])
Testing this by indexing -
>> func1[2,3,1]
231
>> func1[3,2,1]
321
I have a nested dictionary that looks like this:
dictionary = {time: {pixels: {intensity}}}
len(time) = 65
len(pixels) = 6/time
len(intensity) = 6/pixel
Just to be clear, for 1 time value --> [1,2,3,4,5,6] pixel values --> sub values for each of the pixel values are 6 intensity values.
Example:
dictionary = {time1 : {1: array([i11,i12,i13,i14,i15,i16]), 2: array([i21,i22,i23,i24,i25,i26]), 3: array([i31,i32,i33,i34,i35,i36]), 4: array([i41,i42,i43,i44,i45]), 5: array([i51,i52,i53,i54,i55,i56]), 6: array([i61,i62,i63,i64,i65,i66])}}
My question is,
how do I plot these values (3D plot) with time in the z axis and intensity values and pixel values (since both are length 6) on y and x values respectively?
The following is what I have tried so far and have been unsuccessful:
x = []
y = []
z = []
for i in dictionary:
z1 = i
z.append(z1)
x1 = dictionary[i].keys()
x.append(x1)
y1 = dictionary[i].values()
y.append(y1)
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = Axes3D(fig)
ax.plot(x, y, zs = 0, zdir='z', label='zs=0,zdir=z')
Your y is a list of lists. It is easy to see the mistake, if you use listed for loops.
Corrected code:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
x, y, z = [], [], []
for tim, pixels in dictionary.items():
for pixel, intensities in pixels.items():
for intensity in intensities:
x.append(intensity)
y.append(pixel)
z.append(tim)
fig = plt.figure()
ax = Axes3D(fig)
ax.plot(x, y, z, zdir='z')
ax.show()
Example use:
Using the simple dataset:
{1: {1: array([11, 12, 13, 14, 15, 16]), 2: array([21, 22, 23, 24, 25, 26]),
3: array([31, 32, 33, 34, 35, 36]), 4: array([41, 42, 43, 44, 45]),
5: array([51, 52, 53, 54, 55, 56]), 6: array([61, 62, 63, 64, 65, 66])},
2: {1: array([71, 72, 73, 74, 75, 76]), 2: array([21, 22, 23, 24, 25, 26]),
3: array([31, 32, 33, 34, 35, 36]), 4: array([41, 42, 43, 44, 45]),
5: array([51, 52, 53, 54, 55, 56]), 6: array([61, 62, 63, 64, 65, 66])}}
That would result in:
I need a faster/optimised version of my current code:
import numpy as np
a = np.array((1, 2, 3))
b = np.array((10, 20, 30, 40, 50, 60, 70, 80))
print([i*b for i in a])
Is there any faster way to do this using numpy functions (maybe without reshaping and blowing up the whole thing)?
Looks like the outer product.
>>> np.outer(a, b)
array([[ 10, 20, 30, 40, 50, 60, 70, 80],
[ 20, 40, 60, 80, 100, 120, 140, 160],
[ 30, 60, 90, 120, 150, 180, 210, 240]])