I want to calculate the variance of delay arrivals between signals. Each time a signal comes, a timestamp is registered in the 'time' field of the Logs table of my SQLite database. So I solve the problem the following way:
cursor.execute('SELECT time FROM Logs')
rows = cursor.fetchall()
x = numpy.array(rows[:-1])
y = numpy.array(rows[1:])
z = y - x
print "Var = ", z.var()
That gives me the correct value. But... the solution uses two numpy arrays (z stores the delay between one signal and the previous, to be sure: len(z) = len(y)-1 ). I wonder if there is a "numpy" elegant way to do this with only one array, and without iterate over all rows.
I think you're looking for the np.diff function.
import numpy as np
# example data
rows = np.r_[:10]
z = rows[1:] - rows[:-1]
print(z)
#[1 1 1 1 1 1 1 1 1]
z = np.diff(rows)
print(z)
#[1 1 1 1 1 1 1 1 1]
Related
Before start to tell my problem sorry for my grammar and English is not very good. I'm a Python learner. Today i was working on a project but I have a trouble. I'm trying to make a loop.
coordinates = [[1,2],[2,3],[3,5],[5,6],[7,7`],[1,2]]
Here is my list, I'm trying to create a loop. That loop will substract every first values from each others and every seconds to seconds then print. Let me explain my trouble more simple. [[x,y][x1,y1][x2,y2] I need to substract x1-x then print the result after this x2-x1 then print the result but same time y1-y print then print so console output should looks like this;
1,1
1,2
2,1...
Method i've tried
while True:
for x,y in coordinates:
x = x - y
print(x)
This is not worked because it substracts x values to y values. I know it's too wrong.
I've research on internet but i did not understand this subject very well.
I'm looking for help. Thanks everyone.
A simple and naive implementation
def pr(arr):
i = 1
while i < len(arr):
(x,y) = arr[i]
(a,b) = arr[i-1]
print(x-a, y-b)
i += 1
if __name__ == '__main__':
coordinates = [[1,2],[2,3],[3,5],[5,6],[7,7],[1,2]]
pr(coordinates)
O/P:
1 1
1 2
2 1
2 1
-6 -5
This is fairly similar to your original code:
coordinates = [[1,2],[2,3],[3,5],[5,6],[7,7`],[1,2]]
x_prev = None
for x, y in coordinates:
if x_prev is not None:
print('{}, {}'.format(x - x_prev, y - y_prev)
x_prev, y_prev = x, y
If you want to generalize a bit, for different lengths of coordinates, you could do this:
coordinates = [[1,2],[2,3],[3,5],[5,6],[7,7`],[1,2]]
prev = None
for c in coordinates:
if prev is not None:
print(', '.join(c2-c1 for c1, c2 in zip(prev, c)))
prev = c
You need to iterate over the list using range function so that you can get current and next ones together. So you can do the subtraction in the loop.
coordinates = [[1,2],[2,3],[3,5],[5,6],[7,7],[1,2]]
for i in range(len(coordinates) - 1):
print(coordinates[i+1][0] - coordinates[i][0], coordinates[i+1][1] - coordinates[i][1])
If i have dataframe with column x.
I want to make a new column x_new but I want the first row of this new column to be set to a specific number (let say -2).
Then from 2nd row, use the previous row to iterate through the cx function
data = {'x':[1,2,3,4,5]}
df=pd.DataFrame(data)
def cx(x):
if df.loc[1,'x_new']==0:
df.loc[1,'x_new']= -2
else:
x_new = -10*x + 2
return x_new
df['x_new']=(cx(df['x']))
The final dataframe
I am not sure on how to do this.
Thank you for your help
This is what i have so far:
data = {'depth':[1,2,3,4,5]}
df=pd.DataFrame(data)
df
# calculate equation
def depth_cal(d):
z = -3*d+1 #d must be previous row
return z
depth_cal=(depth_cal(df['depth'])) # how to set d as previous row
print (depth_cal)
depth_new =[]
for row in df['depth']:
if row == 1:
depth_new.append('-5.63')
else:
depth_new.append(depth_cal) #Does not put list in a column
df['Depth_correct']= depth_new
correct output:
There is still two problem with this:
1. it does not put the depth_cal list properly in column
2. in the depth_cal function, i want d to be the previous row
Thank you
I would do this by just using a loop to generate your new data - might not be ideal if particularly huge but it's a quick operation. Let me know how you get on with this:
data = {'depth':[1,2,3,4,5]}
df=pd.DataFrame(data)
res = data['depth']
res[0] = -5.63
for i in range(1, len(res)):
res[i] = -3 * res[i-1] + 1
df['new_depth'] = res
print(df)
To get
depth new_depth
0 1 -5.63
1 2 17.89
2 3 -52.67
3 4 159.01
4 5 -476.03
I have a temporal vector as in the following image:
Numpy vector:
https://drive.google.com/file/d/0B4Jac-wNMDxHS3BnUzBoUkdmOGs/view?usp=sharing
I would like to know an efficient way to split the vector in numpy, and extract the 5 chunks of the signals that drop in amplitude significantly.
I could separate them by considering the amplitude 2.302 as the cut off amplitude and separate them by the initial index when the signal drops bellow this value and the final index when the signal goes above this value.
Any efficient way to do this in numpy?
So I've programmed the solution in pure python and lists:
vec = np.load('vector_numpy.npy')
# plt.plot(vec)
# plt.show()
print vec.shape
temporal_vec = []
flag = 0
flag_start = 0
flag_end = 0
all_vectors = []
all_index = []
count = -1
for element in vec:
count = count+1
#print element
if element < 2.302:
if flag_start ==0:
all_index.append(count)
flag_start=1
temporal_vec.append(element)
flag = 1
if flag == 1:
if element >= 2.302:
if flag_start==1:
all_index.append(count)
flag_start=0
all_vectors.append(temporal_vec)
temporal_vec = []
flag = 0
print(all_vectors)
for element in all_vectors:
print(len(element))
plt.plot(element)
plt.show()
print(all_index)
Any fancier way in Numpy or better/shorter python code?
I have 2 timeseries of binary "signals", let's call them "entry" and "stay".
Entry==1 means add 1 to current state (for some maximum amount of time) and stay==0 means set current state to 0.
entry:
0
1
1
0
1
0
stay:
1
1
1
1
0
1
My code now calculates a combined current state:
state:
0
1
2
2
0
1
Currently I use the following code, unfortunately it's (depending on the max-time) quite slow (state/stay/entry are Pandas time series):
state=copy.deepcopy(entry)
state[stay==0]=0
#first iteration
state[(entry.shift(1)==1) & (stay==1)]+=1
#2nd iteration to max time
for lag in range(2,max_time+1):
state[(entry.shift(lag)==1) & (pd.rolling_mean(stay,lag)==1)]+=1
Any idea how to vectorize this code for better performance? Many thanks!
Finally found a solution now, using some NumPy functions:
def calc_state_series(entry,stay, max_time=5):
reduce=(copy.deepcopy(entry)*0).fillna(0) #just for initalization
reduce[(entry.shift(max_time)==1) & (pd.rolling_mean(stay,max_time)==1)]-=1
entry=(entry+stay.shift(1)).fillna(0) #reduce state after max_time
x=entry.values
x = np.concatenate(([0], x))
y=stay.values
y=np.concatenate(([0], y))
nans = y==0
x = np.array(x)
x[nans] = 0
reset_idx = np.zeros(len(x), dtype=int)
reset_idx[nans] = np.arange(len(x))[nans]
reset_idx = np.maximum.accumulate(reset_idx)
cumsum = np.cumsum(x)
cumsum = cumsum - cumsum[reset_idx]
return pd.Series(cumsum[1:], index=entry.index)
I manage to avoid the loop and this solution is (depending on max_time) up to 100x faster for me - but there is probably still potential for further optimization.
Dataframe.resample() works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method?
I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:
df.iloc[::5, :]
Though #chrisb's accepted answer does answer the question, I would like to add to it the following.
A simple method I use to get the nth data or drop the nth row is the following:
df1 = df[df.index % 3 != 0] # Excludes every 3rd row starting from 0
df2 = df[df.index % 3 == 0] # Selects every 3rd raw starting from 0
This arithmetic based sampling has the ability to enable even more complex row-selections.
This assumes, of course, that you have an index column of ordered, consecutive, integers starting at 0.
There is an even simpler solution to the accepted answer that involves directly invoking df.__getitem__.
df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
For example, to get every 2 rows, you can do
df[::2]
a b c
0 x x x
2 x x x
4 x x x
There's also GroupBy.first/GroupBy.head, you group on the index:
df.index // 2
# Int64Index([0, 0, 1, 1, 2], dtype='int64')
df.groupby(df.index // 2).first()
# Alternatively,
# df.groupby(df.index // 2).head(1)
a b c
0 x x x
1 x x x
2 x x x
The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do
# df.groupby(np.arange(len(df)) // 2).first()
df.groupby(pd.RangeIndex(len(df)) // 2).first()
a b c
0 x x x
1 x x x
2 x x x
Adding reset_index() to metastableB's answer allows you to only need to assume that the rows are ordered and consecutive.
df1 = df[df.reset_index().index % 3 != 0] # Excludes every 3rd row starting from 0
df2 = df[df.reset_index().index % 3 == 0] # Selects every 3rd row starting from 0
df.reset_index().index will create an index that starts at 0 and increments by 1, allowing you to use the modulo easily.
I had a similar requirement, but I wanted the n'th item in a particular group. This is how I solved it.
groups = data.groupby(['group_key'])
selection = groups['index_col'].apply(lambda x: x % 3 == 0)
subset = data[selection]
A solution I came up with when using the index was not viable ( possibly the multi-Gig .csv was too large, or I missed some technique that would allow me to reindex without crashing ).
Walk through one row at a time and add the nth row to a new dataframe.
import pandas as pd
from csv import DictReader
def make_downsampled_df(filename, interval):
with open(filename, 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
column_names = csv_dict_reader.fieldnames
df = pd.DataFrame(columns=column_names)
for index, row in enumerate(csv_dict_reader):
if index % interval == 0:
print(str(row))
df = df.append(row, ignore_index=True)
return df
df.drop(labels=df[df.index % 3 != 0].index, axis=0) # every 3rd row (mod 3)