How to sum previous and next row values in pandas - python

After trying several hours i am still not able to perform the following task.
I would like to sum to my center points the previous and next row value as shown in the image below
Can you please provide me with an example how that can be done?
Thank you in advance for your time!

You can also use DF.rolling.sum() by providing center=True (Since by default the labels are set to the right edge of the window) and then take every third slice from it. Additionally, you can set the minimum number of observations, min_periods to be equal to 1 which basically says no output values will be set until at least min_periods non-null values are encountered.
df.A.rolling(window=3, min_periods=1, center=True).sum().iloc[::3].astype(int)
1 30
4 120
7 210
10 190
Name: A, dtype: int32

This will get it done
df = pd.DataFrame(dict(A=np.arange(10, 101, 10)), np.arange(1, 11))
pd.Series(np.convolve(df.A.values, [1, 1, 1])[1::3], df.index[0::3])
1 30
4 120
7 210
10 190
dtype: int64
​

Related

How to get the sum of the last 15 values from a pandas dataframe

So I'm slicing my timeseries data, but for some of the columns, I need to be able to have the sum of the elements the were sliced. For example if you had
s = pd.Series([10, 30, 21, 18])
s = s[::-2]
I need to get the sum of a range of elements in this situation so I would need
3 39
1 40
as the output. I've see things like .cumsum() but I can't find anything to sum a range of elements
I'm not quite understand what's the first column represent. But the second column seems to be the sum result.
If you have got the correct slice, it's easy to get sum with sum(), like this:
import numpy as np
import pandas as pd
data = np.arange(0, 10).reshape(-1, 2)
pd.DataFrame(data).iloc[2:].sum(axis=1)
Output is :
2 9
3 13
4 17
dtype: int64
The answer based only in your title would be df[-15:].sum(), but it seems you're looking for performing a calculation per group of slicing.
To address this problem, pandas provides the window utilities. So, you can simply do:
s = pd.Series([10, 30, 21, 18])
s.rolling(2).sum()[::-2].astype(int)
which returns:
3 39
1 40
dtype: int64
Also, it's escalable, once you can replace 2 by any other value, and the .rolling method also works in dataframe objects.

how re-scale a range of ratio values, to start from 1 rather then 0, without losing statics significance

I'm working on a forecast formula for subscriptions.
First broken down subscriptions by week.
Second Grouped subscriptions by week.
Third Found the ratio by week.
-
The roadblock that I'm facing is..
The ratio range goes from 0.56 to 5.54
it need to be ratio => 1
So when I multiply the Actual subscriptions by ratio like:
df = pd.DataFrame({ "Weeks" : [1,2,3,4,5,6,7,8] ,
"Subscription" : [203,150,120,80,15,13,5,1] })
df["ratio"] = ((df.Subscription_*100) / (df.Subscription_.sum()) )
## So for example:
Actual_value = 100
# if Actual value multiply by number smaller then 1 such **0.56**
# Foretasted value will be smaller then Actual value, but it should be => Equal or Bigger then Actual.
How normalize this ratio value in such way that is will begins with range 1 to whatever
without losing the statistic significance ??
There is a way for your output
(df.Subscription-df.Subscription.min())/np.ptp(df.Subscription)
0 1.000000
1 0.737624
2 0.589109
3 0.391089
4 0.069307
5 0.059406
6 0.019802
7 0.000000
Name: Subscription, dtype: float64

Scaling numbers within a dataframe column to the same proportion

I have a series of numbers of two different magnitudes in a dataframe column. They are
0 154480.429000
1 154.480844
2 154480.433000
3 154.480844
4 154480.433000
......
As we can see that above, I am not sure how to set a condition to scale the small number 154.480844 to have the same order of magnitude as the large one 154480.433000 in dataframe.
How can this be done efficiently with pandas?
Use np.log10 to determine the scaling factor required. Something like this:
v = np.log10(ser).astype(int)
ser * 10 ** (v.max() - v).values
0 154480.429
1 154480.844
2 154480.433
3 154480.844
4 154480.433
Name: 1, dtype: float64

Pandas DataFrame with Function: Columns Varying

Given the following DataFrame:
import pandas as pd
import numpy as np
d=pd.DataFrame({' Label':['a','a','b','b'],'Count1':[10,20,30,40],'Count2':[20,45,10,35],
'Count3':[40,30,np.nan,22],'Nobs1':[30,30,70,70],'Nobs2':[65,65,45,45],
'Nobs3':[70,70,22,32]})
d
Label Count1 Count2 Count3 Nobs1 Nobs2 Nobs3
0 a 10 20 40.0 30 65 70
1 a 20 45 30.0 30 65 70
2 b 30 10 NaN 70 45 22
3 b 40 35 22.0 70 45 32
I would like to apply the z test for proportions on each combination of column groups (1 and 2, 1 and 3, 2 and 3) per row. By column group, I mean, for example, "Count1" and "Nobs1".
For example, one such test would be:
count = np.array([10, 20]) #from first row of Count1 and Count2, respectively
nobs = np.array([30, 65]) #from first row of Nobs1 and Nobs2, respectively
pv = proportions_ztest(count=count,nobs=nobs,value=0,alternative='two-sided')[1] #this returns just the p-value, which is of interest
pv
0.80265091465415639
I would want the result (pv) to go into a new column (first row) called "p_1_2" or something logical that corresponds to its respective columns.
In summary, here are the challenges I'm facing:
How to apply this per row.
...for each paired combination, mentioned above.
...where the column names and number of pairs of "Count" and "Nobs" columns may vary (assuming that there will always be a "Nobs" column for each "Count" column).
Related to 3: For example, I might have a column called "18-24" and another called "18-24_Nobs".
Thanks in advance!
To 1) and 2) for one test, additional tests can be coded similar or within an additonal loop
for i,row in d.iterrows():
d.loc[i,'test'] = proportions_ztest(count=row['Count1':'Count2'].values,
nobs=row['Nobs1':'Nobs2'].values,
value=0,alternative='two-sided')[1]
for 3) it should be possible the handle these case with pure python inside the loop

pandas histogram/barplot on categorical index and axis

I have this series:
data:
0 17
1 25
2 10
3 60
4 0
5 20
6 300
7 50
8 10
9 80
10 100
11 65
12 125
13 50
14 100
15 150
Name: 1, dtype: int64
I wanted to plot an histogram with variable bin size, so I made this:
filter_values = [0,25,50,60,75,100,150,200,250,300,350]
out = pd.cut(data."1", bins = filter_values)
counts = pd.value_counts(out)
print(counts)
My problem is that when I use counts.plot(kind="hist"), i have not the good label for x axis. I only get them by using a bargraph instead counts.plot(kind="bar"), but I can't get the right order then.
I tried to use xticks=counts.index.values[0] but it makes an error, and xticks=filter_values give an odd figure shape as the numbers go far beyond what the plot understand the bins to be.
I also tried counts.hist(), data.hist(), and counts.plot.hist without success.
I don't know how to plot correctly the categorical data from counts (it includes as index a pandas categorical index) so, I don't know which process I should apply, if there is a way to plot variable bins directly in data.hist() or data.plot(kind="hist") or data.plot.hist(), or if I am right to build counts, but then how to represent this correctly (with good labels on the xaxis and the right order, not a descending one as in the bar graph.

Categories

Resources