get value from pandas segment and subtract in place - python

i have a table with values similar to
val1 val2 val3 segVal
0 12.3 88.2
20 0 0
50 14.5 88.7
70 0 0
85 0 0
90 18.2 88.9
for my segVal, i need to use the differences from my val1 columns where val2 is known. so my first segment would be zero to 50, i'm subtracting from 0 and applying that to all segVal rows. my next segment is at 90 so i would subtract that from 50 and apply that.
So my output table would be
val1 val2 val3 segVal
0 12.3 88.2 50
20 0 0 50
50 14.5 88.7 50
70 0 0 40
85 0 0 40
90 18.2 88.9 40
my current somewhat working method is
df1 = df[df.val2 != 0]
df1 = df1.copy()
df1.segVal=(df1['val1'].diff(-1))*1
so i'm creating a additional df and calculating the values this way, then merging back the values with the original df.
It seems there has got to be a better way to do this, I mean, my method works, but doesn't' seem too efficient creating additional df's

Here's one way:
df['segVal'] = df.where(df.val2.ne(0)).val1.dropna().diff().reindex(df.index).bfill()
val1 val2 val3 segVal
0 0 12.3 88.2 50.0
1 20 0.0 0.0 50.0
2 50 14.5 88.7 50.0
3 70 0.0 0.0 40.0
4 85 0.0 0.0 40.0
5 90 18.2 88.9 40.0

Related

Pandas read csv, always get 1 column

Edit: To note as well I have already searched for this problem, but nothing has worked for me.
First line of data, 109 different fields for one line:
15/12/2022,13:53:27,Off,0,0.00,19.9,22.6,19.6,1,Normal,Operator,Not Fitted,14,83:04:21,34:23:28,28:04:51,0,0,0,3025,0,3551,3535,3446,240,0,239,0,0,Not Fitted,125.11:37:20,44.23:11:47,0,0,0,0,0,0,0,21,2,0,0,21.8,0.0,0.0,23.2,21,26,34,1,66,133,8,60,5,74.16:01:01,23.02:02:40,0,0,0,0,0,0,0,25,2.8,0,0,21.4,0.0,0.0,22.2,21,24.1,32,2,64,133,8,28,1,122.22:39:33,43.18:38:50,0,0,0,0,0,0,0,23,1.6,0,0,21.4,0.0,0.0,22.5,21.2,24.1,32,2,64,133,8,28,1,No Alarms
So, in this case its comma delimited. But when I try
df = pd.read_csv(path, sep=',', error_bad_lines=False, engine='python')
Or even using different combinations I always get one column out.
16:02:29 On 4554 0.00 23.5 36.8 21.1 1 Normal Operator Not Fitted 14 83:06:30 35:01:19 28:06:27 0 0 0 3025 0 3502 3413 2911 245 0 1579 0 0 Not Fitted 125.13:45:20 45.01:01:51 98 4025 98.3 96 2627 0 0 12 4.4 0 0 27 0.0 0.0 39.1 24.4 39.6 51 0 67 133 9 124 5 74.18:09:01 23.03:52:44 98 4018 98.1 100 2746 0 0 17 5.5 0 0 25.1 0.0 0.0 32.3 23.6 34.6 51 0 67 133 9 124 5 123.00:47:33 43.20:28:54 97 4003 97.8 101 2767 0 0 16 4.6 0 0 25.4 0.0 0.0 32.2 23.9 34.1 51 0 67 133 9 124 5 No Alarms Present
[3944 rows x 1 columns]
Its meant to have about 70+ columns but whatever I do I get the same result.
I am trying to use Pandas to incorporate it into another program which uses it as well.
The library is also up to data along with python.
Any help is appreciated.

Pandas find and interpolate missing value

This question is pretty much a follow up from Pandas pivot or reshape dataframe with NaN
When decoding videos some frames go missing and that data needs to be interpolated
Current df
frame pvol vvol area label
0 NaN 109.8 120 v
2 NaN 160.4 140 v
0 23.1 NaN 110 p
1 24.3 NaN 110 p
2 25.6 NaN 112 p
Expected df
frame pvol vvol p_area v_area
0 23.1 109.8 110 110
1 24.3 135.1 110 111 # Interpolated for label v
2 25.6 160.4 112 120
I know I can do df.interpolate() once the current_df is reshaped for only p frames. The reshaping is the issue.
Note: label p >= label v meaning label p will always have all the frames but v can have missed frames
You can reshape, dropna as in the previous question, except that now you need to specify that you want to drop only empty columns, then interpolate:
out = (df.pivot(index='frame', columns='label')
.dropna(axis=1, how='all') # only drop empty columns
.interpolate() # interpolate
)
out.columns = [f'{y}_{x}' for x,y in out.columns]
Output:
p_pvol v_vvol p_area v_area
frame
0 23.1 109.8 110.0 120.0
1 24.3 135.1 110.0 130.0
2 25.6 160.4 112.0 140.0
Change the dropna remove the issue
s = df.set_index(['frame','label']).unstack().dropna(thresh=1,axis=1)
s.columns = s.columns.map('_'.join)
s = s.interpolate()
Out[279]:
pvol_p vvol_v area_p area_v
frame
0 23.1 109.8 110.0 120.0
1 24.3 135.1 110.0 130.0
2 25.6 160.4 112.0 140.0

Concatenate multiple unequal dataframes on condition

I have 7 dataframes (df_1, df_2, df_3,..., df_7) all with the same columns but different lengths but sometimes have the same values.
I'd like to concatenate all 7 dataframes under the conditions that:
if df_n.iloc[row_i] != df_n+1.iloc[row_i] and df_n.iloc[row_i][0] < df_n+1.iloc[row_i][0]:
pd.concat([df_n.iloc[row_i], df_n+1.iloc[row_i], df_n+2.iloc[row_i],
...., df_n+6.iloc[row_i]])
Where df_n.iloc[row_i] is the ith row of the nth dataframe and df_n.iloc[row_i][0] is the first column of the ith row.
For example if we only had 2 dataframes and that len(df_1) < len(df_2) and if we used the conditions above the input would be:
df_1 df_2
index 0 1 2 index 0 1 2
0 12.12 11.0 31 0 12.2 12.6 30
1 12.3 12.1 33 1 12.3 12.1 33
2 10 9.1 33 2 13 12.1 23
3 16 12.1 33 3 13.1 12.1 27
4 14.4 13.1 27
5 15.2 13.2 28
And the output would be:
conditions -> pd.concat([df_1, df_2]):
index 0 1 2 3 4 5
0 12.12 11.0 31 12.2 12.6 30
2 10 9.1 33 13 12.1 23
4 nan 14.4 13.1 27
5 nan 15.2 13.2 28
Is there an easy way to do this?
IIUC concat first , the groupby by columns get the different , and we just implement your condition
s=pd.concat([df1,df2],1)
s1=s.groupby(level=0,axis=1).apply(lambda x : x.iloc[:,0]-x.iloc[:,1])
yourdf=s[s1.ne(0).any(1)&s1.iloc[:,0].lt(0)|s1.iloc[:,0].isnull()]
Out[487]:
0 1 2 0 1 2
index
0 12.12 11.0 31.0 12.2 12.6 30
2 10.00 9.1 33.0 13.0 12.1 23
4 NaN NaN NaN 14.4 13.1 27
5 NaN NaN NaN 15.2 13.2 28

merging two dataframes together with similar column values [duplicate]

This question already has answers here:
Combine two pandas Data Frames (join on a common column)
(4 answers)
Closed 4 years ago.
I have two dfs, one is longer than the other but they both have one column that contain the same values.
Here is my first df called weather:
DATE AWND PRCP SNOW WT01 WT02 TAVG
0 2017-01-01 5.59 0.00 0.0 NaN NaN 46
1 2017-01-02 9.17 0.21 0.0 1.0 NaN 40
2 2017-01-03 10.74 0.58 0.0 1.0 NaN 42
3 2017-01-04 8.05 0.00 0.0 1.0 NaN 47
4 2017-01-05 7.83 0.00 0.0 NaN NaN 34
Here is my 2nd df called bike:
DATE LENGTH ID AMOUNT
0 2017-01-01 3 1 5
1 2017-01-01 6 2 10
2 2017-01-02 9 3 100
3 2017-01-02 12 4 250
4 2017-01-03 15 5 45
So I want my df to copy over all rows from the weather df based upon the shared DATE column and copy it over.
DATE LENGTH ID AMOUNT AWND SNOW TAVG
0 2017-01-01 3 1 5 5.59 0 46
1 2017-01-01 6 2 10 5.59 0 46
2 2017-01-02 9 3 100 9.17 0 40
3 2017-01-02 12 4 250 9.17 0 40
4 2017-01-03 15 5 45 10.74 0 42
Please help! Maybe some type of join can be used.
Use merge
In [93]: bike.merge(weather[['DATE', 'AWND', 'SNOW', 'TAVG']], on='DATE')
Out[93]:
DATE LENGTH ID AMOUNT AWND SNOW TAVG
0 2017-01-01 3 1 5 5.59 0.0 46
1 2017-01-01 6 2 10 5.59 0.0 46
2 2017-01-02 9 3 100 9.17 0.0 40
3 2017-01-02 12 4 250 9.17 0.0 40
4 2017-01-03 15 5 45 10.74 0.0 42
Just use the same indexes and simple slicing
df2 = df2.set_index('DATE')
df2[['SNOW', 'TAVG']] = df.set_index('DATE')[['SNOW', 'TAVG']]
If you check the pandas docs, they explain all the different types of "merges" (joins) that you can do between two dataframes.
The common syntax for a merge looks like: pd.merge(weather, bike, on= 'DATE')
You can also make the merge more fancy by adding any of the arguments to your merge function that I listed below: (e.g specifying whether your want an inner vs right join)
Here are the arguments the function takes based on the current pandas docs:
pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
Source
Hope it helps!

shifting down rows of specific columns from a specific index in python

I am scraping multiple tables from multiple pages of a website. The issue is there is a row missing from the initial table. Basically, this is how the dataframe looks.
mar2018 feb2018 jan2018 dec2017 nov2017
oct2017 sep2017 aug2017
balls faced 345 561 295 0 645 balls faced 200 58 0
runs scored 156 281 183 0 389 runs scored 50 20 0
strike rate 52.3 42.6 61.1 0 52.2 strike rate 25 34 0
dot balls 223 387 173 0 476 dot balls 125 34 0
fours 8 12 19 0 22 sixes 2 0 0
doubles 20 38 16 0 36 fours 4 2 0
notout 2 0 0 0 4 doubles 2 0 0
notout 4 2 0
the column 'sixes' is missing in the first page and present in the subsequent pages. So, I am trying to move the rows starting from 'fours' to 'not out' to a position down and leave nan's in row 4 for first 5 columns starting from mar2018 to nov2017.
I tried the following code but it isn't working. This is moving the values horizontally but not vertically downward.
df.iloc[4][0:6] = df.iloc[4][0:6].shift(1)
and also
df2 = pd.DataFrame(index = 4)
df = pd.concat([df.iloc[:], df2, df.iloc[4:]]).reset_index(drop=True)
did not work.
df['mar2018'] = df['mar2018'].shift(1)
But this moves all the values of that column down by 1 row.
So, I was wondering if it is possible to shift down rows of specific columns from a specific index?
I think need reindex by union by numpy.union1d of all index values:
idx = np.union1d(df1.index, df2.index)
df1 = df1.reindex(idx)
df2 = df2.reindex(idx)
print (df1)
mar2018 feb2018 jan2018 dec2017 nov2017
balls faced 345.0 561.0 295.0 0.0 645.0
dot balls 223.0 387.0 173.0 0.0 476.0
doubles 20.0 38.0 16.0 0.0 36.0
fours 8.0 12.0 19.0 0.0 22.0
notout 2.0 0.0 0.0 0.0 4.0
runs scored 156.0 281.0 183.0 0.0 389.0
sixes NaN NaN NaN NaN NaN
strike rate 52.3 42.6 61.1 0.0 52.2
print (df2)
oct2017 sep2017 aug2017
balls faced 200 58 0
dot balls 125 34 0
doubles 2 0 0
fours 4 2 0
notout 4 2 0
runs scored 50 20 0
sixes 2 0 0
strike rate 25 34 0
If multiple DataFrames in list is possible use list comprehension:
from functools import reduce
dfs = [df1, df2]
idx = reduce(np.union1d, [x.index for x in dfs])
dfs1 = [df.reindex(idx) for df in dfs]
print (dfs1)
[ mar2018 feb2018 jan2018 dec2017 nov2017
balls faced 345.0 561.0 295.0 0.0 645.0
dot balls 223.0 387.0 173.0 0.0 476.0
doubles 20.0 38.0 16.0 0.0 36.0
fours 8.0 12.0 19.0 0.0 22.0
notout 2.0 0.0 0.0 0.0 4.0
runs scored 156.0 281.0 183.0 0.0 389.0
sixes NaN NaN NaN NaN NaN
strike rate 52.3 42.6 61.1 0.0 52.2, oct2017 sep2017 aug2017
balls faced 200 58 0
dot balls 125 34 0
doubles 2 0 0
fours 4 2 0
notout 4 2 0
runs scored 50 20 0
sixes 2 0 0
strike rate 25 34 0]

Categories

Resources