Got a panda's dataframe with below structure:
Index Name Value Other
1 NaN 10 5
2 A 20 2
3 30 3
4 100 12
5 NaN 40 10
6 C 10 1
7 40 10
8 40 10
9 40 10
10 NaN 40 10
11 D 10 1
12 NaN 40 10
...
I need to copy a value from column Name in rows that have it to rows that are below it until it finds NaN or other value? So how do i approach a copying a name A to row 3,4? then C [row 6] copying to row 7,8,9... until NaN/SomeName?
so after running the code it should get a dataframe like it:
Index Name Value Other
1 NaN 10 5
2 A 20 2
3 A 30 3
4 A 100 12
5 NaN 40 10
6 C 10 1
7 C 40 10
8 C 40 10
9 C 40 10
10 NaN 40 10
11 D 10 1
12 NaN 40 10
Just use replace():
df=df.replace('nan',float('NaN'),regex=True)
#for converting string 'nan' to Actual NaN
df['Name']=df['Name'].replace('',method='ffill')
#for forward filling '' values
output:
Index Name Value Other
1 NaN 10 5
2 A 20 2
3 A 30 3
4 A 100 12
5 NaN 40 10
6 C 10 1
7 C 40 10
8 C 40 10
9 C 40 10
10 NaN 40 10
11 D 10 1
12 NaN 40 10
Related
I have a Dataframe like the following:
a b a1 b1
0 1 6 10 20
1 2 7 11 21
2 3 8 12 22
3 4 9 13 23
4 5 2 14 24
where a1 and b1 are dynamically created by a and b. Can we create percentage columns dynamically as well ?
The one thing that is contant is the created columns will have 1 suffixed after the name
Expected output:
a b a1 b1 a% b%
0 0 6 10 20 0 30
1 2 7 11 21 29 33
2 3 8 12 22 38 36
3 4 9 13 23 44 39
4 5 2 14 24 250 8
Create new DataFrame by divide both columns and rename columns by DataFrame.add_suffix, last append to original by DataFrame.join:
cols = ['a','b']
new = [f'{x}1' for x in cols]
df = df.join(df[cols].div(df[new].to_numpy()).mul(100).add_suffix('%'))
print (df)
a b a1 b1 a% b%
0 1 6 10 20 10.000000 30.000000
1 2 7 11 21 18.181818 33.333333
2 3 8 12 22 25.000000 36.363636
3 4 9 13 23 30.769231 39.130435
4 5 2 14 24 35.714286 8.333333
I have a csv file, and I'm trying to convert a column with cumulative values to individual values. I can form most of the column with
df['delta'] = df['expenditure'].diff()
So for each person (A,B..) I want the change in expenditure since they last attended. What which gives me
person days expenditure delta
A 1 10
A 2 24 14
A 10 45 21
B 2 0 -45
B 7 2 2
B 8 10 8
C 5 50 40
C 6 78 28
C 7 90 12
and what I want is
person days expenditure delta
A 1 10 ---> 10
A 2 24 14
A 10 45 21
B 2 0 ---> 0
B 7 2 2
B 8 10 8
C 5 50 ---> 50
C 6 78 28
C 7 90 12
so for each person, I want their lowest day's expenditure value put in delta.
Additionally, if I'm trying to average delta by the days, how would I go about it? That is if I wanted
person days expenditure delta
A 1 10 10
A 2 24 14
A 10 45 21/8
B 2 0 0
B 7 2 2/5
B 8 10 8
So 21/8 is the (change in expenditure)/(change in days) for A
Use DataFrameGroupBy.diff with replace first missing values by original by Series.fillna:
df['delta'] = df.groupby('person')['expenditure'].diff().fillna(df['expenditure'])
print (df)
person days expenditure delta
0 A 1 10 10.0
1 A 2 24 14.0
2 A 10 45 21.0
3 B 2 0 0.0
4 B 7 2 2.0
5 B 8 10 8.0
6 C 5 50 50.0
7 C 6 78 28.0
8 C 7 90 12.0
And for second is possible processing both columns and then divide in DataFrame.eval:
df['delta'] = (df.groupby('person')[['expenditure', 'days']].diff()
.fillna(df[['expenditure','days']])
.eval('expenditure / days'))
What working same like:
df['delta'] = (df.groupby('person')['expenditure'].diff().fillna(df['expenditure'])
.div(df.groupby('person')['days'].diff().fillna(df['days'])))
print (df)
person days expenditure delta
0 A 1 10 10.000
1 A 2 24 14.000
2 A 10 45 2.625
3 B 2 0 0.000
4 B 7 2 0.400
5 B 8 10 8.000
6 C 5 50 10.000
7 C 6 78 28.000
8 C 7 90 12.000
Here is my dataframe:
id_1 id_2 cost id_3 other
0 1 a 30 10 a
1 1 a 30 20 f
2 1 a 30 30 h
3 1 b 60 40 b
4 1 b 60 50 m
5 2 a 10 60 u
6 2 a 10 70 l
7 2 b 8 80 u
8 3 c 15 90 y
9 3 c 15 100 l
10 4 d 8 110 m
11 5 e 5 120 v
I want a groupby(['id_1', 'id_2']), but
Dividing the cost number, which is the same in each line of same group, between each of these lines (for example, dividing 30/3=10 between the three a values).
I would expect something like this:
id_1 id_2 cost id_3 other
0 1 a 10 10 a
1 1 a 10 20 f
2 1 a 10 30 h
3 1 b 30 40 b
4 1 b 30 50 m
5 2 a 5 60 u
6 2 a 5 70 l
7 2 b 8 80 u
8 3 c 7.5 90 y
9 3 c 7.5 100 l
10 4 d 8 110 m
11 5 e 5 120 v
It is a similar question to
this link. But now I want more flexibility in manipulating data inside a group of rows.
How can I proceed?
Thaks!
Let us do transform
df.cost/=df.groupby(['id_1','id_2']).cost.transform('count')
df
id_1 id_2 cost id_3 other
0 1 a 10.0 10 a
1 1 a 10.0 20 f
2 1 a 10.0 30 h
3 1 b 30.0 40 b
4 1 b 30.0 50 m
5 2 a 5.0 60 u
6 2 a 5.0 70 l
7 2 b 8.0 80 u
8 3 c 7.5 90 y
9 3 c 7.5 100 l
10 4 d 8.0 110 m
11 5 e 5.0 120 v
The df looks like below:
A B C
1 8 23
2 8 22
3 8 45
4 9 45
5 6 12
6 8 10
7 11 12
8 9 67
I want to create a new df with the occurence of 8 in 'B' and the next row value of 8.
New df:
The df looks like below:
A B C
1 8 23
2 8 22
3 8 45
4 9 45
6 8 10
7 11 12
Use boolean indexing with compared by shifted values with | for bitwise OR:
df = df[df.B.shift().eq(8) | df.B.eq(8)]
print (df)
A B C
0 1 8 23
1 2 8 22
2 3 8 45
3 4 9 45
5 6 8 10
6 7 11 12
I have a pandas Dataframe like this:
id alt amount
6 b 30
6 a 30
3 d 56
3 a 40
1 c 35
1 b 10
1 a 20
which I would like to be turned into this:
id alt amount
6 d 56
6 c 35
6 b 30
6 a 30
5 d 56
5 c 35
5 b 26
5 a 33.33
4 d 56
4 c 35
4 b 22
4 a 36.66
3 d 56
3 c 35
3 b 18
3 a 40
2 c 35
2 b 14
2 a 30
1 c 35
1 b 10
1 a 20
For each missing id number N that is less than the maximum id number, all alt values for the largest id number smaller than N should be duplicated, with id number set to N for these duplicated rows. If the alt value is repeated for a larger id number, then the additional amount entries should increase by the difference divided by the difference between id values (number of steps.) If the alt value is not repeated, then the amount can simple be copied over for each additional id value.
For instance, a appears with id numbers 1, 3 and 6 and amounts 20, 40, 30 respectively. We need to add an instance of a with an id of 2. The amount in this will be 30, since it takes 2 steps to go from 1 to 3 and we are increasing by 20. Going from 3 to 6 there are 3 steps and we are decreasing by 10. -10/3 = -3.33, so we subtract 3.33 for each new instance of a.
I thought of doing some combination of duplicating, sorting, and forward filling? I'm unsure of the logic here though.
You can do with pivot + reindex then interpolate
yourdf=df.pivot(*df.columns).\
reindex(range(df.id.min(),df.id.max()+1)).\
interpolate(method='index').stack().reset_index()
yourdf
Out[51]:
id alt 0
0 1 a 20.000000
1 1 b 10.000000
2 1 c 35.000000
3 2 a 30.000000
4 2 b 14.000000
5 2 c 35.000000
6 3 a 40.000000
7 3 b 18.000000
8 3 c 35.000000
9 3 d 56.000000
10 4 a 36.666667
11 4 b 22.000000
12 4 c 35.000000
13 4 d 56.000000
14 5 a 33.333333
15 5 b 26.000000
16 5 c 35.000000
17 5 d 56.000000
18 6 a 30.000000
19 6 b 30.000000
20 6 c 35.000000
21 6 d 56.000000