I have the following dataframe:
I want to verify if the value of a cell is 0 for any date. If it is, I want to replace the value of the cell by multiplying the value on the previous cell by the proper multiplier.
For example, Day 14 = 0, I want to multiply Day 7 by Mul 14 and store the new value in Day 14. And so on with the whole dataframe.
I have tried this code but it is not working:
if df['day 30'] == 0.00:
df['day 30'] = df['day 14']*df['Mul 30']
And this is my expected output:
Thanks!
Here is a solution with small example:
import pandas as pd
import numpy as np
df=pd.DataFrame([[0.8,0.9,0.7,2,6],[0.6,0,0,2,3],[0.2,0,0,4,2]],columns=["Day 7","Day 14","Day 30","Mul 14","Mul 30"])
print(df)
df["Day 14"]=np.where(df["Day 14"]==0,df["Day 7"]*df["Mul 14"],df["Day 14"])
df["Day 30"]=np.where(df["Day 30"]==0,df["Day 14"]*df["Mul 30"],df["Day 30"])
print(df)
If you want ypu can iterate over [7,14,10,90] instead of writing individual lines.
Result of above code:
Day 7 Day 14 Day 30 Mul 14 Mul 30
0 0.8 0.9 0.7 2 6
1 0.6 0.0 0.0 2 3
2 0.2 0.0 0.0 4 2
Day 7 Day 14 Day 30 Mul 14 Mul 30
0 0.8 0.9 0.7 2 6
1 0.6 1.2 3.6 2 3
2 0.2 0.8 1.6 4 2
Related
I have a sensor. For some reasons, the sensor like to record data like this:
>df
obs count
-0.3 3
0.9 2
1.4 5
i.e. it first records observations and make a count table out of it. What I would like to do it convert this df into a series with raw observations. For example, I would like to end up with: [-0.3,-0.3,-0.3,0.9,0.9,1.4,1.4 ....]
Similar question asked for excel.
If your dataframe structure is like this one (or similar):
obs count
0 -0.3 3
1 0.9 2
2 1.4 5
This is an option, using numpy.repeat:
import numpy as np
times = df['count']
df2['obs'] = np.concatenate([np.repeat(df['obs'],times)])
print(df2)
obs
0 -0.3
1 -0.3
2 -0.3
3 0.9
4 0.9
5 1.4
6 1.4
7 1.4
8 1.4
9 1.4
I have a dataframe with multiple NaN values. I want to fill each with a random number between 0,1. I tried fillna but that fills the code with just one value.
We can use itterows but it consumes a lot of resources. Is there any way else we can do it and if yes then how? The following is an example of my dataframe.
> df
a b c d
0 1 10 na na
1 2 20 40 30
2 24 na na na
expected output
> df
a b c d
0 1 10 0.7 0.9
1 2 20 40 30
2 24 0.9 0.34 0.532
basically replacing na anything between (0,1)
You can create your own formula along with random number:
In below solution, I am multiplying column a with random number and taking only fractions as you want number between 0 to 1.
import pandas as pd
import numpy as np
import random
df = pd.DataFrame({'a':[1,2,24], 'b':[10,20, np.nan],'c':[np.nan,40,np.nan],'d':[np.nan,30,np.nan]})
for c in df.columns:
df[c] = np.where(df[c].isnull(),(df['a']*random.random())%1,df[c])
print(df)
Output:
a b c d
0 1.0 10.000000 0.526793 0.678061
1 2.0 20.000000 40.000000 30.000000
2 24.0 0.865441 0.643032 0.273461
Good Morning, (bad beginner)
I have the following pandas dataframe:
My goal is to take the firs time a new ID appears and let the VALUE COLUMN be 1000* DELTA of that row. for all consecutive rows of that ID, the VALUE is the VALUE of the row above * the DELTA of the current row.
I tried by getting all unique ID values:
a=stocks2.ID.unique()
a.tolist()
It works, unfortunately I do not really know how to iterate in the way I described. Any kind of help or tip would be greatly appreciated!
A way to do it would be as follows. Example dataframe:
df = pd.DataFrame({'ID':[1,1,5,3,3], 'delta':[0.3,0.5,0.2,2,4]}).assign(value=[2,5,4,2,3])
print(df)
ID delta value
0 1 0.3 2
1 1 0.5 5
2 5 0.2 4
3 3 2.0 2
4 3 4.0 3
Fill value from the row above as:
df['value'] = df.shift(1).delta * df.shift(1).value
Groupby to get the indices where the first ID appears:
w = df.groupby('ID', as_index=False).nth(0).index.values
And compute the values for value using the indices in w:
df.loc[w,'value'] = df.loc[w,'delta'] * 1000
Which gives for this example:
ID delta value
0 1 0.3 300.0
1 1 0.5 0.6
2 5 0.2 200.0
3 3 2.0 2000.0
4 3 4.0 4.0
Having the following Data Frame:
name value count total_count
0 A 0 1 20
1 A 1 2 20
2 A 2 2 20
3 A 3 2 20
4 A 4 3 20
5 A 5 3 20
6 A 6 2 20
7 A 7 2 20
8 A 8 2 20
9 A 9 1 20
----------------------------------
10 B 0 10 75
11 B 5 30 75
12 B 6 20 75
13 B 8 10 75
14 B 9 5 75
I would like to pivot the data, grouping each row by the name value, then create columns based on the value & count columns aggregated into bins.
Explanation: I have 10 possible values, range 0-9, not all the values are present in each group. In the above example group B is missing values 1,2,3,4,7. I would like to create an histogram with 5 bins, ignore missing values and calculate the percentage of count for each bin. So the result will look like so:
name 0-1 2-3 4-5 6-7 8-9
0 A 0.150000 0.2 0.3 0.2 0.150000
1 B 0.133333 0.0 0.4 0.4 0.066667
For example for bin 0-1 of group A the calculation is the sum of count for the values 0,1 (1+2) divided by the total_count of group A
name 0-1
0 A (1+2)/20 = 0.15
I was looking into hist method and this StackOverflow question, but still struggling with figuring out what is the right approach.
Use pd.cut to bin your feature, then use a df.groupby().count() and the .unstack() method to get the dataframe you are looking for. During the group by you can use any aggregation function (.sum(), .count(), etc) to get the results you are looking for. The code below works if you are looking for an example.
import pandas as pd
import numpy as np
df = pd.DataFrame(
data ={'name': ['Group A','Group B']*5,
'number': np.arange(0,10),
'value': np.arange(30,40)})
df['number_bin'] = pd.cut(df['number'], bins=np.arange(0,10))
# Option 1: Sums
df.groupby(['number_bin','name'])['value'].sum().unstack(0)
# Options 2: Counts
df.groupby(['number_bin','name'])['value'].count().unstack(0)
The null values in the original data will not affect the result.
To get the exact result you could try this.
bins=range(10)
res = df.groupby('name')['count'].sum()
intervals = pd.cut(df.value, bins=bins, include_lowest=True)
df1 = (df.groupby([intervals,"name"])['count'].sum()/res).unstack(0)
df1.columns = df1.columns.astype(str) # convert the cols to string
df1.columns = ['a','b','c','d','e','f','g','h','i'] # rename the cols
cols = ['a',"b","d","f","h"]
df1 = df1.add(df1.iloc[:,1:].shift(-1, axis=1), fill_value=0)[cols]
print(df1)
You can manually rename the cols later.
# Output:
a b d f h
name
A 0.150000 0.2 0.3 0.200000 0.15
B 0.133333 NaN 0.4 0.266667 0.20
You can replace the NaN values using df1.fillna("0.0")
I have a pandas data frame df1
Time sat1 sat2 sat3 sat4 val1 val2 val3 val4
10 2 4 2 4 0.1 -1.0 1 2.0
20 3 1 1 3 1.6 0 2.1 -0.7
30 12 8 8 16 0.5 1.1 0.6 2.0
40 2 1 2 12 1.0 1.2 0.4 3.7
I want to compare sat1,sat2 with sat3 and sat4 at all time instant.
If there is match between these two columns ,I want to get number of matched
elements and subtract matched elements values columns.
Expected Output:
match_count Reslt_1 Reslt_2
2 val1-val3 val2-val4
2 val1-val4 val2-val3
1 Nan val2-val3
1 val1-val3 Nan ( w.r.t match found in sat1 or sat2)
These data are sample data and number of columns may increase . Data in sat1,sat2 are toggling in sat3 & sat4 and that is why subtraction will happen accordingly.
How can I obtain above expected output using pandas. I obtained above dataframe
using pandas concat function.