This question already has answers here:
Conditional Replace Pandas
(7 answers)
Closed 2 years ago.
I want to convert all values < 100 to 0 in column ODOMETER_FW. I have below DF:
When I use pandas:
stupid_values = fuel_GB['ODOMETER_FW'].replace(fuel_GB['ODOMETER_FW']<100,0)
fuel_GB['ODOMETER_FW'] = stupid_values
fuel_GB.head(13)
And the result as you can see, has some error and I really do not know why.
Use lambda function to convert values less than 100 to 0:
df['ODOMETER_FW'] = df['ODOMETER_FW'].apply(lambda x: 0 if x <100 else x)
print(df)
ODOMETER_FW
0 11833
1 0
2 9080
3 8878
4 0
5 14578
6 14351
7 0
8 13456
9 0
10 0
11 0
12 0
Just ask the modification for the relevant lines:
fuel_GB.loc[fuel_GB['ODOMETER_FW'] < 100, 'ODOMETER_FW'] = 0
Use this pandas code:
fuel_GB[fuel_GB['ODOMETER_FW'] < 100] = 0
Related
I have a sales data, the data has columns including
sales_2000, sales_2001, sales_2002...sales_2020
I am trying to extract rows that have following features:
first 8 years have zero value
On the 9th year, it has value larger than 0.
Any suggestions on how to code this using pandas?
I have simplified this problem for my answer because I didn't want to do that much typing. In the future, please provide actual sample data and any code you may have already tried in order to solve this problem. That being said, here is how you could find rows that have the first two years equal to 0 and the third greater than using slices:
In:
import pandas as pd
df = pd.DataFrame(dict(
sales_2000 = [1,0,10,0,5,0],
sales_2001 = [2,0,8,1,0,0],
sales_2002 = [1,2,3,0,0,4],
))
print(f'Orignal DataFrame:\n{df}')
df_extracted = df[
(
df['sales_2000'] == 0
) & (
df['sales_2001'] == 0
) & (
df['sales_2002'] > 0
)
]
print(f'\nExtracted DataFrame:\n{df_extracted}')
Out:
Orignal DataFrame:
sales_2000 sales_2001 sales_2002
0 1 2 1
1 0 0 2
2 10 8 3
3 0 1 0
4 5 0 0
5 0 0 4
Extracted DataFrame:
sales_2000 sales_2001 sales_2002
1 0 0 2
5 0 0 4
The key here is to wrap each condition inside of Round brackets (condition) and use the & operator to combine each condition. The and key word will not work here. Python Tutor Link to Example Code
I am a beginner and this is my first project.. I searched for the answer but it still isn't clear.
I have imported a worksheet from excel using Pandas..
**Rabbit Class:
Num Behavior Speaking Listening
0 1 3 1 1
1 2 1 1 1
2 3 3 1 1
3 4 1 1 1
4 5 3 2 2
5 6 3 2 3
6 7 3 3 1
7 8 3 3 3
8 9 2 3 2
What I want to do is create if functions.. ex. if a student's behavior is a "1" I want it to print one string, else print a different string. How can I reference a particular cell of the worksheet to set up such a function? I tried: val = df.at(1, "Behavior") but that clearly isn't working..
Here is the code I have so far..
import os
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
path = r"C:\Users\USER\Desktop\Python\rabbit_class.xls"
print("Rabbit Class:")
print(df)
Also you can do
dff = df.loc[df['Behavior']==1]
if(not(dff.empty)):
# do Something
What you want is to find rows where df.Behavior is equal to 1. Use any of the following three methods.
# Method-1
df[df["Behavior"]==1]
# Method-2
df.loc[df["Behavior"]==1]
# Method-3
df.query("Behavior==1")
Output:
Num Behavior Speaking Listening LastColumn
0 0 1 3 1 1
Note: Dummy Data
Your sample data does not have a column header (the last one). So I named it LastColumn and read-in the data as a dataframe.
# Dummy Data
s = """
Num Behavior Speaking Listening LastColumn
0 1 3 1 1
1 2 1 1 1
2 3 3 1 1
3 4 1 1 1
4 5 3 2 2
5 6 3 2 3
6 7 3 3 1
7 8 3 3 3
8 9 2 3 2
"""
# Make Dataframe
ss = re.sub('\s+',',',s)
ss = ss[1:-1]
sa = np.array(ss.split(',')).reshape(-1,5)
df = pd.DataFrame(dict((k,v) for k,v in zip(sa[0,:], sa[1:,].T)))
df = df.astype(int)
df
Hope below example will help you
import pandas as pd
df = pd.read_excel(r"D:\test_stackoverflow.xlsx")
print(df.columns)
def _filter(col, filter_):
return df[df[col]==filter_]
print(_filter('Behavior', 1))
Thank you all for your answers. I finally figured out what I was trying to do using the following code:
i = 0
for i in df.index:
student_number = df["Student Number"][i]
print(student_number)
student_name = student_list[int(student_number) - 1]
behavior = df["Behavior"][i]
if behavior == 1:
print("%s's behavior is good" % student_name)
elif behavior == 2:
print ("%s's behavior is average." % student_name)
else:
print ("%s's behavior is poor" % student_name)
speaking = df["Speaking"][i]
This question already has answers here:
Merge dataframes in a dictionary
(2 answers)
Closed 4 years ago.
I have a dictionary with 120 keys, each of which contains one data frame. How can I programmatically concatenate (using pd.concat) these into a single, large data frame?
Simply concatenate the values of your dictionary using pd.concat(d.values()):
>>> d = {1:pd.DataFrame(np.random.random((5,5))),2:pd.DataFrame(np.random.random((5,5)))}
>>> d
{1: 0 1 2 3 4
0 0.319556 0.540776 0.988554 0.775070 0.535067
1 0.383192 0.379474 0.204998 0.948605 0.785190
2 0.006732 0.362755 0.537260 0.854110 0.409386
3 0.795973 0.073652 0.796565 0.168206 0.814202
4 0.531702 0.524501 0.002366 0.631852 0.024509,
2: 0 1 2 3 4
0 0.369098 0.125491 0.832362 0.183199 0.729110
1 0.069843 0.337424 0.476837 0.078589 0.489447
2 0.504904 0.456996 0.239802 0.025953 0.609697
3 0.262001 0.646389 0.992928 0.124552 0.878561
4 0.707881 0.520429 0.609257 0.018488 0.167053}
>>> new_df = pd.concat(d.values())
>>> new_df
0 1 2 3 4
0 0.319556 0.540776 0.988554 0.775070 0.535067
1 0.383192 0.379474 0.204998 0.948605 0.785190
2 0.006732 0.362755 0.537260 0.854110 0.409386
3 0.795973 0.073652 0.796565 0.168206 0.814202
4 0.531702 0.524501 0.002366 0.631852 0.024509
0 0.369098 0.125491 0.832362 0.183199 0.729110
1 0.069843 0.337424 0.476837 0.078589 0.489447
2 0.504904 0.456996 0.239802 0.025953 0.609697
3 0.262001 0.646389 0.992928 0.124552 0.878561
4 0.707881 0.520429 0.609257 0.018488 0.167053
This question already has answers here:
How to turn a float number like 293.4662543 into 293.47 in python?
(8 answers)
Closed 7 years ago.
I'm trying to convert poloar into x-y panel:
df['x']=df.apply(lambda x: x['speed'] * math.cos(math.radians(x['degree'])),axis=1)
df['y']=df.apply(lambda x: x['speed'] * math.sin(math.radians(x['degree'])),axis=1)
df.head()
This produces
The problem is that the x is too long, how can I make it shorter?
I find in How to turn a float number like 293.4662543 into 293.47 in python?, I can do "%.2f" % 1.2399, but if this is a good approach?
Actually, np.round works well in this case
> from pandas import DataFrame
> import numpy as np
> a = DataFrame(np.random.normal(size=10).reshape((5,2)))
0 1
0 -1.444689 -0.991011
1 1.054962 -0.288084
2 -0.700032 -0.604181
3 0.693142 2.281788
4 -1.647281 -1.309406
> np.round(a,2)
0 1
0 -1.44 -0.99
1 1.05 -0.29
2 -0.70 -0.60
3 0.69 2.28
4 -1.65 -1.31
you can also round an individual column by simply overwriting with rounded values:
> a[1] = np.round(a[1],3)
> a
0 1
0 0.028320 -1.104
1 -0.121453 -0.179
2 -1.906779 -0.347
3 0.234835 -0.522
4 -0.309782 0.129
This question already has answers here:
Redefining the Index in a Pandas DataFrame object
(3 answers)
Closed 4 years ago.
I have a data frame called followers_df as below:
followers_df
0
0 oasikhia
0 LEANEnergyUS
0 _johannesngwako
0 jamesbreenre
0 CaitlinFecteau
0 mantequillaFACE
0 apowersb
0 ecoprinter
0 tsdesigns
0 GreenBizDoc
0 JimHarris
0 Jmarti11Julia
0 JAslat63
0 prAna
0 GrantLundberg
0 Jitasa_Is
0 ChoosePAWind
0 cleanpowerperks
0 WoWEorg
0 Laura_Chuck
I want to change this data frame into something like this:
followers_df
0
0 oasikhia
1 LEANEnergyUS
2 _johannesngwako
3 jamesbreenre
4 CaitlinFecteau
5 mantequillaFACE
6 apowersb
7 ecoprinter
8 tsdesigns
9 GreenBizDoc
10 JimHarris
11 Jmarti11Julia
12 JAslat63
13 prAna
14 GrantLundberg
15 Jitasa_Is
16 ChoosePAWind
17 cleanpowerperks
18 WoWEorg
19 Laura_Chuck
how can I do this? I tried:
index = pandas.Index(range(20))
followers_df = pandas.DataFrame(followers_df, index=index)
but it's giving me the following error:
ValueError: Shape of passed values is (1, 39), indices imply (1, 20)
thanks,
you can do
followers_df.index = range(20)
followers_df.reset_index()
followers_df.reindex(index=range(0,20))
When you are not sure of the number of rows, then you can do it this way:
followers_df.index = range(len(followers_df))