This question already has answers here:
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 2 years ago.
I have a column in my dataset which have the following values (single, married) some of the cell are empty and I want to convert single to 0 and married to 1 and convert it from string to int
df.X4[df.X4 == 'single'] = 1
df.X4[df.X4 == 'married'] = 2
df['X4'] = df['X4'].astype(str).astype(int)
the cells that has no value give this error
ValueError: invalid literal for int() with base 10: 'nan'
I have tried fillna like this : df.X4.fillna(0)
but still give same error
Let us try
df.X4[df.X4 == 'single'] = 1
df.X4[df.X4 == 'married'] = 2
df['X4'] = pd.to_numeric(df['X4'], errors='coerce').fillna(0).astype(int)
Related
This question already has answers here:
Convert categorical data in pandas dataframe
(15 answers)
Convert categorical variables from String to int representation
(3 answers)
Closed 5 months ago.
I have a column with the following values:
City
A
B
C
I want to create a heatmap but can't because this column is not an integar so I will be making it as follows:
city_new
1
2
3
I have tried this case statement but it does not work
df['city_new'] = np.where(df['City']='A', 1,
np.where(df['City']='B', 2,
np.where(df['City']='C', 3)))
You can use pandas.factorize, so that you don't have to make conditions yourself (e.g. if you have 1000 different City):
df["new_city"] = pd.factorize(df["City"])[0] + 1
Output:
City new_city
0 A 1
1 B 2
2 C 3
You could use the replace option. To replace A, B, C with 1,2,3 as per the below code.
df['city_new'] = df['City'].replace(['A','B','C'], [1,2,3])
Your code was incorrect for two reasons:
You used = instead of == to check for the string
You need to state the equivalent of an 'else' clause if none of the logic statement are true, this is the value 4 in the code below.
Your code should look like this:
df['City_New'] = np.where(df['City']=='A', 1, np.where(df['City']=='B', 2, np.where(df['City']=='C', 3, 4)))
This question already has answers here:
What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?
(4 answers)
Closed 1 year ago.
Both the following lines seem to give the same output:
df1 = df[df['MRP'] > 1500]
df1 = df.loc[df['MRP'] > 1500]
Is loc an optional attribute when searching dataframe?
Coming from Padas.DataFrame.loc documentation:
Access a group of rows and columns by label(s) or a boolean array.
.loc[] is primarily label based, but may also be used with a boolean
array.
When you are using Boolean array to filter out data, .loc is optional, and in your example df['MRP'] > 1500 gives a Series with the values of truthfulness, so it's not necessary to use .loc in that case.
df[df['MRP']>15]
MRP cat
0 18 A
3 19 D
6 18 C
But if you want to access some other columns where this Boolean Series has True value, then you may use .loc:
df.loc[df['MRP']>15, 'cat']
0 A
3 D
6 C
Or, if you want to change the values where the condition is True:
df.loc[df['MRP']>15, 'cat'] = 'found'
This question already has answers here:
pandas select from Dataframe using startswith
(5 answers)
Closed 2 years ago.
In table A, there’s columns 1 and 2.
Column 1 is unique id’s like (‘A12324’) and column 2 is blank for now.
I want to fill the value of column 2 with Yes if the id starts with A, and else No.
Is any one familiar with how I can maybe use a left for this?
I tried this but my error read that left is not defined.
TableA.loc[TableA['col1'] == left('A',1), 'hasAnA'] = 'Yes'
You can use the pd.Series.str.startswith() method:
>>> frame = pd.DataFrame({'colA': ['A12342', 'B123123231'], 'colB': False})
>>> condition = frame['colA'].str.startswith('A')
>>> frame.loc[condition, 'colB'] = 'Yes'
>>> frame
colA colB
0 A12342 Yes
1 B123123231 False
This question already has answers here:
What does 'index 0 is out of bounds for axis 0 with size 0' mean?
(4 answers)
Closed 3 years ago.
I'm trying to have my function go through the spread sheet inspect every other 3 columns and find the location (row number) of some values but it works partially. The code runs and returns some values but I get this error message between and it stops.
I'm using start=mm[0] to grab the first value from the array and end=mm[-1] to grab the last value.
def get_voltageStatus(r,t):
for i in range (1,len(data[0]),3):
m=np.where((data[1:,i]>=r) & (data[1:,i]<=t))
mm_raws = []
mm=m[0]
start=mm[0]
end=mm[-1]
print(data[0,i])
duration(start,end)
Error is:
start=mm[0] IndexError: index 0 is out of bounds for axis 0 with size
0
this fixed the problem
def get_voltageStatus(r,t):
all=[]
for i in range (1,len(data[0]),3):
m=np.where((data[1:,i]>=r) & (data[1:,i]<=t))
print(i)
mm_raws = []
mm=m[0]
if mm.any():
start=mm[0]
end=mm[-1]
print(data[0,i])
temp=duration(start,end)
all.append([data[0,i],temp])
This question already has answers here:
Convert pandas.Series from dtype object to float, and errors to nans
(3 answers)
Closed 3 years ago.
Data from json is in df and am trying to ouput to a csv.
I am trying to multiply dataframe column with a fixed value and having issues how data is displayed
I have used the following but the data is still not how i want to display
df_entry['Hours'] = df_entry['Hours'].multiply(2)
df_entry['Hours'] = df_entry['Hours'] * 2
Input
ID, name,hrs
100,AB,37.5
Expected
ID, name,hrs
100,AB,75.0
What I am getting
ID, name,hrs
100,AB,37.537.5
That happens because the dtype of the column is str. You need to convert it to float before multiplication.
df_entry['Hours'] = df_entry['Hours'].astype(float) * 2
You can use apply function.
df_entry['Hours'] = df_entry['Hours'].apply(lambda x: float(int(x))*2)