I have a dataframe of 2 columns.I want to convert COUNT column to int.It keeps me giving value error:Unable to Parse string "0.58%" at position 0
METRIC COUNT
Scans 125487
No Reads 2541
Diverts 54710
No Code% 0.58%
No Read% 1.25%
df['COUNT'] = df['COUNT'].apply(pd.to_numeric)
How can i remove % before conversion
You can use str.strip:
pd.to_numeric(df.col1.str.strip('%'))
0 1
1 2
2 3
Name: col1, dtype: int64
Try this, I'm assuming that the 0.58% is read in as a string, meaning that the replace function will work to replace '%' with nothing, at which point, it can be converted to a number
import pandas as pd
df = pd.DataFrame({'col1':['1','2','3%']})
df.col1.str.replace('%','').astype(float)
Related
This is a quite easy task, however, I am stuck here. I have a dataframe and there is a column with type string, so characters in it:
Category
AB00
CD01
EF02
GH03
RF04
Now I want to treat these values as numeric and filter on and create a subset dataframe. However, I do not want to change the dataframe in any way. I tried:
df_subset=df[df['Category'].str[2:4]<=3]
of course this does not work, as the first part is a string and cannot be evaluated as numeric and compared to 69.
I tried
df_subset=df[int(df['Category'].str[2:4])<=3]
but I am not sure about this, I think it is wrong or not the way it should be done.
Add type conversion to your expression:
df[df['Category'].str[2:].astype(int) <= 3]
Category
0 AB00
1 CD01
2 EF02
3 GH03
As you have leading zeros, you can directly use string comparison:
df_subset = df.loc[df['Category'].str[2:4] <= '03']
Output:
Category
0 AB00
1 CD01
2 EF02
3 GH03
Example:
I the df['column'] has a bunch of values similar to: F/4500/O or G/2/P
The length of the digits range from 1 to 4 similar to the examples given above.
How can I transform that column to only keep 1449 as an integer?
I tried the split method but I can't get it right.
Thank you!
You could extract the value and convert to_numeric:
df['number'] = pd.to_numeric(df['column'].str.extract('/(\d+)/', expand=False))
Example:
column number
0 F/4500/O 4500
1 G/2/P 2
How's about:
df['column'].map(lambda x: int(x.split('/')[1]))
I would like to differentiate an empty string with certain lengths and a regular string such as G1234567. The length of the empty string right now in my dataset is 8 but I would not guarantee all future empty string will still have length of 8.
This is what the column looks like when I print it out:
0
1
2
3
4
9461 G6000000
9462 G6000001
9463 G6000002
9464 G6000003
9465 G6000004
Name: Sub_ID, Length: 9466, dtype: object
If I apply pd.isnull() on the entire column, I will have a mask populated with all False. I would like to ask if there is anyway for me to differentiate between an empty string with certain lengths and a string that is actually populated with something.
Thank you so much for your help!
The following creates a mask for all the cells in your DataFrame (df) that are just empty strings (strings that only contain whitespaces):
df.applymap(lambda column: column.isspace())
I have a pandas Series words which look like:
0 a
1 calculated
2 titration
3 curve
4 for
5 oxalic
6 acid
7 be
8 show
9 at
Name: word, dtype: object
I also have a Series occurances which looks like:
a 278
show 179
curve 2
Name: index, dtype: object
I want to filter words using occurances in a way that a word would be filtered if it is not in occurances or it value is less than 100.
In the given example I would like to get:
0 a
8 show
Name: word, dtype: object
isin only check existence and when I've tried to use apply\map or [] operator I got an Error
Series objects are mutable and cannot be hashed
I can also work with solution on DataFrames.
I think you would need to first filter the specific words you want from your occurences Series, and then use the index of it, as the value for the .isin():
output = words[words.isin(occurences[occurences > 100].index)]
Try this:
words[words.apply(lambda x: x not in occurances or (x in occurances and occurances[x]<100))]
The isin method works, but generate a list of booleans you should use as index:
>> # reproduce the example
>> import pandas as pd
>> words = pd.Series(['a','calculated','titration','curve','for','oxalic','acid','be','show','at'])
>> occurances = pd.Series(['a','show','curve'], index= [278, 179, 2])
>> # apply the filter
>> words[words.isin(occurances[occurances.index > 100])]
0 a
8 show
dtype: object
I have a Column with data like 3.4500,00 EUR.
Now I want to compare this with another column having float numbers like 4000.00.
How do I take this string, remove the EUR and replace comma with decimal and then convert into float to compare?
You can use regular expressions to make your conditions general that would work in all cases:
# Make example dataframe for showing answer
df = pd.DataFrame({'Value':['3.4500,00 EUR', '88.782,21 DOLLAR']})
Value
0 3.4500,00 EUR
1 88.782,21 DOLLAR
Use str.replace with regular expression:
df['Value'].str.replace('[A-Za-z]', '').str.replace(',', '.').astype(float)
0 34500.00
1 88782.21
Name: Value, dtype: float64
Explanation:
str.replace('[A-Za-z\.]', '') removes all alphabetic characters and dots.
str.replace(',', '.') replaces the comma for a dot
astype(float) converts it from object (string) type to float
Here is my solution:
mock data:
amount amount2
0 3.4500,00EUR 4000
1 3.600,00EUR 500
use apply() then convert the data type to float
data['amount'] = data['amount'].apply(lambda x: x.replace('EUR', '')).apply(lambda x: x.replace('.', '')).apply(lambda x: x.replace(',', '.')).astype('float')
result:
amount amount2
0 34500.0 4000
1 3600.0 500