I need to convert a currency column in my DataFrame to float values so I can compute some stats.
Here's how the column looks like:
10.785,177
10.783,554
10.781,931
10.782,094
10.780,843
656,530
The result should be:
10785.177
10783.554
10781.931
10782.094
10780.843
656.530
I was trying to do something with regex but I don't know a lot about it. Any help is appreciated!
You can use df.apply() like this:
df['col'].apply(lambda x: x.replace(".", "").replace(",",".")).astype(float)
You need to remove thousands separators (.), replace decimal separators (,) with ., and then you can use pd.to_numeric:
>>> df['col'].str.replace('.', '', regex=False).str.replace(',', '.', regex=False)\
... .transform(pd.to_numeric)
0 10785.177
1 10783.554
2 10781.931
3 10782.094
4 10780.843
5 656.530
Name: col, dtype: float64
Related
This is a quite easy task, however, I am stuck here. I have a dataframe and there is a column with type string, so characters in it:
Category
AB00
CD01
EF02
GH03
RF04
Now I want to treat these values as numeric and filter on and create a subset dataframe. However, I do not want to change the dataframe in any way. I tried:
df_subset=df[df['Category'].str[2:4]<=3]
of course this does not work, as the first part is a string and cannot be evaluated as numeric and compared to 69.
I tried
df_subset=df[int(df['Category'].str[2:4])<=3]
but I am not sure about this, I think it is wrong or not the way it should be done.
Add type conversion to your expression:
df[df['Category'].str[2:].astype(int) <= 3]
Category
0 AB00
1 CD01
2 EF02
3 GH03
As you have leading zeros, you can directly use string comparison:
df_subset = df.loc[df['Category'].str[2:4] <= '03']
Output:
Category
0 AB00
1 CD01
2 EF02
3 GH03
Example:
I the df['column'] has a bunch of values similar to: F/4500/O or G/2/P
The length of the digits range from 1 to 4 similar to the examples given above.
How can I transform that column to only keep 1449 as an integer?
I tried the split method but I can't get it right.
Thank you!
You could extract the value and convert to_numeric:
df['number'] = pd.to_numeric(df['column'].str.extract('/(\d+)/', expand=False))
Example:
column number
0 F/4500/O 4500
1 G/2/P 2
How's about:
df['column'].map(lambda x: int(x.split('/')[1]))
I have a column of a dataframe consisting of strings, which are either a date (e.g. "12-10-2020") or a string starting with 4 digits (e.g. "4030 - random name"). I would like to write an if statement to capture the strings which are starting with 4 digits, which is similar to this code:
string[0].isdigit()
but instead of isdigit, it should be something like:
is string which starts with 4 digits
I hope I clarified my question and let me know if it is not clear. I am btw working in pandas.
Use str.contains:
col"
df[df["col"].str.contains(r'^[0-9]{4}')]
You can use str.match that is anchored by default to the start of the string:
Example:
df = pd.DataFrame({'col': ['4030 - random name', 'other', '07-02-2022']})
df[df['col'].str.match('\d{4}')]
output:
col
0 4030 - random name
I have a dataframe of 2 columns.I want to convert COUNT column to int.It keeps me giving value error:Unable to Parse string "0.58%" at position 0
METRIC COUNT
Scans 125487
No Reads 2541
Diverts 54710
No Code% 0.58%
No Read% 1.25%
df['COUNT'] = df['COUNT'].apply(pd.to_numeric)
How can i remove % before conversion
You can use str.strip:
pd.to_numeric(df.col1.str.strip('%'))
0 1
1 2
2 3
Name: col1, dtype: int64
Try this, I'm assuming that the 0.58% is read in as a string, meaning that the replace function will work to replace '%' with nothing, at which point, it can be converted to a number
import pandas as pd
df = pd.DataFrame({'col1':['1','2','3%']})
df.col1.str.replace('%','').astype(float)
I have a Column with data like 3.4500,00 EUR.
Now I want to compare this with another column having float numbers like 4000.00.
How do I take this string, remove the EUR and replace comma with decimal and then convert into float to compare?
You can use regular expressions to make your conditions general that would work in all cases:
# Make example dataframe for showing answer
df = pd.DataFrame({'Value':['3.4500,00 EUR', '88.782,21 DOLLAR']})
Value
0 3.4500,00 EUR
1 88.782,21 DOLLAR
Use str.replace with regular expression:
df['Value'].str.replace('[A-Za-z]', '').str.replace(',', '.').astype(float)
0 34500.00
1 88782.21
Name: Value, dtype: float64
Explanation:
str.replace('[A-Za-z\.]', '') removes all alphabetic characters and dots.
str.replace(',', '.') replaces the comma for a dot
astype(float) converts it from object (string) type to float
Here is my solution:
mock data:
amount amount2
0 3.4500,00EUR 4000
1 3.600,00EUR 500
use apply() then convert the data type to float
data['amount'] = data['amount'].apply(lambda x: x.replace('EUR', '')).apply(lambda x: x.replace('.', '')).apply(lambda x: x.replace(',', '.')).astype('float')
result:
amount amount2
0 34500.0 4000
1 3600.0 500