Convert string to Float with dot and also comma - python

This is the column of a dataframe that I have (values are str):
Values
7257.5679
6942.0949714286
5780.0125476250005
This is how I want the record to go to the database:
Values
7.257,56
6.942,09
5.780,01
How can I do this? Thanks in advance!

df["Values"] = df["Values"].apply(lambda x: "{:,.2f}".format(float(x)))
Output:
Values
0 7,257.57
1 6,942.09
2 5,780.01
To get values in the format 7.257,56. You can make good use of the replace function:
df["Values"] = df["Values"].apply(lambda x: "{:,.2f}".format(float(x)).replace(".", ",").replace(",", ".", 1))
But replace might not be more efficient and concise when dealing with larger dataset, in that case you might want to look into translate, that will be the best approach to go with.
trans_column = str.maketrans(",.", ".,")
df["Values"] = df["Values"].apply(lambda x: "{:,.2f}".format(float(x)).translate(trans_column))
Output:
Values
0 7.257,57
1 6.942,09
2 5.780,01

Related

Filter on a pandas string column as numeric without creating a new column

This is a quite easy task, however, I am stuck here. I have a dataframe and there is a column with type string, so characters in it:
Category
AB00
CD01
EF02
GH03
RF04
Now I want to treat these values as numeric and filter on and create a subset dataframe. However, I do not want to change the dataframe in any way. I tried:
df_subset=df[df['Category'].str[2:4]<=3]
of course this does not work, as the first part is a string and cannot be evaluated as numeric and compared to 69.
I tried
df_subset=df[int(df['Category'].str[2:4])<=3]
but I am not sure about this, I think it is wrong or not the way it should be done.
Add type conversion to your expression:
df[df['Category'].str[2:].astype(int) <= 3]
Category
0 AB00
1 CD01
2 EF02
3 GH03
As you have leading zeros, you can directly use string comparison:
df_subset = df.loc[df['Category'].str[2:4] <= '03']
Output:
Category
0 AB00
1 CD01
2 EF02
3 GH03

Python - How to split a Pandas value and only get the value between the slashs

Example:
I the df['column'] has a bunch of values similar to: F/4500/O or G/2/P
The length of the digits range from 1 to 4 similar to the examples given above.
How can I transform that column to only keep 1449 as an integer?
I tried the split method but I can't get it right.
Thank you!
You could extract the value and convert to_numeric:
df['number'] = pd.to_numeric(df['column'].str.extract('/(\d+)/', expand=False))
Example:
column number
0 F/4500/O 4500
1 G/2/P 2
How's about:
df['column'].map(lambda x: int(x.split('/')[1]))

Count the number of elements in a list where the list contains the empty string

I'm having difficulties counting the number of elements in a list within a DataFrame's column. My problem comes from the fact that, after importing my input csv file, the rows that are supposed to contain an empty list [] are actually parsed as lists containing the empty string [""]. Here's a reproducible example to make things clearer:
import pandas as pd
df = pd.DataFrame({"ID": [1, 2, 3], "NETWORK": [[""], ["OPE", "GSR", "REP"], ["MER"]]})
print(df)
ID NETWORK
0 1 []
1 2 [OPE, GSR, REP]
2 3 [MER]
Even though one might think that the list for the row where ID = 1 is empty, it's not. It actually contains the empty string [""] which took me a long time to figure out.
So whatever standard method I try to use to calculate the number of elements within each list I get a wrong value of 1 for those who are supposed to be empty:
df["COUNT"] = df["NETWORK"].str.len()
print(df)
ID NETWORK COUNT
0 1 [] 1
1 2 [OPE, GSR, REP] 3
2 3 [MER] 1
I searched and tried a lot of things before posting here but I couldn't find a solution to what seems to be a very simple problem. I should also note that I'm looking for a solution that doesn't require me to modify my original input file nor modify the way I'm importing it.
You just need to write a custom apply function that ignores the ''
df['COUNT'] = df['NETWORK'].apply(lambda x: sum(1 for w in x if w!=''))
Another way:
df['NETWORK'].apply(lambda x: len([y for y in x if y]))
Using apply is probably more straightforward. Alternatively, explode, filter, then group by count.
_s = df['NETWORK'].explode()
_s = _s[_s != '']
df['count'] = _s.groupby(level=0).count()
This yields:
NETWORK count
ID
1 [] NaN
2 [OPE, GSR, REP] 3.0
3 [MER] 1.0
Fill NA with zeroes if needed.
df["COUNT"] = df["NETWORK"].apply(lambda x: len(x))
Use a lambda function on each row and in the lambda function return the length of the array

str.findall to return the value, if one element is found

I want to find a number in a string and return it as int. There are only two variants: either the number doesn't appear or appears only once. Currently I do this by this code:
df['rating'] = pd.to_numeric(df['rating'].astype(str).str.findall('(\d+)').apply(lambda x: x[0] if len(x) > 0 else np.nan), errors='coerce')
df['rating'] = pd.Series(df['rating'], dtype=pd.Int32Dtype())
But I'm pretty sure the code is not optimal and I can do this shorter. How do I do this?
Use Series.str.extract for match first value else missing value first, then convert to floats and last to intgers with NaNs:
df = pd.DataFrame({'rating':['asas','ds5dd87','sd223d']})
df['rating'] = df['rating'].astype(str).str.extract('(\d+)').astype(float).astype('Int64')
Solution with Series.str.findall is similar, only for get first value is use str[0]:
df['rating']=df['rating'].astype(str).str.findall('(\d+)').str[0].astype(float).astype('Int64')
print (df)
rating
0 NaN
1 5
2 223

Convert categorical column into specific integers

I have a bunch of dataframes with one categorical column defining Sex (M/F). I want to assign integer 1 to Male and 2 to Female. I have the following code that cat codes them to 0 and 1 instead
df4["Sex"] = df4["Sex"].astype('category')
df4.dtypes
df4["Sex_cat"] = df4["Sex"].cat.codes
df4.head()
But I need specifically for M to be 1 and F to be 2. Is there a simple way to assign specific integers to categories?
IIUC:
df4['Sex'] = df4['Sex'].map({'M':1,'F':2})
And now:
print(df4)
Would be desired result.
If you need to impose a specific ordering, you can use pd.Categorical:
c = pd.Categorical(df["Sex"], categories=['M','F'], ordered=True)
This ensures "M" is given the smallest value, "F" the next, and so on. You can then just access codes and add 1.
df['Sex_cat'] = c.codes + 1
It is better to use pd.Categorical than astype('category') if you want finer control over what categories are assigned what codes.
You can also use lambda with apply:
df4['sex'] = df4['sex'].apply(lambda x : 1 if x=='M' else 2)

Categories

Resources