How to retain 2 decimals without rounding in python/pandas? - python

How can I retrain only 2 decimals for each values in a Pandas series? (I'm working with latitudes and longitudes). dtype is float64.
series = [-74.002568, -74.003085, -74.003546]
I tried using the round function but as the name suggests, it rounds. I looked into trunc() but this can only remove all decimals. Then I figures why not try running a For loop. I tried the following:
for i in series:
i = "{0:.2f}".format(i)
I was able to run the code without any errors but it didn't modify the data in any way.
Expected output would be the following:
[-74.00, -74.00, -74.00]
Anyone knows how to achieve this? Thanks!

series = [-74.002568, -74.003085, -74.003546]
["%0.2f" % (x,) for x in series]
['-74.00', '-74.00', '-74.00']
It will convert your data to string/object data type. It is just for display purpose. If you want to use it for calculation purpose then you can cast it to float. Then only one digit decimal will be visible.
[float('{0:.2f}'.format(x)) for x in series]
[-74.0, -74.0, -74.0]

here is one way to do it
assuming you meant pandas.Series, and if its true then
# you indicated its a series but defined only a list
# assuming you meant pandas.Series, and if its true then
series = [-74.002568, -74.003085, -74.003546]
s=pd.Series(series)
# use regex extract to pick the number until first two decimal places
out=s.astype(str).str.extract(r"(.*\..{2})")[0]
out
0 -74.00
1 -74.00
2 -74.00
Name: 0, dtype: object

Change the display options. This shouldn't change your underlying data.
pd.options.display.float_format = "{:,.2f}".format

Related

Remove "?" from pandas column

I've a pandas dataset which has columns and it's Dtype is object. The columns however has numerical float values inside it along with '?' and I'm trying to convert it to float. I want to remove these '?' from the entire column and making those values Nan but not 0 and then convert the column to float64.
The output of value_count() of Voltage column look like this :
? 3771
240.67 363
240.48 356
240.74 356
240.62 356
...
227.61 1
227.01 1
226.36 1
227.28 1
227.02 1
Name: Voltage, Length: 2276, dtype: int64
What is the best way to do that in case I've entire dataset which has "?" inside them along with numbers and i want to convert them all at once.
I tried something like this but it's not working. I want to do this operation for all the columns. Thanks
df['Voltage'] = df['Voltage'].apply(lambda x: float(x.split()[0].replace('?', '')))
1 More question. How can I get "?" from all the columns. I tried something like. Thanks
list = []
for i in df.columns:
if '?' in df[i]
continue
series = df[i].value_counts()['?']
list.append(series)
So, from your value_count, it is clear, that you just have some values that are floats, in a string, and some values that contain ? (apparently that ARE ?).
So, the one thing NOT to do, is use apply or applymap.
Those are just one step below for loops and iterrows in the hierarchy of what not to do.
The only cases where you should use apply is when, otherwise, you would have to iterate rows with for. And those cases almost never happen (in my real life, I've used apply only once. And that was when I was a beginner, and I am pretty sure that if I were to review that code now, I would find another way).
In your case
df.Voltage = df.Voltage.where(~df.Voltage.str.contains('\?')).astype(float)
should do what you want
df.Voltage.str.contains('\?') is a True/False series saying if a row contains a '?'. So ~df.Voltage.str.contains('\?') is the opposite (True if the row does not contain a '\?'. So df.Voltage.where(~df.Voltage.str.contains('\?')) is a serie where values that match ~df.Voltage.str.contains('\?') are left as is, and the other are replaced by the 2nd argument, or, if there is no 2nd argument (which is our case) by NaN. So exactly what you want. Adding .astype(float) convert everyhting to float, since it should now be possible (all rows contains either strings representing a float such as 230.18, or a NaN. So, all convertible to float).
An alternative, closer to what you where trying, that is replacing first, in place, the ?, would be
df.loc[df.Voltage=='?', 'Voltage']=None
# And then, df.Voltage.astype(float) converts to float, with NaN where you put None

Multiply pd DataFrame column with 7-digit scalar

I am trying to modify a pandas dataframe column this way:
Temporary=DF.loc[start:end].copy()
SLICE=Temporary.unstack("time").copy()
SLICE["Var"]["Jan"] = 2678400*SLICE["Var"]["Jan"]
However, this does not work. The resulting column SLICE["Var"]["Jan"] is still the same as before the multiplication.
If I multiply with 2 orders of magnitude less, the multiplication works. Also a subsequent multiplication with 100 to receive the same value that was intended in the first place, works.
SLICE["Var"]["Jan"] = 26784*SLICE["Var"]["Jan"]
SLICE["Var"]["Jan"] = 100*SLICE["Var"]["Jan"]
I seems like the scalar is too large for the multiplication. Is this a python thing or a pandas thing? How can I make sure that the multiplication with the 7-digit number works directly?
I am using Python 3.8, the precision of numbers in the dataframe is float32, they are in a range between 5.0xE-5 and -5.0xE-5 with some numbers having a smaller absolute value than 1xE-11.
EDIT: It might have to do with the 2-level column indexing. When I delete the first level, the calculation works:
Temporary=DF.loc[start:end].copy()
SLICE=Temporary.unstack("time").copy()
SLICE=SLICE.droplevel(0, axis=1)
SLICE["Jan"] = 2678400*SLICE["Jan"]
Your first method might give SettingWithCopyWarning which basically means the changes are not made to the actual dataframe. You can use .loc instead:
SLICE.loc[:,('Var', 'Jan')] = SLICE.loc[:,('Var', 'Jan')]*2678400

Not getting stats analysis of binary column pandas

I have a dataframe, 11 columns 18k rows. The last column is either a 1 or 0, but when I use .describe() all I get is
count 19020
unique 2
top 1
freq 12332
Name: Class, dtype: int64
as opposed to an actual statistical analysis with mean, std, etc.
Is there a way to do this?
If your numeric (0, 1) column is not being picked up automatically by .describe(), it might be because it's not actually encoded as an int dtype. You can see this in the documentation of the .describe() method, which tells you that the default include parameter is only for numeric types:
None (default) : The result will include all numeric columns.
My suggestion would be the following:
df.dtypes # check datatypes
df['num'] = df['num'].astype(int) # if it's not integer, cast it as such
df.describe(include=['object', 'int64']) # explicitly state the data types you'd like to describe
That is, first check the datatypes (I'm assuming the column is called num and the dataframe df, but feel free to substitute with the right ones). If this indicator/(0,1) column is indeed not encoded as int/integer type, then cast it as such by using .astype(int). Then, you can freely use df.describe() and perhaps even specify columns of which data types you want to include in the description output, for more fine-grained control.
You could use
# percentile list
perc =[.20, .40, .60, .80]
# list of dtypes to include
include =['object', 'float', 'int']
data.describe(percentiles = perc, include = include)
where data is your dataframe (important point).
Since you are new to stack, I might suggest that you include some actual code (i.e. something showing how and on what you are using your methods). You'll get better answers

convert float64 to int (excel to pandas)

I have imported excel file into python pandas. but when I display customer numbers I get in float64 format i.e
7.500505e+09 , 7.503004e+09
how do convert the column containing these numbers
int(yourVariable) will cast your float64 to a integer number.
Is this what you are looking for?
You can use pandas DataFrames style.format function to apply a formatting to a column like in https://towardsdatascience.com/style-pandas-dataframe-like-a-master-6b02bf6468b0. If you want to round the numbers to i.e. 2 decimal places follow the accepted answer in Python float to Decimal conversion. Conversion to an int reduces accuracy a bit too much i think
For the conecpt binary floats in general (what float64 is) see Python - round a float to 2 digits
Please use pd.set_option to control the precision's to be viewed
>>> import pandas as pd
>>> a = pd.DataFrame({'sam':[7.500505e+09]})
>>> pd.set_option('float_format', '{:f}'.format)
>>> a
sam
0 7500505000.000000
>>>

Can't Do Math on Column If Some Rows Are of Type String

Here is a sample of my df:
units price
0 143280.0 0.8567
1 4654.0 464.912
2 512210.0 607
3 Unknown 0
4 Unknown 0
I have the following code:
myDf.loc[(myDf["units"].str.isnumeric())&(myDf["price"].str.isnumeric()),'newValue']=(
myDf["price"].astype(float).fillna(0.0)*
myDf["units"].astype(float).fillna(0.0)/
1000)
As you can see, I'm trying to only do math to create the 'newValue' column for rows where the two source columns are both numeric. However, I get the following error:
ValueError: could not convert string to float: 'Unknown'
So it seems that even though I'm attempting to perform math only on the rows that don't have text, Pandas does not like that any of the rows have text.
Note that I need to maintain the instances of "Unknown" exactly as they are and so filling those with zero is not a good option.
This has be pretty stumped. Could not find any solutions by searching Google.
Would appreciate any help/solutions.
You can use the same condition you use on the left side of the = on the right side as follows (I set the condition in a variable is_num for readability):
is_num = (myDf["units"].astype(str).str.replace('.', '').str.isnumeric()) & (myDf["price"].astype(str).str.replace('.', '').str.isnumeric())
myDf.loc[is_num,'newValue']=(
myDf.loc[is_num, "price"].astype(float).fillna(0.0)*
myDf.loc[is_num, "units"].astype(float).fillna(0.0)/1000)
Also, you need to check with your read dataframe, but from this example, you can:
Remove the fillna(0.0), since there are no NaNs
Remove the checks on 'price' (as of your example, price is always numeric, so the check is not necessary)
Remove the astype(float) cast for price, since it's already numeric.
That would lead to the following somewhat more concise code:
is_num = myDf["units"].astype(str).str.replace('.', '').str.isnumeric()
myDf.loc[is_num,'newValue']=(
myDf.loc[is_num, "price"].astype(float)*
myDf.loc[is_num, "units"]/1000)

Categories

Resources