I have longitude and latitude in two dataframes that are close together. If I run an exact similarity check such as
test_similar = test1_latlon.loc[~test1_latlon['cr'].isin(test2_latlon['cr'])]
I get a lot of failures because a lot of the numbers are off at the 5th decimal place. I want to truncate at after the 3rd decimal. I've seen people format so it shows up truncated, but I want to change the actual value. Using round() rounds off the data and I get even more errors, so is there a way to just drop after 3 decimal points?
You may want to use numpy.trunc:
import numpy as np
import pandas as pd
df = pd.DataFrame([[1.2366, 1.2310], [1, 1]])
df1 = np.trunc(1000 * df) / 1000
print(df1, type(df1))
# 0 1
# 0 1.236 1.231
# 1 1.000 1.000 <class 'pandas.core.frame.DataFrame'>
Note that df1 is still a DataFrame not a numpy.array
As suggested here you can do:
x = 1.123456
float( '%.3f'%(x) )
if you want more decimal places, just change the 3 with any number you need.
import math
value1 = 1.1236
value2 = 1.1266
value1 = math.trunc(1000 * value1) / 1000;
value2 = math.trunc(1000 * value2) / 1000;
#value1 output
1.123
#value2 output
1.126
Related
A simple data frame. I want to have a percentage to show the number of rows in the column "Tested" of 22, over the total number of rows.
i.e.
1. there are 5 rows of 22 in the column "Tested"
the data frame total of 15 rows
So the percentage is 5/15 = 0.33
I tried below, but it gives zero.
How can I correct it? Thank you.
import pandas as pd
data = {'Unit_Weight': [335,335,119,119,52,452,19,19,19,165,165,165,724,724,16],
'Tested' : [22,12,14,16,18,20,22,24,26,28,22,22,48,50,22]}
df = pd.DataFrame(data)
num_row = df.shape[0]
suspect_row = df[df["Tested"] == 22].shape[0]
suspect_over_total = suspect_row/num_row
print num_row # 15
print suspect_row # 5
print float(suspect_over_total) # 0.0
suspect_over_total = suspect_row/num_row means you are doing an int/int operation whose result is 0.3333333 so Python will give you an int result, 0 in this case.
As bubble said, you should convert one of the operand to a float:
suspect_over_total = float(suspect_row)/num_row # 0.33333333333
I have a huge database of meshed flow distrubition along a room. But the problem is the meshes are too small so some part of them is useless and makes computation hard for me. On my y dimension per mesh length is 0.00032. And my y dimension goes from 0 to 0.45. As you can understand there are a lot of useless data.
I want to make per mesh length equal to 0.00128 instead by deleting rows which is not dividable by 0.00128, how to do that ?
trainProcessed = trainProcessed[trainProcessed[:,4]%0.00128==0]
I have tried this line of code (trainProcessed is my data as a numpy array) but it goes like 0 -> 0.00128 -> 0.00256 -> 0.00512. But there are rows which has the value 0.00384 and that's also dividable by 0.00128. By the way array shape is (888300,8).
Example Data :
X: [0,0,0,0,0.00031999,0.00031999,0.00063999,0.00064,0.00096,0.00096,0.000128,0.000128]
Example Output:
X: [0,0,0,0,0.000128,0.000128]
For this case and the function modulo, i'll use the Decimal:
import pandas as pd
from decimal import Decimal
df = pd.DataFrame({'values': [0.00128, 0.00384, 0.367, 0.128, 0.34]})
print(df)
#convert float to str then Decimal and apply the modulo
#keep only rows which are dividable by 0.00128
filter = df.apply(lambda r: Decimal(str(r['values'])) % Decimal('0.00128') == Decimal('0') ,axis=1)
#if data are smaller you could multiply by power of 10 before modulo
#filter = df.apply(lambda r: Decimal(str(r['values'] * 1000)) % Decimal('0.00128') == Decimal('0') ,axis=1)
df=df[filter].reset_index(drop=True)
#the line: df=df[~filter].reset_index(drop=True) does the (not filter)
print(df)
initial output:
values
0 0.00128
1 0.00384
2 0.36700
3 0.12800
4 0.34000
final output
values
0 0.00128
1 0.00384
2 0.12800
I know how to calculate percent change from absolute using a pandas dataframe, using the following:
df_pctChange = df_absolute.pct_change()
But I can't seem to figure out how to calculate the inverse: using the initial row of df_absolute as the starting point, how do I calculate the absolute number from the percent change located in df_pctChange?
As an example, let's say that the initial row for the two columns in df_absolute are 548625 and 525980, and the the df_pctChange is the following:
NaN NaN
-0.004522 -0.000812
-0.009018 0.001385
-0.009292 -0.002438
How can I produce the content of df_absolute? It should look as follows:
548625 525980
546144 525553
541219 526281
536190 524998
You should be able to use the formula:
(1 + r).cumprod()
to get a cumulative growth factor.
Example:
>>> data
0 1
0 548625 525980
1 546144 525553
2 541219 526281
3 536190 524998
>>> pctchg = data.pct_change()
>>> init = data.iloc[0] # may want to use `data.iloc[0].copy()`
>>> res = (1 + pctchg).cumprod() * init
>>> res.iloc[0] = init
>>> res
0 1
0 548625.0 525980.0
1 546144.0 525553.0
2 541219.0 526281.0
3 536190.0 524998.0
To confirm you worked backwards into the correct absolute figures:
>>> np.allclose(data, res)
True
I have a data frame column.
P08107 3.658940e-11
P62979 4.817399e-05
P16401 7.784275e-05
Q96B49 7.784275e-05
Q15637 2.099078e-04
P31689 1.274387e-03
P62258 1.662718e-03
P07437 3.029516e-03
O00410 3.029516e-03
P23381 3.029516e-03
P27348 5.733834e-03
P29590 9.559550e-03
P25685 9.957186e-03
P09429 1.181282e-02
P62937 1.260040e-02
P11021 1.396807e-02
P31946 1.409311e-02
P19338 1.503901e-02
Q14974 2.213431e-02
P11142 2.402201e-02
I want to leave one decimal and remove extra digits, that it looks like
3.7e-11
instead of
3.658940e-11
and etc with all the others.
I know how to slice a string but it doesn't seem to work here.
If you have a pandas dataframe you could set the display options.
import pandas as pd
import numpy as np
pd.options.display.float_format = '{:.2f}'.format
pd.DataFrame(dict(randomvalues=np.random.random_sample((5,))))
Returns:
randomvalues
0 0.02
1 0.66
2 0.24
3 0.87
4 0.63
You could use str.format:
>>> '{:.2g}'.format(3.658940e-11)
'3.7e-11'
String slicing will not work here, because it does not round the values:
>>> s = '3.658940e-11'
>>> s[:3] + 'e' + s.split('e')[1]
'3.6e-11'
I have the following data frame (consisting of both negative and positive numbers):
df.head()
Out[39]:
Prices
0 -445.0
1 -2058.0
2 -954.0
3 -520.0
4 -730.0
I am trying to change the 'Prices' column to display as currency when I export it to an Excel spreadsheet. The following command I use works well:
df['Prices'] = df['Prices'].map("${:,.0f}".format)
df.head()
Out[42]:
Prices
0 $-445
1 $-2,058
2 $-954
3 $-520
4 $-730
Now my question here is what would I do if I wanted the output to have the negative signs BEFORE the dollar sign. In the output above, the dollar signs are before the negative signs. I am looking for something like this:
-$445
-$2,058
-$954
-$520
-$730
Please note there are also positive numbers as well.
You can use np.where and test whether the values are negative and if so prepend a negative sign in front of the dollar and cast the series to a string using astype:
In [153]:
df['Prices'] = np.where( df['Prices'] < 0, '-$' + df['Prices'].astype(str).str[1:], '$' + df['Prices'].astype(str))
df['Prices']
Out[153]:
0 -$445.0
1 -$2058.0
2 -$954.0
3 -$520.0
4 -$730.0
Name: Prices, dtype: object
You can use the locale module and the _override_localeconv dict. It's not well documented, but it's a trick I found in another answer that has helped me before.
import pandas as pd
import locale
locale.setlocale( locale.LC_ALL, 'English_United States.1252')
# Made an assumption with that locale. Adjust as appropriate.
locale._override_localeconv = {'n_sign_posn':1}
# Load dataframe into df
df['Prices'] = df['Prices'].map(locale.currency)
This creates a dataframe that looks like this:
Prices
0 -$445.00
1 -$2058.00
2 -$954.00
3 -$520.00
4 -$730.00