I have a problem with an astype, when I do it I'm losing decimals that are really important cause it's for longitude and latitude coordinates.
df[["Latitud","Longitud"]] = df[["Latitud","Longitud"]].astype(float)
Here is what I need:
df[["Latitud", "Longitud"]]
Latitud Longitud
0 -34.807023 -56.0336021
1 -34.8879924 -56.1846677
2 -34.8895332 -56.1560728
3 -34.8860972 -56.1635684
4 -34.7242753 -56.2012194
393 -34.8575722 -56.0534571
394 -34.7448815 -56.2132383
395 -34.8539222 -56.2320066
396 -34.8513169 -56.1721213
397 -34.8220428 -55.9906951
And here it's what astype gives me:
df[["Latitud", "Longitud"]]
Latitud Longitud
0 -35 -56
1 -35 -56
2 -35 -56
3 -35 -56
4 -35 -56
393 -35 -56
394 -35 -56
395 -35 -56
396 -35 -56
397 -35 -56
I try with no luck:
df[["Latitud","Longitud"]] = pd.to_numeric(df[["Latitud","Longitud"]],errors='coerce')
pd.options.display.float_format = '{:.08f}'.format
How can I can keep my decimals?
Well the thing that solve the problem was using:
pd.options.display.float_format = '{:,.8f}'.format
Only to discover that was not the problem! But hope can help someone with decimals!
Related
I have a table:
-60 -40 -20 0 20 40 60
100 520.0 440.0 380.0 320.0 280.0 240.0 210.0
110 600.0 500.0 430.0 370.0 320.0 280.0 250.0
I add the column to the dataframe like so:
wind_comp = -35
if int(wind_comp) not in df.columns:
new_col = df.columns.to_list()
new_col.append(int(wind_comp))
new_col.sort()
df = df.reindex(columns=new_col)
Which returns this:
-60 -40 -35 -20 0 20 40 60
100 520 440 NaN 380 320 280 240 210
110 600 500 NaN 430 370 320 280 250
I interpolate using pandas interpolate() method like this:
df.interpolate(axis=1).interpolate('linear')
If I add a new column of say, -35 it just finds the middle of the -40 and the -20 columns and doesn't get any more accurate. So it returns this:
-60 -40 -35 -20 0 20 40 60
100 520.0 440.0 410.0 380.0 320.0 280.0 240.0 210.0
110 600.0 500.0 465.0 430.0 370.0 320.0 280.0 250.0
Obviously this row would be correct if I had added a column of -30, but I didn't. I need it to give back more accuracy. I want to be able to enter -13 for example and it give me back that interpolated exact number.
How can I do this? Am I doing something wrong in my code or and I missing something? Please help.
EDIT:
It seems that pandas.interpolate() will only halve the to numbers it is placed between and doesn't take into account headers.
I can't find anything that really applies to working with a table using scipy but maybe I'm interpreting it wrong. Is it possible to use that or something different?
Here's an example of interp1d with your values. Now, I'm glossing over a huge number of details here, like how to get values from your DataFrame into a list like this. In many cases, it is easier to do manipulation like this with lists before it becomes a DataFrame.
import scipy.interpolate
x = [ -60, -40, -20, 0 , 20, 40, 60]
y1 = [ 520.0, 440.0, 380.0, 320.0, 280.0, 240.0, 210.0]
y2 = [ 600.0, 500.0, 430.0, 370.0, 320.0, 280.0, 250.0]
f1 = scipy.interpolate.interp1d(x,y1)
f2 = scipy.interpolate.interp1d(x,y2)
print(-35, f1(-35))
print(-35, f2(-35))
Output:
-35 425.0
-35 482.5
I have a table:
-60 -40 -20 0 20 40 60
100 520 440 380 320 280 240 210
110 600 500 430 370 320 280 250
120 670 570 490 420 370 330 290
130 740 630 550 480 420 370 330
140 810 690 600 530 470 410 370
The headers along the top are a wind vector and the first col on the left is a distance. The actual data in the 'body' of the table is just a fuel additive.
I am very new to Pandas and Numpy so please excuse the simplicity of the question. What I would like to know is, how can I enter the table using the headers to retrieve one number? I have seen its possible using indexes, but I don't want to use that method if I don't have to.
for example:
I have a wind unit of -60 and a distance of 120 so I need to retrieve the number 670. How can I use Numpy or Pandas to do this?
Also, if I have a wind unit of say -50 and a distance of 125, is it then possible to interpolate these in a simple way?
EDIT:
Here is what I've tried so far:
import pandas as pd
df = pd.read_table('fuel_adjustment.txt', delim_whitespace=True, header=0,index_col=0)
print(df.loc[120, -60])
But get the error:
line 3083, in get_loc raise KeyError(key) from err
KeyError: -60
You can select any cell from existing indices using:
df.loc[120,-60]
The type of the indices needs however to be integer. If not, you can fix it using:
df.index = df.index.map(int)
df.columns = df.columns.map(int)
For interpolation, you need to add the empty new rows/columns using reindex, then apply interpolate on each dimension.
(df.reindex(index=sorted(df.index.to_list()+[125]),
columns=sorted(df. columns.to_list()+[-50]))
.interpolate(axis=1, method='index')
.interpolate(method='index')
)
Output:
-60 -50 -40 -20 0 20 40 60
100 520.0 480.0 440.0 380.0 320.0 280.0 240.0 210.0
110 600.0 550.0 500.0 430.0 370.0 320.0 280.0 250.0
120 670.0 620.0 570.0 490.0 420.0 370.0 330.0 290.0
125 705.0 652.5 600.0 520.0 450.0 395.0 350.0 310.0
130 740.0 685.0 630.0 550.0 480.0 420.0 370.0 330.0
140 810.0 750.0 690.0 600.0 530.0 470.0 410.0 370.0
You can simply use df.loc for that purpose
df.loc[120,-60]
You need to check the data type of index and column. That should be the reason why you failed df.loc[120,-60].
Try:
df.loc[120, "-60"]
To validate the data type, you may call:
>>> df.index
Int64Index([100, 110, 120, 130, 140], dtype='int64')
>>> df.columns
Index(['-60', '-40', '-20', '0', '20', '40', '60'], dtype='object')
If you want to turn the header of columns into int64, you may need to turn it into numeric:
df.columns = pd.to_numeric(df.columns)
For interpolation, I think the only way would be creating that nonexistent index and column first, then you can get that value. However, it will grow your df rapidly if it's frequently query.
First, you need to add the nonexistent index and column.
Interpolate row-wise and column-wise.
Get your value.
new_index = df.index.to_list()
new_index.append(125)
new_index.sort()
new_col = df.columns.to_list()
new_col.append(-50)
new_col.sort()
df = df.reindex(index=new_index, columns=new_col)
df = df.interpolate(axis=1).interpolate()
print(df[125, -50])
Another way is to write a function to fetch relative numbers and returns the interpolate result.
Find the upper and lower indexes and columns of your target.
Fetch the four numbers.
Sequentially interpolate the index and column.
I read the following txt file with 'pd.read_csv(filename, sep = ',')'
read pass 1000K.
-128,-50,-48,-47,-41,-45,-41,-41,-39,-37
-127,-49,-46,-46,-40,-44,-40,-40,-38,-36
-126,-48,-44,-45,-39,-43,-39,-39,-37,-35
-125,-47,-42,-44,-38,-42,-38,-38,-36,-34
then I convert it to csv using
df = pd.to_csv(filename, index=None)
I get the following:
read pass 1000K.
-37
-36
-35
-34
only one column is preserved since it is default sep = ','
Anyone know how to keep the first row separated with ' ' and the other rows separated with ','?
so I can get all the data into cells
read|pass|1000K.
-128|-50|-48|-47|-41|-45|-41|-41|-39|-37
-127|-49|-46|-46|-40|-44|-40|-40|-38|-36
-126|-48|-44|-45|-39|-43|-39|-39|-37|-35
-125|-47|-42|-44|-38|-42|-38|-38|-36|-34
I tried the following, and it is working fine.
hello.txt
read pass 1000K.
-128,-50,-48,-47,-41,-45,-41,-41,-39,-37
-127,-49,-46,-46,-40,-44,-40,-40,-38,-36
-126,-48,-44,-45,-39,-43,-39,-39,-37,-35
-125,-47,-42,-44,-38,-42,-38,-38,-36,-34
In [1]: import pandas as pd
In [2]: df = pd.read_csv('hello.txt')
In [3]: df
Out[3]:
read pass 1000K.
-128 -50 -48 -47 -41 -45 -41 -41 -39 -37
-127 -49 -46 -46 -40 -44 -40 -40 -38 -36
-126 -48 -44 -45 -39 -43 -39 -39 -37 -35
-125 -47 -42 -44 -38 -42 -38 -38 -36 -34
In [4]: df.to_csv("test3.csv")
Now If I check test.csv it has all columns preserved.
,,,,,,,,,read pass 1000K.
-128,-50,-48,-47,-41,-45,-41,-41,-39,-37
-127,-49,-46,-46,-40,-44,-40,-40,-38,-36
-126,-48,-44,-45,-39,-43,-39,-39,-37,-35
-125,-47,-42,-44,-38,-42,-38,-38,-36,-34
My dataframe has a column called dir, it has several values, I want to know how many the values passes a certain point. For example:
df['dir'].value_counts().sort_index()
It returns a Series
0 855
20 881
40 2786
70 3777
90 3964
100 4
110 2115
130 3040
140 1
160 1697
180 1734
190 3
200 618
210 3
220 1451
250 895
270 2167
280 1
290 1643
300 1
310 1894
330 1
340 965
350 1
Name: dir, dtype: int64
Here, I want to know the number of the value passed 500. In this case, it's all except 100, 140, 190,210, 280,300,330,350.
How can I do that?
I can get away with df['dir'].value_counts()[df['dir'].value_counts() > 500]
(df['dir'].value_counts() > 500).sum()
This gets the value counts and returns them as a series of Truth Values. The parens treats this whole thing like a series. .sum() counts the True values as 1 and the False values as 0.
There is probably a really simple answer to this and I'm only asking as a last resort as I usually get my answers by searching but I can't figure this out or find an answer. Basically I'm plotting some wind barbs in Python but they are pointing in the wrong direction and I don't know why.
Data is imported from a file and put into lists, I found on another stackoverflow post how to set the U, V for barbs using np.sin and np.cos, which results in the correct wind speed but the direction is wrong. I'm basically plotting a very simple tephigram or Skew-T.
# Program to read in radiosonde data from a file named "raob.dat"
# Import numpy since we are going to use numpy arrays and the loadtxt
# function.
import numpy as np
import matplotlib.pyplot as plt
# Open the file for reading and store the file handle as "f"
# The filename is 'raob.dat'
f=open('data.dat')
# Read the data from the file handle f. np.loadtxt() is useful for reading
# simply-formatted text files.
datain=np.loadtxt(f)
# Close the file.
f.close();
# We can copy the different columns into
# pressure, temperature and dewpoint temperature
# Note that the colon means consider all elements in that dimension.
# and remember indices start from zero
p=datain[:,0]
temp=datain[:,1]
temp_dew=datain[:,2]
wind_dir=datain[:,3]
wind_spd=datain[:,4]
print 'Pressure/hPa: ', p
print 'Temperature/C: ', temp
print 'Dewpoint temperature: ', temp_dew
print 'Wind Direction/Deg: ', wind_dir
print 'Wind Speed/kts: ', wind_spd
# for the barb vectors. This is the bit I think it causing the problem
u=wind_spd*np.sin(wind_dir)
v=wind_spd*np.cos(wind_dir)
#change units
#p=p/10
#temp=temp/10
#temp_dew=temp_dew/10
#plot graphs
fig1=plt.figure()
x1=temp
x2=temp_dew
y1=p
y2=p
x=np.linspace(50,50,len(y1))
#print x
plt.plot(x1,y1,'r',label='Temp')
plt.plot(x2,y2,'g',label='Dew Point Temp')
plt.legend(loc=3,fontsize='x-small')
plt.gca().invert_yaxis()
#fig2=plt.figure()
plt.barbs(x,y1,u,v)
plt.yticks(y1)
plt.grid(axis='y')
plt.show()
The barbs should all mostly be in the same direction as you can see in the direction in degrees from the data.
Any help is appreciated. Thank you.
Here is the data that is used:
996 25.2 24.9 290 12
963.2 24.5 22.6 315 42
930.4 23.8 20.1 325 43
929 23.8 20 325 43
925 23.4 19.6 325 43
900 22 17 325 43
898.6 21.9 17 325 43
867.6 20.1 16.5 320 41
850 19 16.2 320 44
807.9 16.8 14 320 43
779.4 15.2 12.4 320 44
752 13.7 10.9 325 43
725.5 12.2 9.3 320 44
700 10.6 7.8 325 45
649.7 7 4.9 315 44
603.2 3.4 1.9 325 49
563 0 -0.8 325 50
559.6 -0.2 -1 325 50
500 -3.5 -4.9 335 52
499.3 -3.5 -5 330 54
491 -4.1 -5.5 332 52
480.3 -5 -6.4 335 50
427.2 -9.7 -11 330 45
413 -11.1 -12.3 335 43
400 -12.7 -14.4 340 42
363.9 -16.9 -19.2 350 37
300 -26.3 -30.2 325 40
250 -36.7 -41.2 330 35
200 -49.9 0 335 0
150 -66.6 0 0 10
100 -83.5 0 0 30
Liam
# for the barb vectors. This is the bit I think it causing the problem
u=wind_spd*np.sin(wind_dir)
v=wind_spd*np.cos(wind_dir)
Instead try:
u=wind_spd*np.sin((np.pi/180)*wind_dir)
v=wind_spd*np.cos((np.pi/180)*wind_dir)
(http://tornado.sfsu.edu/geosciences/classes/m430/Wind/WindDirection.html)