Pandas error with basemap/proj for map plotting

Pandas error with basemap/proj for map plotting - python

I ran the Python code below that is an example of "Plotting Maps: Visualizing Haiti Earthquake Crisis Data" on a book, Python for Data Analysis. Page 242-246
The code is supposed to create a plot map of Haiti but I got an error as below:
Traceback (most recent call last):
File "Haiti.py", line 74, in <module>
x, y = m(cat_data.LONGITUDE, cat_data.LATITUDE)
File "/usr/local/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1148, in __call__
xout,yout = self.projtran(x,y,inverse=inverse)
File "/usr/local/lib/python2.7/site-packages/mpl_toolkits/basemap/proj.py", line 286, in __call__
outx,outy = self._proj4(x, y, inverse=inverse)
File "/usr/local/lib/python2.7/site-packages/mpl_toolkits/basemap/pyproj.py", line 388, in __call__
_proj.Proj._fwd(self, inx, iny, radians=radians, errcheck=errcheck)
File "_proj.pyx", line 122, in _proj.Proj._fwd (src/_proj.c:1571)
RuntimeError
I checked if mpl_toolkits.basemap and proj module were installed okay on my machine. Basemap was installed from source as instructed and proj was installed by Homebrew and they looks fine to me.
If you have basemap and proj installed, does this code run successfully? If not, do you think if it's a module installation issue, the code itself, or any other?
Haiti.csv file can be downloaded from https://github.com/pydata/pydata-book/raw/master/ch08/Haiti.csv
import pandas as pd
import numpy as np
from pandas import DataFrame
data = pd.read_csv('Haiti.csv')
data = data[(data.LATITUDE > 18) & (data.LATITUDE < 20) &
(data.LONGITUDE > -75) & (data.LONGITUDE < -70)
& data.CATEGORY.notnull()]
def to_cat_list(catstr):
stripped = (x.strip() for x in catstr.split(','))
return [x for x in stripped if x]
def get_all_categories(cat_series):
cat_sets = (set(to_cat_list(x)) for x in cat_series)
return sorted(set.union(*cat_sets))
def get_english(cat):
code, names = cat.split('.')
if '|' in names:
names = names.split(' | ')[1]
return code, names.strip()
all_cats = get_all_categories(data.CATEGORY)
english_mapping = dict(get_english(x) for x in all_cats)
def get_code(seq):
return [x.split('.')[0] for x in seq if x]
all_codes = get_code(all_cats)
code_index = pd.Index(np.unique(all_codes))
dummy_frame = DataFrame(np.zeros((len(data), len(code_index))),
index=data.index, columns=code_index)
for row, cat in zip(data.index, data.CATEGORY):
codes = get_code(to_cat_list(cat))
dummy_frame.ix[row, codes] = 1
data = data.join(dummy_frame.add_prefix('category_'))
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
def basic_haiti_map(ax=None, lllat=17.25, urlat=20.25, lllon=-75, urlon=-71):
# create polar stereographic Basemap instance.
m = Basemap(ax=ax, projection='stere',
lon_0=(urlon + lllon) / 2,
lat_0=(urlat + lllat) / 2,
llcrnrlat=lllat, urcrnrlat=urlat,
llcrnrlon=lllon, urcrnrlon=urlon,
resolution='f')
# draw coastlines, state and country boundaries, edge of map. m.drawcoastlines()
m.drawstates()
m.drawcountries()
return m
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)
to_plot = ['2a', '1', '3c', '7a']
lllat=17.25; urlat=20.25; lllon=-75; urlon=-71
for code, ax in zip(to_plot, axes.flat):
m = basic_haiti_map(ax, lllat=lllat, urlat=urlat,
lllon=lllon, urlon=urlon)
cat_data = data[data['category_%s' % code] == 1]
# compute map proj coordinates.
print cat_data.LONGITUDE, cat_data.LATITUDE
x, y = m(cat_data.LONGITUDE, cat_data.LATITUDE)
m.plot(x, y, 'k.', alpha=0.5)
ax.set_title('%s: %s' % (code, english_mapping[code]))

This is resolved by changing m(cat_data.LONGITUDE, cat_data.LATITUDE) to m(cat_data.LONGITUDE.values, cat_data.LATITUDE.values), thanks to Alex Messina's finding.
With a little further study of mine, pandas changed that Series data of DataFrame (derived from NDFrame) should be passed with .values to a Cython function like basemap/proj since v0.13.0 released on 31 Dec 2013 as below.
Quote from github commit log of pandas:
+.. warning::
+
+ In 0.13.0 since ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
+ but instead subclass ``NDFrame``, you can **not pass** a ``Series`` directly as a ``ndarray`` typed parameter
+ to a cython function. Instead pass the actual ``ndarray`` using the ``.values`` attribute of the Series.
+
+ Prior to 0.13.0
+
+ .. code-block:: python
+
+ apply_integrate_f(df['a'], df['b'], df['N'])
+
+ Use ``.values`` to get the underlying ``ndarray``
+
+ .. code-block:: python
+
+ apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
You can find the corrected version of the example code here.

Related

Run traceback errors and missing documentation

I'm a beginner in Python, found some code that I wanted to test since nothing seems to work for me:
import numpy as np
import laspy
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# reading las file and copy points
input_las = laspy.read("topography.las")
point_records = input_las.points.copy()
# getting scaling and offset parameters
las_scaleX = input_las.header.scale[0]
las_offsetX = input_las.header.offset[0]
las_scaleY = input_las.header.scale[1]
las_offsetY = input_las.header.offset[1]
las_scaleZ = input_las.header.scale[2]
las_offsetZ = input_las.header.offset[2]
# calculating coordinates
p_X = np.array((point_records['point']['X'] * las_scaleX) + las_offsetX)
p_Y = np.array((point_records['point']['Y'] * las_scaleY) + las_offsetY)
p_Z = np.array((point_records['point']['Z'] * las_scaleZ) + las_offsetZ)
# plotting points
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(p_X, p_Y, p_Z, "marker=o")
plt.show()
for the most part seems like my IDE is not throwing any errors. But says it is missing some documentation for .copy .points and so on.
Also when I run the code I get:
Traceback (most recent call last):
line 19, in <module>
p_X = np.array((point_records['point']['X'] * las_scaleX) + las_offsetX)
and:
line 185, in __getitem__
return self.array[item]
ValueError: no field of name point
what am I doing wrong?
code I was trying to adapt: https://gis.stackexchange.com/questions/277317/visualizing-las-with-matplotlib

NameError: global name 'balanceAr' is not defined

I get the following error (last line is important) for the code below:
Warning (from warnings module):
File "C:/[file_location]/itteration 4.py", line 12
avgNug = reduce(lambda x, y: x + y, eachPix[:3])/len(eachPix[:3])
RuntimeWarning: overflow encountered in ubyte_scalars
Traceback (most recent call last):
File "C:/[file_location]/itteration 4.py", line 45, in
threshold(iar4)
File "C:/[file_location]/itteration 4.py", line 13, in threshold
balanceAr.append(avgNum)
NameError: global name 'balanceAr' is not defined
I've tried writing "global" before it, defining it outside the definition is in, with multiple syntaxes for the "global" definition.
The code is taken from the sentdex video https://www.youtube.com/watch?v=nych18rsXKU where this code works.
I'm using the same Python version as him, and I'm assuming the same libraries, since this is the fourth program from the playlist, and the previous 3 worked fine.
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import time
def threshold(imageArray):
balaceAr = []
newAr = imageArray
for eachRow in imageArray:
for eachPix in eachRow:
avgNug = reduce(lambda x, y: x + y, eachPix[:3])/len(eachPix[:3])
balanceAr.append(avgNum)
balance = reduce(lambda x, y: x + y, balanceAr)/len(balanceAr)
for eachRow in newAr:
for eachPix in eachRow:
if reduce(lambda x, y: x + y, eachPix[:3])/len(eachPix[:3]) > balance:
#eachPix 0,1,2,3 = 255
else:
#eachPix 0,1,2 = 0
eachPix[3] = 255
return newAr
'''in the original code this part is not commented, and there's also a i, i2 and i3
i4 = Image.open('images/sentdex.png')
iar4 = np.array(i4)'''
threshold(iar4)
'''same explanation as previous comment, only coordinates in 2nd () are 0,0;4,0;0,3
fig = plt.figure()
ax4 = plt.subplot2grid((8,6), (4,3), rowspan=4, colspan=3)
ax4.imshow(iar4)
'''
plt.show()
#P.S. I had to write " " on all lines that didn't have it for stackoverflow
# to interpret it as code, even if it was in the "code" section

you have a syntax error on decalration balaceAr = []
you may need to change it to balanceAr= []

Below your function definition:
balaceAr = [] # <===== Typo
Check for typos before posting next time.

What is the right version of matplotlib for sympy 1.0?

I tried to use the plot module of Sympy(1.0) in Pycharm, but encounter errors like the one below. I guess it is caused by an version imcompatibility between matplotlib(2.0.2) and Sympy(1.0). Does anyone have a clue? Thanks in advance~
Traceback (most recent call last):
File "/home/leizh/PycharmProjects/Learn_python/Smoothness_Bilinear_Quadrilateral_Elmt.py", line 49, in <module>
plot_parametric(cos(u),sin(u),(u,-5,5))
File "/home/leizh/.local/lib/python3.5/site-packages/sympy/plotting/plot.py", line 1415, in plot_parametric
plots.show()
File "/home/leizh/.local/lib/python3.5/site-packages/sympy/plotting/plot.py", line 184, in show
self._backend = self.backend(self)
File "/home/leizh/.local/lib/python3.5/site-packages/sympy/plotting/plot.py", line 1056, in __new__
return MatplotlibBackend(parent)
File "/home/leizh/.local/lib/python3.5/site-packages/sympy/plotting/plot.py", line 868, in __init__
self.cm = self.matplotlib.cm
AttributeError: 'NoneType' object has no attribute 'cm'
The code is meant to calculate a mapping for a bilinear quadrilateral element.
from sympy import *
from sympy.plotting import *
xi = Symbol("xi")
eta = Symbol("eta")
#Shape functions in reference element
def Ni(xi,eta,i):
references_vertices = {1:[-1,-1],2:[1,-1],3:[1,1],4:[-1,1]}
xiv = references_vertices[i][0]
etav = references_vertices[i][1]
return Rational(1,4)*(1+xiv*xi)*(1+etav*eta)
#Give a specific element in physical space with an angle >= 180 degree
physical_vertices = {1:[-1,-1],2:[1,-1],3:[1,1],4:[0,0]}
#Interpolation for (x,y) in terms of (xi,eta)
def mapping(xi,eta,vertices):
x = 0
y = 0
for i in vertices:
xv = vertices[i][0]
yv = vertices[i][1]
x += Ni(xi,eta,i)*xv
y += Ni(xi,eta,i)*yv
return [x,y]
#mapping (xi, eta) -> (x, y)
xy = mapping(xi,eta,physical_vertices)
print("x and y")
print(factor(xy[0]))
print(factor(xy[1]))
#Jacobian
jac = []
jac.append([xy[0].diff(xi),xy[0].diff(eta)])
jac.append([xy[1].diff(xi),xy[1].diff(eta)])
print("Jacobian Matrix")
print(factor(jac))
#The determinant of Jacobian
det_jac = jac[0][0]*jac[1][1]-jac[0][1]*jac[1][0]
print(factor(det_jac))
#Plot
plot3d_parametric_surface(xy[0], xy[1], det_jac,(xi,-1,1),(eta,-1,1))
det_jac.subs([(xi,1),(eta,-1)])
#test
u = symbols('u')
plot(u**2,(u,-1,1))
plot_parametric(cos(u),sin(u),(u,-5,5))

I have been able to reproduce your problem with matplotlib 2.0.2, sympy 1.0 and python 3.4.6. However using matplotlib 2.0.2, sympy 1.0 and python 3.5.3 works just fine. Note that I am using different computers, but fresh virtual environments every time. So there should be no other issues here. I suggest upgrading to python 3.5.x.
In the future please provide a "minimal" working example which reproduces your error, for example:
import sympy as sym
u = sym.symbols('u')
sym.plotting.plot(sym.sin(u), (u,-5,5))
EDIT: There was a difference between the 2 computers: one used the qt4agg backend (did not work), the other used tkagg (does work). So there seems to be a problem regarding which backend you use with sympy and matplotlib.

TypeError: 'numpy.ndarray' object is not callable in scipy.optimize leastsq

I'm new to Python and, for work reason, I'm trying to write a Python code capable to read three files containing float (x,y) data (let's say x1,y1; x2,y2; x3,y3) and combine two of the arrays (y1 and y2) with a linear combination to approach the third (y3) as closely as possible. Moreover, x1 and x2 are identical, whereas x3 is different, so I interpolate x3 and y3 using x1. I'm working on Idle on Mac OSX, and Python 2.7.
Here's my code:
import numpy as np
import matplotlib.pyplot as plt
import Tkinter as tk
import tkFileDialog
from scipy.optimize import leastsq
root1 = tk.Tk()
root1.geometry() #window centered on desktop?
root1.withdraw() #the main app window doesn't remain in the background
filename1 = tkFileDialog.askopenfilename(parent=root1, title="Ouvrir le spectre n° 1",
filetypes=[('dat files', '.dat'), ('all files', '.*')],
initialdir="/Users//Science/Manips/2011_10_05_Nb_W/")
filename2 = tkFileDialog.askopenfilename(parent=root1,title="Ouvrir le spectre n° 2",
filetypes=[('dat files', '.dat'), ('all files', '.*')],
initialdir="/Users/Science/Manips/2011_10_05_Nb_W/")
filenameexp = tkFileDialog.askopenfilename(parent=root1, title="Ouvrir le spectre exp",
filetypes=[('txt files', '.txt'), ('all files', '.*')],
initialdir="/Users/Science/Manips/2011_10_05_Nb_W/spectres_exp")
print 'Fichiers choisis = '
print filename1
print filename2
print filenameexp
energy1, spectrum1 = np.loadtxt(filename1, delimiter=' ', usecols=(0, 1),
unpack=True, skiprows=0)
energy2, spectrum2 = np.loadtxt(filename2, delimiter=' ', usecols=(0, 1),
unpack=True, skiprows=0)
energyexp, spectrumexp = np.loadtxt(filenameexp, delimiter='\t', usecols=(0, 1),
unpack=True, skiprows=0)
#Interpolating experimental energy grid on theoretical energy grid
sp_exp_int = np.interp(energy1, energyexp, spectrumexp)
#guess contains the first guess of the parameters
guess=[1.0,1.0]
spec_theo = guess[0] * spectrum1 + guess[1] * spectrum2
# ErrorFunc is the difference between the "fit" and the y experimental data
ErrorFunc = spec_theo - sp_exp_int
# leastsq finds the set of parameters in the tuple tpl that minimizes
# ErrorFunc=yfit-yExperimental
tplFinal, success = leastsq(ErrorFunc, guess[:], args=(energy1, sp_exp_int))
print "best choice = ", tplFinal
fig, ax1 = plt.subplots()
theory = ax1.plot(energy1, spec_theo, 'b-', label='Theory')
ax1.set_xlabel('Energy (eV)')
# Make the y-axis label and match the line color.
ax1.set_ylabel('Theory', color='b')
ax2 = ax1.twinx()
experiment = ax2.plot(energy1, sp_exp_int, 'r-', label='Experiment')
ax2.set_ylabel('Experiment', color='r', rotation=-90, labelpad=15)
#one legend for all axes
lns = theory + experiment
labs = [l.get_label() for l in lns]
ax1.legend(lns, labs, loc=0)
plt.show()
When I try to run the code I get:
Traceback (most recent call last):
File "/Users/Science/Manips/2011_05_Nb_W/Mars2016/comblin_leastsquares.py", line 79, in <module>
tplFinal, success = leastsq(ErrorFunc, guess[:], args=(energy1, sp_exp_int))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site- packages/scipy/optimize/minpack.py", line 377, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 26, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
TypeError: 'numpy.ndarray' object is not callable
I understand that something is wrong with my leastsq usage, but I really can't figure out what it could be, my knowledge of Python is clearly insufficient.
Can someone help me ?

The error clearly states what's wrong: you are passing an array instead of a function/callable. In fact the leastsq documentation states that the first argument should be a callable.
You are passing ErrorFunc as first argument but this is not a function nor a callable. It's an array. (It may represent a function, but it isn't in the format required by leastsq).
So you have to follow the description for the argument:
should take at least one (possibly length N vector) argument and
returns M floating point numbers. It must not return NaNs or fitting
might fail.
So replace ErrorFunc with a callable that given the input returns the error as floats. Basically you should have:
def error_func(input):
return input - data
Where data is your experimental data andinput is the value of the fitting that scipy is doing. It needs a callable because it will perform more iterations and for each iteration it has to compute the error in order to fit the data.
Obviously change error_func to match what you are doing, that is only to give the idea of what is expected by leastsq.

Just in case it could help other people, I've made the code work with:
<same code as in my question>
#guess contains the first guess of the parameters
guess=[1.0,1.0]
# error_func is the difference between the "fit" and the y experimental data
#error_func = spec_theo - sp_exp_int
def error_func(coeff, sp1, sp2, y):
return (coeff[0] * sp1 + coeff[1] * sp2) - y
# leastsq finds the set of parameters in the tuple tpl that minimizes
# error_func=yfit-yExperimental
guess_fin, success = leastsq(error_func, guess[:], args=(spectrum1, \
spectrum2, sp_exp_int))
print 'best choice = ', guess_fin
print 'success = ', success
spec_theo = guess_fin[0] * spectrum1 + guess_fin[1] * spectrum2
fig, ax1 = plt.subplots()
...
Thanks Bakuriu!

python ignore empty files

We prepare a following python scripts (python 2.7) to make histograms.
histogram.py
#!/usr/bin/env python
import sys
import numpy as np
import matplotlib as mpl
import matplotlib.mlab as mlab
mpl.use('Agg')
import matplotlib.pyplot as plt
sys.argv[1] # Define input name
sys.argv[2] # Define output name
sys.argv[3] # Define title
# Open the file name called "input_file"
input_file=sys.argv[1]
inp = open (input_file,"r")
lines = inp.readlines()
if len(lines) >= 20:
x = []
#numpoints = []
for line in lines:
# if int(line) > -10000: # Activate this line if you would like to filter any date (filter out values smaller than -10000 here)
x.append(float(line))
# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=False, facecolor='gray')
plt.xlabel('Differences')
numpoints = len(lines)
plt.ylabel('Frequency ( n =' + str(numpoints) + ' ) ' )
title=sys.argv[3]
plt.title(title)
plt.grid(True)
save_file=sys.argv[2]
plt.savefig(save_file+".png")
plt.clf()
inp.close()
example: input
1
2
3
The script will do the following
python histogram.py input ${output_file_name}.png ${title_name}
We add a line "if len(lines) >= 20:" so if the data points are less than 20, we don't make a plot.
However, if the file is empty, this python script will be freeze.
We add a bash line to remove any empty files before running "python histogram.py input ${output_file_name}.png ${title_name}"
find . -size 0 -delete
For some reasons, this line always works in small scale testings but not in real production runs under several loops. So we would love to make the "histogram.py" ignore any empty files if possible.
The search only finds this link which doesn't seem to be quite helpful : (
Ignoring empty files from coverage report
Could anyone kindly offer some comments? Thanks!

Check if the input_file file is empty os.path.getsize(input_file) > 0
os.path.getsize
You will need the full path which I presume you will have and it will raise an error if the file does not exist or is inaccessible so you may want to handle those cases.
This code works, ignoring empty files:
#!/usr/bin/env python
import sys
import numpy as np
import matplotlib as mpl
import matplotlib.mlab as mlab
import os
mpl.use('Agg')
import matplotlib.pyplot as plt
sys.argv[1] # Define input name
sys.argv[2] # Define output name
sys.argv[3] # Define title
input_file=sys.argv[1]
# Open the file name called "input_file"
if os.path.getsize(input_file) > 0:
inp = open (input_file,"r")
lines = inp.readlines()
if len(lines) >= 20:
x = []
#numpoints = []
for line in lines:
# if int(line) > -10000: # Activate this line if you would like to filter any date (filter out values smaller than -10000 here)
x.append(float(line))
# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=False, facecolor='gray')
plt.xlabel('Differences')
numpoints = len(lines)
plt.ylabel('Frequency ( n =' + str(numpoints) + ' ) ' )
title=sys.argv[3]
plt.title(title)
plt.grid(True)
save_file=sys.argv[2]
plt.savefig(save_file+".png")
plt.clf()
inp.close()
else:
print "Empty file"
~$ python test.py empty.txt foo bar
Empty file

Check if the file exists + is not empty before hand.
import os
def emptyfile(filepath):
return ((os.path.isfile(filepath) > 0) and (os.path.getsize(filepath) > 0))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas error with basemap/proj for map plotting - python

Related

Run traceback errors and missing documentation

NameError: global name 'balanceAr' is not defined

What is the right version of matplotlib for sympy 1.0?

TypeError: 'numpy.ndarray' object is not callable in scipy.optimize leastsq

python ignore empty files

Categories

Resources