Plotting data from csv using matplotlib.pyplot

Plotting data from csv using matplotlib.pyplot - python

I am trying to follow a tutorial on youtube, now in the tutorial they plot some standard text files using matplotlib.pyplot, I can achieve this easy enough, however I am now trying to perform the same thing using some csvs I have of real data.
The code I am using is import matplotlib.pyplot as plt
import csv
#import numpy as np
with open(r"Example RFI regression axis\Delta RFI.csv") as x, open(r"Example RFI regression axis\strikerate.csv") as y:
readx = csv.reader(x)
ready = csv.reader(y)
plt.plot(readx,ready)
plt.title ('Test graph')
plt.xlabel('x axis')
plt.ylabel('y axis')
plt.show()
The traceback I receive is long
Traceback (most recent call last):
File "C:\V4 code snippets\matplotlib_test.py", line 11, in <module>
plt.plot(readx,ready)
File "C:\Python27\lib\site-packages\matplotlib\pyplot.py", line 2832, in plot
ret = ax.plot(*args, **kwargs)
File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 3997, in plot
self.add_line(line)
File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 1507, in add_line
self._update_line_limits(line)
File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 1516, in _update_line_limits
path = line.get_path()
File "C:\Python27\lib\site-packages\matplotlib\lines.py", line 677, in get_path
self.recache()
File "C:\Python27\lib\site-packages\matplotlib\lines.py", line 401, in recache
x = np.asarray(xconv, np.float_)
File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 320, in asarray
return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number
Please advise what I need to do, I realise this is probably very easy to most seasoned coders. Kind regards SMNALLY

csv.reader() returns strings (technically, .next()method of reader object returns lists of strings). Without converting them to float or int, you won't be able to plt.plot() them.
To save the trouble of converting, I suggest using genfromtxt() from numpy. (http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html)
For example, there are two files:
data1.csv:
data1
2
3
4
3
6
6
4
and data2.csv:
data2
92
73
64
53
16
26
74
Both of them have one line of header. We can do:
import numpy as np
data1=np.genfromtxt('data1.csv', skip_header=1) #suppose it is in the current working directory
data2=np.genfromtxt('data2.csv', skip_header=1)
plt.plot(data1, data2,'o-')
and the result:

Related

2D array error in python using scikitlearn package

i have used following code in my pycharm but i am constantly getting the error mentioned below:
import numpy as np
import seaborn as sns
from sklearn import linear_model
import matplotlib.pyplot as plt
df=pd.read_csv(r"C:\Users\gmcks\Downloads\Data samples\homeprices.csv")
df
https://docs.google.com/spreadsheets/d/1wxaadKAHTZtECv6gW6Mpreq3tFb2PWgVOhqANbWlIAk/edit?usp=sharing
x=df[["area"]]
y=df.price
reg=linear_model.LinearRegression()
reg.fit(x,y)
LinearRegression()
m=reg.coef_
c=reg.intercept_
print(m,c)
reg.predict(2000)
ERROR :
Traceback (most recent call last):
File "C:\Users\gmcks\PycharmProjects\using jupyter.py\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-30-b5b06b1b028e>", line 1, in <module>
reg.predict(2000)
File "C:\Users\gmcks\PycharmProjects\using jupyter.py\venv\lib\site-packages\sklearn\linear_model\_base.py", line 236, in predict
return self._decision_function(X)
File "C:\Users\gmcks\PycharmProjects\using jupyter.py\venv\lib\site-packages\sklearn\linear_model\_base.py", line 218, in _decision_function
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
File "C:\Users\gmcks\PycharmProjects\using jupyter.py\venv\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
File "C:\Users\gmcks\PycharmProjects\using jupyter.py\venv\lib\site-packages\sklearn\utils\validation.py", line 616, in check_array`enter code here`
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got scalar array instead:
array=2000.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Why do I have to shape my data again as I have already written the code as df[["area"]]? This piece of code converts the array into (5,1), so 2D array is created.

You need to provide the input that is the same shape as your predictor:
from sklearn import linear_model
import numpy as np
import pandas as pd
np.random.seed(111)
df = pd.DataFrame({'x' : np.random.uniform(0,1,100),
'y' : np.random.uniform(0,1,100)})
reg=linear_model.LinearRegression()
reg.fit(df[["x"]],df['y'])
You can do:
reg.predict([[2000]])

Scatterplot of pandas DataFrame ends in KeyError: 0

After I updated pandas (0.23.4) and matplotlib (3.01) I get a strange error trying to do something like the following:
import pandas as pd
import matplotlib.pyplot as plt
clrdict = {1: "#a6cee3", 2: "#1f78b4", 3: "#b2df8a", 4: "#33a02c"}
df_full = pd.DataFrame({'x':[20,30,30,40],
'y':[25,20,30,25],
's':[100,200,300,400],
'l':[1,2,3,4]})
df_full['c'] = df_full['l'].replace(clrdict)
df_part = df_full[(df_full.x == 30)]
fig = plt.figure()
plt.scatter(x=df_full['x'],
y=df_full['y'],
s=df_full['s'],
c=df_full['c'])
plt.show()
fig = plt.figure()
plt.scatter(x=df_part['x'],
y=df_part['y'],
s=df_part['s'],
c=df_part['c'])
plt.show()
The scatterplot of the original DataFrame (df_full) is shown without problems. But the plot of the partially DataFrame raises the following error:
Traceback (most recent call last):
File "G:\data\project\test.py", line 27, in <module>
c=df_part['c'])
File "C:\Program Files\Python37\lib\site-packages\matplotlib\pyplot.py", line 2864, in scatter
is not None else {}), **kwargs)
File "C:\Program Files\Python37\lib\site-packages\matplotlib\__init__.py", line 1805, in inner
return func(ax, *args, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\matplotlib\axes\_axes.py", line 4195, in scatter
isinstance(c[0], str))):
File "C:\Program Files\Python37\lib\site-packages\pandas\core\series.py", line 767, in __getitem__
result = self.index.get_value(self, key)
File "C:\Program Files\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3118, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 114, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
This is due to the color-option c=df_part['c']. When you leave it out – the problem doesn't occur. This hasn't happend before the updates, so maybe you're not able to reproduce this with lower versions of matplotlib or pandas (I have no idea which one causes it).
In my project the df_part = df_full[(df_full.x == i)] line is used within the update-function of a matplotlib.animation.FuncAnimation. The result is an animation over the values of x (which are timestamps in my project). So I need a way to part the DataFrame.

This is a bug which got fixed by https://github.com/matplotlib/matplotlib/pull/12673.
It should hopefully be available in the next bugfix release 3.0.2, which should be up within the next days.
In the meantime, you may use the numpy array from the pandas series, series.values.

Visualizing graph in Python using NetworkX

I'm trying to visualize graph in NetworkX. I need to colorize the graph like this: center node needs to be colored dark. Then, all nodes that are further away will need to be colored lighter, but when i run the code i get this error :
error: Cannot convert argument type < class 'numpy.ndarray' > to rgba array on the line :
nx.draw_networkx_nodes(G,pos,nodelist=p.keys(),node_size=90,
node_color=p.values(),cmap=plt.cm.Reds_r)
I think the problem is in:
node_color=p.values()
The code is:
import numpy
import pandas
import networkx as nx
import unicodecsv as csv
import community
import matplotlib.pyplot as plt
# Generate the Graph
G=nx.davis_southern_women_graph()
# Create a Spring Layout
pos=nx.spring_layout(G)
# Find the center Node
dmin=1
ncenter=0
for n in pos:
x,y=pos[n]
d=(x-0.5)**2+(y-0.5)**2
if d<dmin:
ncenter=n
dmin=d
""" returns a dictionary of nodes and their distance to the node
supplied as an argument. We will then use these distances
to determine colors"""
p=nx.single_source_shortest_path_length(G,ncenter)
plt.figure(figsize=(8,8))
nx.draw_networkx_edges(G,pos,nodelist=[ncenter],alpha=0.4)
nx.draw_networkx_nodes(G,pos,nodelist=p.keys(),node_size=90,
node_color=p.values(),cmap=plt.cm.Reds_r)
plt.show()
Full Traceback
Traceback (most recent call last):
File "<ipython-input-4-da1414ba5e14>", line 1, in <module>
runfile('C:/Users/Desktop/Marvel/finding_key_players.py', wdir='C:/Users/Desktop/Marvel')
File "C:\Users\Anaconda33\lib\site- packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "C:\Users\Anaconda33\lib\site packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/Desktop/Marvel/finding_key_players.py", line 70, in
<module>
cmap=plt.cm.Reds_r)
File "C:\Users\Anaconda33\lib\site-packages\networkx\drawing\nx_pylab.py", line 399, in draw_networkx_nodes
label=label)
File "C:\Users\Anaconda33\lib\site-packages\matplotlib\axes\_axes.py", line 3606, in scatter
colors = mcolors.colorConverter.to_rgba_array(c, alpha)
File "C:\Users\Anaconda33\lib\site-packages\matplotlib\colors.py", line 391, in to_rgba_array
if alpha > 1 or alpha < 0:
ValueError: Cannot convert argument type <class 'numpy.ndarray'> to rgba array

The error is in the function for drawing nodes.
p.keys()values must be put in a list for nodelist, and node_color, otherwise it's not working.
So the correct line is:
nx.draw_networkx_nodes(G,pos,nodelist=list(p.keys()),node_size=80,node_color=list(p.values()), cmap=plt.cm.Reds_r)
plt.axis('off')
plt.show()

Having Issues with an AssertionError when trying to use the psd() command in matplotlib

I'm trying to write a short script that takes a .csv file with some distance data, and outputs the psd file for it. the code is here:
import math
import matplotlib.pyplot as plt
name = raw_input('File:')
data = open(name + '.csv', 'r')
distances = []
for row in data:
distances.append(row.replace("\n",""))
for i in range(len(distances)):
distances[i] = float(distances[i])
Pxx, freqs = plt.psd(distances, NFFT=16,Fs=2,detrend='detrend_mean',window='window_none',noverlap=128,sides='onesided',scale_by_freq=True)
plot(Pxx,freqs)
plt.savefig(name + 'psd.png', bbox_inches = 'tight')
As you can see, it's pretty simple. the csv file just features one column of numbers, so distances is a vector.
The error I'm getting is as follows:
Traceback (most recent call last):
File "C:psdplot.py", line 15, in <module>
Pxx, freqs = plt.psd(distances, NFFT=16,Fs=2,detrend='detrend_mean',window='window_none',noverlap=128,sides='onesided',scale_by_freq=True)
File "C:\Python27\lib\site-packages\matplotlib\pyplot.py", line 3029, in psd
sides=sides, scale_by_freq=scale_by_freq, **kwargs)
File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 8696, in psd
sides, scale_by_freq)
File "C:\Python27\lib\site-packages\matplotlib\mlab.py", line 389, in psd
scale_by_freq)
File "C:\Python27\lib\site-packages\matplotlib\mlab.py", line 423, in csd
noverlap, pad_to, sides, scale_by_freq)
File "C:\Python27\lib\site-packages\matplotlib\mlab.py", line 251, in _spectral_helper
assert(len(window) == NFFT)
AssertionError
Could someone direct me on how to fix this? I'm sure it's rather obvious, but I haven't been able to find anything on fixing it in this particular context.
Thanks in advance!

date2num , ValueError: ordinal must be >= 1

I'm using the matplotlib candlestick module which requires the time to be passed as a float day format . I`m using date2num to convert it, before :
This is my code :
import csv
import sys
import math
import numpy as np
import datetime
from optparse import OptionParser
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
import matplotlib.mlab as mlab
import matplotlib.dates as mdates
from matplotlib.finance import candlestick
from matplotlib.dates import date2num
datafile = 'historical_data/AUD_Q10_1D_500.csv'
print 'loading', datafile
r = mlab.csv2rec(datafile, delimiter=';')
quotes = [date2num(r['date']),r['open'],r['close'],r['max'],r['min']]
candlestick(ax, quotes, width=0.6)
plt.show()
( here is the csv file : http://db.tt/MIOqFA0 )
This is what the doc says :
candlestick(ax, quotes,
width=0.20000000000000001,
colorup='k', colordown='r', alpha=1.0)
quotes is a list of (time, open,
close, high, low, ...) tuples. As
long as the first 5 elements of the
tuples are these values, the tuple
can be as long as you want (eg it may
store volume).
time must be in float days format - see date2num
Here is the full error log :
Traceback (most recent call last):
File
"/usr/lib/python2.6/site-packages/matplotlib/backends/backend_qt4agg.py",
line 83, in paintEvent
FigureCanvasAgg.draw(self) File
"/usr/lib/python2.6/site-packages/matplotlib/backends/backend_agg.py",
line 394, in draw
self.figure.draw(self.renderer) File
"/usr/lib/python2.6/site-packages/matplotlib/artist.py",
line 55, in draw_wrapper draw(artist,
renderer, *args, **kwargs) File
"/usr/lib/python2.6/site-packages/matplotlib/figure.py",
line 798, in draw func(*args) File
"/usr/lib/python2.6/site-packages/matplotlib/artist.py",
line 55, in draw_wrapper draw(artist,
renderer, *args, **kwargs) File
"/usr/lib/python2.6/site-packages/matplotlib/axes.py", line 1946, in draw a.draw(renderer)
File
"/usr/lib/python2.6/site-packages/matplotlib/artist.py",
line 55, in draw_wrapper draw(artist,
renderer, *args, **kwargs) File
"/usr/lib/python2.6/site-packages/matplotlib/axis.py", line 971, in draw tick_tups = [ t for
t in self.iter_ticks()] File
"/usr/lib/python2.6/site-packages/matplotlib/axis.py", line 904, in iter_ticks majorLocs =
self.major.locator() File
"/usr/lib/python2.6/site-packages/matplotlib/dates.py",
line 743, in __call__ self.refresh()
File
"/usr/lib/python2.6/site-packages/matplotlib/dates.py",
line 752, in refresh dmin, dmax =
self.viewlim_to_dt() File
"/usr/lib/python2.6/site-packages/matplotlib/dates.py",
line 524, in viewlim_to_dt return
num2date(vmin, self.tz),
num2date(vmax, self.tz) File
"/usr/lib/python2.6/site-packages/matplotlib/dates.py",
line 289, in num2date if not
cbook.iterable(x): return
_from_ordinalf(x, tz) File "/usr/lib/python2.6/site-packages/matplotlib/dates.py",
line 203, in _from_ordinalf dt =
datetime.datetime.fromordinal(ix)
ValueError: ordinal must be >= 1
If I run a quick :
for x in r['date']:
print str(x) + "is :" + str(date2num(x))
it outputs something like :
2010-06-12is :733935.0
2010-07-12is :733965.0
2010-08-12is :733996.0
which sound ok to me :)

Read the docstring a bit more carefully :)
quotes is a list of (time, open, close, high, low, ...) tuples.
What's happening is that it expects each item of quotes to be a sequence of (time, open, close, high, low).
You're passing in 5 long arrays, it expects a long sequence of 5 items.
You just need to zip your input.
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
from matplotlib.finance import candlestick
from matplotlib.dates import date2num
datafile = 'Downloads/AUD_Q10_1D_500.csv'
r = mlab.csv2rec(datafile, delimiter=';')
quotes = zip(date2num(r['date']),r['open'],r['close'],r['max'],r['min'])
fig, ax = plt.subplots()
candlestick(ax, quotes, width=0.6)
plt.show()

Seems like you're passing it a float. And in the error message you provide (full message next time please!) it appears that matplotlib is simply delegating the conversion to datetime.datetime.fromordinal.
I don't have a Python 3 installation to test this with, but when I tried to convert a float to a datetime object using datetime.datetime.fromordinal in 2.6, I got a deprecation warning. Then I tried it on ideone and got this:
Traceback (most recent call last):
File "prog.py", line 2, in <module>
print(datetime.datetime.fromordinal(5.5))
TypeError: integer argument expected, got float
So perhaps it's choking on the float.

I think your problem is here:
r = mlab.csv2rec(datafile, delimiter=';')
You need to skip the first line of the csv, which means you need:
r = mlab.csv2rec(datafile, delimiter=';', skiprows=1)
Technically this is incorrect, Ubuntu has an older version of the library, and the OP's version has the two lines below, but it was my original answer
I would make sure you're using the most recent version of matplotlib.
So that I could reproduce this issue, I downloaded and installed the latest version and I noticed that the line number of the offending piece of code had been changed to 179. I also noticed that the value is cast to int immediately before fromordinal is called (this gives a lot of credence to senderle's answer).
(line 178-179 of most recent matplotlib in Ubuntu repository)
ix = int(x)
dt = datetime.datetime.fromordinal(ix)
If upgrading is not an option, then you should cast to an int first.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting data from csv using matplotlib.pyplot - python

Related

2D array error in python using scikitlearn package

Scatterplot of pandas DataFrame ends in KeyError: 0

Visualizing graph in Python using NetworkX

Having Issues with an AssertionError when trying to use the psd() command in matplotlib

date2num , ValueError: ordinal must be >= 1

Categories

Resources