I have a simple figure which I have used matplotlib.plot function in order to plot it.
For example:
Is there a way to extract the data points and paste it (like in matlab) to excel sheet which I didn't know about?
I want to assume that many figures were created randomly and I didn't know which data/figure I needed until I see the results.
To extract the data-points, you can assign a variable to your plot:
graph = plt.plot(your_data)
data_points = graph[0].get_data()
Example that extracts the data-points of a line plot and saves them to a csv file:
In[1]: import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1, 1, 5)
y = 2*x + 1
xy = plt.plot(x, y)
data = xy[0].get_data()
print(data)
plt.show()
with open('data.csv', 'w') as myfile:
writer = csv.writer(myfile)
writer.writerow(['x', 'y'])
for i in range(len(data[0])):
writer.writerow([data[0][i], data[1][i]])
Out[1]: (array([-1. , -0.5, 0. , 0.5, 1. ]), array([-1., 0., 1., 2., 3.]))
Related
This seems like a very simple thing but I canĀ“t make it. I have panda frame like this http://prntscr.com/ko8lyd and I now want to plot one column on X-axis and another column on Y-axis. Here is what i try
import matplotlib.pyplot as plt
x = ATR_7
y = Vysledek
plt.scatter(x,y)
plt.show()
the is the error i am getting
<ipython-input-116-5ead5868ec87> in <module>()
1 import matplotlib.pyplot as plt
----> 2 x = ATR_7
3 y = Vysledek
4 plt.scatter(x,y)
5 plt.show()
where am I going wrong?
You just need:
df.plot.scatter('ATR_7','Vysledek')
Where df is the name of your dataframe. There's no need to use matplotlib.
You are trying to use undefined variables. ATR_7 is a name of a column inside your dataframe, it is not known to the rest of the world.
Try something like:
df.plot.scatter(x='ATR_7', y='Vysledek')
assuming your dataframe name is df
If you want to use matplotlib then you need to make your x and y values a list then pass to plt.scatter
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
%matplotlib inline
x = list(df['ATR_7']) # set x axis by creating a list
y = list(df['Vysledek']) # set y axis by creating a list
plt.scatter(x,y)
It seems there were two issues in your code. First, the names of the columns were not in quotes, so python has no way of knowing those are strings (column names are strings). Second, the easiest way to plot variables using pandas is to use pandas functions. You are trying to plot a scatter plot using matplotlib (that takes as input an array, not just a column name).
First, let's load modules and create the data
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
d = {'ATR_7' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
'Vysledek' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
Then, you can either use pandas plotting as in
x = 'ATR_7'
y = 'Vysledek'
df.plot.scatter(x,y)
Or plain-old matplotlib plotting as in
x = df['ATR_7']
y = df['Vysledek']
plt.scatter(x,y)
Scatter does not know which data to use. You need to provide it with the data.
x = "ATR_7"
y = "Vysledek"
plt.scatter(x,y, data=df)
under the assumption that df is your dataframe and has columns named "ATR_7" and "Vysledek".
I created a csv-file (with pandas and the help of a friend) like the one in the picture.
Now I need to plot this file.
The first column is the time and should be used for x data. The rest is y data.
For the legend I just want the first row to be used for the labels, like "T_HS_Netz_03" for the second column.
Could not figure out how to do this.
My first attempt:
csv_data = pd.read_csv('file', header=[0, 1], delimiter=';')
ax = csv_data.plot(legend=True)
plt.legend(bbox_to_anchor=(0., 1.0, 1.0, 0.), loc=3, ncol=2, mode="expand")
plt.show()
But this includes the second row in the labels too and the x ticks does not match the data (0.9 - 3.2).
Second attempt:
csv_data = pd.read_csv('file', header=[0, 1], delimiter=';')
x =csv_data.iloc[1:, [0]]
y = csv_data.iloc[1:, 1:]
plt.legend()
plt.plot(x, y)
This does not show any labels
The resulting plot should be something like
Thanks
You have to open your cvs file with numpy from example. Then, you can plot columns :
import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt("file", delimiter="", dtype=[("time", "f4"), ("column1", "f8"), ("column2", "f8")])
figure_1 = plt.plot(data['time'], data['column1'])
figure_2 = plt.plot(data['time'], data['column2'])
plt.legend(loc='upper right')
plt.xlabel('data')
plt.ylabel('time')
plt.show()
You should get the good result ;)
I am trying to plot a CCDF using numpy and input is csv with #keywords as col[0] and frequency as col[1].
Input
#Car,45
#photo,4
#movie,6
#life,1
Input has more than 10K rows and two column out of which col[0] is not used at all and only the frequency from col[1] is used to plot the CCDF. The data has no empty rows in-between nor eof has any blank row.
Code:
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator
data = np.genfromtxt('input.csv', delimiter=",")
d0=data[:,1]
X0 = np.sort(d0)
cdf0 = np.arange(len(X0))/float(len(X0))
#cumulative = np.cumsum(data)
ccdf0 = 1 - cdf0
plt.plot(X0,ccdf0, color='b', marker='.', label='Frequency')
plt.legend(loc='upper right')
plt.xlabel('Freq (x)')
plt.ylabel('ccdf(x)')
plt.gca().set_xscale("log")
#plt.gca().set_yscale("log")
plt.show()
Error
Traceback (most recent call last):
File "00_plot_ccdf.py", line 17, in <module>
d0=data[:,1]
IndexError: too many indices for array
Thanks in Advance
genfromtxt by default treats lines starting with # as comments, so actually your data is empty:
In [1]: genfromtxt('test.csv', delimiter=',')
/usr/lib/python3/dist-packages/numpy/lib/npyio.py:1385: UserWarning: genfromtxt: Empty input file: "test.csv"
warnings.warn('genfromtxt: Empty input file: "%s"' % fname)
Out[1]: array([], dtype=float64)
data is a 1-dimensional empty array and so [:,1] is too many indices.
To disable this pass comments=None to genfromtxt:
In [20]: genfromtxt('test.csv', delimiter=',', comments=None)
Out[20]:
array([[ nan, 45.],
[ nan, 4.],
[ nan, 6.],
[ nan, 1.]])
Since you need only the 2. column, you can also limit the results to that directly:
In [21]: genfromtxt('test.csv', delimiter=',', comments=None, usecols=(1,))
Out[21]: array([ 45., 4., 6., 1.])
I have the code below that seems to run without issues until I try to plot it. A blank plot will show when asked to plot.
import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt('/home/oem/Documents/620157.csv', delimiter=',', skip_header=01, skip_footer=01, names=['x', 'y'])
plt.plot(data,'o-')
plt.show()
I'm not sure what your data looks like, but I believe you need to do something like this:
data = np.genfromtxt('/home/oem/Documents/620157.csv',
delimiter=',',
skip_header=1,
skip_footer=1)
name, x, y, a, b = zip(*data)
plt.plot(x, y, 'o-')
As per your comment, the data is currently an array containing tuples of the station name and the x and y data. Using zip with the * symbol assigns them back to individual variables which can then be used for plotting.
I'm very beginner at Python and matplotlib but trying to learn! I would like to use matplotlib to plot some simple data from a CSV containing dates with a frequency. The X axis containing dates and Y containing the frequency. Example data from CSV:
2011/12/15,5
2011/12/11,4
2011/12/19,2
I checked the "matplotlib.sf.net/examples" out but appears all the test data is downloaded from a http get. I would really appreciate if someone could guide me with some example code of how to read in (presumably using CSV reader) and display data in chart.
Thank you!!
Maybe you look for something like:
import csv
import datetime as dt
import matplotlib.pyplot as plt
arch = 'C:\\Python26\\programas\\test.csv'
data = csv.reader(open(arch))
data = [(dt.datetime.strptime(item, "%Y/%m/%d"), float(value)) for item, value in data]
data.sort()
[x, y] = zip(*data)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.grid(True)
fig.autofmt_xdate()
plt.show()
I've tried to keep my code as simple as possible and this is by no means elegant, but here you go:
import csv
import matplotlib.pyplot as plt
### Making test CSV file ###
data = [['2011/12/15,5'],['2011/12/11,4'],['2011/12/19,2'],['2011/12/16,3'],['2011/12/20,8'],['2011/12/14,4'],['2011/12/10,10'],['2011/12/9,7']]
with open('test.csv', 'wb') as f:
writer = csv.writer(f)
for i in data:
writer.writerow(i)
### Extract data from CSV ###
with open('test.csv', 'rb') as n:
reader = csv.reader(n)
dates = []
freq = []
for row in reader:
values = row[0].split(',')
dates.append(values[0])
freq.append(values[1])
### Do plot ###
false_x = [x for x in range(len(dates))]
plt.plot(false_x,freq, 'o-')
plt.xticks(range(len(dates)), (dates), rotation=45)
# plt.axis([xmin, xmax, ymin, ymax]) - sets axes limits on graph
plt.axis([-1, 8, 0, 11])
plt.show()
This makes: