I am using Python 2.7 and need to draw a time series using matplotlib library. My y axis data is numeric and everything is ok with it.
The problem is my x axis data which is not numeric, and matplotlib does not cooperate in this case. It does not draw me a time series even though it is not supposed to affect the correctness of the plot, because the x axis data is arranged by a given order anyway and it's order does not affect anything logically.
For example let's say the x data is ["i","like","python"] and the y axis data is [1,2,3].
I did not add my code because I've found that the code is ok, it works if I change the data to all numeric data.
Please explain me how can I use matplotlib to draw the time series, without making me to convert the x values to numeric stuff.
I've based my matplotlib code on following answers: How to plot Time Series using matplotlib Python, Time Series Plot Python.
Matplotlib requires someway of positioning those labels. See the following example:
import matplotlib.pyplot as plt
x = ["i","like","python"]
y = [1,2,3]
plt.plot(y,y) # y,y because both are numeric (you could create an xt = [1,2,3]
plt.xticks(y,x) # same here, the second argument are the labels.
plt.show()
, that results in this:
Notice how I've put the labels there but had to somehow say where they are supposed to be.
I also think you should put a part of your code so that it's easier for other people to suggest upon.
Related
I'm new to Python so I hope you'll forgive my silly questions. I have read a dataset from excel with pandas. The dataset is composed by 3 functions (U22, U35, U55) and related same index (called y/75). enter image description here
now I would like to "turn" the graph so that the index "y/75" goes on the y-axis instead of the x-axis, keeping all the functions in the same graph. The results I want to obtain is like in the following picture enter image description here
the code I've used is
var = pd.read_excel('path.xlsx','SummarySheet', index_col=0)
norm_vel=var[['U22',"U35","U55"]]
norm_vel.plot(figsize=(10,10), grid='true')
But with this code I couldn't find a way to change the axes. Then I tried a different approach, so I turned the graph but couldn't add all the functions in the same graph but just one by one
var = pd.read_excel('path.xlsx','SummarySheet', index_col=False)
norm_vel2=var[['y/75','U22',"U35","U55"]]
norm_vel2.plot( x='U22', y='y/75', figsize=(10,10), grid='true' )
plt.title("Velocity profiles")
plt.xlabel("Normalized velocity")
plt.ylabel("y/75")
obtaining this enter image description here
I am not very familiar with dataframes plot. And to be honest, I've been stalking this question expecting that someone would give an obvious answer. But since no one has one (1 hour old questions, is already late for obvious answers), I can at least tell you how I would do it, without the plot method of the dataframe
plt.figure(figsize=(10,10))
plt.grid(True)
plt.plot(var[['U22',"U35","U55"]], var['y/75'])
plt.title("Velocity profiles")
plt.xlabel("Normalized velocity")
plt.ylabel("y/75")
When used to matplotlib, in which, you can have multiple series in both x and y, the instinct says that pandas connections (which are just useful functions to call matplotlib with the correct parameters), should make it possible to just call
var.plot(x=['U22', 'U35', 'U55'], y='y/75')
Since after all,
var.plot(x='y/75', y=['U22', 'U35', 'U55'])
works as expected (3 lines: U22 vs y/75, U35 vs y/75, U55 vs y/75). So the first one should have also worked (3 lines, y/75 vs U22, y/75 vs U35, y/75 vs U55). But it doesn't. Probably the reason why pandas documentation itself says that these matplotlib connections are still a work in progress.
So, all you've to do is call matplotlib function yourself. After all, it is not like pandas is doing much more when calling those .plot method anyway.
I am using bqplot to create a live line graph on jupyter-notebook + VOILA
from bqplot import pyplot as plt2
import datetime
x_values = [] #array of datetimes
y_values = [] #array of 10+ digit numbers
plt2.show()
def functionThatIsCalledRepeatedly(x_val, y_val):
x_values.append(x_val)
y_values.append(y_val)
plt2.plot(x_values, y_values)
Part of the Resulting Plot
My question is, how do I remove the scientific notation from the y-axis. It's a simple task but I have tried a lot of things.
I tried using axes.tick_format property of the graph but I think that only works if you have axes objects which I cannot have because they require the mandatory Scale property which I cannot use because the graph is live and the x and y scales need to be generated/recalibrated while it runs.
I tried changing y_values.append(y_val) to y_values.append("{:.2f}".format(y_val)) but that converts to a string and bqplot doesn't process it as a number so it ends up with negative numbers on top of the 0 sometimes.
I tried converting to a numpy array and then doing np.set_printoptions(suppress=True) which (obviously) didn't work.
Basically tried a lot of things and I think it comes down to some bqplot property that may or may not exist. Have been stuck for a while. Thank you!
You can provide axes options with the tick format you want to the plot method:
plt2.plot(x_values, y_values, axes_options={
y=dict(tick_format='0.2f')
})
You can see examples of this axes_options (using a scatter plot, but that should work the same) in this notebook: https://github.com/bqplot/bqplot/blob/master/examples/Marks/Pyplot/Scatter.ipynb
is there a way to obtain specific coordinates from a plot even when they aren't used in the plotting process?
for example: can i extract the value at x=0.5 from the plot below? (just an easy example, want to use it for more complicated ones too)
import matplotlib.pyplot as plt
x=[0,1]
y=[1,2]
plt.plot(x,y)
Theoretically you can do something like this, but since the equation itself isn't something you can extract you're limited in precision.
import seaborn as sns
x=[0,1]
y=[1,2]
p = sns.regplot(x=x,y=y, ci=None)
line = dict(zip(p.get_lines()[0].get_xdata().round(1),p.get_lines()[0].get_ydata().round(1)))
print(line[0.5])
Output
1.5
No. This is not a limitation of Python. If your graph conforms precisely to a function, as in your example, you can obviously use the function to interpolate. If it does not, it wouldn't make sense to read intermediate data from your plot. For instance, you shouldn't try to read the y value for x = 4 from the graph below. In these cases, you need to resort to linear or non-linear curve fitting as suggested by the other responders.
I am coming from a background of C# programming to learn python, and trying to wrap my head around how things work.
There seems to be a lot of "magic" in getting the result you want in python.
For example, take the following code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
returns = pd.DataFrame(np.random.normal(1.0, 0.03, (100, 10)))
prices = returns.cumprod()
prices.plot()
plt.title('Randomly-generated Prices')
plt.xlabel('Time')
plt.ylabel('Price')
plt.legend(loc=0);
This produces a nifty line plot, very nice.
So we pass the output of numpy.random.normal(...) into pandas.DataFrame(...) to get, according to the docostring, the following assigned to the variable returns:
Two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns)
Then somehow that data structure contains a method returns.cumprod() which is called to get, according to the docstring, the following assigned to the variable prices:
Return cumulative cumprod over requested axis.
The next four lines of code, (how good would it be if notebook had line numbers), are calling methods of the matplotlib.pyplot lib.
So my question is:
As these are completely separate objects in separate namespaces, how does the matplotlib.pyplot know anything about what pandas.DataFrame(...).comprod(...).plot() just did.
Like in C# one object in one namespace knows nothing about another object in a separate namespace that it hasn't been explicitly told, via input methods.
So just trying to get my head around how this all works, so I can build things with a good design, instead of just copying and pasting stuff I google to find that look like they get the output I want.
Like how does scope work with the above code. Lets say I do two plots, how does it know which one to assign the title to?
Thanks for your time/patience,
Kind Regards.
When you run for example plt.legend() it uses the function plt.gca() (get current axis) and since the pandas plot was the last axis plotted it knows where to put the legend. If you do, for example:
pd.DataFrame({'x': [1,2], 'y': [2,3]}).plot.line()
pd.DataFrame({'x': [1,2], 'y': [2,43]}).plot.line()
plt.legend([1])
It will plot the legend in the second axis since it was the current axis.
Consider the following code
import matplotlib.pyplot as plt
import numpy as np
time=np.arange(-100,100,01)
val =np.sin(time/10.)
time=-1.0*time
plt.figure()
plt.plot(time,val)
plt.xlim([70,-70])
plt.savefig('test.pdf')
when I open the pdf in inkscape, I can select (with F2) the entire data, it's just invisible outside of the specified xlim interval:
The problem seems to be the line
time=-1.0*time
If I omit this line, everything works perfectly.. no idea why this is. I often need such transformations because I deal with paleo-climate data which are sometimes given in year B.C. and year A.D., respectively.
The problem I see with this behavior is that someone could in principle get the data outside the range which I want to show.
Has someone a clue how to solve this problem (except for an slice of the arrays before plotting)?
I use matplotlib 1.1.1rc2
You can mask your array when plotting according to the limits you choose. Yes, this also requires changes to the code, but maybe not as extensive as you might fear. Here's an updated version of your example:
import matplotlib.pyplot as plt
import numpy as np
time=np.arange(-100,100,01)
val =np.sin(time/10.)
time=-1.0*time
plt.figure()
# store the x-limites in variables for easy multi-use
XMIN = -70.0
XMAX = 70.0
plt.plot(np.ma.masked_outside(time,XMIN,XMAX),val)
plt.xlim([XMIN,XMAX])
plt.savefig('test.pdf')
The key change is using np.ma.masked_outside for your x-axis value (note: the order of XMIN and XMAX in the mask-command is not important).
That way, you don't have to change the array time if you wanted to use other parts of it later on.
When I checked with inkscape, no data outside of the plot was highlighted.