I'm new to Python so I hope you'll forgive my silly questions. I have read a dataset from excel with pandas. The dataset is composed by 3 functions (U22, U35, U55) and related same index (called y/75). enter image description here
now I would like to "turn" the graph so that the index "y/75" goes on the y-axis instead of the x-axis, keeping all the functions in the same graph. The results I want to obtain is like in the following picture enter image description here
the code I've used is
var = pd.read_excel('path.xlsx','SummarySheet', index_col=0)
norm_vel=var[['U22',"U35","U55"]]
norm_vel.plot(figsize=(10,10), grid='true')
But with this code I couldn't find a way to change the axes. Then I tried a different approach, so I turned the graph but couldn't add all the functions in the same graph but just one by one
var = pd.read_excel('path.xlsx','SummarySheet', index_col=False)
norm_vel2=var[['y/75','U22',"U35","U55"]]
norm_vel2.plot( x='U22', y='y/75', figsize=(10,10), grid='true' )
plt.title("Velocity profiles")
plt.xlabel("Normalized velocity")
plt.ylabel("y/75")
obtaining this enter image description here
I am not very familiar with dataframes plot. And to be honest, I've been stalking this question expecting that someone would give an obvious answer. But since no one has one (1 hour old questions, is already late for obvious answers), I can at least tell you how I would do it, without the plot method of the dataframe
plt.figure(figsize=(10,10))
plt.grid(True)
plt.plot(var[['U22',"U35","U55"]], var['y/75'])
plt.title("Velocity profiles")
plt.xlabel("Normalized velocity")
plt.ylabel("y/75")
When used to matplotlib, in which, you can have multiple series in both x and y, the instinct says that pandas connections (which are just useful functions to call matplotlib with the correct parameters), should make it possible to just call
var.plot(x=['U22', 'U35', 'U55'], y='y/75')
Since after all,
var.plot(x='y/75', y=['U22', 'U35', 'U55'])
works as expected (3 lines: U22 vs y/75, U35 vs y/75, U55 vs y/75). So the first one should have also worked (3 lines, y/75 vs U22, y/75 vs U35, y/75 vs U55). But it doesn't. Probably the reason why pandas documentation itself says that these matplotlib connections are still a work in progress.
So, all you've to do is call matplotlib function yourself. After all, it is not like pandas is doing much more when calling those .plot method anyway.
Related
When I build a scatterplot of this data, you can see see that the one large value (462) is completely swamping even being able to see some of the other points.
Does anyone know of a specific way to normalize this data, so that the small dots can see be seen, while maintaining a link between the size of the dot and the value size. I'm thinking would either of these make sense:
(1) Set a minimum value for the size a dot can be
(2) Do some normalization of the data somehow, but I guess the large data point will always be 462 compared to some of the other points with a value of 1.
Just wondering how other people get around this, so they don't actually miss seeing some points on the plot that are actually there? Or I guess is the most obvious answer just don't scale the points by size, and then add a label to each point somehow with the size.
you can clip() https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.clip.html the values used for size param
full solution below
import pandas as pd
import numpy as np
import plotly.express as px
df = pd.DataFrame(
{"Class": np.linspace(-8, 4, 25), "Values": np.random.randint(1, 40, 25)}
).assign(Class=lambda d: "class_" + d["Class"].astype(str))
df.iloc[7, 1] = 462
px.scatter(df, x="Class", y="Values", size=df["Values"].clip(0, 50))
This isn't really a question linking to Python directly, but more to plotting styles. There are several ways to solve the issue in your case:
Split the data into equally sized categories and assign colorlabels. Your legend would look something like this in this case:
0 - 1: color 1
2 - 20: color 2
...
The way to implement this is to split your data into the sets you want and plotting seperate scatter plots each with a new color. See here or here for examples
The second option that is frequently used is to use the log of the value for the bubble size. You would just have to point that out quite clearly in your legend.
The third option is to limit marker size to an arbitrary value. I personally am not a bit fan of this method since it changes the information shown in a degree that the other alternatives don't, but if you add a data callout, this would still be legitimate.
These options should be fairly easy to implement in code. If you are having difficulties, feel free to post runnable sample code and we could implement an example as well.
I have a certain data set called "df" existing of 5000 rows and 32 columns. I want to plot 16 graphs by using a for loop. There are two problems that I cannot overcome:
The plot does not show when using this code:
proef_numbers = [1,2,3,4,5]
def plot_results(df, proef_numbers, title):
for proef in proef_numbers:
for test in range(1,2,3,4,5):
S_data = df[f"S_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]]
F_data = df[f"F_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]]-F0
plt.plot(S_data, F_data, label=f"Proef {proef} test {test}" )
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title(f"Proef {proef}, test {test}")
plt.legend()
plt.show()
After this I tried something else and restructured my data set and I wanted to use the following for loop:
for i in range(1,17):
plt.plot(df[i],df[i+16])
plt.show()
Then I get the error:
KeyError: 1
For some reason, I cannot even print(df[1]) anymore. It will give me "KeyError: 1" also. As you have probably guessed by now I am very new to python.
There are a couple problems with the code that could be causing problems.
First, the range function behaves differently from how you use it in the top code block. Range is defined as range(start, end, step) where the start number is included in the range, the end number is not included, and the step is 1 by default. The way that the top code is now, it should not even run. If you want to make it easier to understand for yourself, you could replace range(1,5) (range(1,2,3,4,5) in the code above) with [1,2,3,4] since you can use a for statement to iterate over a list like you can for a range object.
Also, how are you calling the function? In the code example that you gave, you don't have the call to the function. If you don't call the function, it does not execute the code. If you don't want to use a function, that is okay, but it will change the code to be the code below. The function just makes the code more flexible if you want to make different variations of plots.
proef_numbers = [1,2,3,4]
for proef in proef_numbers:
for test in range(1,5):
S_data = df[f"S_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]]
F_data = df[f"F_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]]-F0
plt.plot(S_data, F_data, label=f"Proef {proef} test {test}" )
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title(f"Proef {proef}, test {test}")
plt.legend()
plt.show()
I tested it with dummy data from your other question, and it seems to work.
For your other question, it seems that you want to try to index columns by number, right? As this question shows, you can use .iloc for your pandas dataframe to locate by index (instead of column name). So you will change the second block of code to this:
for i in range(1,17):
plt.plot(df.iloc[:,i],df.iloc[:,i+16])
plt.show()
For this, the df.iloc[:,i] means that you are looking at all the rows (when used by itself, : means all of the elements) and i means the ith column. Keep in mind that python is zero indexed, so the first column would be 0. In that case, you might want to change range(1,17) to range(0,16) or simply range(16) since range defaults to a start value of 0.
I would highly recommend against locating by index though. If you have good column names, you should use those instead since it is more robust. When you select a column by name, you get exactly what you want. When you select by index, there could be a small chance of error if your columns get shuffled for some strange reason.
If you want to see multiple plots at the same time, like a grid of plots, I suggest looking at using sublots:
https://matplotlib.org/stable/gallery/subplots_axes_and_figures/subplots_demo.html
For indexing your dataframe you should use .loc method. Have a look at:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
Since you are new to python, I would suggest learning using NumPy arrays. You can convert your dataframe directly to a NumPy array then plot slices of it.
I am coming from a background of C# programming to learn python, and trying to wrap my head around how things work.
There seems to be a lot of "magic" in getting the result you want in python.
For example, take the following code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
returns = pd.DataFrame(np.random.normal(1.0, 0.03, (100, 10)))
prices = returns.cumprod()
prices.plot()
plt.title('Randomly-generated Prices')
plt.xlabel('Time')
plt.ylabel('Price')
plt.legend(loc=0);
This produces a nifty line plot, very nice.
So we pass the output of numpy.random.normal(...) into pandas.DataFrame(...) to get, according to the docostring, the following assigned to the variable returns:
Two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns)
Then somehow that data structure contains a method returns.cumprod() which is called to get, according to the docstring, the following assigned to the variable prices:
Return cumulative cumprod over requested axis.
The next four lines of code, (how good would it be if notebook had line numbers), are calling methods of the matplotlib.pyplot lib.
So my question is:
As these are completely separate objects in separate namespaces, how does the matplotlib.pyplot know anything about what pandas.DataFrame(...).comprod(...).plot() just did.
Like in C# one object in one namespace knows nothing about another object in a separate namespace that it hasn't been explicitly told, via input methods.
So just trying to get my head around how this all works, so I can build things with a good design, instead of just copying and pasting stuff I google to find that look like they get the output I want.
Like how does scope work with the above code. Lets say I do two plots, how does it know which one to assign the title to?
Thanks for your time/patience,
Kind Regards.
When you run for example plt.legend() it uses the function plt.gca() (get current axis) and since the pandas plot was the last axis plotted it knows where to put the legend. If you do, for example:
pd.DataFrame({'x': [1,2], 'y': [2,3]}).plot.line()
pd.DataFrame({'x': [1,2], 'y': [2,43]}).plot.line()
plt.legend([1])
It will plot the legend in the second axis since it was the current axis.
I am using Python 2.7 and need to draw a time series using matplotlib library. My y axis data is numeric and everything is ok with it.
The problem is my x axis data which is not numeric, and matplotlib does not cooperate in this case. It does not draw me a time series even though it is not supposed to affect the correctness of the plot, because the x axis data is arranged by a given order anyway and it's order does not affect anything logically.
For example let's say the x data is ["i","like","python"] and the y axis data is [1,2,3].
I did not add my code because I've found that the code is ok, it works if I change the data to all numeric data.
Please explain me how can I use matplotlib to draw the time series, without making me to convert the x values to numeric stuff.
I've based my matplotlib code on following answers: How to plot Time Series using matplotlib Python, Time Series Plot Python.
Matplotlib requires someway of positioning those labels. See the following example:
import matplotlib.pyplot as plt
x = ["i","like","python"]
y = [1,2,3]
plt.plot(y,y) # y,y because both are numeric (you could create an xt = [1,2,3]
plt.xticks(y,x) # same here, the second argument are the labels.
plt.show()
, that results in this:
Notice how I've put the labels there but had to somehow say where they are supposed to be.
I also think you should put a part of your code so that it's easier for other people to suggest upon.
basically I want to graph two functions
g1 = x*cos(x*pi)
g2 = 1 - 0.6x^2
and then plot the intersection, I already have a module that takes inputs close to the two lines intersections, and then converges to those points (there's four of them)
but I want to graph these two functions and their intersections using matplotlib but have no clue how. I've only graphed basic functions. Any help is greatly appreciated
Assuming you can get as far as plotting one function, with x and g1 as numpy arrays,
pylab.plot(x,g1)
just call plot again (and again) to draw any number of separate curves:
pylab.plot(x,g2)
finally display or save to a file:
pylab.show()
To indicate a special point such as an intersection, just pass in scalars for x, y and ask for a marker such 'x' or 'o' or whatever else you like.
pylab.plot(x_intersect, y_intersect, 'x', color="#80C0FF")
Alternatively, I often mark a special place along x with a vertical segment by plotting a quick little two-point data set:
pylab.plot( [x_special, x_special], [0.5, 1.9], '-b' )
I may hardcode the y values to look good on a plot for my current project, but obviously this is not reusable for other projects. Note that plot() can take ordinary python lists; no need to convert to numpy arrays.
If you can't get as far as plotting one function (just g1) then you need a basic tutorial in matplot lib, which wouldn't make a good answer here but please go visit http://matplotlib.org/ and google "matplotlib tutorial" or "matplotlib introduction".