I have a certain data set called "df" existing of 5000 rows and 32 columns. I want to plot 16 graphs by using a for loop. There are two problems that I cannot overcome:
The plot does not show when using this code:
proef_numbers = [1,2,3,4,5]
def plot_results(df, proef_numbers, title):
for proef in proef_numbers:
for test in range(1,2,3,4,5):
S_data = df[f"S_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]]
F_data = df[f"F_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]]-F0
plt.plot(S_data, F_data, label=f"Proef {proef} test {test}" )
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title(f"Proef {proef}, test {test}")
plt.legend()
plt.show()
After this I tried something else and restructured my data set and I wanted to use the following for loop:
for i in range(1,17):
plt.plot(df[i],df[i+16])
plt.show()
Then I get the error:
KeyError: 1
For some reason, I cannot even print(df[1]) anymore. It will give me "KeyError: 1" also. As you have probably guessed by now I am very new to python.
There are a couple problems with the code that could be causing problems.
First, the range function behaves differently from how you use it in the top code block. Range is defined as range(start, end, step) where the start number is included in the range, the end number is not included, and the step is 1 by default. The way that the top code is now, it should not even run. If you want to make it easier to understand for yourself, you could replace range(1,5) (range(1,2,3,4,5) in the code above) with [1,2,3,4] since you can use a for statement to iterate over a list like you can for a range object.
Also, how are you calling the function? In the code example that you gave, you don't have the call to the function. If you don't call the function, it does not execute the code. If you don't want to use a function, that is okay, but it will change the code to be the code below. The function just makes the code more flexible if you want to make different variations of plots.
proef_numbers = [1,2,3,4]
for proef in proef_numbers:
for test in range(1,5):
S_data = df[f"S_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]]
F_data = df[f"F_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]]-F0
plt.plot(S_data, F_data, label=f"Proef {proef} test {test}" )
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title(f"Proef {proef}, test {test}")
plt.legend()
plt.show()
I tested it with dummy data from your other question, and it seems to work.
For your other question, it seems that you want to try to index columns by number, right? As this question shows, you can use .iloc for your pandas dataframe to locate by index (instead of column name). So you will change the second block of code to this:
for i in range(1,17):
plt.plot(df.iloc[:,i],df.iloc[:,i+16])
plt.show()
For this, the df.iloc[:,i] means that you are looking at all the rows (when used by itself, : means all of the elements) and i means the ith column. Keep in mind that python is zero indexed, so the first column would be 0. In that case, you might want to change range(1,17) to range(0,16) or simply range(16) since range defaults to a start value of 0.
I would highly recommend against locating by index though. If you have good column names, you should use those instead since it is more robust. When you select a column by name, you get exactly what you want. When you select by index, there could be a small chance of error if your columns get shuffled for some strange reason.
If you want to see multiple plots at the same time, like a grid of plots, I suggest looking at using sublots:
https://matplotlib.org/stable/gallery/subplots_axes_and_figures/subplots_demo.html
For indexing your dataframe you should use .loc method. Have a look at:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
Since you are new to python, I would suggest learning using NumPy arrays. You can convert your dataframe directly to a NumPy array then plot slices of it.
Related
I'm new to Python so I hope you'll forgive my silly questions. I have read a dataset from excel with pandas. The dataset is composed by 3 functions (U22, U35, U55) and related same index (called y/75). enter image description here
now I would like to "turn" the graph so that the index "y/75" goes on the y-axis instead of the x-axis, keeping all the functions in the same graph. The results I want to obtain is like in the following picture enter image description here
the code I've used is
var = pd.read_excel('path.xlsx','SummarySheet', index_col=0)
norm_vel=var[['U22',"U35","U55"]]
norm_vel.plot(figsize=(10,10), grid='true')
But with this code I couldn't find a way to change the axes. Then I tried a different approach, so I turned the graph but couldn't add all the functions in the same graph but just one by one
var = pd.read_excel('path.xlsx','SummarySheet', index_col=False)
norm_vel2=var[['y/75','U22',"U35","U55"]]
norm_vel2.plot( x='U22', y='y/75', figsize=(10,10), grid='true' )
plt.title("Velocity profiles")
plt.xlabel("Normalized velocity")
plt.ylabel("y/75")
obtaining this enter image description here
I am not very familiar with dataframes plot. And to be honest, I've been stalking this question expecting that someone would give an obvious answer. But since no one has one (1 hour old questions, is already late for obvious answers), I can at least tell you how I would do it, without the plot method of the dataframe
plt.figure(figsize=(10,10))
plt.grid(True)
plt.plot(var[['U22',"U35","U55"]], var['y/75'])
plt.title("Velocity profiles")
plt.xlabel("Normalized velocity")
plt.ylabel("y/75")
When used to matplotlib, in which, you can have multiple series in both x and y, the instinct says that pandas connections (which are just useful functions to call matplotlib with the correct parameters), should make it possible to just call
var.plot(x=['U22', 'U35', 'U55'], y='y/75')
Since after all,
var.plot(x='y/75', y=['U22', 'U35', 'U55'])
works as expected (3 lines: U22 vs y/75, U35 vs y/75, U55 vs y/75). So the first one should have also worked (3 lines, y/75 vs U22, y/75 vs U35, y/75 vs U55). But it doesn't. Probably the reason why pandas documentation itself says that these matplotlib connections are still a work in progress.
So, all you've to do is call matplotlib function yourself. After all, it is not like pandas is doing much more when calling those .plot method anyway.
I'm not great at matplotlib, but I need to use it for some work I am doing. I have a set of 9 columns of data, with around 100k lines. I want to produce a scatter plot, and I don't care about the rows, they're meaningless for my purposes. What I need is for the values to be plotted as a scatter against which column they are in, regardless of which row they a part of.
This is all loaded in from a text file in a simple 2D array using numpy.loadtxt. It's just a set of numbers, so any substitution of random numbers should work. I'm just not sure how to manipulate it in a way that the scatter command will like. I often get it giving me errors like I'm giving it too few arguments, or it just iterates over the array (or arrays if I separate them), in ways I do not anticipate.
My first thought is that I could somehow break it down into a set of series by column, but I don't think the scatter command will take that. Any help would be very much appreciated.
The scatter function takes in two lists that have the same length. To access a single column n of your numpy array, just use data[:, n]. Since you want to correspond all columns with their column number, you need to also create a list that has the same length of data, with only the column number as elements. To create the plot you want, just do the following:
for i in range(9):
plt.scatter([i + 1] * len(data), data[:, i])
I'm using iPython notebook's %matplotlib inline and I'm having trouble formatting my plot.
As you can see, my first and last data point aren't showing up the way the other data points are showing up. I'd like to have the error bars visible and have the graph be "zoomed out" a bit.
df.plot(yerr=df['std dev'],color='b', ecolor='r')
plt.title('SpO2 Mean with Std Dev')
plt.xlabel('Time (s)')
plt.ylabel(SpO2)
I assume I have to use
matplotlib.pyplot.xlim()
but I'm not sure how to use it properly if my x-axis is a DataFrame index composed of strings:
index = ['-3:0','0:3','3:6','6:9','9:12','12:15','15:18','18:21','21:24']
Any ideas? Thanks!
You can see the usage of xlim here. Basically in this case if you ran plt.xlim() you would get(0.0, 8.0). As you have an index that uses text and not numbers the values for xlim are actually just the index of the entries in your index. So in this case you would just need to change the values by feeding in however many steps left and right you want your graph to take. For example:
plt.xlim(-1,len(df))
Would change this:
to this:
Hope that helps.
In the graphic below, I want to put in a legend for the calendar plot. The calendar plot was made using ax.plot(...,label='a') and drawing rectangles in a 52x7 grid (52 weeks, 7 days per week).
The legend is currently made using:
plt.gca().legend(loc="upper right")
How do I correct this legend to something more like a colorbar? Also, the colorbar should be placed at the bottom of the plot.
EDIT:
Uploaded code and data for reproducing this here:
https://www.dropbox.com/sh/8xgyxybev3441go/AACKDiNFBqpsP1ZttsZLqIC4a?dl=0
Aside - existing bugs
The code you put on the dropbox doesn't work "out of the box". In particular - you're trying to divide a datetime.timedelta by a numpy.timedelta64 in two places and that fails.
You do your own normalisation and colour mapping (calling into color_list based on an int() conversion of your normalised value). You subtract 1 from this and you don't need to - you already floor the value by using int(). The result of doing this is that you can get an index of -1 which means your very smallest values are incorrectly mapped to the colour for the maximum value. This is most obvious if you plot column 'BIOM'.
I've hacked this by adding a tiny value (0.00001) to the total range of the values that you divide by. It's a hack - I'm not sure that this method of mapping is at all the best use of matplotlib, but that's a different question entirely.
Solution adapting your code
With those bugs fixed, and adding a last suplot below all the existing ones (i.e. replacing 3 with 4 on all your calls to subplot2grid(), you can do the following:
Replace your
plt.gca().legend(loc="upper right")
with
# plot an overall colorbar type legend
# Grab the new axes object to plot the colorbar on
ax_colorbar = plt.subplot2grid((4,num_yrs), (3,0),rowspan=1,colspan=num_yrs)
mappableObject = matplotlib.cm.ScalarMappable(cmap = palettable.colorbrewer.sequential.BuPu_9.mpl_colormap)
mappableObject.set_array(numpy.array(df[col_name]))
col_bar = fig.colorbar(mappableObject, cax = ax_colorbar, orientation = 'horizontal', boundaries = numpy.arange(min_val,max_val,(max_val-min_val)/10))
# You can change the boundaries kwarg to either make the scale look less boxy (increase 10)
# or to get different values on the tick marks, or even omit it altogether to let
col_bar.set_label(col_name)
ax_colorbar.set_title(col_name + ' color mapping')
I tested this with two of your columns ('NMN' and 'BIOM') and on Python 2.7 (I assume you're using Python 2.x given the print statement syntax)
The finalised code that works directly with your data file is in a gist here
You get
How does it work?
It creates a ScalarMappable object that matplotlib can use to map values to colors. It set the array to base this map on to all the values in the column you are dealing with. It then used Figure.colorbar() to add the colorbar - passing in the mappable object so that the labels are correct. I've added boundaries so that the minimum value is shown explicitly - you can omit that if you want matplotlib to sort that out for itself.
P.S. I've set the colormap to palettable.colorbrewer.sequential.BuPu_9.mpl_colormap, matching your get_colors() function which gets these colours as a 9 member list. I strongly recommend importing the colormap you want to use as a nice name to make the use of mpl_colors and mpl_colormap more easy to understand e.g.
import palettable.colorbrewer.sequential.BuPu_9 as color_scale
Then access it as
color_scale.mpl_colormap
That way, you can keep your code DRY and change the colors with only one change.
Layout (in response to comments)
The colorbar may be a little big (certainly tall) for aesthetic ideal. There are a few possible options to do that. I'll point you to two:
The "right" way to do it is probably to use a Gridspec
You could use your existing approach, but increase the number of rows and have the colorbar still in one row, while the other elements span more rows than they do currently.
I've implemented that with 9 rows, an extra column (so that the month labels don't get lost) and the colorbar on the bottom row, spanning 2 less columns than the main figure. I've also used tight_layout with w_pad=0.0 to avoid label clashes. You can play with this to get your exact preferred size. New code here.
This gives:
:
There are functions to do this in matplotlib.colorbar. With some specific code from your example, I could give you a better answer, but you'll use something like:
myColorbar = matplotlib.colorbar.ColorbarBase(myAxes, cmap=myColorMap,
norm=myNorm,
orientation='vertical')
I am using a kinect to get position data, using the module pykinect.
The problem is that it returns me position data for each gap of time, so the output looks like this:
output x_data:
...
(0.04)
(0.06)
(0.069)
(0.072)
(0.08)
(0.074)
(0.071)
So when I call x_data, it only returns the last value (in this case 0.071), so x_data is not a list or a tuple.
I need the position values as a list, so I can use them later.
Does anyone know how to take all the values of the output and save them into a single list? Because the value of x_data is changing due to time passing, the question is how can I save the values of the output in a list.
I don't know when you (can) call/read x_data, but if it is a loop or after a time interval you can use the following code:
data_x = [] # create empty list data_x (not to confuse with x_data)
# some other code
# run repeatedly:
data_x.append(x_data()) # I don't know if x_data is a function or a variable
# if it's a variable remove the inner parenthesis
If you are processing real-time position data, you will probably have loads of values -- using a list for doing anything with this will be incredibly slow. You should consider using a pre-allocated numpy array instead. It's hard to know without more details what is the best option for your application, however.