Plot rolling mean together with data - python
I have a DataFrame that looks something like this:
####delays:
Worst case Avg case
2014-10-27 2.861433 0.953108
2014-10-28 2.899174 0.981917
2014-10-29 3.080738 1.030154
2014-10-30 2.298898 0.711107
2014-10-31 2.856278 0.998959
2014-11-01 3.118587 1.147104
...
I would like to plot the data of this DataFrame, together with the rolling mean of the data. I would like the data itself should be a dotted line and the rolling mean to be a full line. The worst case column should be in red, while the average case column should be in blue.
I've tried the following code:
import pandas as pd
import matplotlib.pyplot as plt
rolling = pd.rolling_mean(delays, 7)
delays.plot(x_compat=True, style='r--')
rolling.plot(style='r')
plt.title('Delays per day on entire network')
plt.xlabel('Date')
plt.ylabel('Minutes')
plt.show()
Unfortunately, this gives me 2 different plots. One with the data and one with the rolling mean. Also, the worst case column and average case column are both in red.
How can I get this to work?
You need to say to pandas where you want to plot. By default pandas creates a new figure.
Just modify these 2 lines:
delays.plot(x_compat=True, style='r--')
rolling.plot(style='r')
by:
ax_delays = delays.plot(x_compat=True, style='--', color=["r","b"])
rolling.plot(color=["r","b"], ax=ax_delays, legend=0)
in the 2nd line you now tell pandas to plot on ax_delays, and to not show the legend again.
To get 2 different colors for the 2 lines, just pass as many colors with color argument (see above).
Related
How to iterate distance calculation for different vehicles from coordinates
I am new to coding and need help developing a Time Space Diagram (TSD) from a CSV file which I got from a VISSIM simulation as a result. A general TSD looks like this: TSD and I have a CSV which looks like this: CSV. I want to take "VEHICLE:SIMSEC" which represent the simulation time which I want it represented as the X axis on TSD, "NO" which represent the vehicle number (there are 185 different vehicles and I want to plot all 185 of them on the plot) as each of the line represented on TSD, "COORDFRONTX" which is the x coordinate of the simulation, and "COORDFRONTY" which is the y coordinate of the simulation as positions which would be the y axis on TSD. I have tried the following code but did not get the result I want. import pandas as pd import matplotlib.pyplot as mp # take data data = pd.read_csv(r"C:\Users\hk385\Desktop\VISSIM_DATA_CSV.csv") df = pd.DataFrame(data, columns=["VEHICLE:SIMSEC", "NO", "DISTTRAVTOT"]) # plot the dataframe df.plot(x="NO", y=["DISTTRAVTOT"], kind="scatter") # print bar graph mp.show() The plot came out to be uninterpretable as there were too many dots. The diagram looks like this: Time Space Diagram. So would you be able to help me or guide me to get a TSD from the CSV I have? Suggestion made by mitoRibo, The top 20 rows of the csv is the following: VEHICLE:SIMSEC,NO,LANE\LINK\NO,LANE\INDEX,POS,POSLAT,COORDFRONTX,COORDFRONTY,COORDREARX,COORDREARY,DISTTRAVTOT 5.9,1,1,1,2.51,0.5,-1.259,-3.518,-4.85,-1.319,8.42 6.0,1,1,1,10.94,0.5,0.932,-4.86,-2.659,-2.661,16.86 6.1,1,1,1,19.37,0.5,3.125,-6.203,-0.466,-4.004,25.29 6.2,1,1,1,27.82,0.5,5.319,-7.547,1.728,-5.348,33.73 6.3,1,1,1,36.26,0.5,7.515,-8.892,3.924,-6.693,42.18 6.4,1,1,1,44.72,0.5,9.713,-10.238,6.122,-8.039,50.64 6.5,1,1,1,53.18,0.5,11.912,-11.585,8.321,-9.386,59.1 6.6,1,1,1,61.65,0.5,14.112,-12.933,10.521,-10.734,67.56 6.7,1,1,1,70.12,0.5,16.314,-14.282,12.724,-12.082,76.04 6.8,1,1,1,78.6,0.5,18.518,-15.632,14.927,-13.432,84.51 6.9,1,1,1,87.08,0.5,20.723,-16.982,17.132,-14.783,93.0 7.0,1,1,1,95.57,0.5,22.93,-18.334,19.339,-16.135,101.49 7.1,1,1,1,104.07,0.5,25.138,-19.687,21.547,-17.487,109.99 7.2,1,1,1,112.57,0.5,27.348,-21.04,23.757,-18.841,118.49 7.3,1,1,1,121.08,0.5,29.56,-22.395,25.969,-20.195,127.0 7.4,1,1,1,129.59,0.5,31.773,-23.75,28.182,-21.551,135.51 7.5,1,1,1,138.11,0.5,33.987,-25.107,30.396,-22.907,144.03 7.6,1,1,1,146.64,0.5,36.203,-26.464,32.612,-24.264,152.56 7.7,1,1,1,155.17,0.5,38.421,-27.822,34.83,-25.623,161.09 Thank you.
You can groupby and iterate through different vehicles, adding each one to your plot. I changed your example data so there were 2 different vehicles. import pandas as pd import io import matplotlib.pyplot as plt df = pd.read_csv(io.StringIO(""" VEHICLE:SIMSEC,NO,LANE_LINK_NO,LANE_INDEX,POS,POSLAT,COORDFRONTX,COORDFRONTY,COORDREARX,COORDREARY,DISTTRAVTOT 5.9,1,1,1,2.51,0.5,-1.259,-3.518,-4.85,-1.319,0 6.0,1,1,1,10.94,0.5,0.932,-4.86,-2.659,-2.661,16.86 6.1,1,1,1,19.37,0.5,3.125,-6.203,-0.466,-4.004,25.29 6.2,1,1,1,27.82,0.5,5.319,-7.547,1.728,-5.348,33.73 6.3,1,1,1,36.26,0.5,7.515,-8.892,3.924,-6.693,42.18 6.4,1,1,1,44.72,0.5,9.713,-10.238,6.122,-8.039,50.64 6.5,1,1,1,53.18,0.5,11.912,-11.585,8.321,-9.386,59.1 6.6,1,1,1,61.65,0.5,14.112,-12.933,10.521,-10.734,67.56 6.7,1,1,1,70.12,0.5,16.314,-14.282,12.724,-12.082,76.04 6.8,1,1,1,78.6,0.5,18.518,-15.632,14.927,-13.432,84.51 6.9,1,1,1,87.08,0.5,20.723,-16.982,17.132,-14.783,90 6.0,2,1,1,95.57,0.5,22.93,-18.334,19.339,-16.135,0 6.1,2,1,1,104.07,0.5,25.138,-19.687,21.547,-17.487,30 6.2,2,1,1,112.57,0.5,27.348,-21.04,23.757,-18.841,40 6.3,2,1,1,121.08,0.5,29.56,-22.395,25.969,-20.195,50 6.4,2,1,1,129.59,0.5,31.773,-23.75,28.182,-21.551,60 6.5,2,1,1,138.11,0.5,33.987,-25.107,30.396,-22.907,70 6.6,2,1,1,146.64,0.5,36.203,-26.464,32.612,-24.264,80 6.7,2,1,1,155.17,0.5,38.421,-27.822,34.83,-25.623,90 """),sep=',') fig = plt.figure() #Iterate through each vehicle, adding it to the plot for vehicle_no,vehicle_df in df.groupby('NO'): plt.plot(vehicle_df['VEHICLE:SIMSEC'],vehicle_df['DISTTRAVTOT'], label=vehicle_no) plt.legend() #comment this out if you don't want a legned plt.show() plt.close()
If you don't mind could you please try this. mp.scatter(x="NO", y=["DISTTRAVTOT"]) If still not work please attach your data for me to test from my side.
Set confidence intervals for error bars plot in matplotlib
I have this dataset: mydf = pd.DataFrame({'Feature':['Pysch','Physio'],'log_or':[0.3126,0.2022], 'se':[0.0712,0.0568], 'conf_low':[0.1729,0.0907], 'conf_high':[0.4522, 0.3136]}) mydf = mydf.sort_values(by='log_or') mydf Feature log_or se conf_low conf_high 1 Physio 0.2022 0.0568 0.0907 0.3136 0 Pysch 0.3126 0.0712 0.1729 0.4522 And I want to create an error bar plot using my calculated confidence intervals in con_low and conf_high I tried this at the beginning but I can see that the intervals don't cover my calculated confidence intervals: plt.errorbar(mydf['log_or'], mydf['Feature'], xerr=mydf['se'], marker='s', mfc='Tomato') plt.show() You can see that, for example, in the Physio variable the error bar goes from 0.14 to 0.26 in the image approximately, but my tabulated confidence intervals go from 0.091 to 0.316. So I tried to set up my custom intervals, with this: lowr = mydf['conf_low'].to_numpy() uppr = mydf['conf_high'].to_numpy() intervals = [lowr, uppr] plt.errorbar(mydf['log_or'], mydf['Feature'], xerr=intervals, marker='s', mfc='Tomato') plt.show() Now my variable Physio interval goes from 0.1 to 0.5 approx, which is wrong. Now, what I am doing wrong? How can I use my custom intervals to this plot?
I think you are misunderstanding what the values passed to xerr are meant to represent. Have a look at the plt.errorbar documentation (sub xerr, yerr). From your first attempt: xerr=mydf['se'] will be used as follows: shape(N,): Symmetric +/-values for each data point. From your second attempt, xerr=intervals will be used as follows: shape(2, N): Separate - and + values for each bar. First row contains the lower errors, the second row contains the upper errors. So, the values you are passing here are used to measure the length of the error (+/- for each data point). However, your values in mydf.conf_low and mydf.conf_high do not represent length, they are simply x-values. As you mention for Physio: my tabulated confidence intervals go from 0.091 to 0.316. The solution then is to calculate the length on both sides and pass those values to xerr. Like so: import pandas as pd import matplotlib.pyplot as plt mydf = pd.DataFrame({'Feature':['Pysch','Physio'],'log_or':[0.3126,0.2022], 'se':[0.0712,0.0568], 'conf_low':[0.1729,0.0907], 'conf_high':[0.4522, 0.3136]}) mydf = mydf.sort_values(by='log_or') mydf plt.errorbar(mydf['log_or'], mydf['Feature'], xerr=((mydf.log_or - mydf.conf_low),(mydf.conf_high-mydf.log_or)), marker='s', mfc='Tomato') plt.show() Result:
I want to detect ranges with the same numerical boundaries of a dataset using matplotlib or pandas in python 3.7
I have a ton of ranges. They all consist of numbers. The range has a maximum and a minimum which can not be exceeded, but given the example that you have two ranges and one max point of the range reaches above the min area of the other. That would mean that you have a small area that covers both of them. You can write one range that includes the others. I want to see if some ranges overlap or if I can find some ranges that cover most of the other. The goal would be to see if I can simplify them by using one smaller range that fits inside the other. For example 7,8 - 9,6 and 7,9 - 9,6 can be covered with one range. You can see my attempt to visualize them. But when I use my entire dataset consisting of hundreds of ranges my graph is not longer useful. I know that I can detect recurrent ranges using python. But I don't want to know how often a range occurs. I want to know how many ranges lay in the same numerical boundaries.I want see if I can have a couple of ranges covering all of them. Finally my goal is to have the masterranges sorted in categories. Meaning that I have range 1 covering 50 other ranges. then range 2 covering 25 ranges and so on. My current program shows the penetration of ranges but I also want that in a printed output with the exact digits. It would be nice if you share some ideas to solve that program or if you have any suggestions on tools within python 3.7 import matplotlib.pyplot as plt intervals = [[3.6,4.5], [3.6,4.5], [7.8,9.6], [7.9,9.6], [7.8,9.6], [3.4,4.1], [2.8,3.4], [8.25,9.83], [3.62,3.96], [8.25,9.83], [0.62,0.68], [2.15,2.49], [0.8,1.0], [0.8,1.0], [3.1,3.9], [6.7,8.3], [1,1.5], [1,1.2], [1.5,1.8], [1.8,2.5], [3,4.0], [6.5,8.0], [1.129,1.35], [2.82,3.38], [1.69,3.38], [3.38,6.21], [2.25,2.82], [5.649,6.214], [1.920,6.214] ] for int in intervals: plt.plot(int,[0,0], 'b', alpha = 0.2, linewidth = 100) plt.show()
Here is an idea, You make a pandas data frame with the array. You substract the values in column2 - colum1 ( column 1 is x, and column 2 is y ). After that you create a histogram in which you take the range and the frecuency. import pandas as pd import numpy as np import matplotlib.pyplot as plt intervals = [[3.6,4.5], [3.6,4.5], [7.8,9.6], [7.9,9.6], [7.8,9.6], [3.4,4.1], [2.8,3.4], [8.25,9.83], [3.62,3.96], [8.25,9.83], [0.62,0.68], [2.15,2.49], [0.8,1.0], [0.8,1.0], [3.1,3.9], [6.7,8.3], [1,1.5], [1,1.2], [1.5,1.8], [1.8,2.5], [3,4.0], [6.5,8.0], [1.129,1.35], [2.82,3.38], [1.69,3.38], [3.38,6.21], [2.25,2.82], [5.649,6.214], [1.920,6.214]] intervals_ar = np.array(intervals) df = pd.DataFrame({'Column1': intervals_ar[:, 0], 'Column2': intervals_ar[:, 1]}) df['Ranges'] = df['Column2'] - df ['Column1'] print(df) frecuency_range = df['Ranges'].value_counts().sort_index() print(frecuency_range) df.Ranges.value_counts().sort_index().plot(kind = 'hist', bins = 5) plt.title("Histogram Frecuency vs Range (column 2- column1)") plt.show()
How to plot errorbar in line chart from dataframes
I have two dataframes, df_avg and df_sem, which contain mean values and standard errors of the means, respectively. For example: KPCmb1 KPCmb1IA KPCmb2 KPCmb3 KPCmb4 KPCmb5 KPCmb6 temp 19.99 15.185905 24.954296 22.610052 29.249107 26.151815 34.374257 36.589218 20.08 15.198452 24.998227 22.615342 29.229325 26.187794 34.343738 36.596730 20.23 15.208917 25.055061 22.647499 29.234424 26.193382 34.363549 36.580033 20.47 15.244485 25.092773 22.691421 29.206816 26.202425 34.337385 36.640839 20.62 15.270921 25.145798 22.720752 29.217821 26.235101 34.364162 36.600030 and KPCmb1 KPCmb1IA KPCmb2 KPCmb3 KPCmb4 KPCmb5 KPCmb6 temp 19.99 0.342735 0.983424 0.131502 0.893494 1.223318 0.536450 0.988185 20.08 0.347366 0.983732 0.136239 0.898661 1.230763 0.534779 0.993970 20.23 0.348641 0.981614 0.134729 0.898790 1.227567 0.529240 1.005609 20.47 0.350937 0.993973 0.138411 0.881142 1.237749 0.526841 0.991591 20.62 0.345863 0.983064 0.132934 0.883863 1.234746 0.533048 0.987520 I want to plot a line chart using temp as the x-axis and the dataframe columns as the y-axes. I also want to use the df_sem dataframe to provide error bars for each line (note the column names are the same between the two dataframes). I can achieve this with the following code: df_avg.plot(yerr=df_sem), but this does not allow me to change many aspects of the plot, like DPI, labels, and things like that. So I've tried to make the plot using the following code as an alternative: plt.figure() x = df_avg.index y = df_avg plt.errorbar(x,y,yerr=df_sem) plt.show() But this gives me the error: ValueError: shape mismatch: objects cannot be broadcast to a single shape How do I go about making the same chart that I am able to using pandas plotting with matplotlib plotting? Thanks!
You can do just a simple for loop: for col in df_avg.columns: plt.errorbar(df_avg.index, df_avg[col], yerr=df_sem[col], label=col) plt.legend() Output:
Is there a way for iPython to generate these kinds of charts given a dataframe?
This picture Please ignore the background image. The foreground chart is what I am interested in showing using pandas or numpy or scipy (or anything in iPython). I have a dataframe where each row represents temperatures for a single day. This is an example of some rows: 100 200 300 400 500 600 ...... 2300 10/3/2013 53*C 57*C 48*C 49*C 54*C 54*C 55*C 10/4/2013 45*C 47*C 48*C 49*C 50*C 52*C 57*C Is there a way to get a chart that represents the changes from hour to hour using the first column as a 'zero'
Something quick and dirty that might get you most of the way there, assuming your data frame is named df: import matplotlib.pyplot as plt plt.imshow(df.T.diff().fillna(0.0).T.drop(0, axis=1).values) Since I can't easily construct a sample version with your exact column labels, there might be slight additional tinkering with getting rid of any index columns that are included in the diff and moved with the transposition. But this worked to make a simple heat-map-ish plot for me on a random data example. Then you can create a matplotlib figure or axis object and specify whatever you want for the x- and y-axis labels.
You could just plot lines one at a time for each row with an offset: nrows, ncols = 12, 30 # make up some fake data: d = np.random.rand(nrows, ncols) d *= np.sin(2*np.pi*np.arange(ncols)*4/ncols) d *= np.exp(-0.5*(np.arange(nrows)-nrows/2)**2/(nrows/4)**2)[:,None] #this is all you need, if you already have the data: for i, r in enumerate(d): plt.fill_between(np.arange(ncols), r+(nrows-i)/2., lw=2, facecolor='white') You could do it all at once if you don't need the fill color to block the previous line: d += np.arange(nrows)[:, None] plt.plot(d.T)