How to plot a graph of pandas dataframe rows using matplotlib? - python
A small snipet of my dataframe is given below.
UserID Recommendations
0 A001 [(B000OR5928, 5.671419620513916), (B000A1HU1G, 5.435572624206543), (B0039HBNMA, 5.4260640144348145), (B000EEGAJW, 5.502416133880615), (B001L8KE06, 5.508320331573486), (B0002ZO60I, 5.640686511993408), (B0002D0096, 5.543562412261963), (B0013PU75Y, 5.452023506164551), (B005M0TKL8, 5.481754302978516), (B001PGXHYO, 5.5017194747924805)]
1 A002 [(B000EEGAJW, 4.382242679595947), (B004ZKIHVU, 4.182255268096924), (B000CBE3GE, 4.242227077484131), (B000CCJP4I, 4.354374408721924), (B000VBC5CY, 4.342846393585205), (B0002KZHQA, 4.127199649810791), (B0026RB0G8, 4.246310234069824), (B0002D0CQC, 4.275753021240234), (B0002M6CVC, 4.679849624633789), (B0002D0KOG, 4.138158321380615)]
The dataframe contains two columns UserID and Recommendations.The recommendation column contains productID of products recommended to that user along with ratings which is in the form of list.
What I want to do is if I click on user A001 then a graph should get display.The y-axis of graph will display productIDs recommended to A001 and X-axis will display rating of that product.This should be done in case of each UserID
I know how to plot a graph with single values using matplotlib but here it has a list of values .How can I go about it.
You can try this code to solve your problem:
import matplotlib.pyplot as plt
import numpy as np
for i in df.UserID:
ratings = []
productsIDs = []
for points in df.Recommendations[np.where(df.UserID==i)[0]]:
for point in points:
ratings.append(point[1])
productsIDs.append(point[0])
plt.plot(ratings, productsIDs)
plt.show()
Related
How to iterate distance calculation for different vehicles from coordinates
I am new to coding and need help developing a Time Space Diagram (TSD) from a CSV file which I got from a VISSIM simulation as a result. A general TSD looks like this: TSD and I have a CSV which looks like this: CSV. I want to take "VEHICLE:SIMSEC" which represent the simulation time which I want it represented as the X axis on TSD, "NO" which represent the vehicle number (there are 185 different vehicles and I want to plot all 185 of them on the plot) as each of the line represented on TSD, "COORDFRONTX" which is the x coordinate of the simulation, and "COORDFRONTY" which is the y coordinate of the simulation as positions which would be the y axis on TSD. I have tried the following code but did not get the result I want. import pandas as pd import matplotlib.pyplot as mp # take data data = pd.read_csv(r"C:\Users\hk385\Desktop\VISSIM_DATA_CSV.csv") df = pd.DataFrame(data, columns=["VEHICLE:SIMSEC", "NO", "DISTTRAVTOT"]) # plot the dataframe df.plot(x="NO", y=["DISTTRAVTOT"], kind="scatter") # print bar graph mp.show() The plot came out to be uninterpretable as there were too many dots. The diagram looks like this: Time Space Diagram. So would you be able to help me or guide me to get a TSD from the CSV I have? Suggestion made by mitoRibo, The top 20 rows of the csv is the following: VEHICLE:SIMSEC,NO,LANE\LINK\NO,LANE\INDEX,POS,POSLAT,COORDFRONTX,COORDFRONTY,COORDREARX,COORDREARY,DISTTRAVTOT 5.9,1,1,1,2.51,0.5,-1.259,-3.518,-4.85,-1.319,8.42 6.0,1,1,1,10.94,0.5,0.932,-4.86,-2.659,-2.661,16.86 6.1,1,1,1,19.37,0.5,3.125,-6.203,-0.466,-4.004,25.29 6.2,1,1,1,27.82,0.5,5.319,-7.547,1.728,-5.348,33.73 6.3,1,1,1,36.26,0.5,7.515,-8.892,3.924,-6.693,42.18 6.4,1,1,1,44.72,0.5,9.713,-10.238,6.122,-8.039,50.64 6.5,1,1,1,53.18,0.5,11.912,-11.585,8.321,-9.386,59.1 6.6,1,1,1,61.65,0.5,14.112,-12.933,10.521,-10.734,67.56 6.7,1,1,1,70.12,0.5,16.314,-14.282,12.724,-12.082,76.04 6.8,1,1,1,78.6,0.5,18.518,-15.632,14.927,-13.432,84.51 6.9,1,1,1,87.08,0.5,20.723,-16.982,17.132,-14.783,93.0 7.0,1,1,1,95.57,0.5,22.93,-18.334,19.339,-16.135,101.49 7.1,1,1,1,104.07,0.5,25.138,-19.687,21.547,-17.487,109.99 7.2,1,1,1,112.57,0.5,27.348,-21.04,23.757,-18.841,118.49 7.3,1,1,1,121.08,0.5,29.56,-22.395,25.969,-20.195,127.0 7.4,1,1,1,129.59,0.5,31.773,-23.75,28.182,-21.551,135.51 7.5,1,1,1,138.11,0.5,33.987,-25.107,30.396,-22.907,144.03 7.6,1,1,1,146.64,0.5,36.203,-26.464,32.612,-24.264,152.56 7.7,1,1,1,155.17,0.5,38.421,-27.822,34.83,-25.623,161.09 Thank you.
You can groupby and iterate through different vehicles, adding each one to your plot. I changed your example data so there were 2 different vehicles. import pandas as pd import io import matplotlib.pyplot as plt df = pd.read_csv(io.StringIO(""" VEHICLE:SIMSEC,NO,LANE_LINK_NO,LANE_INDEX,POS,POSLAT,COORDFRONTX,COORDFRONTY,COORDREARX,COORDREARY,DISTTRAVTOT 5.9,1,1,1,2.51,0.5,-1.259,-3.518,-4.85,-1.319,0 6.0,1,1,1,10.94,0.5,0.932,-4.86,-2.659,-2.661,16.86 6.1,1,1,1,19.37,0.5,3.125,-6.203,-0.466,-4.004,25.29 6.2,1,1,1,27.82,0.5,5.319,-7.547,1.728,-5.348,33.73 6.3,1,1,1,36.26,0.5,7.515,-8.892,3.924,-6.693,42.18 6.4,1,1,1,44.72,0.5,9.713,-10.238,6.122,-8.039,50.64 6.5,1,1,1,53.18,0.5,11.912,-11.585,8.321,-9.386,59.1 6.6,1,1,1,61.65,0.5,14.112,-12.933,10.521,-10.734,67.56 6.7,1,1,1,70.12,0.5,16.314,-14.282,12.724,-12.082,76.04 6.8,1,1,1,78.6,0.5,18.518,-15.632,14.927,-13.432,84.51 6.9,1,1,1,87.08,0.5,20.723,-16.982,17.132,-14.783,90 6.0,2,1,1,95.57,0.5,22.93,-18.334,19.339,-16.135,0 6.1,2,1,1,104.07,0.5,25.138,-19.687,21.547,-17.487,30 6.2,2,1,1,112.57,0.5,27.348,-21.04,23.757,-18.841,40 6.3,2,1,1,121.08,0.5,29.56,-22.395,25.969,-20.195,50 6.4,2,1,1,129.59,0.5,31.773,-23.75,28.182,-21.551,60 6.5,2,1,1,138.11,0.5,33.987,-25.107,30.396,-22.907,70 6.6,2,1,1,146.64,0.5,36.203,-26.464,32.612,-24.264,80 6.7,2,1,1,155.17,0.5,38.421,-27.822,34.83,-25.623,90 """),sep=',') fig = plt.figure() #Iterate through each vehicle, adding it to the plot for vehicle_no,vehicle_df in df.groupby('NO'): plt.plot(vehicle_df['VEHICLE:SIMSEC'],vehicle_df['DISTTRAVTOT'], label=vehicle_no) plt.legend() #comment this out if you don't want a legned plt.show() plt.close()
If you don't mind could you please try this. mp.scatter(x="NO", y=["DISTTRAVTOT"]) If still not work please attach your data for me to test from my side.
How to read data from a file with a nonstandard separator and plot the values
I've got a .txt file containing numbers like the following: No. --- Amount -------- Location 1 ----- 23.5 -------- -0.0039 3 ----- 2.093 -------- 0.992 7 ----- 1.211 -------- 0.3929 5 ----- 0.898 -------- -1.8933 and so on I've got approx. 700 numbers. Now I want to plot and visualize those numbers. To be exact: I want to show the "Amount" on the x-Axis and the "Location" on the y-Axis. The graph should resemble a sine curve. Moreover, I want to choose certain numbers. For example, only No. 1 and No. 2. That means, I need to read in the Amount and the Location of Number 1 and Number 2. I have never worked before with Python or with Matplotlib. Since I hope that somebody could help me out or give me some hints. So far, I've got the following code: import matplotlib.pyplot as plt import numpy as np import io numbers_file = open('numbers_file.txt').read().replace(',',' ') numbers_data = np.loadtxt(io.StringIO(numbers_file),skiprows=1) x = np.linspace(0, 2*np.pi, 20) y = np.sin(x) beta = x plt.figure(figsize=(7,3)) plt.title('Companies amount and location') plt.plot(x,y, label=r'sin( $\beta$ )') plt.xlabel(r'$\beta$') plt.ylabel('Location') plt.legend() plt.grid() plt.show() Beyond that, I want to use a specific textfile entry as a Start Point and as an End Point for my data graph. Some entries (access via 'No.' in the text file) should be visualized as maximum and as a minimum. I would be very thankful for every help I can get because I am kinda lost on how to solve that.
The easiest way to load the file, considering the unconventional separator, is to create a pandas.DataFrmae with pd.read_csv, which can have a regular expression for the separator. Don't specify sep and engine if the separator is a , The data can then be plotted with pandas.DataFrame.plot, which uses matplotlib as the default backend. There are many ways to select values based on another column, which are covered by other answers. extract column value based on another column pandas dataframe How do I select rows from a DataFrame based on column values? Pandas select rows and columns based on boolean condition selected = df[df['No.'].isin([1, 7])] Tested in python 3.9.7, pandas 1.3.4, matplotlib 3.4.3 import pandas as pd # read the data in from the file df = pd.read_csv('numbers_file.txt', sep=' -* ', engine='python') # display(df) No. Amount Location 0 1 23.500 -0.0039 1 3 2.093 0.9920 2 7 1.211 0.3929 3 5 0.898 -1.8933 # plot ax = df.plot(x='Amount', y='Location') ax = df.plot(kind='scatter', x='Amount', y='Location')
I have converted a continuous feature to categorical. I am getting NaN in Pandas
I have converted a continuous dataset to categorical. I am getting nan values when ever the value of the continuous data is 0.0 after conversion. Below is my code import pandas as pd import matplotlib as plt df = pd.read_csv('NSL-KDD/KDDTrain+.txt',header=None) data = df[33] bins = [0.000,0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40,0.45,0.50,0.55,0.60,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1.00] category = pd.cut(data,bins) category = category.to_frame() print (category) How do I convert the values so that I dont get NaN values. I have attached two screenshots for better understanding how the actual data looks and how the convert data looks. This is the main dataset. This is the what it becomes after using bins and pandas.cut(). How can thos "0.00" stays like the other values in the dataset.
When using pd.cut, you can specify the parameter include_lowest = True. This will make the first internal left inclusive (it will include the 0 value as your first interval starts with 0). So in your case, you can adjust your code to be import pandas as pd import matplotlib as plt df = pd.read_csv('NSL-KDD/KDDTrain+.txt',header=None) data = df[33] bins = [0.000,0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40,0.45,0.50,0.55,0.60,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1.00] category = pd.cut(data,bins,include_lowest=True) category = category.to_frame() print (category) Documentation Reference for pd.cut
Choosing the correct values in excel in Python
General Overview: I am creating a graph of a large data set, however i have created a sample text document so that it is easier to overcome the problems. The Data is from an excel document that will be saved as a CSV. Problem: I am able to compile the data a it will graph (see below) However how i pull the data will not work for all of the different excel sheet i am going to pull off of. More Detail of problem: The Y-Values (Labeled 'Value' and 'Value1') are being pulled for the excel sheet from the numbers 26 and 31 (See picture and Code). This is a problem because the Values 26 and 31 will not be the same for each graph. Lets take a look for this to make more sense. Here is my code import pandas as pd import matplotlib.pyplot as plt pd.read_csv('CSV_GM_NB_Test.csv').T.to_csv('GM_NB_Transpose_Test.csv,header=False) df = pd.read_csv('GM_NB_Transpose_Test.csv', skiprows = 2) DID = df['SN'] Value = df['26'] Value1 = df['31'] x= (DID[16:25]) y= (Value[16:25]) y1= (Value1[16:25]) """ print(x,y) print(x,y1) """ plt.plot(x.astype(int), y.astype(int)) plt.plot(x.astype(int), y1.astype(int)) plt.show() Output: Data Set: Below in the comments you will find the 0bin to my Data Set this is because i do not have enough reputation to post two links. As you can see from the Data Set X- DID = Blue Y-Value = Green Y-Value1 = Grey Troublesome Values = Red The problem again is that the data for the Y-Values are pulled from Row 10&11 from values 26,31 under SN Let me know if more information is needed. Thank you
Not sure why you are creating the transposed CSV version. It is also possible to work directly from your original data. For example: import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv('CSV_GM_NB_Test.csv', skiprows=8) data = df.ix[:,19:].T data.columns = df['SN'] data.plot() plt.show() This would give you: You can use pandas.DataFrame.ix() to give you a sliced version of your data using integer positions. The [:,19:] says to give you columns 19 onwards. The final .T transposes it. You can then apply the values for the SN column as column headings using .columns to specify the names.
Plot rolling mean together with data
I have a DataFrame that looks something like this: ####delays: Worst case Avg case 2014-10-27 2.861433 0.953108 2014-10-28 2.899174 0.981917 2014-10-29 3.080738 1.030154 2014-10-30 2.298898 0.711107 2014-10-31 2.856278 0.998959 2014-11-01 3.118587 1.147104 ... I would like to plot the data of this DataFrame, together with the rolling mean of the data. I would like the data itself should be a dotted line and the rolling mean to be a full line. The worst case column should be in red, while the average case column should be in blue. I've tried the following code: import pandas as pd import matplotlib.pyplot as plt rolling = pd.rolling_mean(delays, 7) delays.plot(x_compat=True, style='r--') rolling.plot(style='r') plt.title('Delays per day on entire network') plt.xlabel('Date') plt.ylabel('Minutes') plt.show() Unfortunately, this gives me 2 different plots. One with the data and one with the rolling mean. Also, the worst case column and average case column are both in red. How can I get this to work?
You need to say to pandas where you want to plot. By default pandas creates a new figure. Just modify these 2 lines: delays.plot(x_compat=True, style='r--') rolling.plot(style='r') by: ax_delays = delays.plot(x_compat=True, style='--', color=["r","b"]) rolling.plot(color=["r","b"], ax=ax_delays, legend=0) in the 2nd line you now tell pandas to plot on ax_delays, and to not show the legend again. To get 2 different colors for the 2 lines, just pass as many colors with color argument (see above).