I have 2 questions
First
is there a way to barplot(unstacked) this data frame as I am getting "Empty 'DataFrame': no numeric data to plot"?
df=pd.DataFrame({'midterm':['A','B','B','D'],'Final':['C','A','D','B']}, index=['math', 'sport', 'History', 'Physics'])
Second question:
I manually plot the data of the dataframe like
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'midterm':['A','B','B'],'Final':['C','A','D']}, index=['math', 'sport', 'History'])
fig, ax = plt.subplots()
index = np.asarray([1,10])
width=0.5
plt.bar(index, df.iloc[0,:], width,label='Math')
plt.bar(index+width, df.iloc[1,:], width, label='Sport')
plt.bar(index+2*width, df.iloc[2,:], width, label='History')
xticks=['midterm','final']
plt.xticks=(index,xticks)
plt.legend()
plt.show()
What the code produces is here
This has two problems,
- first , A, B, C, D are not ordered
- second, the y axis starts at point 0,0 which makes the bar of grade C in this graph not visible at all
what i aim to do is here
I will answer your second question. You should restrict your question to a single query. I would map the grades onto integers and then later set the y-ticks. To do so, I define a mapping dictionary and a function mapping which takes the alphabetical grades and converts them into integer values for bar plot height.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'midterm':['A','B','B'],'Final':['C','A','D']}, index=['math', 'sport', 'History'])
map_dict = {'A': 1, 'B': 2, 'C': 3, 'D':4}
def mapping(keys):
values = [map_dict[i] for i in keys]
return values
fig, ax = plt.subplots()
index = np.asarray([1,10])
width=0.5
plt.bar(index, mapping(df.iloc[0,:]), width,label='Math')
plt.bar(index+width, mapping(df.iloc[1,:]), width, label='Sport')
plt.bar(index-width, mapping(df.iloc[2,:]), width, label='History')
xticks=['midterm','final']
ax.set_yticks(range(1, 5))
ax.set_yticklabels(map_dict.keys())
plt.legend()
Related
I'd like to show the occurrence in a color map for the frequency of a point , i.e. (1,2) has a frequency of 3 points while still keeping my 'xaxis' (i.e. df['A'])
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A': [1,1,1,1,2,2,3,4,6,7,7],
'B': [2,2,2,3,3,4,5,6,7,8,8]})
plt.figure()
plt.scatter(df['A'], df['B'])
plt.show()
Here is my current plot
I'd like to keep the same axis I have, while adding the colormap. Hope I was being clear.
You can calculate the frequency of a certain value using the collections package.
freq_dic = collections.Counter(df["B"])
You then need to add this new list to your dataframe and add two new options to the scatter plot. The colormap legend is displayed with plt.colorbar. This code is far from perfect, so any further improvements are very welcome.
import pandas as pd
import matplotlib.pyplot as plt
import collections
df = pd.DataFrame({'A': [1,1,1,1,2,2,3,4,6,7,7],
'B': [2,2,2,3,3,4,5,6,7,8,8]})
freq_dic = collections.Counter(df["B"])
for index, entry in enumerate(df["B"]):
df.at[index, 'freq'] = (freq_dic[entry])
plt.figure()
plt.scatter(df['A'], df['B'],
c=df['freq'],
cmap='viridis')
plt.colorbar()
plt.show()
Given the dataframe df, I could use some help to create two different scatter plots one for the x,y cordinates, the c value is used for the color map with the id "aa" and one with the x,y cordinates, the c value is used for the color map with the id "bb". With the actual data there are over 1000 unique id's.
import numpy as np
import matplotlib.pyplot as plt
import pyodbc
import pandas as pd
#need to add the
data = {'x':[2,4,6, 8,10, 12], 'y':[2,4,6, 8,10, 12], 'c': [.2,.5,.5,.7,.8,.9], 'id':['aa','aa','aa','bb','bb','bb']}
df = pd.DataFrame(data)
print (df)
for d in df.groupby(df['id']):
plt.scatter(d[1][['x']],d[1][['y']], c=d[1][['c']], s=10, alpha=0.3, cmap='viridis')
clb = plt.colorbar();
plt.show()
Returns this error: ValueError: RGBA values should be within 0-1 range
Try this:
df = pd.DataFrame(data)
for d in df.groupby(df['id']):
plt.plot(d[1][['x','y']])
plt.show()
I would like to plot certain slices of my Pandas Dataframe for each rows (based on row indexes) with different colors.
My data look like the following:
I already tried with the help of this tutorial to find a way but I couldn't - probably due to a lack of skills.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("D:\SOF10.csv" , header=None)
df.head()
#Slice interested data
C = df.iloc[:, 2::3]
#Plot Temp base on row index colorfully
C.apply(lambda x: plt.scatter(x.index, x, c='g'))
plt.show()
Following is my expected plot:
I was also wondering if I could displace the mean of each row of the sliced data which contains 480 values somewhere in the plot or in the legend beside of plot! Is it feasible (like the following picture) to calculate the mean and displaced somewhere in the legend or by using small font size displace next to its own data in graph ?
Data sample: data
This gives the plot without legend
C = df.iloc[:,2::3].stack().reset_index()
C.columns = ['level_0', 'level_1', 'Temperature']
fig, ax = plt.subplots(1,1)
C.plot('level_0', 'Temperature',
ax=ax, kind='scatter',
c='level_0', colormap='tab20',
colorbar=False, legend=True)
ax.set_xlabel('Cycles')
plt.show()
Edit to reflect modified question:
stack() transform your (sliced) dataframe to a series with index (row, col)
reset_index() reset the double-level index above to level_0 (row), level_1 (col).
set_xlabel sets the label of x-axis to what you want.
Edit 2: The following produces scatter with legend:
CC = df.iloc[:,2::3]
fig, ax = plt.subplots(1,1, figsize=(16,9))
labels = CC.mean(axis=1)
for i in CC.index:
ax.scatter([i]*len(CC.columns[1:]), CC.iloc[i,1:], label=labels[i])
ax.legend()
ax.set_xlabel('Cycles')
ax.set_ylabel('Temperature')
plt.show()
This may be an approximate answer. scatter(c=, cmap= can be used for desired coloring.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import itertools
df = pd.DataFrame({'a':[34,22,1,34]})
fig, subplot_axes = plt.subplots(1, 1, figsize=(20, 10)) # width, height
colors = ['red','green','blue','purple']
cmap=matplotlib.colors.ListedColormap(colors)
for col in df.columns:
subplot_axes.scatter(df.index, df[col].values, c=df.index, cmap=cmap, alpha=.9)
I'm trying to plot a graph grouped by column values using a for loop without knowing the number of unique values in that column.
You can see sample code below (without a for loop) and the desired output.
I would like that each plot will have different color and marker (as seen below).
This is the code:
import pandas as pd
from numpy import random
df = pd.DataFrame(data = random.randn(5,4), index = ['A','B','C','D','E'],
columns = ['W','X','Y','Z'])
df['W'] = ['10/01/2018 12:00:00','10/03/2018 13:00:00',
'10/03/2018 12:30:00','10/04/2018 12:05:00',
'10/08/2018 12:00:15']
df['W']=pd.to_datetime(df['W'])
df['Entity'] = ['C201','C201','C201','C202','C202']
print(df.head())
fig, ax = plt.subplots()
df[df['Entity']=="C201"].plot(x="W",y="Y",label='C201',ax=ax,marker='x')
df[df['Entity']=="C202"].plot(x="W",y="Y",label='C202',ax=ax, marker='o')
This is the output:
You can first find out the unique values of your df['Entity'] and then loop over them. To generate new markers automatically for each Entity, you can define an order of some markers (let's say 5 in the answer below) which will repeat via marker=next(marker).
Complete minimal answer
import itertools
import pandas as pd
from numpy import random
import matplotlib.pyplot as plt
marker = itertools.cycle(('+', 'o', '*', '^', 's'))
df = pd.DataFrame(data = random.randn(5,4), index = ['A','B','C','D','E'],
columns = ['W','X','Y','Z'])
df['W'] = ['10/01/2018 12:00:00','10/03/2018 13:00:00',
'10/03/2018 12:30:00','10/04/2018 12:05:00',
'10/08/2018 12:00:15']
df['W']=pd.to_datetime(df['W'])
df['Entity'] = ['C201','C201','C201','C202','C202']
fig, ax = plt.subplots()
for idy in np.unique(df['Entity'].values):
df[df['Entity']==idy].plot(x="W",y="Y", label=idy, ax=ax, marker=next(marker))
plt.legend()
plt.show()
Is it possible to use a column in a dataframe to scale the marker size in matplotlib? I keep getting an error about using a series when I do the following.
import pandas as pd
import matplotlib.pyplot as plt
my_dict = {'Vx': [16,25,85,45], 'r': [1315,5135,8444,1542], 'ms': [10,50,100, 25]}
df= pd.DataFrame(my_dict)
fig, ax = plt.subplots(1, 1, figsize=(20, 10))
ax.plot(df.Vx, df.r, '.', markersize= df.ms)
when I run
ValueError: setting an array element with a sequence.
I'm guessing it does not like the fact that Im feeding a series to the marker, but there must be a way to make it work...
Use plt.scatter instead of plt.plot. Scatter lets you specify the size s as well as the color c of the points using a tuple or list.
import pandas as pd
import matplotlib.pyplot as plt
my_dict = {'Vx': [16,25,85,45], 'r': [1315,5135,8444,1542], 'ms': [10,50,100, 25]}
df= pd.DataFrame(my_dict)
fig, ax = plt.subplots(1, 1, figsize=(20, 10))
ax.scatter(df.Vx, df.r, s= df.ms)
plt.show()
Better to use the built-in scatter plot function in pandas where you can pass a whole series object as the size param to vary the bubble size:
df.plot.scatter(x=['Vx'], y=['r'], s=df['ms'], c='g') # df['ms']*5 bubbles more prominent
Or, if you want to go via the matplotlib route, you need to pass a scalar value present in the series object each time to the markersize arg.
fig, ax = plt.subplots()
[ax.plot(row['Vx'], row['r'], '.', markersize=row['ms']) for idx, row in df.iterrows()]
plt.show()