Given the dataframe df, I could use some help to create two different scatter plots one for the x,y cordinates, the c value is used for the color map with the id "aa" and one with the x,y cordinates, the c value is used for the color map with the id "bb". With the actual data there are over 1000 unique id's.
import numpy as np
import matplotlib.pyplot as plt
import pyodbc
import pandas as pd
#need to add the
data = {'x':[2,4,6, 8,10, 12], 'y':[2,4,6, 8,10, 12], 'c': [.2,.5,.5,.7,.8,.9], 'id':['aa','aa','aa','bb','bb','bb']}
df = pd.DataFrame(data)
print (df)
for d in df.groupby(df['id']):
plt.scatter(d[1][['x']],d[1][['y']], c=d[1][['c']], s=10, alpha=0.3, cmap='viridis')
clb = plt.colorbar();
plt.show()
Returns this error: ValueError: RGBA values should be within 0-1 range
Try this:
df = pd.DataFrame(data)
for d in df.groupby(df['id']):
plt.plot(d[1][['x','y']])
plt.show()
Related
I have a DataFrame full of unique values. I also have a dictionary containing values in the DataFrame and their unique RGB tuple associated with each value. I am trying to plot said values based on their location within the DataFrame and assign each location to its specific color.
You can use map() to create column with color assigned to value and later you can use this column to set color in plot
colors_for_values = {1:'#FF0000', 2:'#00FF00', 3:'#0000FF'}
df['Column_with_Color'] = df['Column_with_Value'].map(colors_for_values)
df.plot.scatter(..., y='Column_with_Value', c='Column_with_Color')
Minimal working code
import pandas as pd
import matplotlib.pyplot as plt
data = {
'A': [1,2,3,1,2,3],
'B': [4,5,6,4,5,6],
'C': [7,8,9,7,8,9]
}
df = pd.DataFrame(data)
#colors = {1:'r', 2:'g', 3:'b'}
#colors = {1:'red', 2:'green', 3:'blue'}
colors_for_values = {1:'#FF0000', 2:'#00FF00', 3:'#0000FF'}
df['Color'] = df['A'].map(colors_for_values)
df['Index'] = df.index
print(df)
#df.plot.scatter(x='Index', y='A', c=['r', 'g', 'b'])
df.plot.scatter(x='Index', y='A', c='Color')
plt.show()
EDIT:
Instead of own colors you may also use predefined colormaps
import pandas as pd
import matplotlib.pyplot as plt
import random
random.seed(0) # to get always the same values
data = {
'A': [random.randrange(100) for _ in range(100)],
}
df = pd.DataFrame(data)
df['Index'] = df.index
print(df)
df.plot.scatter(x='Index', y='A', c='A', cmap='hsv', title="colormap: 'hsv'")
df.plot.scatter(x='Index', y='A', c='A', cmap='plasma', title="colormap: 'plasma'")
plt.show()
You could try to create own colormap using your values but I never tried to do this.
I have a dataframe like this with many more variants and values in each x and y list:
x y
variant
*BCDS%q3rn [45, 59] [18, 14]
F^W#Bfr18 [82, 76] [12, 3]
How can I iterate through each variant (each row has a unique string) and plot the x and y values in a scatterplot? This would result in ~40 plots, which is what I want so I can draw a relationship for each variant. Please advise. Thank you!
You can walk through the columns of a Pandas' DataFrame-object and plot them, either with the build-in plot-function (using Matplotlib under the hood) or by calling Matplotlib directly:
import pandas as pd
import matplotlib.pyplot as plt
# create random test data
import numpy as np
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=['Col 1','Col 2','Col 4','Col 5'])
fig, axs = plt.subplots(1,2)
for col in df:
# pandas plotting
df[col].plot(ax=axs[0])
#matplotlib plotting
axs[1].plot(df[col])
axs[0].set_title('pandas plotting')
axs[1].set_title('matplotlib plotting')
You can convert each column into a list and iterate through them to plot. Below I have created line plots but you could easily convert the code to create scatter plots.
variants = df.index.values.tolist()
x_data = df["x"].to_numpy().tolist()
y_data = df["y"].to_numpy().tolist()
for idx in range(len(variants)):
plt.plot(x_data[idx], y_data[idx],label= variants[idx])
plt.ylabel('Y')
plt.xlabel('X')
plt.legend()
plt.show()
For your sample data, the plot would look like this:
I'd like to show the occurrence in a color map for the frequency of a point , i.e. (1,2) has a frequency of 3 points while still keeping my 'xaxis' (i.e. df['A'])
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A': [1,1,1,1,2,2,3,4,6,7,7],
'B': [2,2,2,3,3,4,5,6,7,8,8]})
plt.figure()
plt.scatter(df['A'], df['B'])
plt.show()
Here is my current plot
I'd like to keep the same axis I have, while adding the colormap. Hope I was being clear.
You can calculate the frequency of a certain value using the collections package.
freq_dic = collections.Counter(df["B"])
You then need to add this new list to your dataframe and add two new options to the scatter plot. The colormap legend is displayed with plt.colorbar. This code is far from perfect, so any further improvements are very welcome.
import pandas as pd
import matplotlib.pyplot as plt
import collections
df = pd.DataFrame({'A': [1,1,1,1,2,2,3,4,6,7,7],
'B': [2,2,2,3,3,4,5,6,7,8,8]})
freq_dic = collections.Counter(df["B"])
for index, entry in enumerate(df["B"]):
df.at[index, 'freq'] = (freq_dic[entry])
plt.figure()
plt.scatter(df['A'], df['B'],
c=df['freq'],
cmap='viridis')
plt.colorbar()
plt.show()
I have 2 questions
First
is there a way to barplot(unstacked) this data frame as I am getting "Empty 'DataFrame': no numeric data to plot"?
df=pd.DataFrame({'midterm':['A','B','B','D'],'Final':['C','A','D','B']}, index=['math', 'sport', 'History', 'Physics'])
Second question:
I manually plot the data of the dataframe like
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'midterm':['A','B','B'],'Final':['C','A','D']}, index=['math', 'sport', 'History'])
fig, ax = plt.subplots()
index = np.asarray([1,10])
width=0.5
plt.bar(index, df.iloc[0,:], width,label='Math')
plt.bar(index+width, df.iloc[1,:], width, label='Sport')
plt.bar(index+2*width, df.iloc[2,:], width, label='History')
xticks=['midterm','final']
plt.xticks=(index,xticks)
plt.legend()
plt.show()
What the code produces is here
This has two problems,
- first , A, B, C, D are not ordered
- second, the y axis starts at point 0,0 which makes the bar of grade C in this graph not visible at all
what i aim to do is here
I will answer your second question. You should restrict your question to a single query. I would map the grades onto integers and then later set the y-ticks. To do so, I define a mapping dictionary and a function mapping which takes the alphabetical grades and converts them into integer values for bar plot height.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'midterm':['A','B','B'],'Final':['C','A','D']}, index=['math', 'sport', 'History'])
map_dict = {'A': 1, 'B': 2, 'C': 3, 'D':4}
def mapping(keys):
values = [map_dict[i] for i in keys]
return values
fig, ax = plt.subplots()
index = np.asarray([1,10])
width=0.5
plt.bar(index, mapping(df.iloc[0,:]), width,label='Math')
plt.bar(index+width, mapping(df.iloc[1,:]), width, label='Sport')
plt.bar(index-width, mapping(df.iloc[2,:]), width, label='History')
xticks=['midterm','final']
ax.set_yticks(range(1, 5))
ax.set_yticklabels(map_dict.keys())
plt.legend()
I'm trying to plot a graph grouped by column values using a for loop without knowing the number of unique values in that column.
You can see sample code below (without a for loop) and the desired output.
I would like that each plot will have different color and marker (as seen below).
This is the code:
import pandas as pd
from numpy import random
df = pd.DataFrame(data = random.randn(5,4), index = ['A','B','C','D','E'],
columns = ['W','X','Y','Z'])
df['W'] = ['10/01/2018 12:00:00','10/03/2018 13:00:00',
'10/03/2018 12:30:00','10/04/2018 12:05:00',
'10/08/2018 12:00:15']
df['W']=pd.to_datetime(df['W'])
df['Entity'] = ['C201','C201','C201','C202','C202']
print(df.head())
fig, ax = plt.subplots()
df[df['Entity']=="C201"].plot(x="W",y="Y",label='C201',ax=ax,marker='x')
df[df['Entity']=="C202"].plot(x="W",y="Y",label='C202',ax=ax, marker='o')
This is the output:
You can first find out the unique values of your df['Entity'] and then loop over them. To generate new markers automatically for each Entity, you can define an order of some markers (let's say 5 in the answer below) which will repeat via marker=next(marker).
Complete minimal answer
import itertools
import pandas as pd
from numpy import random
import matplotlib.pyplot as plt
marker = itertools.cycle(('+', 'o', '*', '^', 's'))
df = pd.DataFrame(data = random.randn(5,4), index = ['A','B','C','D','E'],
columns = ['W','X','Y','Z'])
df['W'] = ['10/01/2018 12:00:00','10/03/2018 13:00:00',
'10/03/2018 12:30:00','10/04/2018 12:05:00',
'10/08/2018 12:00:15']
df['W']=pd.to_datetime(df['W'])
df['Entity'] = ['C201','C201','C201','C202','C202']
fig, ax = plt.subplots()
for idy in np.unique(df['Entity'].values):
df[df['Entity']==idy].plot(x="W",y="Y", label=idy, ax=ax, marker=next(marker))
plt.legend()
plt.show()