I have sample data in dataframe as below
Header=['Date','EmpCount','DeptCount']
2009-01-01,100,200
print(df)
Date EmpCount DeptCount
0 2009-01-01 100 200
Can we generate Scatter plot(or any Line chart etc..) only with this one record.
I tried multiple approaches but i am getting
TypeError: no numeric data to plot
In X Axis: Dates
In Y Axis: Two dots one for Emp Count , and other one is for dept count
Starting from #the-cauchy-criterion, try this:
import pandas as pd
import matplotlib.pyplot as plt
header=['Date','EmpCount','DeptCount']
df = pd.DataFrame([['2009-01-01',100,200]],columns=header)
b=df.set_index('Date')
ax = plt.plot(b, linewidth=3, markersize=10, marker='.')
What are you using to plot the scatter plot?
Here's how to do it with pyplot.
import pandas as pd
import matplotlib.pyplot as plt
header=['Date','EmpCount','DeptCount']
df = pd.DataFrame([['2009-01-01',100,200]],columns=header)
plt.scatter(*df.iloc[0][1:])
plt.show()
iloc[0] gets the first entry, [1:] takes all the columns except the first and the * operator unpacks the arguments.
Related
I have a multi index dataframe, with the two indices being Sample and Lithology
Sample 20EC-P 20EC-8 20EC-10-1 ... 20EC-43 20EC-45 20EC-54
Lithology Pd Di-Grd Gb ... Hbl Plag Pd Di-Grd Gb
Rb 7.401575 39.055118 6.456693 ... 0.629921 56.535433 11.653543
Ba 24.610102 43.067678 10.716841 ... 1.073115 58.520532 56.946630
Th 3.176471 19.647059 3.647059 ... 0.823529 29.647059 5.294118
I am trying to put it into a seaborn lineplot as such.
spider = sns.lineplot(data = data, hue = data.columns.get_level_values("Lithology"),
style = data.columns.get_level_values("Sample"),
dashes = False, palette = "deep")
The lineplot comes out as
1
I have two issues. First, I want to format hues by lithology and style by sample. Outside of the lineplot function, I can successfully access sample and lithology using data.columns.get_level_values, but in the lineplot they don't seem to do anything and I haven't figured out another way to access these values. Also, the lineplot reorganizes the x-axis by alphabetical order. I want to force it to keep the same order as the dataframe, but I don't see any way to do this in the documentation.
To use hue= and style=, seaborn prefers it's dataframes in long form. pd.melt() will combine all columns and create new columns with the old column names, and a column for the values. The index too needs to be converted to a regular column (with .reset_index()).
Most seaborn functions use order= to set an order on the x-values, but with lineplot the only way is to make the column categorical applying a fixed order.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
column_tuples = [('20EC-P', 'Pd '), ('20EC-8', 'Di-Grd'), ('20EC-10-1 ', 'Gb'),
('20EC-43', 'Hbl Plag Pd'), ('20EC-45', 'Di-Grd'), ('20EC-54', 'Gb')]
col_index = pd.MultiIndex.from_tuples(column_tuples, names=["Sample", "Lithology"])
data = pd.DataFrame(np.random.uniform(0, 50, size=(3, len(col_index))), columns=col_index, index=['Rb', 'Ba', 'Th'])
data_long = data.melt(ignore_index=False).reset_index()
data_long['index'] = pd.Categorical(data_long['index'], data.index) # make categorical, use order of the original dataframe
ax = sns.lineplot(data=data_long, x='index', y='value',
hue="Lithology", style="Sample", dashes=False, markers=True, palette="deep")
ax.set_xlabel('')
ax.legend(loc='upper left', bbox_to_anchor=(1.01, 1.02))
plt.tight_layout() # fit legend and labels into the figure
plt.show()
The long dataframe looks like:
index Sample Lithology value
0 Rb 20EC-P Pd 6.135005
1 Ba 20EC-P Pd 6.924961
2 Th 20EC-P Pd 44.270570
...
I have a list of case and control samples along with the information about what characteristics are present or absent in each of them. A dataframe including the information can be generated by Pandas:
import pandas as pd
df={'Patient':[True,True,False],'Control':[False,True,False]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
I need to visualize this data as a dotplot/scatterplot in the way that both of the x and y axis to be categorical and presence/absence to be coded by different shapes. Something like following:
Patient| x x -
Control| - x -
__________________
GeneA GeneB GeneC
I am new to Matplotlib/seaborn and I can plot simple line plots and scatter plots. But searching online I could not find any instructions or plot similar to what I need here.
A quick way would be:
import pandas as pd
import matplotlib.pyplot as plt
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
heatmap = plt.imshow(df)
plt.xticks(range(len(df.columns.values)), df.columns.values)
plt.yticks(range(len(df.index)), df.index)
cbar = plt.colorbar(mappable=heatmap, ticks=[0, 1], orientation='vertical')
# vertically oriented colorbar
cbar.ax.set_yticklabels(['Absent', 'Present'])
Thanks to #DEEPAK SURANA for adding labels to the colorbar.
I searched the pyplot documentation and could not find a scatter or dot plot exactly like you described. Here is my take on creating a plot that illustrates what you want. The True records are blue and the False records are red.
# creating dataframe and extra column because index is not numeric
import pandas as pd
df={'Patient':[True,True,False],
'Control':[False,True,False]}
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
df['level'] = [i for i in range(0, len(df))]
print(df)
# plotting the data
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,6))
for idx, gene in enumerate(df.columns[:-1]):
df_gene = df[[gene, 'level']]
cList = ['blue' if x == True else 'red' for x in df[gene]]
for inr_idx, lv in enumerate(df['level']):
ax.scatter(x=idx, y=lv, c=cList[inr_idx], s=20)
fig.tight_layout()
plt.yticks([i for i in range(len(df.index))], list(df.index))
plt.xticks([i for i in range(len(df.columns)-1)], list(df.columns[:-1]))
plt.show()
Something like this might work
import pandas as pd
import numpy as np
from matplotlib.ticker import FixedLocator
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
plot = df.T.plot()
loc = FixedLocator([0,1,2])
plot.xaxis.set_major_locator(loc)
plot.xaxis.set_ticklabels(df.columns)
look at https://matplotlib.org/examples/pylab_examples/major_minor_demo1.html
and https://matplotlib.org/api/ticker_api.html
I think you have to convert the boolean values to zeros and ones to make it work. Someting like df.astype(int)
I have two dataFrames that I would like to plot into a single graph. Here's a basic code:
#!/usr/bin/python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
scenarios = ['scen-1', 'scen-2']
for index, item in enumerate(scenarios):
df = pd.DataFrame({'A' : np.random.randn(4)})
print df
df.plot()
plt.ylabel('y-label')
plt.xlabel('x-label')
plt.title('Title')
plt.show()
However, this only plots the last dataFrame. If I use pd.concat() it plots one line with the combined values.
How can I plot two lines, one for the first dataFrame and one for the second one?
You need to put your plot in the for loop.
If you want them on a single plot then you need to use plot's ax kwarg to put them to plot on the same axis. Here I have created a fresh axis using subplots but this could be an already populated axis,
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
scenarios = ['scen-1', 'scen-2']
fig, ax = plt.subplots()
for index, item in enumerate(scenarios):
df = pd.DataFrame({'A' : np.random.randn(4)})
print df
df.plot(ax=ax)
plt.ylabel('y-label')
plt.xlabel('x-label')
plt.title('Title')
plt.show()
The plot function is only called once, and as you say this is with the last value of df. Put df.plot() inside the loop.
I am trying to make a line plot in which every one of the elements from the index appears as an xtick.
import pandas as pd
ind = ['16-12', '17-01', '17-02', '17-03', '17-04',
'17-05','17-06', '17-07', '17-08', '17-09', '17-10', '17-11']
data = [1,3,5,2,3,6,4,7,8,5,3,8]
df = pd.DataFrame(data,index=ind)
df.plot(kind='line',x_compat=True)
however the resultant plot skips every second element of the index like so:
My code to call the plot includes the (x_compat=True) parameter which the documentation for pandas suggests should stop the auto tick configuratioin but it seems to have no effect.
You need to use ticker object on axis and then use that axis when plotting.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
ind = ['16-12', '17-01', '17-02', '17-03', '17-04',
'17-05','17-06', '17-07', '17-08', '17-09', '17-10', '17-11']
data = [1,3,5,2,3,6,4,7,8,5,3,8]
df = pd.DataFrame(data,index=ind)
ax2 = plt.axes()
ax2.xaxis.set_major_locator(ticker.MultipleLocator(1))
df.plot(kind='line', ax=ax2)
I am plotting one column of a pandas dataframe as line plot, using plot() :
df.iloc[:,1].plot()
and get the desired result:
Now I want to plot another column of the same dataframe as bar chart using
ax=df.iloc[:,3].plot(kind='bar',width=1)
with the result:
And finally I want to combine both by
spy_price_data.iloc[:,1].plot(ax=ax)
which doesn't produce any plot.
Why are the x-ticks of the bar plot so different to the x-ticks of the line plot? How can I combine both plots in one plot?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
some data
df = pd.DataFrame(np.random.randn(5,2))
print (df)
0 1
0 0.008177 -0.121644
1 0.643535 -0.070786
2 -0.104024 0.872997
3 -0.033835 0.067264
4 -0.576762 0.571293
then we create an axes object (ax). Notice that we pass ax to both plots
_, ax = plt.subplots()
df[0].plot(ax=ax)
df[1].plot(kind='bar', ax=ax)