I am extremely new to coding, so I appreciate any help I can get.
I have a large data file that I want to create multiple plots for where the first column is the x axis for all of them. The code would ideally then iterate through all the columns with each respectively being the new y axis. I included my code for the individual plots, but want to create a loop to do it for all the columns.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
X = df[:,0]
col_1= df[:,1]
plt.plot(X,col_1)
plt.show()
col_2= df[:,2]
plt.plot(X,col_2)
plt.show()
Pandas will iterate over all the columns for you. Simply place the x column in the index and then just make a call to plot with your dataframe. Pandas uses the index as the x-axis There is no need to directly use matplotlib. Here is some fake data with a plot:
df = pd.DataFrame(np.random.rand(10,5), columns=['x', 'y1', 'y2', 'y3', 'y4'])
df = df.sort_values('x')
x y1 y2 y3 y4
9 0.262202 0.417279 0.075722 0.547804 0.599150
5 0.314894 0.611873 0.880390 0.282140 0.513770
8 0.406541 0.933734 0.879495 0.500626 0.527526
2 0.407636 0.550611 0.646449 0.635693 0.807088
1 0.437580 0.194937 0.501611 0.949575 0.409130
4 0.497347 0.443345 0.658259 0.457635 0.851847
3 0.500726 0.569175 0.304910 0.151071 0.678991
6 0.547433 0.512125 0.539995 0.701858 0.358552
0 0.783461 0.649381 0.320577 0.107062 0.840443
7 0.793702 0.951807 0.938635 0.526010 0.098321
df.set_index('x').plot(subplots=True)
You could loop through each column plotting it on its own subplot like so:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(df.shape[1]-1, sharex=True)
for i in range(df.shape[1]-1):
ax[i].plot(df[:,0], df[:,i+1])
plt.show()
edit
I just realized your example was displaying 1 plot at a time. You could accomplish that like this:
import matplotlib.pyplot as plt
for i in range(df.shape[1]-1):
plt.plot(df[:,0], df[:,i+1])
plt.show()
plt.close()
Related
I have sample data in dataframe as below
Header=['Date','EmpCount','DeptCount']
2009-01-01,100,200
print(df)
Date EmpCount DeptCount
0 2009-01-01 100 200
Can we generate Scatter plot(or any Line chart etc..) only with this one record.
I tried multiple approaches but i am getting
TypeError: no numeric data to plot
In X Axis: Dates
In Y Axis: Two dots one for Emp Count , and other one is for dept count
Starting from #the-cauchy-criterion, try this:
import pandas as pd
import matplotlib.pyplot as plt
header=['Date','EmpCount','DeptCount']
df = pd.DataFrame([['2009-01-01',100,200]],columns=header)
b=df.set_index('Date')
ax = plt.plot(b, linewidth=3, markersize=10, marker='.')
What are you using to plot the scatter plot?
Here's how to do it with pyplot.
import pandas as pd
import matplotlib.pyplot as plt
header=['Date','EmpCount','DeptCount']
df = pd.DataFrame([['2009-01-01',100,200]],columns=header)
plt.scatter(*df.iloc[0][1:])
plt.show()
iloc[0] gets the first entry, [1:] takes all the columns except the first and the * operator unpacks the arguments.
I have two DataFrames like below. They are exactly the same in terms of column names and structure. The difference is one dataframe is prediction and one is observed data. I need to plot a figure with subplots where each subplot title is the column names, X-axis is the index values and Y-axis is the values in the table. I need two lines drawn in each graph, 1 from the dataframe with prediction, one with observed. Below is what I have been trying to do but its not working.
08FB006 08FC001 08FC005 08GD004 08GD005
----------------------------------------------------------------
0 1.005910075 0.988765247 0.00500000 0.984376392 5.099999889
1 1.052696367 1.075232414 0.00535313 1.076066586 5.292135227
2 1.101749034 1.169026145 0.005731682 1.176162168 5.491832766
3 1.153183046 1.270744221 0.006137526 1.285419625 5.699405829
4 1.207119522 1.381030962 0.006572672 1.404662066 5.915181534
5 1.263686077 1.500580632 0.007039282 1.534784937 6.139501445
6 1.323017192 1.630141078 0.007539681 1.676762214 6.372722261
7 1.38525461 1.770517606 0.008076372 1.831653101 6.615216537
8 1.450547748 1.922577115 0.008652045 2.000609283 6.867373442
9 1.519054146 2.087252499 0.009269598 2.184882781 7.129599561
10 1.590939931 2.265547339 0.009932148 2.385834436 7.402319731
The sample code, I have 81 stations/columns
import matplotlib.pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(nrows=81, ncols=1)
newdf1.plot(ax=axes[0,0])
newdf1_pred.plot(ax=axes[0,1])
I set up a dummy dataframe to help visualize what I did.
import pandas as pd
import numpy as np
obs = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
pred = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
Plotting two dataframes on the same plot is no different than plotting two columns. You just have to include them on the same axis. Here is how I did it:
import matplotlib.pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(figsize=(5,7), nrows=4, ncols=1)
for col in range(len(obs.columns)): # loop over the number of columns to plot
obs.iloc[col].plot(ax=axes[col], label='observed') # plot observed data
pred.iloc[col].plot(ax=axes[col], label='predicted') # plot predicted data
axes[col].legend(loc='upper right') # legends to know which is which
axes[col].set_title(obs.columns[col]) # titles to know which column/variable
plt.tight_layout() # just to make it easier to read the plots
Here was my output:
I hope this helps.
I have a list of case and control samples along with the information about what characteristics are present or absent in each of them. A dataframe including the information can be generated by Pandas:
import pandas as pd
df={'Patient':[True,True,False],'Control':[False,True,False]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
I need to visualize this data as a dotplot/scatterplot in the way that both of the x and y axis to be categorical and presence/absence to be coded by different shapes. Something like following:
Patient| x x -
Control| - x -
__________________
GeneA GeneB GeneC
I am new to Matplotlib/seaborn and I can plot simple line plots and scatter plots. But searching online I could not find any instructions or plot similar to what I need here.
A quick way would be:
import pandas as pd
import matplotlib.pyplot as plt
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
heatmap = plt.imshow(df)
plt.xticks(range(len(df.columns.values)), df.columns.values)
plt.yticks(range(len(df.index)), df.index)
cbar = plt.colorbar(mappable=heatmap, ticks=[0, 1], orientation='vertical')
# vertically oriented colorbar
cbar.ax.set_yticklabels(['Absent', 'Present'])
Thanks to #DEEPAK SURANA for adding labels to the colorbar.
I searched the pyplot documentation and could not find a scatter or dot plot exactly like you described. Here is my take on creating a plot that illustrates what you want. The True records are blue and the False records are red.
# creating dataframe and extra column because index is not numeric
import pandas as pd
df={'Patient':[True,True,False],
'Control':[False,True,False]}
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
df['level'] = [i for i in range(0, len(df))]
print(df)
# plotting the data
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,6))
for idx, gene in enumerate(df.columns[:-1]):
df_gene = df[[gene, 'level']]
cList = ['blue' if x == True else 'red' for x in df[gene]]
for inr_idx, lv in enumerate(df['level']):
ax.scatter(x=idx, y=lv, c=cList[inr_idx], s=20)
fig.tight_layout()
plt.yticks([i for i in range(len(df.index))], list(df.index))
plt.xticks([i for i in range(len(df.columns)-1)], list(df.columns[:-1]))
plt.show()
Something like this might work
import pandas as pd
import numpy as np
from matplotlib.ticker import FixedLocator
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
plot = df.T.plot()
loc = FixedLocator([0,1,2])
plot.xaxis.set_major_locator(loc)
plot.xaxis.set_ticklabels(df.columns)
look at https://matplotlib.org/examples/pylab_examples/major_minor_demo1.html
and https://matplotlib.org/api/ticker_api.html
I think you have to convert the boolean values to zeros and ones to make it work. Someting like df.astype(int)
I have two dataFrames that I would like to plot into a single graph. Here's a basic code:
#!/usr/bin/python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
scenarios = ['scen-1', 'scen-2']
for index, item in enumerate(scenarios):
df = pd.DataFrame({'A' : np.random.randn(4)})
print df
df.plot()
plt.ylabel('y-label')
plt.xlabel('x-label')
plt.title('Title')
plt.show()
However, this only plots the last dataFrame. If I use pd.concat() it plots one line with the combined values.
How can I plot two lines, one for the first dataFrame and one for the second one?
You need to put your plot in the for loop.
If you want them on a single plot then you need to use plot's ax kwarg to put them to plot on the same axis. Here I have created a fresh axis using subplots but this could be an already populated axis,
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
scenarios = ['scen-1', 'scen-2']
fig, ax = plt.subplots()
for index, item in enumerate(scenarios):
df = pd.DataFrame({'A' : np.random.randn(4)})
print df
df.plot(ax=ax)
plt.ylabel('y-label')
plt.xlabel('x-label')
plt.title('Title')
plt.show()
The plot function is only called once, and as you say this is with the last value of df. Put df.plot() inside the loop.
Objective: To generate 100 barplots using a for loop, and display the output as a subplot image
Data format: Datafile with 101 columns. The last column is the X variable; the remaining 100 columns are the Y variables, against which x is plotted.
Desired output: Barplots in 5 x 20 subplot array, as in this example image:
Current approach: I've been using PairGrid in seaborn, which generates an n x 1 array: .
where input == dataframe; input3 == list from which column headers are called:
for i in input3:
plt.figure(i)
g = sns.PairGrid(input,
x_vars=["key_variable"],
y_vars=i,
aspect=.75, size=3.5)
g.map(sns.barplot, palette="pastel")
Does anyone have any ideas how to solve this?
To give an example of how to plot 100 dataframe columns over a grid of 20 x 5 subplots:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = np.random.rand(3,101)
data[:,0] = np.arange(2,7,2)
df = pd.DataFrame(data)
fig, axes = plt.subplots(nrows=5, ncols=20, figsize=(21,9), sharex=True, sharey=True)
for i, ax in enumerate(axes.flatten()):
ax.bar(df.iloc[:,0], df.iloc[:,i+1])
ax.set_xticks(df.iloc[:,0])
plt.show()
You can try to use matplotlob's subplots to create the plot grid and pass the axis to the barplot. The axis indexing you could do using a nested loop...