How to create a scatter plot for each dataframe column - python

I am trying to write some code in order to create an animation of scatter plot data through tine. In order to do this I have a dataset with multiple columns where each column represents a numbered timestep.
I would like the code to cycle through each timestep column for the y axis and use a constant x axis, so that a separate scatter plot is generated for each timestep. I tried to do this by coding a for loop that specifies an incrementing column number for the y axis.
My current code generates three out of seven scatter plots in my sample data but returns the following error:
IndexError: index 9 is out of bounds for axis 0 with size 9
I have tried other similar solutions on stack overflow but that didn't correct my problem.
The data is here if anyone wants to use what I am using: https://www.dropbox.com/s/7vwa0lud44td2ak/test_splot_anim_noTS.csv?dl=0data file
Any help or advice would be much appreciated.
import numpy as np
import pandas a pd
import matplotlib as mpl
import matplotlib.pyplot as plt
data=pd.read_csv("test_splot_anim_noTS.csv")
for n in range (6, 13):
data.plot(kind='scatter', x='metres', y=n)
plt.ylim(-4,4)
plt.savefig('n.jpeg')

data=pd.read_csv("test_splot_anim_noTS.csv")
for column in data.columns[1:]:
data.plot(kind='scatter', x='metres',y=column)
plt.ylim(-4,4)
plt.savefig('{}.jpeg'.format(column))
I may have done it!

panda.DataFrame.plot, single line plot
data=pd.read_csv("test_splot_anim_noTS.csv")
data.set_index('metres', drop=True, inplace=True)
data.plot()
With matplotlib, single plot with all columns:
import matplotlib.pyplot as plt
plt.plot(data)
plt.show()
Separate scatter plots, files saved:
for col in data.columns:
plt.scatter(data.index, data[col])
plt.ylim(-4, 4)
plt.savefig(f'{col}.jpeg')
plt.show()
With Seaborn:
for col in data.columns:
sns.scatterplot(data.index, data[col])
plt.ylim(-4,4)
plt.savefig(f'{col}.jpeg')
plt.show()

Related

How do I make subplots with this method

I'm trying to make a subplot of histograms for each of the features in the dataset.
The following code is what I have already tried to fix the problem. Consider train dataset, which has 9 columns and which I want to be plotted in a 3*3 subplot.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(nrows=3, ncols=3)
i=0
for row in ax:
for col in row:
train.iloc[:,i].hist()
i=i+1
I'm getting all histograms in the last subplot.
here my suggestion:
import matplotlib.pyplot as plt
import random
for i in range(1,7):
# Cut your figure into 3 row and 3 columns
# and create the plot in the i subplot.
# here I used the f-string formatting that is available from python3.6
plt.subplot(f'33{i}')
plt.hist(random.randrange(0, 10))
you can find more ideas at this amazing website: The Python Graph Gallery
pandas.DataFrame.hist can take an ax parameter which is the Matplotlib axes to use.

Display all x values of a graph

I know it has already been asked, but I could not solve my problem.
I have three pandas column, One with dates, and other with values.
I can get my graph with the two curves depending on date.
However, I cannot display all dates in the x axis. Can you help me?
import pandas as pd
import matplotlib.pyplot as plt
# mau_file is the pandas dataframe with three columns.
plt.figure()
mau_file.plot(x='month_date', y=['mau', 'nb_migs'], figsize=(10,5), grid=True)
plt.set_xticklabels(mau_file['month_date'])
plt.legend(loc='best')
plt.show()
Usually, plt.xticks() is used to display x axis values.
As I'm not sure it is 100% compatible with a pandas structure, you may need to store your data in a classical table or a numpy array.
Documentation of plt.xticks()
EDIT : It is possible to chose the orientation of the labels.
For exemple plt.xticks(x, labels, rotation='vertical') will give you vertical labels.

Pandas: Number of passed axes does not match the number of columns to plot

I want to plot three columns of a pandas data frame to a 2x2 subplot layout:
rand=np.random.random((12,6))
df=pd.DataFrame(columns=['a','b','c','d','e','f'],data=rand)
ax=df.loc[:,'a':'c'].plot(subplots=True,layout=(2,2))
This gives me the desired result (with an empty plot in the lower right corner). However, if I try to add another column to all of these subplots using
df.loc[:,'d':'f'].plot(subplots=True,ax=ax)
the following error occurs:
ValueError: The number of passed axes must be 3, the same as the output plot
Is there any solution to this problem without having to rearrange the layout of the subplots?
In case of
ax=df.plot(subplots=True,layout=(2,2))
ax is a 2x2 array with 4 elements. Those are the 4 axes, where the last one is invisible if df only contains 3 columns. When plotting again 3 columns you need to supply 3 axes, not all 4. An option is to index the ax array to the number of columns to use.
df2.plot(subplots=True,ax=ax.flatten()[:3])
Complete example:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
rand=np.random.random((12,6))
df=pd.DataFrame(columns=['a','b','c','d','e','f'],data=rand)
ax=df.loc[:,'a':'c'].plot(subplots=True,layout=(2,2))
df.loc[:,'d':'f'].plot(subplots=True,ax=ax.flatten()[:3])
plt.show()

Create a checkerboard plot with unbalanced rows and colums

I have a dataset similar to this format X = [[1,4,5], [34,70,1,5], [43,89,4,11], [22,76,4]] where the length of element lists are not equal.
I want to create a checkerboard plot of 4 rows and 4 columns and the colorbar of each unit box corresponds to the value of the number. In this dataset some small boxes will be missing (eg. 4th column firs row).
How would I plot this in python using matplotlib?
Thanks
You can use seaborn library or matplotlib to generate heatmap. Firstly, convert it to pandas dataframe to handle missing values.
import pandas as pd
df = pd.DataFrame([[1,4,5],[34,70,1,5], [43,89,4,11],[22,76,4]])
%matplotlib inline
from matplotlib import pyplot as plt
import seaborn as sns
sns.heatmap(df)
plt.show()
Result looks something like this.

matplotlib.pyplot issue with subplot, np.ones and np.arange?

I am plotting several data types which share the x axis so I am using the matplotlib.pylot subplots command
The shared x axis is time (in years AD). The last subplot I have is the number of independent observations as a function of the time. I have the following code
import numpy as np
import matplotlib.pyplot as plt
#
# There's a bunch of data analysis here
#
f, ax = plt.subplots(4, sharex=True)
# Here I plot the first 3 subplots with no issue
x = np.arange(900, 2000, 1)#make x array in steps of 1
ax[3].plot(x[0:28], np.ones(len(x[0:28])),'k')#one observation from 900-927 AD
ax[3].plot(x[29:62], 2*np.ones(len(x[29:62])),'k')#two observations from 928-961 AD
Now when I run this code, the subplot I get only shows the second ax[3] plot and not the first. How can I fix this?? Thanks
Ok, I think I found an answer. The first plot was plotting but I couldn't see it with the axes so I changed the y limits
ax[3].axes.set_ylim([0 7])
That seemed to work, although is there a way to connect these horizontal lines, perhaps with dashed lines?

Categories

Resources