I know it has already been asked, but I could not solve my problem.
I have three pandas column, One with dates, and other with values.
I can get my graph with the two curves depending on date.
However, I cannot display all dates in the x axis. Can you help me?
import pandas as pd
import matplotlib.pyplot as plt
# mau_file is the pandas dataframe with three columns.
plt.figure()
mau_file.plot(x='month_date', y=['mau', 'nb_migs'], figsize=(10,5), grid=True)
plt.set_xticklabels(mau_file['month_date'])
plt.legend(loc='best')
plt.show()
Usually, plt.xticks() is used to display x axis values.
As I'm not sure it is 100% compatible with a pandas structure, you may need to store your data in a classical table or a numpy array.
Documentation of plt.xticks()
EDIT : It is possible to chose the orientation of the labels.
For exemple plt.xticks(x, labels, rotation='vertical') will give you vertical labels.
Related
I am new to data visualization, so please bear with me.
I am trying to create a data plot that describes various different attributes on a data set on blockbuster movies. The x-axis will be year of the movie and the y-axis will be worldwide gross. Now, some movies have made upwards of a billion in this category, and it seems that my y axis is overwhelmed as it completely blocks out the numbers and becomes illegible. Here is what I have thus far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('blockbusters.csv')
fig, ax = plt.subplots()
ax.set_title('Top Grossing Films')
ax.set_xlabel('Year')
ax.set_ylabel('Worldwide Grossing')
x = df['year'] #xaxis
y = df['worldwide_gross'] #yaxis
plt.show()
Any tips on how to scale this down? Ideally it could be presented on a scale of 10. Thanks in advance!
You could try logarithmic scaling:
ax.set_yscale('log')
You might want to manually set the ticks on the y-axis using
ax.set_yticks([list of values for which you want to have a tick])
ax.set_yticklabels([list of labels you want on each tick]) # optional
Another way to approach this might be to rank the movies (which gross is the highest, second highest, ...), i.e. on the y axis you would plot
df['worldwide_gross'].rank()
Edit: as you indicate, one might also check the dtypes to make sure the data is numerical. If not, use .astype(int) or .astype(float) to convert it.
I am trying to write some code in order to create an animation of scatter plot data through tine. In order to do this I have a dataset with multiple columns where each column represents a numbered timestep.
I would like the code to cycle through each timestep column for the y axis and use a constant x axis, so that a separate scatter plot is generated for each timestep. I tried to do this by coding a for loop that specifies an incrementing column number for the y axis.
My current code generates three out of seven scatter plots in my sample data but returns the following error:
IndexError: index 9 is out of bounds for axis 0 with size 9
I have tried other similar solutions on stack overflow but that didn't correct my problem.
The data is here if anyone wants to use what I am using: https://www.dropbox.com/s/7vwa0lud44td2ak/test_splot_anim_noTS.csv?dl=0data file
Any help or advice would be much appreciated.
import numpy as np
import pandas a pd
import matplotlib as mpl
import matplotlib.pyplot as plt
data=pd.read_csv("test_splot_anim_noTS.csv")
for n in range (6, 13):
data.plot(kind='scatter', x='metres', y=n)
plt.ylim(-4,4)
plt.savefig('n.jpeg')
data=pd.read_csv("test_splot_anim_noTS.csv")
for column in data.columns[1:]:
data.plot(kind='scatter', x='metres',y=column)
plt.ylim(-4,4)
plt.savefig('{}.jpeg'.format(column))
I may have done it!
panda.DataFrame.plot, single line plot
data=pd.read_csv("test_splot_anim_noTS.csv")
data.set_index('metres', drop=True, inplace=True)
data.plot()
With matplotlib, single plot with all columns:
import matplotlib.pyplot as plt
plt.plot(data)
plt.show()
Separate scatter plots, files saved:
for col in data.columns:
plt.scatter(data.index, data[col])
plt.ylim(-4, 4)
plt.savefig(f'{col}.jpeg')
plt.show()
With Seaborn:
for col in data.columns:
sns.scatterplot(data.index, data[col])
plt.ylim(-4,4)
plt.savefig(f'{col}.jpeg')
plt.show()
When I plot single plots with panda dataframes I have an x-axis.
However, when I make a subplot and try to make a shared x-axis the way I would when using numpy arrays without pandas, there are no numbers labels
I only want the numbers and label to appear on the last plot as they share the same x-axis.
The data loaded and the plot produced can be found here:
https://drive.google.com/open?id=1hTmTSkIcYl-usv_CCxLl8U6bAoO6tMRh
This is for combining and plotting the data logged from two different logging devices which represent the same time period.
import pandas as pd
import matplotlib.pyplot as plt
df1=pd.read_csv('data1.csv', sep=',',header=0)
df1.columns.values
cols1 = list(df1.columns.values)
df2=pd.read_csv('data2.dat', sep='\t',header=18)
df2.columns.values
cols2 = list(df2.columns.values)
start =10000
stop = 30000
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True, figsize=(10, 10))
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[1], ax=axes[0])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[2], ax=axes[0])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[3], ax=axes[2])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[4], ax=axes[2])
df2.iloc[start:stop].plot(x=cols2[0], y=cols2[3], ax=axes[3])
ax3.set_xlabel("Time [s]")
plt.show()
I expect there to be numbers and a label on the x-axis but instead, it only gives the pandas label "#timestamp"
UPDATE: I have found something that hints at the problem. I think the problem is due to the two files not having identical time spacing, the first column of each file is time, they are roughly 1 sample per second but not exactly. If I remove the x=cols[x] parts it then shows numbers on the x-axis but then there is a shift in time between the two plots as they are not plotting against time but rather against the index in the dataframe.
I am currently trying to interpolate the data so that they have the same x-axis but I would not have expected that to be necessary.
I have a data frame as follow:
and I am trying to plot a histogram from it such that the letters {A,B,C,D} are in the x axis and y axis shows the numbers. I have tried the following:
df.plot(kind='hist')
for which I get the address instead of the plot, i.e:
<matplotlib.axes._subplots.AxesSubplot at 0x11217d5f8>
I was wondering how can I show the plot?
IIUC, I think you need to transpose the dataframe to get index ['A','B','C','D']as x-axis and then plot. Also use plt.show() to display the histogram. The latest version of pandas will display directly the plot with axes object displaying. But, for the older versions need to explicitly write the plt.show() code to display.
import matplotlib.pyplot as plt
df.T.plot(kind='hist')
plt.show()
I'm plotting a scatter plot using a pandas dataframe. This works correctly, but I wanted to use seaborn themes and specials functions. When I plot the same data points calling seaborn, the y-axis remains almost invisible. X-axis values ranges from 5000-15000, while y-axis values are in [-6:6]*10^-7.
If I multiply the y-axis values by 10^6, they display correctly, but the actual values when plotted using seaborn remains invisible/indistinguishable in a seaborn generated plot.
How can I seaborn so that the y-axis values scale automatically in the resultant plot?
Also some rows even contain NaN, not in this case, how to disregard that while plotting, short of manually weeding out rows containing NaN.
Below is the code I've used to plot.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("datascale.csv")
subdf = df.loc[(df.types == "easy") & (df.weight > 1300), ]
subdf = subdf.iloc[1:61, ]
subdf.drop(subdf.index[[25]], inplace=True) #row containing NaN
subdf.plot(x='length', y='speed', style='s') #scales y-axis correctly
sns.lmplot("length", "speed", data=subdf, fit_reg=True, lowess=True) #doesn't scale y-axis properly
# multiplying by 10^6 displays the plot correctly, in matplotlib
plt.scatter(subdf['length'], 10**6*subdf['speed'])
Strange that seaborn does not scale the axis correctly. Nonetheless, you can correct this behaviour. First, get a reference to the axis object of the plot:
lm = sns.lmplot("length", "speed", data=subdf, fit_reg=True)
After that you can manually set the y-axis limits:
lm.axes[0,0].set_ylim(min(subdf.speed), max(subdf.speed))
The result should look something like this:
Example Jupyter notebook here.
Seaborn and matplotlib should just ignore NaN values when plotting. You should be able to leave them as is.
As for the y scaling: there might be a bug in seaborn.
The most basic workaround is still to scale the data before plotting.
Scale to microspeed in the dataframe before plotting and plot microspeed instead.
subdf['microspeed']=subdf['speed']*10**6
Or transform to log y before plotting, i.e.
import math
df = pd.DataFrame({'speed':[1, 100, 10**-6]})
df['logspeed'] = df['speed'].map(lambda x: math.log(x,10))
then plot logspeed instead of speed.
Another approach would be to use seaborn regplot instead.
Matplot lib correctly scales and plots for me as follows:
plt.plot(subdf['length'], subdf['speed'], 'o')