I want to change x axis to years. The years are saves in the variable years.
I want to make plot of my data that looks like this:
It should look like this image
However, I am not able to create x axes with a years. My plot looks like the following image:
This is an example of produced image by my code
My code looks as follows:
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("data1.csv")
demand = data["demand"]
years = data["year"]
plt.plot( demand, color='black')
plt.xlabel("Year")
plt.ylabel("Demand (GW)")
plt.show()
I am thankful for any advice.
The plot method in your example does not know the scaling of your data. So, for simplicity it treats the values of demand as being one unit apart from each other. If you want your x-axis to represent years, you have to tell matplotlib how many values of demand it should treat as "one year". If your data is a monthly demand, it is obviously 12 values per year. And here we go:
# setup a figure
fig, (ax1, ax2) = plt.subplots(2)
# generate some random data
data = np.random.rand(100)
# plot undesired way
ax1.plot(data)
# change the tick positions and labels ...
ax2.plot(data)
# ... to one label every 12th value
xticks = np.arange(0,100,12)
# ... start counting in the year 2000
xlabels = range(2000, 2000+len(xticks))
ax2.set_xticks(xticks)
ax2.set_xticklabels(xlabels)
plt.show()
Related
I'm struggling to wrap my head around matplotlib with dataframes today. I see lots of solutions but I'm struggling to relate them to my needs. I think I may need to start over. Let's see what you think.
I have a dataframe (ephem) with 4 columns - Time, Date, Altitude & Azimuth.
I produce a scatter for alt & az using:
chart = plt.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
What's the most efficient way to set the values in the Time column as the labels/ticks on the x axis?
So:
the scale/gridlines etc all remain the same
the chart still plots alt and az
the y axis ticks/labels remain as is
only the x axis ticks/labels are changed to the Time column.
Thanks
This isn't by any means the cleanest piece of code but the following works for me:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
labels = list(ephem.Time)
ax.set_xticklabels(labels)
plt.show()
Here you will explicitly force the set_xticklabels to the dataframe Time column which you have.
In other words, you want to change the x-axis tick labels using a list of values.
labels = ephem.Time.tolist()
# make your plot and before calling plt.show()
# insert the following two lines
ax = plt.gca()
ax.set_xticklabels(labels = labels)
plt.show()
I am new to data visualization, so please bear with me.
I am trying to create a data plot that describes various different attributes on a data set on blockbuster movies. The x-axis will be year of the movie and the y-axis will be worldwide gross. Now, some movies have made upwards of a billion in this category, and it seems that my y axis is overwhelmed as it completely blocks out the numbers and becomes illegible. Here is what I have thus far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('blockbusters.csv')
fig, ax = plt.subplots()
ax.set_title('Top Grossing Films')
ax.set_xlabel('Year')
ax.set_ylabel('Worldwide Grossing')
x = df['year'] #xaxis
y = df['worldwide_gross'] #yaxis
plt.show()
Any tips on how to scale this down? Ideally it could be presented on a scale of 10. Thanks in advance!
You could try logarithmic scaling:
ax.set_yscale('log')
You might want to manually set the ticks on the y-axis using
ax.set_yticks([list of values for which you want to have a tick])
ax.set_yticklabels([list of labels you want on each tick]) # optional
Another way to approach this might be to rank the movies (which gross is the highest, second highest, ...), i.e. on the y axis you would plot
df['worldwide_gross'].rank()
Edit: as you indicate, one might also check the dtypes to make sure the data is numerical. If not, use .astype(int) or .astype(float) to convert it.
I have a csv with the following columns: recorded, humidity and temperature. I want to display the recorded values(date and time) on the x axis and the humidity on the y axis. How can I properly display the dates(it is quite a big csv), as my current plot has black instead of proper date numbers... My date format is like this: 2019-09-12T07:26:55, having the date and also the time displayed in the csv.
I have displayed the plot using this code:
from matplotlib import pyplot as plt
import pandas as pd
data = pd.read_csv('home_data.csv')
plt.plot(data.recorded, data.humidity)
plt.xlabel('date')
plt.ylabel('humidity')
plt.title('Visualizing date and humidity')
plt.show()
This is a print screen of the plot:
https://snipboard.io/d4hfS7.jpg
Actually, the plot is displaying every date in your dataset. They are so many that they seem just like a black blob. You can downsample the xticks in order to increase the readability. Do something like this:
fig, ax = plt.subplots()
ax.plot(data.recorded, data.humidity)
# some axes labelling
# Reduce now the number of the ticks printed in the figure
ax.set_xticks(ax.get_xticks()[::4])
ax.get_xticklabels(ax.get_xticks(), rotation=45)
In line ax.set_xticks(ax.get_xticks()[::4]) you are setting the ticks of the x-axis
picking 1 date every 4 using the property of the list. It will reduce the number of dates printed. You can increase the number as much as you want.
To increase the readibility, you can rotate the tick labels as I suggested in the line
ax.get_xticklabels(ax.get_xticks(), rotation=45).
Hope this helps.
When I plot single plots with panda dataframes I have an x-axis.
However, when I make a subplot and try to make a shared x-axis the way I would when using numpy arrays without pandas, there are no numbers labels
I only want the numbers and label to appear on the last plot as they share the same x-axis.
The data loaded and the plot produced can be found here:
https://drive.google.com/open?id=1hTmTSkIcYl-usv_CCxLl8U6bAoO6tMRh
This is for combining and plotting the data logged from two different logging devices which represent the same time period.
import pandas as pd
import matplotlib.pyplot as plt
df1=pd.read_csv('data1.csv', sep=',',header=0)
df1.columns.values
cols1 = list(df1.columns.values)
df2=pd.read_csv('data2.dat', sep='\t',header=18)
df2.columns.values
cols2 = list(df2.columns.values)
start =10000
stop = 30000
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True, figsize=(10, 10))
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[1], ax=axes[0])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[2], ax=axes[0])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[3], ax=axes[2])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[4], ax=axes[2])
df2.iloc[start:stop].plot(x=cols2[0], y=cols2[3], ax=axes[3])
ax3.set_xlabel("Time [s]")
plt.show()
I expect there to be numbers and a label on the x-axis but instead, it only gives the pandas label "#timestamp"
UPDATE: I have found something that hints at the problem. I think the problem is due to the two files not having identical time spacing, the first column of each file is time, they are roughly 1 sample per second but not exactly. If I remove the x=cols[x] parts it then shows numbers on the x-axis but then there is a shift in time between the two plots as they are not plotting against time but rather against the index in the dataframe.
I am currently trying to interpolate the data so that they have the same x-axis but I would not have expected that to be necessary.
I have this following code in order to generate scatterplots
import matplotlib.pyplot as plt
line = plt.figure()
plt.plot(xvalue, yvalue)
plt.grid(True)
plt.savefig("test.png")
plt.show()
and here is the screenshot of the plot:
I am just wondering if i could change the x-axis labels into strings. I have stored all the labels in
xlabel = ['2015/4/1', '2015/4/11', '2015/4/12', '2015/4/18', '2015/4/19'...]
Is there any function for matplotlib so that i could set x-axis labels to the values in "xlabel"?
many thx!
ALso my labels are overlapped, anything i could do to fix this problem? thx!
Here is my answer. You target was to plot the datetime as xticklabel.
I always do something like this. Code like this:
## For example, I have 9 daily value during 2013-04-01 to 2014-04-10
start = datetime.datetime.strptime("01-04-2013", "%d-%m-%Y")
end = datetime.datetime.strptime("10-04-2013", "%d-%m-%Y")
date = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days)]
plt.figure(figsize=(12,4))
## y is the data I want to plot
ind = np.arange(len(y))
ax=plt.subplot()
plt.plot(ind,y,lw = 3)
k = []
for i in range(0,len(ind),1):
k.append(str(date[i])[0:10])
plt.xticks(ind,k,rotation=65)
Update
To solve the overlap problem, I recommend the code below:
for label in ax.xaxis.get_ticklabels()[::2]:
label.set_visible(False)
For daily value in a month, you can get a figure like this:
Do:
plt.xticks(xs, labels)
Where xs is a list of the positions for the ticks, and labels is the list of labels.