Replace x-ticks with evenly spaced timestamps - python

I am trying to insert timestamps on the x-axis for a scatter plot instead of total seconds. Below is what I have tried thus far but I'm getting an error with this line;
loc, labels = ax.set_xticks(x)
AttributeError: 'NoneType' object has no attribute 'update'
Example:
import pandas as pd
import matplotlib.pyplot as plt
d = ({
'A' : ['08:00:00','08:10:00','08:12:00','08:26:00','08:29:00','08:31:00','10:10:00','10:25:00','10:29:00','10:31:00'],
'B' : ['1','1','1','2','2','2','7','7','7','7'],
'C' : ['X','Y','Z','X','Y','Z','A','X','Y','Z'],
})
df = pd.DataFrame(data=d)
fig,ax = plt.subplots()
x = df['A']
y = df['B']
x_numbers = (pd.to_timedelta(df['A']).dt.total_seconds())
ax.scatter(x_numbers, y)
loc, labels = ax.set_xticks(x)
newlabels = [str(pd.Timedelta(str(i)+ ' seconds')).split()[2] for i in loc]
ax.set_xticks(loc, newlabels)
Note
I need to use ax instead of plt as this plot is called as a subplot. If I use plot, the axis will be assigned to the last subplot instead of the designated one.

I would suggest to use datetimes directly without messing with the ticklabels. Using a matplotlib.dates.MinuteLocator in addition can give you nice positions of the ticks.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
d = ({
'A' : ['08:00:00','08:10:00','08:12:00','08:26:00','08:29:00','08:31:00',
'10:10:00','10:25:00','10:29:00','10:31:00'],
'B' : ['1','1','1','2','2','2','7','7','7','7'],
'C' : ['X','Y','Z','X','Y','Z','A','X','Y','Z'],
})
df = pd.DataFrame(data=d)
df['A'] = pd.to_datetime(df['A'])
fig,ax = plt.subplots()
ax.scatter(df["A"].values, df["B"].values)
ax.set_xlim(df["A"].min(), df["A"].max())
ax.xaxis.set_major_locator(mdates.MinuteLocator((0,30)))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
plt.show()

I am taking a guess, but if you want to replace the x-axis labels give this a try.
import pandas as pd
import matplotlib.pyplot as plt
d = ({
'A' : ['08:00:00','08:10:00','08:12:00','08:26:00','08:29:00','08:31:00','10:10:00','10:25:00','10:29:00','10:31:00'],
'B' : ['1','1','1','2','2','2','7','7','7','7'],
'C' : ['X','Y','Z','X','Y','Z','A','X','Y','Z'],
})
df = pd.DataFrame(data=d)
x = df['A']
y = df['B']
x_numbers = (pd.to_timedelta(df['A']).dt.total_seconds())
fig,ax = plt.subplots(figsize=(10,7))
ax.scatter(x_numbers, y)
xLabel = [str(int(num)) + ' seconds' for num in x_numbers]
ax.set_xticklabels(xLabel)
plt.tight_layout()
plt.show()

Something like this will work:
Edit: Changes made to make sure axis and subplot is used
import pandas as pd
import matplotlib.pyplot as plt
d = ({
'A' : ['08:00:00','08:10:00','08:12:00','08:26:00','08:29:00','08:31:00','10:10:00','10:25:00','10:29:00','10:31:00'],
'B' : ['1','1','1','2','2','2','7','7','7','7'],
'C' : ['X','Y','Z','X','Y','Z','A','X','Y','Z'],
})
df = pd.DataFrame(data=d)
fig,ax = plt.subplots()
x = df['A']
y = df['B']
x_numbers = (pd.to_timedelta(df['A']).dt.total_seconds())
ax.scatter(x_numbers, y)
plt.sca(ax) # gets handle on the current axis
loc, labels = plt.xticks()
plt.xticks(loc, [str(a) for a in x])
plt.show()

Related

Scatterplot error : "x and y must be the same size" but they have the same size

I would like to make a scatterplot with the dataframe :"df_death_mois1". But it doesn't work. The error message is : "x and y must be the same size". Can you help me ?
import pandas as pd
import matplotlib.pyplot as plt
members = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv")
expeditions = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/expeditions.csv")
expeditions['highpoint_date'] = pd.to_datetime(expeditions['highpoint_date'])
lesmois = expeditions['highpoint_date'].dt.month
expeditions["mois"] = lesmois
expeditions
df_members_mois = pd.merge(members,expeditions[['expedition_id','mois']], on='expedition_id', how='inner')
df_death_mois = df_members_mois[df_members_mois["death_cause"]=="Avalanche"]
df_death_mois
df_death_mois1 = df_death_mois.groupby("mois")['death_cause'].count()
df_death_mois1 = df_death_mois1.to_frame()
df_death_mois1
plt.scatter(x="mois", y = "death_cause", data = df_death_mois1)
plt.title('scatterplot')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
reset_index and then call plot.scatter:
>>> df_death_mois1.reset_index().plot.scatter(x="mois", y="death_cause")
With matplotlib.pyplot you can use:
>>> plt.scatter(x=df_death_mois1.index, y=df_death_mois1["death_cause"])

Alter xticks matplotlib

I have a scatter plot that has time on the x-axis
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker
d = ({
'A' : ['08:00:00','08:10:00','08:12:00','08:26:00','08:29:00','08:31:00','10:10:00','10:25:00','10:29:00','10:31:00'],
'B' : ['1','1','1','2','2','2','7','7','7','7'],
'C' : ['X','Y','Z','X','Y','Z','A','X','Y','Z'],
})
df = pd.DataFrame(data=d)
fig,ax = plt.subplots()
x = df['A']
y = df['B']
x_numbers = (pd.to_timedelta(df['A']).dt.total_seconds())
plt.scatter(x_numbers, y)
plt.show()
Output 1:
I wanted to swap total seconds for actual timestamps so I included:
plt.xticks(x_numbers, x)
This results in the x-ticks overlapping each other.
If I use:
plt.locator_params(axis='x', nbins=10)
The results is the same as above. If I change the nbins to something smaller the ticks don't overlap but they don't align with their respective scatter points. As in the scatter points don't line up with the correct timestamp.
If I use:
M = 10
xticks = ticker.MaxNLocator(M)
ax.xaxis.set_major_locator(xticks)
The ticks don't overlap but the don't align with their respective scatter points.
Is it possible to pick the number of x-ticks you use but is still aligned to the respective data point.
E.g. For the figure below. Can I just use n number of ticks instead of all of them?
Output 2:
Let use some xticklabel manipulations:
d = ({
'A' : ['08:00:00','08:10:00','08:12:00','08:26:00','08:29:00','08:31:00','10:10:00','10:25:00','10:29:00','10:31:00'],
'B' : ['1','1','1','2','2','2','7','7','7','7'],
'C' : ['X','Y','Z','X','Y','Z','A','X','Y','Z'],
})
df = pd.DataFrame(data=d)
fig,ax = plt.subplots()
x = df['A']
y = df['B']
x_numbers = (pd.to_timedelta(df['A']).dt.total_seconds())
plt.scatter(x_numbers, y)
loc, labels = plt.xticks()
newlabels = [str(pd.Timedelta(str(i)+ ' seconds')).split()[2] for i in loc]
plt.xticks(loc, newlabels)
plt.show()
Output:
Firstly, the time interval is not consistent.
Secondly, it's a high-frequency series.
In a general case, you won't be required to match the xticks corresponding to each entry. And, in those scenarios, you can exploit something like plt.plot_date(x, y) along-with tick locators and formatters like, DayLocator() and DateFormatter('%Y-%m-%d').
Though for this very specific case where data is at minute level and few points are really close, the hack may be to try and play with the numeric Series you are using for x-axis, x_numbers. For increasing the gap between two points, I tried cumsum() and for eliminate overlapping to an extent, gave some rotation to xticks.
fig, ax = plt.subplots(figsize=(10,6))
x = df['A']
y = df['B']
x_numbers = (pd.to_timedelta(df['A']).dt.total_seconds()).cumsum()
plt.scatter(x_numbers, y)
plt.xticks(x_numbers, x, rotation=50)
plt.show()

Value Error when altering x-ticks matplotlib

I am trying to alter the x-ticks on the plot below. When I run the code below I'm getting an error:
ValueError: unit abbreviation w/o a number
I can't seem to find anything on this except it's related to pd.to_timedelta. However, I can't find any solutions on this.
I've upgraded all relevant packs including matplotlib.
import pandas as pd
import matplotlib.pyplot as plt
d = ({
'A' : ['08:00:00','08:10:00','08:12:00','08:26:00','08:29:00','08:31:00','10:10:00','10:25:00','10:29:00','10:31:00'],
'B' : ['1','1','1','2','2','2','7','7','7','7'],
'C' : ['X','Y','Z','X','Y','Z','A','X','Y','Z'],
})
df = pd.DataFrame(data=d)
fig,ax = plt.subplots()
x = df['A']
y = df['B']
x_numbers = (pd.to_timedelta(df['A']).dt.total_seconds())
plt.scatter(x_numbers, y)
xaxis = ax.get_xaxis()
ax.set_xticklabels([str(pd.Timedelta(i.get_text()+' seconds')).split()[2] for i in xaxis.get_majorticklabels()], rotation=45)
plt.show()
Any suggestions? Has anyone come across this?
Based on this SO question and answer, one solution is to trigger axis tick positioning with a call to fig.canvas.draw() after the scatter, but before getting the labels:
[...]
plt.scatter(x_numbers, y)
# draw canvas to trigger tick positioning
fig.canvas.draw()
xaxis = ax.get_xaxis()
[...]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
d = ({
'A' : ['08:00:00','08:10:00','08:12:00','08:26:00','08:29:00','08:31:00','10:10:00','10:25:00','10:29:00','10:31:00'],
'B' : ['1','1','1','2','2','2','7','7','7','7'],
'C' : ['X','Y','Z','X','Y','Z','A','X','Y','Z'],
})
df = pd.DataFrame(data=d)
x = df['A']
y = df['B']
x_numbers = (pd.to_timedelta(df['A']).dt.total_seconds())
fig, axes = plt.subplots(figsize=(10, 4))
axes.scatter(x_numbers, y)
axes.set_xticks(x_numbers)
axes.set_xticklabels([i+' seconds' for i in df['A'].get_values()], rotation=90)
plt.tight_layout()
output:

Seaborn swarmplot: Get point coordinates [duplicate]

I have the following data:
import pandas as pd
import numpy as np
# Generate dummy data.
a = np.random.random(75)
b = np.random.random(75) - 0.6
c = np.random.random(75) + 0.75
# Collate into a DataFrame
df = pd.DataFrame({'a': a, 'b': b, 'c': c})
df.columns = [list(['WT', 'MUT', 'WTxMUT']), list(['Parent', 'Parent', 'Offspring'])]
df.columns.names = ['Genotype', 'Status']
df_melt = pd.melt(df)
and I plot it in seaborn using this code:
import seaborn as sb
sb.swarmplot(data = df_melt, x = "Status", y = "value", hue = "Genotype")
How do I get the x-span of each group? What is the range of the horizontal span of the swarmplot for the Parent group, for instance?
You can get the information from the collections which are created by swarmplot.
swarmplot actually returns the matplotlib Axes instance, and from there we can find the PathCollections that it creates. To get the positions, we can use .get_offsets().
Here is your example, modified to find and print the swarm limits, and then use them to plot a box around the swarms.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
from matplotlib.patches import Rectangle
# Generate dummy data.
a = np.random.random(75)
b = np.random.random(75) - 0.6
c = np.random.random(75) + 0.75
# Collate into a DataFrame
df = pd.DataFrame({'a': a, 'b': b, 'c': c})
df.columns = [list(['WT', 'MUT', 'WTxMUT']), list(['Parent', 'Parent', 'Offspring'])]
df.columns.names = ['Genotype', 'Status']
df_melt = pd.melt(df)
ax = sb.swarmplot(data = df_melt, x = "Status", y = "value", hue = "Genotype")
def getdatalim(coll):
x,y = np.array(coll.get_offsets()).T
try:
print 'xmin={}, xmax={}, ymin={}, ymax={}'.format(
x.min(), x.max(), y.min(), y.max())
rect = Rectangle((x.min(),y.min()),x.ptp(),y.ptp(),edgecolor='k',facecolor='None',lw=3)
ax.add_patch(rect)
except ValueError:
pass
getdatalim(ax.collections[0]) # "Parent"
getdatalim(ax.collections[1]) # "Offspring"
plt.show()
which prints:
xmin=-0.107313729132, xmax=0.10661092707, ymin=-0.598534246847, ymax=0.980441247759
xmin=0.942829146473, xmax=1.06105941656, ymin=0.761277608688, ymax=1.74729717464
And here's the figure:

Obtaining span of plotted points from seaborn swarmplot

I have the following data:
import pandas as pd
import numpy as np
# Generate dummy data.
a = np.random.random(75)
b = np.random.random(75) - 0.6
c = np.random.random(75) + 0.75
# Collate into a DataFrame
df = pd.DataFrame({'a': a, 'b': b, 'c': c})
df.columns = [list(['WT', 'MUT', 'WTxMUT']), list(['Parent', 'Parent', 'Offspring'])]
df.columns.names = ['Genotype', 'Status']
df_melt = pd.melt(df)
and I plot it in seaborn using this code:
import seaborn as sb
sb.swarmplot(data = df_melt, x = "Status", y = "value", hue = "Genotype")
How do I get the x-span of each group? What is the range of the horizontal span of the swarmplot for the Parent group, for instance?
You can get the information from the collections which are created by swarmplot.
swarmplot actually returns the matplotlib Axes instance, and from there we can find the PathCollections that it creates. To get the positions, we can use .get_offsets().
Here is your example, modified to find and print the swarm limits, and then use them to plot a box around the swarms.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
from matplotlib.patches import Rectangle
# Generate dummy data.
a = np.random.random(75)
b = np.random.random(75) - 0.6
c = np.random.random(75) + 0.75
# Collate into a DataFrame
df = pd.DataFrame({'a': a, 'b': b, 'c': c})
df.columns = [list(['WT', 'MUT', 'WTxMUT']), list(['Parent', 'Parent', 'Offspring'])]
df.columns.names = ['Genotype', 'Status']
df_melt = pd.melt(df)
ax = sb.swarmplot(data = df_melt, x = "Status", y = "value", hue = "Genotype")
def getdatalim(coll):
x,y = np.array(coll.get_offsets()).T
try:
print 'xmin={}, xmax={}, ymin={}, ymax={}'.format(
x.min(), x.max(), y.min(), y.max())
rect = Rectangle((x.min(),y.min()),x.ptp(),y.ptp(),edgecolor='k',facecolor='None',lw=3)
ax.add_patch(rect)
except ValueError:
pass
getdatalim(ax.collections[0]) # "Parent"
getdatalim(ax.collections[1]) # "Offspring"
plt.show()
which prints:
xmin=-0.107313729132, xmax=0.10661092707, ymin=-0.598534246847, ymax=0.980441247759
xmin=0.942829146473, xmax=1.06105941656, ymin=0.761277608688, ymax=1.74729717464
And here's the figure:

Categories

Resources