I am working my way through: https://medium.com/analytics-vidhya/exploratory-data-analysis-of-the-hotel-booking-demand-with-python-200925230106
In a bunch of the visualization outputs the sort order is off. As I am working my way through each question, I have successfully fixed the sort order of every output -- until now.
For question #6, part two (Let’s see the stay duration trend for each hotel type.) I am getting
the exact same output as is shown in the article. However, the x-axis is incorrectly sorted, and I am trying to fix it as I have all previous outputs.
Here is my code for question #6, including the first part where I fixed the sort order:
# 6. How long do people stay in the hotel?
df_not_canceled2 = df_not_canceled.copy()
total_nights = df_not_canceled2['stays_in_weekend_nights'] + df_not_canceled2['stays_in_week_nights']
x5, y5, z5 = get_count(total_nights, limit=10)
x5 = x5[[0, 2, 1, 3, 5, 6, 4, 8, 7, 9]]
y5 = y5[[0, 2, 1, 3, 5, 6, 4, 8, 7, 9]]
z5 = z5[[0, 2, 1, 3, 5, 6, 4, 8, 7, 9]]
plot(x5, y5, x_label='Number of Nights', y_label='Booking Percentage (%)', title='Night Stay Duration (Top 10)', figsize =(10, 5))
plt.show()
# The stay duration trend for each hotel type.
df_not_canceled2.loc[:, 'total_nights'] = df_not_canceled2['stays_in_weekend_nights'] + df_not_canceled2['stays_in_week_nights']
df_not_canceled2 = df_not_canceled2.sort_values(by=['total_nights']).reset_index(drop=True)
fig1, ax = plt.subplots(figsize=(12, 6))
ax.set_xlabel('No of Nights')
ax.set_ylabel('No of Nights')
ax.set_title('Hotel wise night stay duration (Top 10)')
sns.countplot(x='total_nights', hue='hotel', data=df_not_canceled2,
order=df_not_canceled2['total_nights'].value_counts().iloc[:10].index, ax=ax)
plt.show()
First I tried sorting the df by 'total_nights'. The output did not change. Then I sorted and reset the index (this is the current state of my code). Still no change.
This is what I get (exactly the same as the article):
Notice the sort order of the x-axis (total_nights). I want 1, 2, 3, 4, 5, etc., not 1, 3, 2, 4, 7, etc.
Just figured it out. Had to remove .value_counts from the order parameter.
Related
Good morning,
I am using pygmt on python 3.6 with spyder.
I am trying to fill several polygons in a range of colors defined by a colorpalet.
I used makecpt to define the colorpalet.
The variable I want to represent is Mog.
My code is :
pygmt.makecpt(cmap="bilbao", series=[-5, 50, 5])
for i , long in enumerate(longitude_polyT):
fig.plot(x=longitude_polyT[i], y=latitude_polyT[i], frame="a", pen="black", color=Mog[i], cmap=True)
But it doesn't fill my polygons.
Does anybody have an idea about it?
Thanks a lot!
Here is my best guess at what you want:
import pygmt
fig = pygmt.Figure()
a = fig.basemap(region=[0, 6, 0, 2], projection="X12c/4c", frame=True, )
pol = ["n0.9c", "c0.9c", "d0.9c"]
Mog = [
pygmt.makecpt(cmap="bilbao", series=[-5, 50, 5]),
pygmt.makecpt(cmap="bilbao", series=[-5, 15, 5]),
pygmt.makecpt(cmap="bilbao", series=[-8000, 4000])
]
longitude_polyT = [1, 3, 5]
latitude_polyT = [1, 1, 1]
for i, long in enumerate(longitude_polyT):
fig.plot(x=long, y=latitude_polyT[i], style=pol[i], frame=a,
pen="black",
color=Mog[i], cmap=True)
fig.show()
Couldn't get it to show different colours :(
I have a 2-dimensional xarray dataset that I want to interpolate on the lon and lot coordinates such that I have a higher resolution, but the values correspond exactly with the original values at each coordinate.
I thought the excellent xr.interp function would be able to do this, but following the example I see some discrepancy between the original and interpolated values. I am increasing the longitude and latitude resolution by 4, and thus would except all air values that occur once in the original dataset, to occur 16 times in the interpolated dataset, but this is not the case.
Does anyone know what the cause is that the original and interpolated dataset do not align and how I could solve it?
ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)
fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
ds_sel=ds.sel(lon=slice(250,260),lat=slice(40,30))
ds.air.plot(ax=axes[0],xlim=(250,260),ylim=(30,40))
axes[0].set_title("Raw data")
# Interpolated data
new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims["lon"] * 4)
new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims["lat"] * 4)
dsi = ds.interp(lat=new_lat, lon=new_lon,method="nearest")
dsi_sel=dsi.sel(lon=slice(250,260),lat=slice(40,30))
dsi.air.plot(ax=axes[1],xlim=(250,260),ylim=(30,40))
axes[1].set_title("Interpolated data")
Showing the unique values with
unique, counts = np.unique(ds_sel.air.values, return_counts=True)
print("original values",dict(zip(unique, counts)))
unique, counts = np.unique(dsi_sel.air.values, return_counts=True)
print("interpolated values",dict(zip(unique, counts)))
I get
original values {262.1: 1, 263.1: 1, 263.9: 1, 264.4: 1, 265.19998: 1, 266.6: 1, 266.79: 1, 266.9: 2, 268.29: 1, 269.79: 1, 270.4: 1, 273.0: 1, 273.6: 1, 275.19998: 1, 276.29: 1, 278.0: 1, 278.5: 1, 278.6: 1, 281.5: 1, 282.1: 1, 282.29: 1, 284.6: 1, 286.79: 1, 288.0: 1}
interpolated values {262.1: 4, 263.1: 8, 263.9: 8, 264.4: 8, 265.19998: 4, 266.6: 16, 266.79: 16, 266.9: 24, 268.29: 8, 269.79: 20, 270.4: 10, 273.0: 20, 273.6: 16, 275.19998: 8, 276.29: 20, 278.0: 16, 278.5: 10, 278.6: 8, 281.5: 4, 282.1: 16, 282.29: 8, 284.6: 8, 286.79: 8, 288.0: 4}
I think you're conceptually running up against a fencepost error (see the section on this page: https://en.wikipedia.org/wiki/Off-by-one_error)
You should interpret the xarray coordinates as "midpoints", not as the cell boundaries.
Your new_lon isn't nicely divided into 1/2, 1/4, 1/8, etc.:
print(new_lon)
[200. 200.61611374 201.23222749 201.84834123 202.46445498
203.08056872 203.69668246 204.31279621 204.92890995]
And it doesn't include all the original coordinates.
Taking the "off-by-ones" into account:
new_lon = np.linspace(ds.lon[0], ds.lon[-1], (ds.dims["lon"] - 1) * 4 + 1)
new_lat = np.linspace(ds.lat[0], ds.lat[-1], (ds.dims["lat"] - 1) * 4 + 1)
print(new_lon)
[200. 200.625 201.25 201.875 202.5 203.125 203.75 204.375 205. ]
You can then e.g. inspect the part of the first row of the original and the interpolated one:
selection = ds["air"][0, :3]
selection_i = dsi["air"][0, :9]
print(selection["lon"])
print(selection.values)
print(selection_i["lon"])
print(selection_i.values)
This looks good to me:
[200. 202.5 205. ]
[241.2 242.5 243.5]
[200. 200.625 201.25 201.875 202.5 203.125 203.75 204.375 205. ]
[241.2 241.2 241.2 242.5 242.5 242.5 242.5 243.5 243.5]
Of course, when doing nearest interpolation, you might end up with ties:
0.5 is equally far removed from 0.0 as it is from 1.0 -- and so you inadverntely have to bias either "up" or "down" to get a single nearest value.
Also note that the .plot() command, which draws a Matplotlib QuadMesh has to infer boundaries from midpoints somehow. This can sometimes lead to boundaries being drawn slightly differently from what you might have in mind (especially if the coordinate is unevenly spaced).
I'm working on a small project on the vehicle routing problem, where a set of vehicles delivering goods to a set of customers from depot.
The solution would be something like:
Sub-route 1: Depot Customer4 Customer7
Sub-route 2: Depot Customer1 Customer5 Customer3
Sub-route 3: Depot Customer2 Customer6
where depot always have x-y coordinate (0,0), so x_all and y_all would be something like
x_all = [0,x4,x7,0,x1,x5,x3,0,...]
y_all = [0,y4,y7,0,y1,y5,y3,0,...]
plt.plot(x_all, y_all)
How could I plot a graph that has different colors for different routes? In other words, the colors would change when (x,y) = (0,0).
Thanks
You could do something like this:
# Find indices where routes get back to the depot
depot_stops = [i for i in range(len(x_all)) if x_all[i] == y_all[i] == 0]
# Split route into sub-routes
sub_routes = [(x_all[i:j+1], y_all[i:j+1]) for i, j in zip(depot_stops[:-1], depot_stops[1:])]
for xs, ys in sub_routes:
plt.plot(xs, ys)
# (Consecutive calls will automatically use different colours)
plt.show()
There are a few ways you could do this, but I would suggest using a multidimensional list:
x = [[0, 4, 7],
[0, 5, 3],
[0, 2, 1]]
y = [[0, 4, 7],
[0, 1, 5],
[0, 2, 1]]
for i in range(len(x)):
plt.plot(x[i], y[i])
plt.show()
And matplotlib will take care of the coloring for you
This is an advisable way of managing your data as now you can index each route independently without worrying about all the routes being of the same length. For example if one route had 4 stops and you needed to get that set of stops, you'd have to index your x and y arrays knowing where this route is. Instead, I could just index the 1st route of x and y:
x[1]
>> [0, 5, 3]
y[1]
>> [0, 1, 5]
Let's call the function I'm looking for "magic_combine", which can combine the continuous dimensions of tensor I give to it. For more specific, I want it to do the following thing:
a = torch.zeros(1, 2, 3, 4, 5, 6)
b = a.magic_combine(2, 5) # combine dimension 2, 3, 4
print(b.size()) # should be (1, 2, 60, 6)
I know that torch.view() can do the similar thing. But I'm just wondering if there is any more elegant way to achieve the goal?
a = torch.zeros(1, 2, 3, 4, 5, 6)
b = a.view(*a.shape[:2], -1, *a.shape[5:])
Seems to me a bit simpler than the current accepted answer and doesn't go through a list constructor (3 times).
There is a variant of flatten that takes start_dim and end_dim parameters. You can call it in the same way as your magic_combine (except that end_dim is inclusive).
a = torch.zeros(1, 2, 3, 4, 5, 6)
b = a.flatten(2, 4) # combine dimension 2, 3, 4
print(b.size()) # should be (1, 2, 60, 6)
https://pytorch.org/docs/stable/generated/torch.flatten.html
There is also a corresponding unflatten, in which you can specify a dimension to unflatten and a shape to unflatten it to.
I am not sure what you have in mind with "a more elegant way", but Tensor.view() has the advantage not to re-allocate data for the view (original tensor and view share the same data), making this operation quite light-weight.
As mentioned by #UmangGupta, it is however rather straight-forward to wrap this function to achieve what you want, e.g.:
import torch
def magic_combine(x, dim_begin, dim_end):
combined_shape = list(x.shape[:dim_begin]) + [-1] + list(x.shape[dim_end:])
return x.view(combined_shape)
a = torch.zeros(1, 2, 3, 4, 5, 6)
b = magic_combine(a, 2, 5) # combine dimension 2, 3, 4
print(b.size())
# torch.Size([1, 2, 60, 6])
Also possible with torch einops.
Github.
> pip install einops
from einops import rearrange
a = torch.zeros(1, 2, 3, 4, 5, 6)
b = rearrange(a, 'd0 d1 d2 d3 d4 d5 -> d0 d1 (d2 d3 d4) d5')
I've created a matlab graph in my Tkinter GUI. Which is part of a bigger GUI class. I'm having issues with adding a title.
Question: Does anyone know how I give my subplots a title ?
self.f = plt.Figure(figsize=(4,5), dpi=90)
self.a = self.f.add_subplot(211)
self.a.plot([1, 2, 3, 4, 5, 6, 7, 8], [5, 6, 1, 7, 4, 2, 5, 0])
self.a.plt.ylabel('some numbers')
self.b = self.f.add_subplot(212)
self.b.plot([1, 2, 3, 4, 5, 6, 7, 8], [1, 3, 6, 1, 0, 2, 1, 0])
self.canvas = FigureCanvasTkAgg(self.f, master=self.frame1)
self.canvas.get_tk_widget().grid(row=8, column=0, columnspan=2)
Simply adding the following code doesn't work.
self.a.plt.title('some numbers')
This is an example from matplotlib
Matplotlib has standards I haven't quite fully comprehended yet, but it seems that which method you use to do something like set a title depends on if you're using a fig, plot, or axis...in this case, the answer is:
self.a.set_title('title goes here for your subplot')