remove overlay text from pandas boxplot - python

I am trying to remove the overlay text on my boxplot I created using pandas. The code to generate it is as follows (minus a few other modifications):
ax = df.boxplot(column='min2',by=df['geomfull'],ax=axes,grid=False,vert=False, sym='',return_type='dict')
I just want to remove the "boxplot grouped by 0..." etc. and I can't work out what object it is in the plot. I thought it was an overflowing title but I can't find where the text is coming from! Thanks in advance.
EDIT: I found a work around which is to construct a new pandas frame with just the relevant list of things I want to box (removing all other variables).
data = {}
maps = ['BA4','BA5','BB4','CA4','CA5','EA4','EA5','EB4','EC4','EX4','EX5']
for mapi in maps:
mask = (df['geomfull'] == mapi)
arr = np.array(df['min2'][mask])
data[mapi] = arr
dfsub = pd.DataFrame(data)
Then I can use the df.plot routines as per examples....
bp = dfsub.plot(kind='box',ax=ax, vert=False,return_type='dict',sym='',grid=False)
This produces the same plot without the overlay.

Related

How to show a Holoviews Heatmap

I have a function that creates a Holoviewa heatmap. If I save the heatmap using hv.save(heatmap, 'heatmap.html') it works great! I just cannot figure out how to show the plot without having to save it. The same script generates two density plots with Plotly and using .show() and pops the plot up in my browser.
I am NOT using jupyter notebook and have been starting the bokeh server from a DOS prompt. I am working inside PyCharm Community with Python 3.10. Though if I could do it all from inside the script that would be easier.
def gen_heat_map(df: pandas.DataFrame, freq: float) -> holoviews.HeatMap:
"""
Uses a single frequency upon which to build the heatmap.
:param df: pandas.Dataframe containing data read from a JSON file
:param freq: The frequency to build the heatmap out of
:return: Holoviews Heat Map
"""
# Select a single frequency upon which to build the heatmap
single_frq = df[df.centerFrequency == freq].reset_index(drop=True)
# create a second dataframe from each transmission
sec_df = pd.DataFrame()
for index, row in single_frq.iterrows():
sec_df = sec_df.append(make_by_second(row), ignore_index=True)
min_df = sec_df.set_index('time').resample('1min').mean().reset_index().replace(np.nan, -160)
with pd.option_context('display.max_columns', None):
print(min_df)
min_df["Minute"] = min_df["time"].dt.strftime("%M")
min_df["Hour"] = min_df['time'].dt.strftime("%H")
heatmap = hv.HeatMap(min_df, ['Minute', 'Hour'], ['power', 'time'])
heatmap.opts(radial=True,
width=750,
height=750,
tools=['hover'],
colorbar=True,
cmap='bokeh',
start_angle=-np.pi * 7 / 24,
title='Frequency Power Level Radial Heat Map'
)
return heatmap
heatmap = gen_heat_map(df, 929612500.0)
The function gen_heat_map takes a large Pandas Dataframe of data read from a JSON file plus a single frequency and generates the heat map. It is trying to display this resultant heat map that is the issue. I can do so through Holoviz's Panel toolkit but I would like to find a simpler solution.
Suggestions?
Thanks,
Doug

Colour bars based on values in pandas dataframe when using plotnine

I am trying to build a waterfall chart using plotnine. I would like to colour the starting and ending bars as grey (ideally I want to specify hexadecimal colours), increases as green and decreases as red.
Below is some sample data and my current plot. I am trying to set fill to the pandas column colour, but the bars are all black. I have also tied putting fill in the geom_segment, but this does not work either.
df = pd.DataFrame({})
df['label'] = ('A','B','C','D','E')
df['percentile'] = (10)*5
df['value'] = (100,80,90,110,110)
df['yStart'] = (0,100,80,90,0)
df['barLabel'] = ('100','-20','+10','+20','110')
df['labelPosition'] = ('105','75','95','115','115')
df['colour'] = ('grey','red','green','green','grey')
p = (ggplot(df, aes(x=np.arange(0,5,1), xend=np.arange(0,5,1), y='yStart',yend='value',fill='colour'))
+ theme_light(6)
+ geom_segment(size=10)
+ ylab('value')
+ scale_y_continuous(breaks=np.arange(0,141,20), limits=[0,140], expand=(0,0))
)
EDIT
Based on teunbrand's comment of changing fill to color, I have the following. How do I specify the actual colour, preferably in hexadecimal format?
Just to close this question off, credit goes to teunbrand in the comments for the solution.
geom_segment() has a colour aesthetic but not a fill aesthetic. Replace fill='colour' with colour='colour'.
Plotnine will use default colours for the bars. Use scale_color_identity() if the contents of the DataFrame column are literal colours, or scale_colour_manual() to manually specify a tuple or list of colours. Both forms accept hexadecimal colours.

How to plot data on a basemap using matplotlib basemap

Two sections of my code are giving me trouble, I am trying to get the basemap created in this first section here:
#Basemap
epsg = 6060; width = 2000.e3; height = 2000.e3 #epsg 3413. 6062
m=Basemap(epsg=epsg,resolution='l',width=width,height=height) #lat_ts=(90.+35.)/2.
m.drawcoastlines(color='white')
m.drawmapboundary(fill_color='#99ffff')
m.fillcontinents(color='#cc9966',lake_color='#99ffff')
m.drawparallels(np.arange(10,70,20),labels=[1,1,0,0])
m.drawmeridians(np.arange(-100,0,20),labels=[0,0,0,1])
plt.title('ICESAT2 Tracks in Greenland')
plt.figure(figsize=(20,10))
Then my next section is meant to plot the data its getting from a file, and plot these tracks on top of the Basemap. Instead, it creates a new plot entirely. I have tried rewording the secondary plt.scatter to match Basemap, such as m.scatter, m.plt, etc. But it only returns with “RuntimeError: Can not put single artist in more than one figure” when I do so.
Any ideas on how to get this next section of code onto the basemap? Here is the next section, focus on the end to see where it is plotting.
icesat2_data[track] = dict() # creates a sub-dictionary, track
icesat2_data[track][year+month+day] = dict() # and one layer more for the date under the whole icesat2_data dictionary
icesat2_data[track][year+month+day] = dict.fromkeys(lasers)
for laser in lasers: # for loop, access all the gt1l, 2l, 3l
if laser in f:
lat = f[laser]["land_ice_segments"]["latitude"][:] # data for a particular laser's latitude.
lon = f[laser]["land_ice_segments"]["longitude"][:] #data for a lasers longitude
height = f[laser]["land_ice_segments"]["h_li"][:] # data for a lasers height
quality = f[laser]["land_ice_segments"]["atl06_quality_summary"][:].astype('int')
# Quality filter
idx1 = quality == 0 # data dictionary to see what quality summary is
#print('idx1', idx1)
# Spatial filter
idx2 = np.logical_and( np.logical_and(lat>=lat_min, lat<=lat_max), np.logical_and(lon>=lon_min, lon<=lon_max) )
idx = np.where( np.logical_and(idx1, idx2) ) # combines index 1 and 2 from data quality filter. make sure not empty. if empty all data failed test (low quality or outside box)
icesat2_data[track][year+month+day][laser] = dict.fromkeys(['lat','lon','height']) #store data, creates empty dictionary of lists lat, lon, hi, those strings are the keys to the dict.
icesat2_data[track][year+month+day][laser]['lat'] = lat[idx] # grabbing only latitudes using that index of points with good data quality and within bounding box
icesat2_data[track][year+month+day][laser]['lon'] = lon[idx]
icesat2_data[track][year+month+day][laser]['height'] = height[idx]
if lat[idx].any() == True and lon[idx].any() == True:
x, y = transformer.transform(icesat2_data[track][year+month+day][laser]['lon'], \
icesat2_data[track][year+month+day][laser]['lat'])
plt.scatter(x, y, marker='o', color='#000000')
Currently, they output separately, like this:
Not sure if you're still working on this, but here's a quick example I put together that you might be able to work with (obviously I don't have the data you're working with). A couple things that might not be self-explanatory - I used m() to transform the coordinates to map coordinates. This is Basemap's built-in transformation method so you don't have to use PyProj. Also, setting a zorder in the scatter function ensures that your points are plotted above the countries layer and don't get hidden underneath.
#Basemap
epsg = 6060; width = 2000.e3; height = 2000.e3 #epsg 3413. 6062
plt.figure(figsize=(20,10))
m=Basemap(epsg=epsg,resolution='l',width=width,height=height) #lat_ts=(90.+35.)/2.
m.drawcoastlines(color='white')
m.drawmapboundary(fill_color='#99ffff')
m.fillcontinents(color='#cc9966',lake_color='#99ffff')
m.drawparallels(np.arange(10,70,20),labels=[1,1,0,0])
m.drawmeridians(np.arange(-100,0,20),labels=[0,0,0,1])
plt.title('ICESAT2 Tracks in Greenland')
for coord in [[68,-39],[70,-39]]:
lat = coord[0]
lon = coord[1]
x, y = m(lon,lat)
m.scatter(x,y,color='red',s=100,zorder=10)
plt.show()
I think you might need:
plt.figure(figsize(20,10))
before creating the basemap, not after. As it stands it's creating a map and then creating a new figure after that which is why you're getting two figures.
Then your plotting line should be m.scatter() as you mentioned you tried before.

Python - Plot a graph with times on x-axis

I have the following dataframe in pandas:
dfClicks = pd.DataFrame({'clicks': [700,800,550],'date_of_click': ['10/25/1995
03:30','10/25/1995 04:30','10/25/1995 05:30']})
dfClicks['date_of_click'] = pd.to_datetime(dfClicks['date_of_click'])
dfClicks.set_index('date_of_click')
dfClicks.clicks = pd.to_numeric(dfClicks.clicks)
Could you please advise how I can plot the above such that the x-axis shows the date/time and the y axis the number of clicks? I will also need to plot another data frame which includes predicted clicks on the same graph, just to compare. The test could be a replica of above, with minor changes:
dfClicks2 = pd.DataFrame({'clicks': [750,850,500],'date_of_click': ['10/25/1995
03:30','10/25/1995 04:30','10/25/1995 05:30']})
dfClicks2['date_of_click'] = pd.to_datetime(dfClicks2['date_of_click'])
dfClicks2.set_index('date_of_click')
dfClicks2.clicks = pd.to_numeric(dfClicks2.clicks)
Change to numeric the column clicks and then:
ax = dfClicks.plot()
dfClicks2.plot(ax=ax)
ax.legend(["Clicks","Clicks2"])
Output:
UPDATE:
There is an error in how you set the index, change
dfClicks.set_index('date_of_click')
with:
dfClicks = dfClicks.set_index('date_of_click')

Producing a plot using a for loop

I want to select from a dataframe based on a name. Then I want to plot the data on a single graph using a for loop.
df = pd.read_csv ('Kd.csv')
watertype = ['Evian','Volvic','Buxton']
for type in watertype:
sdf = df[(df['Water']==type)]
Na = sdf.iloc[:,13]
Kd = sdf.iloc[:,2]
plt.plot(Na,Kd,'o')
plt.show()`
Multiple graphs produced instead of overlaying them on a single graph.

Categories

Resources