I have the following dataframe in pandas:
dfClicks = pd.DataFrame({'clicks': [700,800,550],'date_of_click': ['10/25/1995
03:30','10/25/1995 04:30','10/25/1995 05:30']})
dfClicks['date_of_click'] = pd.to_datetime(dfClicks['date_of_click'])
dfClicks.set_index('date_of_click')
dfClicks.clicks = pd.to_numeric(dfClicks.clicks)
Could you please advise how I can plot the above such that the x-axis shows the date/time and the y axis the number of clicks? I will also need to plot another data frame which includes predicted clicks on the same graph, just to compare. The test could be a replica of above, with minor changes:
dfClicks2 = pd.DataFrame({'clicks': [750,850,500],'date_of_click': ['10/25/1995
03:30','10/25/1995 04:30','10/25/1995 05:30']})
dfClicks2['date_of_click'] = pd.to_datetime(dfClicks2['date_of_click'])
dfClicks2.set_index('date_of_click')
dfClicks2.clicks = pd.to_numeric(dfClicks2.clicks)
Change to numeric the column clicks and then:
ax = dfClicks.plot()
dfClicks2.plot(ax=ax)
ax.legend(["Clicks","Clicks2"])
Output:
UPDATE:
There is an error in how you set the index, change
dfClicks.set_index('date_of_click')
with:
dfClicks = dfClicks.set_index('date_of_click')
Related
So I have found this code that creates a python graph that is updated a plot in real time that does everything i need, but i would like if instead of the x-axis moving the values are updated. I have been searching for examples but i only find a static version where the values of the x values stay the same.
What i have right now:
what i want:
Here is the code:
# Create Plot Widget
self.scrolling_timestamp_plot_widget = pg.PlotWidget(axisItems={'bottom': TimeAxisItem(orientation='bottom')})
# Enable/disable plot squeeze (Fixed axis movement)
self.scrolling_timestamp_plot_widget.plotItem.setMouseEnabled(x=False, y=False)
self.scrolling_timestamp_plot_widget.setTitle('Signal 1 ')
self.scrolling_timestamp_plot_widget.setLabel('left', 'Value')
self.scrolling_timestamp_plot_widget.setLabel('bottom', 'Time (s)')
self.scrolling_timestamp_plot = self.scrolling_timestamp_plot_widget.plot()
self.scrolling_timestamp_plot.setPen("r")
def plot_updater(self):
self.data_point = float(self.current_position_value)
self.data.append({'x': self.timestamp.elapsed(), 'y': self.data_point })
print("List Values:",self.data)
self.scrolling_timestamp_plot.setData(x=[item['x'] for item in self.data], y=[item['y'] for item in self.data])
I am using Altair to create a graph, but for some weird reason it's seems to be generating a tick for each of the points. Creating a graph like this Altair Graph
If I filter the dataframe, it produces weird axis values. Altair graph
Is there a way to reduce the amount of ticks? I tried tickCount in the y axis paramater and it didn't work since it seems to require integers.I also tried setting the axis value parameter to a list [0,0.2,0.4,0.6,0.8,1] and that didn't work either. Here is my code (sorry it's so lengthy!). Thank you in advance!
a = alt.Chart(df_filtered).mark_point().encode(x =alt.X('Process_Time_(mins)', axis = alt.Axis(title='Process Time (mins)')),
y = alt.Y('Heavy_Phase_%SS',axis=alt.Axis(title='Heavy Phase %SS', tickCount = 10),sort = 'descending'),
color = alt.Color('DSP_Lot', legend = alt.Legend(title = 'DSP_Lot')),
shape = alt.Shape('Strain', scale = alt.Scale(range = ["circle", "square", "cross", "diamond", "triangle-up", "triangle-down", "triangle-right", "triangle-left"])),
tooltip = [alt.Tooltip('DSP_Lot',title = 'Lot'), alt.Tooltip('Heavy_Phase_%SS', title = 'Heavy Phase %SS'),
alt.Tooltip('Process_Time_(mins)', title = 'Process Time (mins)'), alt.Tooltip('Purpose', title = 'Purpose'), alt.Tooltip('Strain', title = 'Strain'),
alt.Tooltip('Trial', title = 'Trial')]).properties(width = 1000, height = 500)
It's hard to tell without a reproducible example but I suspect the issue is that your y axis is defaulting to a nominal encoding type, in which case you get one tick mark per unique value. If you specify a quantitative type in the Y encoding, it may improve things:
y = alt.Y('Heavy_Phase_%SS:Q', ...)
The reason it defaults to nominal is probably because the associated column in the pandas dataframe has a string type rather than a numerical type.
I am making an XY-scatter chart, where both axes show aggregated data.
For both variables I want to have an interval selection in two small charts below where I can brush along the x-axis to set a range.
The selection should then be used to filter what is taken into account for each aggregation operation individually.
On the example of the cars data set, let's say I what to look at Horsepower over Displacement. But not of every car: instead I aggregate (sum) by Origin. Additionally I create two plots of totally mean HP and displacement over time, where I add interval selections, as to be able to set two distinct time ranges.
Here is an example of what it should look like, although the selection functionality is not yet as intended.
And here below is the code to produce it. Note, that I left some commented sections in there which show what I already tried, but does not work. The idea for the transform_calculate came from this GitHub issue. But I don't know how I could use the extracted boundary values for changing what is included in the aggregations of x and y channels. Neither the double transform_window took me anywhere. Could a transform_bin be useful here? How?
Basically, what I want is: when brush1 reaches for example from 1972 to 1975, and brush2 from 1976 to 1979, I want the scatter chart to plot the summed HP of each country in the years 1972, 1973 and 1974 against each countries summed displacement from 1976, 1977 and 1978 (for my case I don't need the exact date format, the Year might as well be integers here).
import altair as alt
from vega_datasets import data
cars = data.cars.url
brush1 = alt.selection(type="interval", encodings=['x'])
brush2 = alt.selection(type="interval", encodings=['x'])
scatter = alt.Chart(cars).mark_point().encode(
x = 'HP_sum:Q',
y = 'Dis_sum:Q',
tooltip = 'Origin:N'
).transform_filter( # Ok, I can filter the whole data set, but that always acts on both variables (HP and displacement) together... -> not what I want.
brush1 | brush2
).transform_aggregate(
Dis_sum = 'sum(Displacement)',
HP_sum = 'sum(Horsepower)',
groupby = ['Origin']
# ).transform_calculate( # Can I extract the selection boundaries like that? And if yes: how can I use these extracts to calculate the aggregationsof HP and displacement?
# b1_lower='(isDefined(brush1.x) ? (brush1.x[0]) : 1)',
# b1_upper='(isDefined(brush1.x) ? (brush1.x[1]) : 1)',
# b2_lower='(isDefined(brush2.x) ? (brush2.x[0]) : 1)',
# b2_upper='(isDefined(brush2.x) ? (brush2.x[1]) : 1)',
# ).transform_window( # Maybe instead of calculate I can use two window transforms...??
# conc_sum = 'sum(conc)',
# frame = [brush1.x[0],brush1.x[1]], # This will not work, as it sets the frame relative (back- and foreward) to each datum (i.e. sliding window), I need it to correspond to the entire data set
# groupby=['sample']
# ).transform_window(
# freq_sum = 'sum(freq)',
# frame = [brush2.x[0],brush2.x[1]], # ...same problem here
# groupby=['sample']
)
range_sel1 = alt.Chart(cars).mark_line().encode(
x = 'Year:T',
y = 'mean(Horsepower):Q'
).add_selection(
brush1
).properties(
height = 100
)
range_sel2 = alt.Chart(cars).mark_line().encode(
x = 'Year:T',
y = 'mean(Displacement):Q'
).add_selection(
brush2
).properties(
height = 100
)
scatter & range_sel1 & range_sel2
Interval selection cannot be used for aggregate charts yet in Vega-Lite. The error behavior have been updated in a recent PR to Vega-Lite to show a helpful message.
Not sure if I understand your requirements correctly, does this look close to what you want? (Just added param selections on top of your vertically concatenated graphs)
Vega Editor
I am trying to replicate a chart like the following using a pandas dataframe and bokeh vbar.:
Objective
So far, I´ve managed to place the labels in their corresponding height but now I can't find a way to access the numeric value where the category (2016,2017,2018) is located in the x axis. This is my result:
My nested categorical stacked bars chart
This is my code. It's messy but it's what i've managed so far. So is there a way to access the numeric value in x_axis of the bars?
def make_nested_stacked_bars(source,measurement,dimension_attr):
#dimension_attr is a list that contains the names of columns in source that will be used as categories
#measurement containes the name of the column with numeric data.
data = source.copy()
#Creates list of values of highest index
list_attr = source[dimension_attr[0]].unique()
list_stackers = list(source[dimension_attr[-1]].unique())
list_stackers.sort()
#trims labals that are too wide to fit in graph
for column in data.columns:
if data[column].dtype.name == 'object':
data[column] = np.where(data[column].apply(len) > 30, data[column].str[:30]+'...', data[column])
#Creates a list of dataframes, each grouping a specific value
list_groups = []
for item in list_attr:
list_groups.append(data[data[dimension_attr[0]] == item])
#Groups data by dimension attrs, aggregates measurement to count
#Drops highest index from dimension attr
dropped_attr = dimension_attr[0]
dimension_attr.remove(dropped_attr)
#Creates groupby by the last 2 parameters, and aggregates to count
#Calculates percentage
for index,value in enumerate(list_groups):
list_groups[index] = list_groups[index].groupby(by=dimension_attr).agg({measurement: ['count']})
list_groups[index] = list_groups[index].groupby(level=0).apply(lambda x: round(100 * x / float(x.sum()),1))
# Resets indexes
list_groups[index] = list_groups[index].reset_index()
list_groups[index] = list_groups[index].pivot(index=dimension_attr[0], columns=dimension_attr[1])
list_groups[index].index = [(x,list_attr[index]) for x in list_groups[index].index]
# Drops dimension attr as top level column
list_groups[index].columns = list_groups[index].columns.droplevel(0)
list_groups[index].columns = list_groups[index].columns.droplevel(0)
df = pd.concat(list_groups)
# Get the number of colors needed for the plot.
colors = brewer["Spectral"][len(list_stackers)]
colors.reverse()
p = figure(plot_width=800, plot_height=500, x_range=FactorRange(*df.index))
renderers = p.vbar_stack(list_stackers, x='index', width=0.3, fill_color=colors, legend=[get_item_value(x)for x in list_stackers], line_color=None, source=df, name=list_stackers,)
# Adds a different hovertool to a stacked bar
#empy dictionary with initial values set to zero
list_previous_y = {}
for item in df.index:
list_previous_y[item] = 0
#loops through bar graphs
for r in renderers:
stack = r.name
hover = HoverTool(tooltips=[
("%s" % stack, "#%s" % stack),
], renderers=[r])
#Initial value for placing label in x_axis
previous_x = 0.5
#Loops through dataset rows
for index, row in df.iterrows():
#adds value of df column to list
list_previous_y[index] = list_previous_y[index] + df[stack][index]
## adds label if value is not nan and at least 10
if not math.isnan(df[stack][index]) and df[stack][index]>=10:
p.add_layout(Label(x=previous_x, y=list_previous_y[index] -df[stack][index]/2,
text='% '+str(df[stack][index]), render_mode='css',
border_line_color='black', border_line_alpha=1.0,
background_fill_color='white', background_fill_alpha=1.0))
# increases position in x_axis
#this should be done by adding the value of next bar in x_axis
previous_x = previous_x + 0.8
p.add_tools(hover)
p.add_tools(hover)
p.legend.location = "top_left"
p.x_range.range_padding = 0.2
p.xgrid.grid_line_color = None
return p
Or is there an easier way to get all this done?
Thank you for your time!
UPDATE:
Added an additional image of a three level nested chart where the label placement in x_axis should be accomplished too
Three level nested chart
I can't find a way to access the numeric value where the category (2016,2017,2018) is located in the x axis.
There is not any way to access this information on the Python side in standalone Bokeh output. The coordinates are only computed inside the browser on the JavaScript side. i.e. only after your Python code has finished running and is out of the picture entirely. Even in a Bokeh server app context there is not any direct way, as there are not any synchronized properties that record the values.
As of Bokeh 1.3.4, support for placing labels with categorical coordinates is a known open issue.
In the mean time, the only workarounds I can suggest are:
Use the text glyph method with coordinates in a ColumnDataSource, instead of Label. That should work to position with actual categorical coordinates. (LabelSet might also work, though I have not tried). You can see an example of text with categorical coordiantes here:
https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/periodic.py
Use numerical coordinates to position the Label. But you will have to experiment/best guess to find numercal coordinates that work for you. A rule of thumb is that categories have a width of 1.0 in synthetic (numeric) coordinate space.
My solution was..
Creating a copy of the dataframe used for making the chart. This dataframe (labeling_data) contains the y_axis coordinates calculated so that the label is positioned at the middle of the corresponding stacked bar.
Then, added aditional columnns to be used as the actual label where the values to be displayed were concatenated with the percentage symbol.
labeling_data = df.copy()
#Cumulative sum of columns
labeling_data = labeling_data.cumsum(axis=1)
#New names for columns
y_position = []
for item in labeling_data.columns:
y_position.append(item+'_offset')
labeling_data.columns = y_position
#Copies original columns
for item in df:
#Adding original columns
labeling_data[item] = df[item]
#Modifying offset columns to place label in the middle of the bar
labeling_data[item+'_offset'] = labeling_data[item+'_offset']-labeling_data[item]/2
#Concatenating values with percentage symbol if at least 10
labeling_data[item+'_label'] = np.where(df[item] >=10 , '% '+df[item].astype(str), "")
Finally, by looping through the renderers of the plot, a labelset was added to each stack group using the labeling_data as Datasource . By doing this, the index of the dataframe can be used to set the x_coordinate of the label. And the corresponding columns were added for the y_coordinate and text parameters.
info = ColumnDataSource(labeling_data)
#loops through bar graphs
for r in renderers:
stack = r.name
#Loops through dataset rows
for index, row in df.iterrows():
#Creates Labelset and uses index, y_offset and label columns
#as x, y and text parameters
labels = LabelSet(x='index', y=stack+'_offset', text=stack+'_label', level='overlay',
x_offset=-25, y_offset=-5, source=info)
p.add_layout(labels)
Final result:
Nested categorical stacked bar chart with labels
I am using python's version of plotly to build time series plots of tweets. But I only want to include tweets in the most recent five days. So I have this code which works as far I can tell (it's a simplified version and not reproducible because I am very sure my dataframe is formatted correctly and pretty sure that the bug lies somewhere in the code below):
# Set range to use to limit to recent dates
min_day = tweet_dataframe['day'].max() - timedelta(days = 5)
reduced_df = tweet_dataframe.loc[tweet_dataframe['date'] > min_day]
# Plot time series
time_series = go.Scatter(
x = reduced_df['date'],
y = reduced_df['vader_polarity'],
name = topic,
mode = 'markers'
hoverinfo = 'x+text',
text = reduced_df['custom_text'],
)
fig.append_trace(time_series)
offline_plot.plot(fig, filename = path, auto_open = True)
This generates an interactive time series that displays the date and some custom text. After manually checking the hover info, it looks like the data points match what I would expect from the dataframe.
However, using the approach below, without defining a reduced_df, a few of the data points display the wrong hover information or are plotted in the wrong date bin. When I do not include the > min_day bit, the plots are fine.
time_seres = go.Scatter(
x = tweet_dataframe['date'].loc[tweet_dataframe['date'] > min_day],
y = tweet_dataframe['vader_polarity'].loc[tweet_dataframe['day'] > min_day,
name = topic,
mode = 'markers',
hoverinfo = 'x+text',
text = tweet_dataframe['custom_text']
)
Has anyone had a similar problem with plotting time series in plotly, or is there an obvious error in my plotly/pandas logic?
I found my bug. All I needed was to specify the date range in the text argument of plotly like so
time_seres = go.Scatter(
x = tweet_dataframe['date'].loc[tweet_dataframe['date'] > min_day],
y = tweet_dataframe['vader_polarity'].loc[tweet_dataframe['day'] > min_day],
name = topic,
mode = 'markers',
hoverinfo = 'x+text',
text = tweet_dataframe['custom_text'].loc[tweet_dataframe['day'] > min_day]
)