I have a DataFrame containing data about different projects. I tried to create a bar chart, representing the number of users for each project and this 'filtered' depending if the project has been audited before or after 2014.
My problem is that I would like to have all of the bars ranked from the biggest one to the smallest one, and not one on one side and the other one on the other side. I think it's quite hard to understand but with the following pictures, it will be much clearer.
I tried the following:
applications = applications.sort_values(by='NombreUtilisateur2017', ascending=False)
fig = px.bar(applications, x='AppCode', y='NombreUtilisateur2017', color='test_avant_2014')
fig.show()
Here is my output:
current output
But, I would like my graph to look like this:
expected output
Related
So I am making a program to plot a bar graph for a probability data set. The data set is not stored, at least I don't want it to. I need to plot a bar for every possibility,and I want the bars to be dynamic. Dynamic in the sense that I don't want them to be plotted by counting the occurrence of each item from the stored data set as I said the data set is not stored. I want the bars to generate with the data simultaneously. \n
I was trying to use python lists. So the bars would look something like, 36[****************]. But I can't think of using them dynamically. I am left with two possibilities, one that I generate like 60-120 bars (which is stupid). Or I store the data (which increases my work and execution time and load). And I also can't think of other things. So suggest me something please!
I am trying to get a bar plot for feature importance in the XGBoost classifier. It should have worked but it didn't. I tried too many times. Can you check the code below and tell me what is wrong with it?
feat_import=clf.feature_importances_
feat_names=X.columns
sorted_idx=clf.feature_importances_.argsort()[-20:]
plt.barh(feat_names[sorted_idx], clf.feature_importances_[sorted_idx])
It takes the features that the most important ones. However, it plots them unsorted.
When I use just numbers instead of column names I take the sorted bar graph.
plt.barh(range(20),feat_import[sorted_idx])
I couldn't figure out the problem here.
I'm working with a dataset regarding the survivors on the Titanic, where I'm trying to show the relationship between Age of passengers and the fare they paid.
This is what the data is currently formatted as:
from here, it was fairly easy to make a simple scatterplot, like so:
However, I am curious as to if there is a way to set the color of some of the points to be different based on the sex from the dataset. Most examples I have seen across the internet focus on how to change the color for two separate data sets. I initially tried to use an if statement to change the color depending on sex, but that didn't work for me the way I hoped it would.
Perhaps much easier with seaborn:
import seaborn as sns
data = sns.load_dataset('titanic')
sns.scatterplot('age', 'fare', data=data, hue='sex')
One potential solution I came up to after pondering a bit could potentially look like this as well:
The problem with this solution is you have to add more variables, which isn't ideal, and the results stack over each other a bit making it harder to see the data trends.
First of all sorry for my bad english as it is not first language.
I have recently started learning python and I am trying to develop a "simple" program, but I have run into a problem.
I am using xlwings to modify and interact with Excel. What I want to achieve (or to know if its possible) is:
I have excel look into data and plot a graph. However this graph sometimes has for example 20 values for the X-Axis and in other cases let's say 10 values for the X-Axis, thus, leaving 10 #NA empty spaces. Based on this, I want to adjust the graph to show only 10 values by changing the range that shapes the graph .
The function get_prod_hours() looks how many values I want on the X-Axis:
def get_prod_hours():
"""From the input gets the production hours to adapt the graphs"""
dt = wb.sheets['Calculatrice']
return dt.range('E24').value
Based on the value gotten from the function I must modify the range of values on the graph (by reducing it).
Solutions as for example create the graphs from scratch are not OK because I would like to only modify the range of the graph because the Excel file is a "standard" on my company.
I hope for something like:
Column A in Excel with values: 1, 2, 3, 4, 5 and get from get_prod_hours() a value of 5, so my graph will have only 5 points and not for example 6 of which one is #NA.
Thank you very much, and sorry for the wall of text.
The xlwings API doesn't offer a lot of options for charts (see https://docs.xlwings.org/en/stable/api.html?highlight=charts#xlwings.main.Charts).
Try to find the chart in wb.sheets[0].charts.
The range can then be modified with
range = xw.Range((1,1), (get_prod_hours(),1))
set_source_data(wb.sheets[0].range(range))
But from looking at the API and knowing how many options Excel charts have, the API feels too thin.
If this doesn't work, an option is to add a VBA macro which modifies the chart and call that. See How do I call an Excel macro from Python using xlwings?
Note from maintainers: this question is about the obsolete bokeh.charts API removed several years ago. For an example of timeseries charts in modern Bokeh, see here:
https://docs.bokeh.org/en/latest/docs/gallery/range_tool.html
I'm trying to create a timeseries graph with bokeh. This is my first time using bokeh, and my first time dealing with pandas as well. Our customers receive reviews on their products. I'm trying to create a graph which shows how their average review rating has changed over time.
Our database contains the dates of each review. We also have the average review value for that date. I need to plot a line with the x axis being dates and the y axis being the review value range (1 through 10).
When I accepted this project I thought it would be easy. How wrong I was. I found a timeseries example that looks good. Unfortunately, the example completely glosses over what is the most difficult part about creating a solution. Specifically, it does not show how to create an appropriate data structure from your source data. The example is retrieving pre-built datastructures from the yahoo api. I've tried examining these structures, but they don't exactly look straightforward to me.
I found a page explaining pandas structs. It is a little difficult for me to understand. Particularly confusing to me is how to represent points in the graph without necessarily labeling those points. For example the y axis should display whole numbers, but data points need not intersect with the whole number value. The page I found is linked below:
http://pandas.pydata.org/pandas-docs/stable/dsintro.html
Does anyone know of a working example for the timeseries chart type which exemplifies how to build the necessary data structure?
UPDATE:
Thanks to the answer below I toyed around with just passing lists into lines. It didn't occur to me that I could do this, but it works very well. For example:
date = [1/11/2011, 1/12/2011. 1/13/2011, 4/5/2014]
rating = [4, 4, 5, 2]
line(
date, # x coordinates
rating, # y coordinates
color='#A6CEE3', # set a color for the line
x_axis_type = "datetime", # NOTE: only needed on first
tools="pan,wheel_zoom,box_zoom,reset,previewsave" # NOTE: only needed on first
)
You don't have to use Pandas, you simply need to supply a sequence of x-values and a sequence of y-values. These can be plain Python lists of numbers, or NumPy arrays, or Pandas Series. Here is another time series example that uses just NumPy arrays:
http://docs.bokeh.org/en/latest/docs/gallery/color_scatter.html
EDIT: link updated