I am getting data from SQL database and it is converted in to the pandas dataframe. When I try to "print" my chart in streamlit, the order of the values is upside down.
dashboard_chart1 = st.line_chart(df, x="time", width=300, height=500)
I was trying to find something in the official streamlit docs, but there is no argument for the order.
Yes, I found a solution!
I was getting data from database with pandas function pd.read_sql(). All columns in dataframe were objects. I used function df['column_name'] = df['column_name'].astype(float) to convert them to floats. Now, my data are shown correctly.
Screenshot from working chart:
Related
I have been facing one issue while I am trying to plot a bar graph using the matplotlib library.
Please find the sample data below
Sample Data Image
count_movies_year = n_db.groupby('release_year').agg({'title':'count'}).rename(columns={'title':'no_of_titles'})
count_movies_year.reset_index()
I have written the above code and did the group_by on certain cases and renamed the column in the dataframe that I have in place. Now after this I wanted to plot a bar graph of the same using the matplotlib and I have written the below code
plt.bar(count_movies_year['release_year'],count_movies_year['no_of_titles'])
plt.xlabel('release_year')
plt.ylabel('no_of_titles')
plt.show()
but, when I do this I have some errors in place and the key_error shows me 'release_year'. Can I know what is wrong over here as I am new to Python and Matplotlib understanding. Can someone guide me where exactly things are going wrong so that I can correct them next time?
When doing a group_by, the column "release_year" no longer exist in you Dataframe, since it's now the index.
You have multiple solution :
using a reset_index as you did, but you should reattribute it to your variable
count_movies_year = count_movies_year.reset_index()
or use the inplace parameter
count_movies_year.reset_index(inplace=True)
use the .index directly in your plot
plt.bar(count_movies_year.index, count_movies_year['no_of_titles'])
I was trying to get data using pandas from a wikipedia article about the largest bankrupts DATA but for some reason the table was incomplete. I used this:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_largest_U.S._bank_failures')
type(df)
len(df)
df[1]`
PS: I am using hidrogen to run jupyter at Atom. But that was the output:
Please explain what happened. I am new to Data Science and Pandas
You are getting the whole table. By default, only some number of rows is displayed, hence the ... signs in the middle. If you want to display all rows you can change pandas display default as follows:
# show at most 100 rows
pd.options.display.max_rows = 100
Note that this is a display setting only, the DataFrame contains all table data already.
I have a Pandas DataFrame that has sports records in it. All of them look like this: "1-2-0", "17-12-1", etc., for wins, losses and ties. When I export this the records come up in different date formats within Excel. Some will come up as "12-May", others as "9/5/2001", and others will come up as I want them to.
The DataFrame that I want to export is named 'x' and this is the command I'm currently using. I tried it without the date_format part and it gave the same response in Excel.
x.to_csv(r'C:\Users\B\Desktop\nba.csv', date_format = '%s')
Also tried using to_excel and I kept getting errors while trying to export. Any ideas? I was thinking I am doing the date_format part wrong, but don't know to transfer the string of text directly instead of it getting automatically switched to a string.
Thanks!
I don't think its a python issue, but Excel auto detecting dates in your data.
But, see below to convert your scores to strings.
Try this,
import pandas as pd
df = pd.DataFrame({"lakers" : ["10-0-1"],"celtics" : ["11-1-3"]})
print(df.head())
here is the dataframe with made up data.
lakers celtics
0 10-0-1 11-1-3
Convert to dataframe to string
df = df.astype(str)
and save the csv:
df.to_csv('nba.csv')
Opening in LibreOffice gives me to columns with scores (made up)
You might have a use Excel issue going on here. Inline with my comment below, you can change any column in Excel to lots of different formats. In this case I believe Excel is auto detecting date formatting, incorrectly. Select your columns of data, right click, select format and change to anything else, like 'General'.
I have a prebuilt and populated powerpoint presentation where I am modifying the data for charts and tables. I would like to retain all formatting (and much of the text), but replace the data in a line chart within a slide.
I have a function that will replace the data using a pandas data frame that works with bar charts.
def replaceCategoryChart(df, chart, skipLastCol=0):
"""
Replaces Category chartdata for a simple series chart. e.g. Nonfarm Employment
Parameters:
df: dataframe containing new data. column 0 is the categories
chart: the powerpoint shape chart.
skipLast: 0=don't skip last column, 1+ will skip that many columns from the end
Returns: replaced chart(?)
"""
cols1= list(df)
#print(cols1)
#create chart data object
chart_data = CategoryChartData()
#create categories
chart_data.categories=df[cols1[0]]
# Loop over all series
for col in cols1[1:-skipLastCol]:
chart_data.add_series(col, df[col])
#replace chart data
chart.replace_data(chart_data)
...
S0_L= pd.read_excel(EXCEL_BOOK, sheet_name="S0_L", usecols="A:F")
S0_L_chart = prs.slides[0].shapes[3].chart
print(S0_L)
replaceCategoryChart(S0_L, S0_L_chart)
...
The python file runs successfully, however, when I open the powerpoint file I get the error
Powerpoint found a problem with content in Name.pptx.
Powerpoint can attempt to repair the presentation.
If you trust the source of this presentation, click Repair.
After clicking repair, the slide I attempted to modify is replaced by a blank layout.
Because this function works for bar charts, I think there is a mistake in the way I am understanding how to use replace_data() for a line chart.
Thank you for your help!
If your "line chart" is an "XY Scatter" chart, you'll need a different chart-data object, the XyChartData object and then to populate its XySeries objects: https://python-pptx.readthedocs.io/en/latest/api/chart-data.html#pptx.chart.data.XyChartData
I would recommend starting by getting it working using literal values, e.g. "South" and 1.05, and then proceed to supply the values from Pandas dataframes. That way you're sure the python-pptx part of your code is properly structured and you'll know where to go looking for any problems that arise.
As scanny mentioned, replace_data() does work for category line charts.
The repair error was (probably) caused by incorrectly adding series data, (there was a bad loop, corrected below).
# Loop over all series
for col in cols1[1:len(cols1)-skipLastCol]:
print('Type of column is ' + str(type(col)))
chart_data.add_series(col, df[col])
Whenever I try to plot data using the plotly python library (in this case from Modeanalytics dataframe), it ends up connecting out-of-order data points together and causing a mess as follows:
If I sort my data with the SQL query that genrates the dataframe, then the plot looks great!
However, I want to actually sort the data in python and not in SQL.
I attempted to take the out-of-order dataframe and do this:
df.sort_values(by=['time'])
but it still resulted in the messy plot.
How can I sort my data frame in python such that it is plotted correctly?
By default sort_values() returns a new dataframe without modifying the original.
You can either set the flag to True or assign the output back to the original dataframe.
Try:
df = df.sort_values(by=['time'])
Or
df.sort_values(by=['time'], inplace=True)