Trying to create a stacked barchart from dataframe data - python

I am trying to create a stacked bar-chart showing total marriages by months for each year between 2008 and 2015.
import pandas as pd
import numpy as np
import io
import requests
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
url = "https://data.code4sa.org/api/views/r4bb-fvka/rows.csv"
file=requests.get(url).content
c=pd.read_csv(io.StringIO(file.decode('utf-8')))
Here I am adding the total number of marriages for each year then grouping by both Marriage Year and month to have the total number of marriages for each month
c['Total'] = c['MarriageYear']
months = c.groupby(['MarriageYear','MarriageMonth'])['Total'].count()
I think the index should be both Marriage Year and Marriage Month since I want the total of marriages for each month in every year???
months.set_index(['MarriageYear','MarriageMonth'])\
.reindex(months.set_index('MarriageMonth').sum().sort_values().index, axis=1)\
.T.plot(kind='bar', stacked=True,
colormap=ListedColormap(sns.color_palette("GnBu", 10)),
figsize=(24,28))
If you do post any potential solutions or what I should look at again, please explain why/where I went wrong and how I should be approaching this

Try this:
c.groupby(['MarriageYear', 'MarriageMonth']).size() \
.unstack().plot.bar(stacked=True, colormap='GnBu', figsize=(12, 14))

Related

Is it possible to add another x axis to a plotly chart?

I have a plotly chart that looks like this:
Is there a way to make a second x axis that only has the years? What I mean is that I want two x axes: a 'sub-axis' that has the months (Sep, Nov, Jan , ...), and another one that has the years (2021, 2022, 2023).
It is possible to handle this by making the x-axis a multiple list, but if the original data is in date units, it will be changed to a graph of one day in month units. To put it more simply, if the data is for one year, there are 365 points, but if the data is displayed in months only, there will be 12 points. The closest way to meet the request is to make it month name and day.
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import calendar
df = px.data.stocks()
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
multi_index = [df['year'].values,df['date'].dt.strftime('%b-%d').values]
fig = go.Figure()
fig.add_scatter(x=multi_index, y=df['GOOG'])
fig.show()

Group years by decade in seaborn barplot

If I have a DataFrame with a column 'Year' and another column 'Average temperature' and I want to represent them in a barplot to see if the global average temperature has risen over the last decades, how do you convert years to decades?
For example, between 1980 and 1989 I need it to be represented in x axis as 1980. For 1990 and 1999 as 1990, and so on.
Note that:
x axis = Year
y axis = Average temperature
Many thanks
You can do this by,
Converting years to starting year of the decade
Then take average temperature for years of that decade
Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
#Sample data. Replace it with your data.
df = pd.DataFrame([[2011,20],[2012,10],[2013,10],[2014,10],[2015,10],[2016,10],[2017,10],[2018,10],[2019,10],[2020,10],[2021,10],[2022,15]], columns=['year','temp'])
df['year'] = df['year'] - df['year'] % 10
df_decade = (df.groupby(['year']).mean().reset_index())
ax = sns.barplot(x="year", y="temp", data=df_decade)
plt.show()

How to create a State vs. Death Bar Graph

File: https://docs.google.com/spreadsheets/d/1JNrPnC2YRg78ceblt1eeBN_Iz6rG2psE/edit?usp=sharing&ouid=105308566456636539364&rtpof=true&sd=true
I am looking to create a Bar Graph Comparing State vs Death COVID 19 Data(Data is Attached) I have already filtered out the states and dates I want using the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
datetime.datetime.strptime
df = pd.read_excel("Project.xlsx")
start = datetime.date(2020,10,31).strftime('%Y%m%d')
end = datetime.date(2020,12,1).strftime('%Y%m%d')
dfnew=df.query(f"{start} < date < {end}")
dfnew = dfnew.fillna('0')
dflatest = dfnew[(dfnew['state']=='GA')|(dfnew['state']=='IL')|(dfnew['state']=='CA')|(dfnew['state']=='NY')|
(dfnew['state']=='NC')|(dfnew['state']=='MI')|(dfnew['state']=='OH')|(dfnew['state']=='FL')|
(dfnew['state']=='PA')|(dfnew['state']=='TX')]
dflatest
However I am looking to get the average deaths (add up deaths per day) in the month of November by State. And Create a bar graph with X: State Y: Average Deaths in month of November I am not sure how to write out this code and any help would be appreciated.

How to omit only weekends from my data frame?

I am working on a project to create an algorithmic trader. However, I want to remove the weekends from my data frame as it ruins the data as shown in I have tried to do somethings I found on StackOverflow but I get an error that the type is Timestamp and so I can't use that technique. It also isn't a column in the data frame. I'm new to python so I'm not very sure but I think it's an index since when I go through the .index function it shows me the date and time. I'm sorry if these are stupid questions but I am new to python and pandas.
Here is my code:
#import all the libraries
import nsetools as ns
import pandas as pd
import numpy
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
plt.style.use('fivethirtyeight')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
print(hist)
plt.figure(figsize=(12.5, 4.5))
plt.plot(hist['Close'], label=a)
plt.title('close price history')
plt.xlabel("13 Nov 2020 too 13 Dec 2020")
plt.ylabel("Close price")
plt.legend(loc='upper left')
plt.show()
EDIT: On the suggestion of a user, I tried to modify my code to this
refinedlist = hist[hist.index.dayofweek<5]plt.style.use('fivethirtyeight')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
refinedlist = hist[hist.index.dayofweek<5]
print (refinedlist)
And graphed that, but the graph still includes the weekends on the x axis.
In the first place, stock market data does not exist because the market is closed on holidays and national holidays. The reason for this is that your unit of acquisition is time, so there is also no data from the time the market closes to the time it opens the next day.
For example, I graphed the first 50 results. (The x-axis doesn't seem to be correct.)
plt.plot(hist['Close'][:50], label=a)
As one example, if you include holidays and national holidays and draw a graph with missing values for the times when the market is not open, you get the following.
new_idx = pd.date_range(hist.index[0], hist.index[-1], freq='15min')
hist = hist.reindex(new_idx, fill_value=np.nan)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
# plt.style.use('fivethirtyeight')
# a = input("Enter the ticker name you wish to apply strategy to")
a = 'AAPL'
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
new_idx = pd.date_range(hist.index[0], hist.index[-1], freq='15min')
hist = hist.reindex(new_idx, fill_value=np.nan)
plt.figure(figsize=(12.5, 4.5))
plt.plot(hist['Close'], label=a)
plt.title('close price history')
plt.xlabel("13 Nov 2020 too 13 Dec 2020")
plt.ylabel("Close price")
plt.legend(loc='upper left')
plt.show()

How do I create a Line graph with my Data?

I have a CSV file which contains two columns. First column contains a date in the format 01/01/1969 and second column has an average house price for that month. The data I have ranges from 01/04/1969 to the same date in 2019 for a total of 613 entries in the dataframe. I want to create a line graph which represents the average house price per year. So far I have this.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('ScottishAveragePrices.csv')
df.groupby(['Date']).mean().sort_values('AveragePrice')
The output is :
AveragePrice
Date
01/04/1968 2844.980688
01/05/1968 2844.980688
01/06/1968 2844.980688
01/10/1968 2921.049691
01/11/1968 2921.049691
...
01/04/2019 150825.247700
01/09/2018 151465.715100
01/10/2018 151499.207500
01/07/2018 151874.694900
01/08/2018 152279.438800
[613 rows x 1 columns]
Im just not sure how to tranfer this data into a line graph. Sorry if the formatting of this post is wrong I'm very new to the forum.
Thanks
Name the df and then plot it with matplotlib:
df_2 = df.groupby(['Date']).mean().sort_values('AveragePrice')
df_2.plot(y="AveragePrice")
Make sure you also have the matplotlib magic function in your code:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('ScottishAveragePrices.csv')
df = df.groupby(['Date']).mean().sort_values('AveragePrice')
plt.plot(df['Date'], df['AveragePrice'])
plt.show()

Categories

Resources