How to create a State vs. Death Bar Graph - python

File: https://docs.google.com/spreadsheets/d/1JNrPnC2YRg78ceblt1eeBN_Iz6rG2psE/edit?usp=sharing&ouid=105308566456636539364&rtpof=true&sd=true
I am looking to create a Bar Graph Comparing State vs Death COVID 19 Data(Data is Attached) I have already filtered out the states and dates I want using the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
datetime.datetime.strptime
df = pd.read_excel("Project.xlsx")
start = datetime.date(2020,10,31).strftime('%Y%m%d')
end = datetime.date(2020,12,1).strftime('%Y%m%d')
dfnew=df.query(f"{start} < date < {end}")
dfnew = dfnew.fillna('0')
dflatest = dfnew[(dfnew['state']=='GA')|(dfnew['state']=='IL')|(dfnew['state']=='CA')|(dfnew['state']=='NY')|
(dfnew['state']=='NC')|(dfnew['state']=='MI')|(dfnew['state']=='OH')|(dfnew['state']=='FL')|
(dfnew['state']=='PA')|(dfnew['state']=='TX')]
dflatest
However I am looking to get the average deaths (add up deaths per day) in the month of November by State. And Create a bar graph with X: State Y: Average Deaths in month of November I am not sure how to write out this code and any help would be appreciated.

Related

How to omit only weekends from my data frame?

I am working on a project to create an algorithmic trader. However, I want to remove the weekends from my data frame as it ruins the data as shown in I have tried to do somethings I found on StackOverflow but I get an error that the type is Timestamp and so I can't use that technique. It also isn't a column in the data frame. I'm new to python so I'm not very sure but I think it's an index since when I go through the .index function it shows me the date and time. I'm sorry if these are stupid questions but I am new to python and pandas.
Here is my code:
#import all the libraries
import nsetools as ns
import pandas as pd
import numpy
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
plt.style.use('fivethirtyeight')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
print(hist)
plt.figure(figsize=(12.5, 4.5))
plt.plot(hist['Close'], label=a)
plt.title('close price history')
plt.xlabel("13 Nov 2020 too 13 Dec 2020")
plt.ylabel("Close price")
plt.legend(loc='upper left')
plt.show()
EDIT: On the suggestion of a user, I tried to modify my code to this
refinedlist = hist[hist.index.dayofweek<5]plt.style.use('fivethirtyeight')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
refinedlist = hist[hist.index.dayofweek<5]
print (refinedlist)
And graphed that, but the graph still includes the weekends on the x axis.
In the first place, stock market data does not exist because the market is closed on holidays and national holidays. The reason for this is that your unit of acquisition is time, so there is also no data from the time the market closes to the time it opens the next day.
For example, I graphed the first 50 results. (The x-axis doesn't seem to be correct.)
plt.plot(hist['Close'][:50], label=a)
As one example, if you include holidays and national holidays and draw a graph with missing values for the times when the market is not open, you get the following.
new_idx = pd.date_range(hist.index[0], hist.index[-1], freq='15min')
hist = hist.reindex(new_idx, fill_value=np.nan)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
# plt.style.use('fivethirtyeight')
# a = input("Enter the ticker name you wish to apply strategy to")
a = 'AAPL'
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
new_idx = pd.date_range(hist.index[0], hist.index[-1], freq='15min')
hist = hist.reindex(new_idx, fill_value=np.nan)
plt.figure(figsize=(12.5, 4.5))
plt.plot(hist['Close'], label=a)
plt.title('close price history')
plt.xlabel("13 Nov 2020 too 13 Dec 2020")
plt.ylabel("Close price")
plt.legend(loc='upper left')
plt.show()

Error ValueError: day is out of range for month

I was looking at a sample table of student information and wanted to see what days were the most popular for students to enroll on a course. The script worked fine the first day I ran it and I left it. A few days later I returned to take another look at it but started getting the ValueError message.
Why did it stop working? No new information was added to the dataset
since it worked the first time. The code now fails at
df["EnrolmentDate"] = pd.to_datetime(df.EnrolmentDate)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import seaborn as sns
colors = sns.cubehelix_palette(28, rot=-0.4)
df = pd.read_csv("data.csv")
#print(df.dtypes)
# To change the format of the data type from object to datetime. Has to be run at start of script or the format returns to object.
#df['Enrolment Date'] = pd.to_datetime(df['Enrolment Date'])
print(df.EnrolmentDate.str.slice(0, 10))
df["EnrolmentDate"] = pd.to_datetime(df.EnrolmentDate)
print(df.head())
print(df.dtypes)
#Tells us what day of the week the enrolment date was. Can also use .dayofyear. Google Pandas API Reference, search for .dt., datetime properties
print(df.EnrolmentDate.dt.weekday_name)
#Shows the latest or greatest enrolment date
print(df.EnrolmentDate.max())
print(df.EnrolmentDate.min())
print(df.EnrolmentDate.max()-df.EnrolmentDate.min())
df["EnrolmentDay"] = df.EnrolmentDate.dt.weekday_name
print(df.head())
print(df.EnrolmentDay.value_counts())
print(df.EnrolmentDay.value_counts().plot())
#print(df.Day.value_counts().sort_index())
#df.EnrolmentDay.value_counts().sort_index().plot()
# naming the x axis
plt.xlabel('Day')
# naming the y axis
plt.ylabel('No. of Enrolments')
plt.show()

How do I create a Line graph with my Data?

I have a CSV file which contains two columns. First column contains a date in the format 01/01/1969 and second column has an average house price for that month. The data I have ranges from 01/04/1969 to the same date in 2019 for a total of 613 entries in the dataframe. I want to create a line graph which represents the average house price per year. So far I have this.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('ScottishAveragePrices.csv')
df.groupby(['Date']).mean().sort_values('AveragePrice')
The output is :
AveragePrice
Date
01/04/1968 2844.980688
01/05/1968 2844.980688
01/06/1968 2844.980688
01/10/1968 2921.049691
01/11/1968 2921.049691
...
01/04/2019 150825.247700
01/09/2018 151465.715100
01/10/2018 151499.207500
01/07/2018 151874.694900
01/08/2018 152279.438800
[613 rows x 1 columns]
Im just not sure how to tranfer this data into a line graph. Sorry if the formatting of this post is wrong I'm very new to the forum.
Thanks
Name the df and then plot it with matplotlib:
df_2 = df.groupby(['Date']).mean().sort_values('AveragePrice')
df_2.plot(y="AveragePrice")
Make sure you also have the matplotlib magic function in your code:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('ScottishAveragePrices.csv')
df = df.groupby(['Date']).mean().sort_values('AveragePrice')
plt.plot(df['Date'], df['AveragePrice'])
plt.show()

How to make overlay plots of a variable, but every plot than i want to make has a different length of data

I want to overlay 30 plots, each of those is the Temperature of one day, to make at the end a comparison of the develop of the Temperature and how much differ from one day to another , the problem is that when i separate the data(separate the 30 days) in pandas, every day data set has different length,for example the first day has 54977 Temperature data , and the second day has 54988 ant the third also differ so the thing I want in resume is: overlay 30 plots and in the resultant graphic the x axis use the time ticks of the first day, and the other 29 plots just match to those ticks and reduce the data to a limit in the plot to make them all start from a point a finish in other it doesnt matter if some hours or data get lost, i just want to make something like this(see last image).
The code so far is this, im not very good in python so dont judge my long code
`
import pandas as pd
from datetime import date
import datetime as dt
import calendar
import numpy as np
import pylab as plt
import matplotlib.ticker as ticker
import seaborn as sns
>
datos = pd.read_csv("Jun2018T.txt", sep = ',', names=('Fecha', 'Hora', 'RADNETA', 'RADCORENT', 'RADCORSAL', 'RADINFENT', 'RADINFSAL', 'TEMP'))
>
datos['Hora'] = datos['Hora'].str[:9]
datos['Hora']
>
Dia01Jun2018 = datos[datos['Fecha'] == "2018-06-01"]
>
tiempo01=Dia01Jun2018['Hora']
temp01=Dia01Jun2018['TEMP']
>
imagen = plt.figure(figsize=(25,10))
plt.plot(tiempo01,temp01)
plt.xticks(np.arange(0, 54977, 7000)) #the number 54977 is the last data that the first day has, the second day has a different length an so on with the rest of the days
plt.xlabel("Tiempo (H:M:S)(Formato 24 Horas)")
plt.ylabel("Temperatura (K)")
plt.title("Día 01 Jun 2018")
plt.show()
imagen.savefig('D1JUN2018')
`
The code above repeats for every day, maybe with a cycle is more quickly but i don handle python very good.
And the result of this is this graph is the next one:
enter image description here
The graph that i want is this
enter image description here
Mi data is represented in this form
enter image description here
and this are the formats
enter image description here
if I understood your question right, that you want to plot all days in a single plot, you have togenerate one figure, plt.plot() all days before you finally plt.show() the image including all plots made before. Try something like shown below:
(as I don't know your data, I don't know if this code would work. the concept should be clear at least.)
import pandas as pd
from datetime import date
import datetime as dt
import calendar
import numpy as np
import pylab as plt
import matplotlib.ticker as ticker
import seaborn as sns
>
datos = pd.read_csv("Jun2018T.txt", sep = ',', names=('Fecha', 'Hora', 'RADNETA', 'RADCORENT', 'RADCORSAL', 'RADINFENT', 'RADINFSAL', 'TEMP'))
>
datos['Hora'] = datos['Hora'].str[:9]
>
imagen = plt.figure(figsize=(25,10))
for day in range(1,31):
dia = datos[datos['Fecha'] == "2018-06-"+(f"{day:02d}")]
tiempo= pd.to_datetime(dia['HORA'], format='%H:%M:%S').dt.time
temp= dia['TEMP']
plt.plot(tiempo, temp)
#plt.xticks(np.arange(0, 54977, 7000))
plt.xlabel("Tiempo (H:M:S)(Formato 24 Horas)")
plt.ylabel("Temperatura (K)")
plt.title("Jun 2018")
plt.show()
imagen.savefig('JUN2018')
For the second part of your question:
as your data is stored with an timestamp, you can transform it to pandas time objects. Using them for plots, the x-axis should not have an offset anymore. I've modified the tiempo =... assignment in the code above.
The x-tics should automatically be in time mode now.

Trying to create a stacked barchart from dataframe data

I am trying to create a stacked bar-chart showing total marriages by months for each year between 2008 and 2015.
import pandas as pd
import numpy as np
import io
import requests
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
url = "https://data.code4sa.org/api/views/r4bb-fvka/rows.csv"
file=requests.get(url).content
c=pd.read_csv(io.StringIO(file.decode('utf-8')))
Here I am adding the total number of marriages for each year then grouping by both Marriage Year and month to have the total number of marriages for each month
c['Total'] = c['MarriageYear']
months = c.groupby(['MarriageYear','MarriageMonth'])['Total'].count()
I think the index should be both Marriage Year and Marriage Month since I want the total of marriages for each month in every year???
months.set_index(['MarriageYear','MarriageMonth'])\
.reindex(months.set_index('MarriageMonth').sum().sort_values().index, axis=1)\
.T.plot(kind='bar', stacked=True,
colormap=ListedColormap(sns.color_palette("GnBu", 10)),
figsize=(24,28))
If you do post any potential solutions or what I should look at again, please explain why/where I went wrong and how I should be approaching this
Try this:
c.groupby(['MarriageYear', 'MarriageMonth']).size() \
.unstack().plot.bar(stacked=True, colormap='GnBu', figsize=(12, 14))

Categories

Resources