Plot bar graph using multiple groupby count in panda - python

I am trying to plot bar graph using pandas. DataTime is index column which I get from timestamp. Here is table structure:
So far i have written this:
import sqlite3
from pylab import *
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
conn = sqlite3.connect('DEMO2.sqlite')
df = pd.read_sql("SELECT * FROM Data", conn)
df['DateTime'] = df['DATE'].apply(lambda x: dt.date.fromtimestamp(x))
df1 = df.set_index('DateTime', drop=False)
grouped= df1['DateTime'].groupby(lambda x: x.month)
#df1.groupby([df1.index.month, 'DateTime']).count()
grouped.count()
I want output like this:
June has total 4 entry and one entry starts with u. so X has 4 y has 1. Same for July.
Also i want to plot bar graph (X and Y entries) using output. I want MONTH vs Values bar graph

I would created the DataFrame with a dict:
result = pd.DataFrame({'X': g.count(),
'Y': g.apply(lambda x: x.str.startswith('u').sum())})
Now you can use the plot method to plot months vs values.
result.plot()
Note: you can create grouped more efficiently:
grouped = df1['DateTime'].groupby(df1['DateTime'].dt.to_period('M'))
grouped = df1['DateTime'].groupby(df1['DateTime'].dt.month) # if you want Jan-2015 == Jan-2014

Related

Plotly graph : show number of occurrences in bar-chart

I try to plot a bar-chart from a givin dataframe.
x-axis = dates
y-axis = number of occurences for each month
The result should be a barchart. Each x is an occurrence.
x
xx
x
2020-1
2020-2
2020-3
2020-4
2020-5
I tried but don't get the desired result as above.
import datetime as dt
import pandas as pd
import numpy as np
import plotly.offline as pyo
import plotly.graph_objs as go
# initialize list of lists
data = [['a', '2022-01-05'], ['a', '2022-02-14'], ['a', '2022-02-15'],['a', '2022-05-14']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Date'])
# print dataframe.
df['Date']=pd.to_datetime(df['Date'])
# plot dataframe
trace1=go.Bar(
#
x = df.Date.dt.month,
y = df.Name.groupby(df.Date.dt.month).count()
)
data=[trace1]
fig=go.Figure(data=data)
pyo.plot(fig)
Remove the last line and write instead:
fig.show()
Edit:
It's unclear to me whether you have 1 dimensional or 2 dimensional data here. Supposing you have 1d data, this is, just a bunch of dates that you want to aggregate in a bar chart, simply do this:
# initialize list of lists
data = ['2022-01-05', '2022-02-14', '2022-02-15', '2022-05-14']
# Create the pandas DataFrame
df = pd.DataFrame(data)
# plot dataframe
fig = px.bar(df)
If, instead, you have 2d data then what you want is a scatter plot, not a bar chart.

how to change xy axis with matplot in python

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
corona_data = pd.read_csv("서울시 코로나19 확진자 현황 csv.csv", encoding="cp949")
confirmed_dates = corona_data["확진일"]
confirmed_date = [datetime.strptime(date, "%Y-%m-%d") for date in confirmed_dates]
corona_data["확진일"]= confirmed_date
plt.rc('font', family='Malgun Gothic')
corona_data["확진일"].plot(title="확진일 별 확진자 추이")
plt.show()
This plot show x-axis is just number and y-axis is date but I wanna change x-axis is date and y-axis is number how can I solve it?
If your data is in a dataframe, I recommend using Seaborn to visualize it. It has a great API that allows you to plot elements of your dataframe by referening column names. Here is a toy example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load data
df = pd.read_csv(...)
# Plot scatter plot
sns.scatter(x='col_1', y='col_2', data=df)
plt.show()
Check out the Seaborn documentation for more
The problem seems to be that your dataframe only contains one dataset which are the dated. You could add a column that contains the row numbers and then select what you want to have on x and y axis by passing the column name to the plot function:
import matplotlib.pyplot as plt
from datetime import datetime
corona_data = pd.read_csv("서울시 코로나19 확진자 현황 csv.csv", encoding="cp949")
confirmed_dates = corona_data["확진일"]
confirmed_date = [datetime.strptime(date, "%Y-%m-%d") for date in confirmed_dates]
corona_data["확진일"]= confirmed_date
# now add the numbers to the dataset
corona_data["numbers"]=[i for i in len(confirmed_dates)]
plt.rc('font', family='Malgun Gothic')
# and tell the plot function that you want "확진일" as x ans "numbers" as y axis
corona_data.plot("확진일","numbers",title="확진일 별 확진자 추이")
plt.show()```

Create a graph of a pivot_table in Python

I create a pivot table and I want create a bar graph. This is my pivot_table:
I don't know how to stract the values of the column 1970 and use this information to make a bar graph.
Thanks!!
Just convert dataframe column names to str then you can select the data of year 1970 with df['1970']. Then, you can use pandas built-in plot.bar method to make a bar plot. Try this:
import pandas as pd
import matplotlib.pyplot as plt
#converting column names to string
df.columns = df.columns.astype(str)
#plotting a bar plot
df['1970'].plot.bar()
plt.show()
Examples based on #AlanDyke DataFrame:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame([[1970,'a',1],
[1970,'b',2],
[1971,'a',2],
[1971,'b',3]],
columns=['year','location', 'value'])
df = pd.pivot_table(df, values='value', index='location', columns='year')
df.columns = df.columns.astype(str)
df['1970'].plot.bar()
plt.show()
you can use plt.bar and slice the dataframe:
df = pd.DataFrame([[1970,'a',1],
[1970,'b',2],
[1971,'a',2],
[1971,'b',3]],
columns=['year','location', 'value'])
df = pd.pivot_table(df, values='value', index='location', columns='year')
plt.bar(list(df.transpose().columns), height=df[1970])

how to convert string to float while plot the date with Python matplotlib

enter image description here
I want to draw the close price (y-axis) and date (x-axis) with python, but the error shows that I need to convert date from string to float.
Here is coding:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
import datetime
from pandas import DataFrame, Series
df = pd.read_csv('C:/Users/Vicky/Desktop/pythontest/T1706dailyrecord.csv')
df.columns = [1,2,3,4,5]
print(df)
plt.plot(df[1], df[3])
I think you need parameter parse_dates for convert column to datetime in read_csv:
df = pd.read_csv('C:/Users/Vicky/Desktop/pythontest/T1706dailyrecord.csv', parse_dates=[0])
Or:
df=pd.read_csv('C:/Users/Vicky/Desktop/pythontest/T1706dailyrecord.csv',parse_dates=['Date'])
Also df.columns = [1,2,3,4,5] is not necessary, for select use: df['Date'] and df['Close']:
plt.plot(df['Date'], df['Close'])
Also is possible use DataFrame.plot:
df.plot(x='Date', y='Close')

Sorting and conditional color formatting in matplotlib

to skip the context and get straight to the question, go down to "desired changes"
I wrote the helper function below to
Fetch data
Calculate the YTD return
Plot the results in a bar plot
Here is the function:
def ytd_perf(symb, col_names, source = 'yahoo'):
import datetime as datetime
from datetime import date
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import seaborn as sns
%pylab inline
#establish start and end dates
start = date(date.today().year, 1, 1)
end = datetime.date.today()
#fetch data
df = web.DataReader(symb, source, start = start, end = end)['Adj Close']
#make sure column orders don't change
df = df.reindex_axis(symb, 1)
#rename the columns
df.columns = col_names
#calc returns from the first element
df = (df / df.ix[0]) - 1
#Plot the most recent line of data -- this represents the YTD return
ax = df.ix[-1].plot(kind = 'bar', title = ('YTD Performance as of '+ str(end)),figsize=(12,9))
vals = ax.get_yticks()
ax.set_yticklabels(['{:3.1f}%'.format(x*100) for x in vals])
So, when I run:
tickers = ['SPY', 'TLT']
names = ['Stocks', 'Bonds']
ytd_perf(tickers, names)
I get the following output:
2 desired changes that I can't quite get to work:
I would like to change the color of the bar such that if the value < 0, it is red.
Sort the bars from highest to lowest (which is the case in this chart because there are only two series, but doesnt work with many series).

Categories

Resources