Wrong y axis value when two dataset are selected in Matplotlib

Wrong y axis value when two dataset are selected in Matplotlib - python

I am trying to plot two datasets (HIBOR and US_yield). I don't know why the values of HIBOR in the plot are wrong. However, the values are correct when I plot it alone.
import requests
import pandas as pd
import xmltodict
from datetime import datetime
import json
import matplotlib.pyplot as plt
year=2023
us_url=f'https://home.treasury.gov/resource-center/data-chart-center/interest-rates/pages/xml?data=daily_treasury_yield_curve&field_tdr_date_value={year}'
us_data=requests.get(us_url).content
hk_url=f'https://api.hkma.gov.hk/public/market-data-and-statistics/daily-monetary-statistics/daily-figures-interbank-liquidity'
hk_data=requests.get(hk_url).content
# Parase us data
dict_data_us=xmltodict.parse(us_data)
dict_us=dict()
for key in dict_data_us['feed']['entry'][0]['content']['m:properties'].keys():
dict_us[key.replace('d:','')]=[i['content']['m:properties'][key]['#text'] for i in dict_data_us['feed']['entry']]
df_us=pd.DataFrame(dict_us)
df_us['Date']=[datetime.strptime(i, '%Y-%m-%dT%H:%M:%S') for i in df_us['NEW_DATE']]
df_us.set_index('Date', inplace=True)
# Parase hk data hibor
dict_data_hk=json.loads(hk_data)
dict_hk=dict()
for key in dict_data_hk['result']['records'][0].keys():
dict_hk[key]=[i[key] for i in dict_data_hk['result']['records']]
df_hk=pd.DataFrame(dict_hk)
df_hk['Date']=[datetime.strptime(i, '%Y-%m-%d') for i in df_hk['end_of_date']]
df_hk.set_index('Date', inplace=True)
df_hk.sort_index(inplace=True)
plt.plot(df_hk['hibor_fixing_1m'][-40:], label='HIBOR_1M')
plt.plot(df_us['BC_1MONTH'][-40:], label='US_Yield_1M')
plt.legend()
plt.show()

The reason is that the y-axis fields are read as string in your plot. You need to add astype(float) to them as below...
plt.plot(df_us['BC_1MONTH'][-40:].astype(float), label='US_Yield_1M')
plt.plot(df_hk['hibor_fixing_1m'][-40:].astype(float), label='HIBOR_1M')
This will give you the below plot. Hope this is what you are looking for

Related

matplotlib how do I reduce the amount of space between bars in a stacked bar chart when x-axis are dates 1-week apart?

import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
x=pd.date_range(end=datetime.today(),periods=150,freq='W').to_pydatetime().tolist()
x_1 = np.random.rand(150)
x_2 = np.random.rand(150)/2
fig = plt.figure(figsize=(10,6),dpi=100)
ax=fig.add_subplot(111)
ax.bar(x,x_1,label='x_1')
ax.bar(x,x_2,label='x_2',bottom=x_1)
plt.legend()
plt.show()
The above code will provide this stacked bar chart.
stacked_chart1
Because the x-axis are specified as dates with 1 week apart, the distance between bars are very large.
I would like to change the chart so that the bars are next to each other with no space like the picture below.
x=np.arange(150)
x_1 = np.random.rand(150)
x_2 = np.random.rand(150)/2
fig = plt.figure(figsize=(10,6),dpi=100)
ax=fig.add_subplot(111)
ax.bar(x,x_1,label='x_1')
ax.bar(x,x_2,label='x_2',bottom=x_1)
plt.legend()
plt.show()
stacked_chart2
Except numbers as x-axis, I would still want to keep the dates in chart 1. I am wondering is there a way to do that? Thanks!!

The reason for the difference is that matplotlib will try to simplify the x-axis when you pass a datetime, because usually you cannot fit every date in the x-ticks. It doesn't try this for int or string types, which is why your second sample looks normal.
However I'm unable to figure out why in this particular example why the spacing is so odd. I looked at this post to no avail.
In any case, there are other plotting modules that tend to handle dates a little more elegantly.
import pandas as pd
from datetime import datetime
import plotly.express as px
import numpy as np
x=pd.date_range(end=datetime.today(),periods=150,freq='W').tolist()
x_1 = np.random.rand(150)
x_2 = np.random.rand(150)/2
df = pd.DataFrame({
'date':x,
'x_1':x_1,
'x_2':x_2}).melt(id_vars='date')
px.bar(df, x='date', y='value',color='variable')
Output

how to change xy axis with matplot in python

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
corona_data = pd.read_csv("서울시 코로나19 확진자 현황 csv.csv", encoding="cp949")
confirmed_dates = corona_data["확진일"]
confirmed_date = [datetime.strptime(date, "%Y-%m-%d") for date in confirmed_dates]
corona_data["확진일"]= confirmed_date
plt.rc('font', family='Malgun Gothic')
corona_data["확진일"].plot(title="확진일 별 확진자 추이")
plt.show()
This plot show x-axis is just number and y-axis is date but I wanna change x-axis is date and y-axis is number how can I solve it?

If your data is in a dataframe, I recommend using Seaborn to visualize it. It has a great API that allows you to plot elements of your dataframe by referening column names. Here is a toy example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load data
df = pd.read_csv(...)
# Plot scatter plot
sns.scatter(x='col_1', y='col_2', data=df)
plt.show()
Check out the Seaborn documentation for more

The problem seems to be that your dataframe only contains one dataset which are the dated. You could add a column that contains the row numbers and then select what you want to have on x and y axis by passing the column name to the plot function:
import matplotlib.pyplot as plt
from datetime import datetime
corona_data = pd.read_csv("서울시 코로나19 확진자 현황 csv.csv", encoding="cp949")
confirmed_dates = corona_data["확진일"]
confirmed_date = [datetime.strptime(date, "%Y-%m-%d") for date in confirmed_dates]
corona_data["확진일"]= confirmed_date
# now add the numbers to the dataset
corona_data["numbers"]=[i for i in len(confirmed_dates)]
plt.rc('font', family='Malgun Gothic')
# and tell the plot function that you want "확진일" as x ans "numbers" as y axis
corona_data.plot("확진일","numbers",title="확진일 별 확진자 추이")
plt.show()```

Plot Correlation Table imported from excel with Python

So I am trying to plot correlation Matrix (already calculated) in python. the table is like below:
And I would like it to look like this:
I am using the Following code in python:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
df = pd.DataFrame(data)
print (df)
corrMatrix = data.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
Note that, the matrix is ready and I don't want to calculate the correlation again! but I failed to do that. Any suggestions?

You are recalculating the correlation with the following line:
corrMatrix = data.corr()
You then go on to utilize this recalculated variable in the heatmap here:
sn.heatmap(corrMatrix, annot=True)
plt.show()
To resolve this, instead of passing in the corrMatrix value which is the recalculated value, pass the pure excel data data or df (as df is just a copy of data). Thus, all the code you should need is:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
sn.heatmap(data, annot=True)
plt.show()
Note that this assumes, however, that your data IS ready for the heatmap as you suggest. As we online do not have access to your data we cannot confirm that.

I have deleted to frist column (names) and add them later so the code is as below:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Users/yousefalbuhaisi/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
fig, ax = plt.subplots(dpi=150)
y_axis_labels = ['CLC','GIEMS','GLWD','LPX_BERN','LPJ_WSL','LPJ_WHyME','SDGVM','DLEM','ORCHIDEE','CLM4ME']
sn.heatmap(data,yticklabels=y_axis_labels, annot=True)
plt.show()
and the results are:

Reading data from csv and create a graph

I have a csv file with data in the following format -
Issue_Type DateTime
Issue1 03/07/2011 11:20:44
Issue2 01/05/2011 12:30:34
Issue3 01/01/2011 09:44:21
... ...
I'm able to read this csv file, but what I'm unable to achieve is to plot a graph or rather trend based on the data.
For instance - I'm trying to plot a graph with X-axis as Datetime(only Month) and Y-axis as #of Issues. So I would show the trend in line-graphy with 3 lines indicating the pattern of issue under each category for the month.
I really don't have a code for plotting the graph and hence can't share any, but so far I'm only reading the csv file. I'm not sure how to proceed further to plot a graph
PS: I'm not bent on using python - Since I've parsed csv using python earlier I though of using the language, but if there is an easier approach using some other language - I would be open explore that as well.

A way to do this is to use dataframes with pandas.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.read_csv("D:\Programmes Python\Data\Data_csv.txt",sep=";") #Reads the csv
df.index = pd.to_datetime(df["DateTime"]) #Set the index of the dataframe to the DateTime column
del df["DateTime"] #The DateTime column is now useless
fig, ax = plt.subplots()
ax.plot(df.index,df["Issue_Type"])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m')) #This will only show the month number on the graph
This assumes that Issue1/2/3 are integers, I assumed they were as I didn't really understand what they were supposed to be.
Edit: This should do the trick then, it's not pretty and can probably be optimised, but it works well:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.read_csv("D:\Programmes Python\Data\Data_csv.txt",sep=";")
df.index = pd.to_datetime(df["DateTime"])
del df["DateTime"]
list=[]
for Issue in df["Issue_Type"]:
list.append(int(Issue[5:]))
df["Issue_number"]=list
fig, ax = plt.subplots()
ax.plot(df.index,df["Issue_number"])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m'))
plt.show()

The first thing you need to do is to parse the datetime fields as dates/times. Try using dateutil.parser for that.
Next, you will need to count the number of issues of each type in each month. The naive way to do that would be to maintain lists of lists for each issue type, and just iterate through each column, see which month and which issue type it is, and then increment the appropriate counter.
When you have such a frequency count of issues, sorted by issue types, you can simply plot them against dates like this:
import matplotlib.pyplot as plt
import datetime as dt
dates = []
for year in range(starting_year, ending_year):
for month in range(1, 12):
dates.append(dt.datetime(year=year, month=month, day=1))
formatted_dates = dates.DateFormatter('%b') # Format dates to only show month names
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(issues[0], dates) # To plot just issues of type 1
ax.plot(issues[1], dates) # To plot just issues of type 2
ax.plot(issues[2], dates) # To plot just issues of type 3
ax.xaxis.set_major_formatter(formatted_dates) # Format X tick labels
plt.show()
plt.close()

honestly, I would just use R. check this link out on downloading / setting up R & RStudio.
data <- read.csv(file="c:/yourdatafile.csv", header=TRUE, sep=",")
attach(data)
data$Month <- format(as.Date(data$DateTime), "%m")
plot(DateTime, Issue_Type)

python plot values against date

I have a dataframe object:
import pandas as pd
import matplotlib.pyplot as plt
data=pd.DataFrame({'date':['2013-03-04','2013-03-05','2013-03-06','2013-03-07'],'value':[1,1.1,1.2,1.3]})
and I would like to plot value column against date column, I've tried:
plt.plot(pd.to_datetime(data['date']),data['value'])
The x axis is not the date label I've expected. Anyone could help? Thanks!

You can just plot it like that:
data.plot(x='date', y='value')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Wrong y axis value when two dataset are selected in Matplotlib - python

Related

matplotlib how do I reduce the amount of space between bars in a stacked bar chart when x-axis are dates 1-week apart?

how to change xy axis with matplot in python

Plot Correlation Table imported from excel with Python

Reading data from csv and create a graph

python plot values against date

Categories

Resources