How do I create a Line graph with my Data? - python

I have a CSV file which contains two columns. First column contains a date in the format 01/01/1969 and second column has an average house price for that month. The data I have ranges from 01/04/1969 to the same date in 2019 for a total of 613 entries in the dataframe. I want to create a line graph which represents the average house price per year. So far I have this.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('ScottishAveragePrices.csv')
df.groupby(['Date']).mean().sort_values('AveragePrice')
The output is :
AveragePrice
Date
01/04/1968 2844.980688
01/05/1968 2844.980688
01/06/1968 2844.980688
01/10/1968 2921.049691
01/11/1968 2921.049691
...
01/04/2019 150825.247700
01/09/2018 151465.715100
01/10/2018 151499.207500
01/07/2018 151874.694900
01/08/2018 152279.438800
[613 rows x 1 columns]
Im just not sure how to tranfer this data into a line graph. Sorry if the formatting of this post is wrong I'm very new to the forum.
Thanks

Name the df and then plot it with matplotlib:
df_2 = df.groupby(['Date']).mean().sort_values('AveragePrice')
df_2.plot(y="AveragePrice")
Make sure you also have the matplotlib magic function in your code:
%matplotlib inline

import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('ScottishAveragePrices.csv')
df = df.groupby(['Date']).mean().sort_values('AveragePrice')
plt.plot(df['Date'], df['AveragePrice'])
plt.show()

Related

Plotting a CSV-file with time using matplotlib

I have currently started a project where I need to evaluate and plot data using python. The csv-file that I have to plot are structured like this:
date,ch1,ch2,ch3,date2
11:56:20.149766,0.909257531,0.909420371,1.140183687, 13:56:20.149980
11:56:20.154008,0.895447016,0.895601869,1.122751355, 13:56:20.154197
11:56:20.157245,0.881764293,0.881911397,1.105638862, 13:56:20.157404
11:56:20.160590,-0.009178977,-0.000108901,-1.486875653, 13:56:20.160750
11:56:20.190473,-1.473576546,-1.477073431,-1.846657276, 13:56:20.190605
11:56:20.193810,-1.460405469,-1.463766813,-1.8300246, 13:56:20.193933
11:56:20.197139,-1.447362065,-1.450844049,-1.813711882, 13:56:20.197262
11:56:20.200480,-1.434574604,-1.437921286,-1.797878742, 13:56:20.200604
11:56:20.203803,-1.422042727,-1.425382376,-1.782045603, 13:56:20.203926
11:56:20.207136,-1.40951097,-1.412971258,-1.7663728, 13:56:20.207258
11:56:20.210472,-0.436505407,-0.438260257,-0.54675138, 13:56:20.210595
11:56:20.213804,0.953246772,0.953690529,1.19551909, 13:56:20.213921
11:56:20.217136,0.93815738,0.938464701,1.176487565, 13:56:20.217252
11:56:20.220472,0.923707485,0.924006522,1.158255577, 13:56:20.220590
11:56:20.223807,0.909385324,0.909676254,1.140343547, 13:56:20.223922
11:56:20.227132,0.895447016,0.895729899,1.122911215, 13:56:20.227248
11:56:20.230466,0.881892085,0.882039428,1.105798721, 13:56:20.230582
I can already read the file and print it using pandas:
df = pd.read_csv (r'F:\Schule\HTL\Diplomarbeit\aw_python\datei_meas.csv')
print (df)
But now I want to plot the file using matplotlib. The first column date should be in the x axis and column 2,3 and 4 should be the y-values of different graphs.
I hope that anyone can help me with my problem.
Kind regards
Matthias
Edit:
This is what I have tried to convert the date-column into a readable file-format:
import matplotlib.pyplot as plt
import numpy as np
import mplcursors
import pandas as pd
import matplotlib.dates as mdates
df = pd.read_csv (r'F:\Schule\HTL\Diplomarbeit\aw_python\datei_meas.csv')
print (df)
x_list = df.date
y = df.ch1
x = mdates.date2num(x_list)
plt.scatter(x,y)
plt.show
And this is the occurring error message:
d = d.astype('datetime64[us]')
ValueError: Error parsing datetime string " 11:56:20.149766" at position 3

Parsing an csv file and plotting with Python

I'm new to Python development and I have to implement a project on data analysis. I have a data.txt file which has the following values:
ID,name,date,confirmedInfections
DE2,BAYERN,2020-02-24,19
.
.
DE2,BAYERN,2020-02-25,19
DE1,BADEN-WÃœRTTEMBERG,2020-02-24,1
.
.
DE1,BADEN-WÃœRTTEMBERG,2020-02-26,7
.
.(lot of other names and data)
What I'm trying to do?
As you can see in the file above each name represents a city with covid infections. For each city, I need to save a data frame for each city and plot a time series graph which uses the index of date on x-axis and confirmedInfections on y-axis. An example:
Because of the big data file I was given with four columns I think that I'm doing a mistake on parsing that file and selecting the correct values. Here is an example of my code:
# Getting the data fron Bayern city
data = pd.read_csv("data.txt", index_col="name")
first = data.loc["BAYERN"]
print(first)
# Plotting the timeseries
series = read_csv('data.txt' ,header=0, index_col=0, parse_dates=True, squeeze=True)
series.plot()
pyplot.show()
And here is a photo of the result:
As you can see on the x-axis I get all the different IDs that are included on data.txt. From that to exlude the ID and stats of each city.
Thanks for your time.
You need to parse date after reading from CSV
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
# You can limit the columns as below provided
headers = ['ID','name','date','confirmedInfections']
data = pd.read_csv('data.csv',names=headers)
data['Date'] = data['Date'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d'))
x = data['Date']
y = data['confirmedInfections']
# Plot using pyplotlib
plt.plot(x,y)
# display chart
plt.show()
I haven't tested this particular code.
I hope this will work for you

How can I visualise categorical feature vs date column

In my dataset I have a categorical column named 'Type'contain(eg.,INVOICE,IPC,IP) and 'Date' column contain dates(eg,2014-02-01).
how can I plot these two.
On x axis I want date
On y axis a line for (eg.INVOCE) showing its trend
enter image description here
Not very sure what you mean by plot and show trend, one ways is to count like #QuangHoang suggested, and plot with a heatmap, something like below. If it is something different, please expand on your question.
import pandas as pd
import numpy as np
import seaborn as sns
dates = pd.date_range(start='1/1/2018', periods=5, freq='3M')[np.random.randint(0,5,20)]
type = np.random.choice(['INVOICE','IPC','IP'],20)
df = pd.DataFrame({'dates':dates ,'type':type})
tab = pd.crosstab(df['type'],df['dates'].dt.strftime('%d-%m-%Y'))
n = np.unique(tab.values)
cmap = sns.color_palette("BuGn_r",len(n))
sns.heatmap(tab,cmap=cmap)

How to filter certain values when reading csv

So I have a text document with about 11 columns and I need to display specific columns (5 and 6) on a chart. I don't know how to only read those columns. Currently, every single column of data shows when I run the code. Here's what I have so far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv(r'C:\Users\jager\Desktop\dataf.txt',sep='\t', index_col=0)
df.plot()
plt.show()
If the columns have names you can do:
col5 = df['column name']
to get the information in the that column.
And since you're importing matplotlib I assume you want to plot using that all you need to do is:
plt.plot(df['column 5 name'], df['column 6 name'])
For a line plot

python plot values against date

I have a dataframe object:
import pandas as pd
import matplotlib.pyplot as plt
data=pd.DataFrame({'date':['2013-03-04','2013-03-05','2013-03-06','2013-03-07'],'value':[1,1.1,1.2,1.3]})
and I would like to plot value column against date column, I've tried:
plt.plot(pd.to_datetime(data['date']),data['value'])
The x axis is not the date label I've expected. Anyone could help? Thanks!
You can just plot it like that:
data.plot(x='date', y='value')

Categories

Resources