Plotting 3 diffrent coloums from a CSV file python - python

My goal is to use the sorted result data to plot "Month vs Mean Temp" graph for each year on the same window.
I've sorted the first two columns that have the year and the month respectively and then saved the new sorted data into a file called NewFile, but I can't seem to get to a solution here, I used csv reader and now I'm using numpy,
Code:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
csv1 = open('Data_5.1.csv')
data = np.array(list(csv.reader(csv1,delimiter=',').astype("string")
year = data[:,0]
mounth = data[:,1]
temp= data[:,3]
fig, ax = plt.subplots(figsize=(10,10))
ax.plot(year, mounth, label='mounth/year')
ax.plot(year, temp, label='year/temp')
plt.legend()
But it just throws an error saying:
File "<ipython-input-282-282e91df631f>", line 9
year = data[:,0]
^
SyntaxError: invalid syntax
I will put two links to the files, the Data_5.1 and the NewFile respectively
Data_5.1
NewFile

1 - You didn't close brackets in line 6, hence you are getting the error in line 8.
2 - astype("string") is not needed in line 6.
I fixed your code, but you will have to complete the subplotting. Good luck!
import numpy as np
import matplotlib.pyplot as plt
import csv
plt.style.use('ggplot')
csv1 = open('Data_5.1.csv')
data = np.array(list(csv.reader(csv1,delimiter=',')))
year = data[:,0]
mounth = data[:,1]
temp= data[:,3]
fig, ax = plt.subplots(2,2) #This will create 4X4 subplots in one window
ax[0,0].plot(year, mounth, label='mounth/year') #This will plot in the 0,0 subplot
ax[0,1].plot(year, temp, label='year/temp') #This will plot in the 0,1 subplot
'''
For you to continue.
'''
plt.legend()
plt.show()

Your data is in a CSV file, and it's non-homogenous in type. Pandas is really the more appropriate tool for this.
I had to adapt your CSV slightly due to encoding errors, here is what it ended up looking like:
year,Month,Other Month,temperature_C
2003,Jan.,Some val,17.7
2004,Jan.,Some val,19.5
2005,Jan.,Some val,17.3
2006,Jan.,Some val,17.8
...
Here is a general sketch of what the code you shared could look like after the refactoring:
import matplotlib.pyplot as plt
import pandas as pd
plt.style.use('ggplot')
# csv1 = open('Data_5.1.csv')
# data = np.array(list(csv.reader(csv1,delimiter=',').astype("string")
df_1 = pd.read_csv('../resources/Data_5.1.csv', header=0, names=['year', 'month', 'some_col', 'temp'],
dtype={'some_col': str, 'temp': float, 'month': str, 'year': str})
year = df_1['year']
month = df_1['month']
temp = df_1['temp']
fig, ax = plt.subplots(figsize=(10, 10))
ax.plot(year, month, label='month/year')
ax.plot(year, temp, label='year/temp')
plt.show()
Let me know if you have any questions :)

Related

Plotting dates only when frequency changes

I have been trying to plot date against frequency.
This is how my data set looks like:
2017-07-04,13
2018-04-11,13
2017-08-17,13
2017-08-30,13
2018-04-26,12
2018-01-03,12
2017-07-05,11
2017-06-21,11
This is the code I have tried:
with open('test.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(temp)
### Extract data from CSV ###
with open('test.csv', 'r') as n:
reader = csv.reader(n)
dates = []
freq = []
for row in reader:
dates.append(row[0])
freq.append(row[1])
fig = plt.figure()
graph = fig.add_subplot(111)
# Plot the data as a red line with round markers
graph.plot(dates, freq, 'r-o')
graph.set_xticks(dates)
graph.set_xticklabels(
[dates]
)
plt.show()
This is the result I got:
The xlabels are very cluttered. I want the dates in the labels to be displayed only when there is a change of value.
I don't know how to do that.
Help is appreciated.
Thanks!
Firstly, I would strongly encourage you to use the pandas library and its DataFrame object to handle your data. It has some very useful functions, such as read_csv, which will save you some work.
To have matplotlib space the xticks more sensibly, you'll want to convert your dates to datetime objects (instead of storing your dates as strings).
Here I'll read your data in with pandas, parse the dates and order by date:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Read data
df = pd.read_csv('/path/to/test.csv', names=['date', 'freq'], parse_dates=['date'])
# Sort by date
df.sort_values(by='date', inplace=True)
You can then go ahead and plot the data (you'll need the latest version of pandas to automatically handle the dates):
fig, ax = plt.subplots(1, 1)
# Plot date against frequency
ax.plot(df['date'], df['freq'], 'r-o')
# Rotate the tick labels
ax.tick_params(axis='x', rotation=45)
fig.tight_layout()
If you only wanted to display dates when the frequency changes, the following would work
ax.set_xticks(df.loc[np.diff(df['freq']) != 0, 'date'])
though I wouldn't recommend it (the unequal spacing looks messy)

Plotting Time and float value using python matplotlib from File

I am having a text file with time and a float value. I have heard that it is possible to plot these two columns using matplotlib. Searched similar threads but could not make it happening. My code and Data are-
import math
import datetime
import matplotlib
import matplotlib.pyplot as plt
import csv
with open('MaxMin.txt','r') as f_input:
csv_input = csv.reader(f_input, delimiter=' ', skipinitialspace=True)
x = []
y = []
for cols in csv_input:
x = matplotlib.dates.date2num(cols[0])
y = [float(cols[1])]
# naming the x axis
plt.xlabel('Real-Time')
# naming the y axis
plt.ylabel('Acceleration (m/s2)')
# giving a title to my graph
plt.title('Accelerometer reading graph!')
# plotting the points
plt.plot(x, y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
# function to show the plot
plt.show()
And part of the Data in MaxMin.txt
23:28:30.137 10.7695982757
23:28:30.161 10.4071263594
23:28:30.187 9.23969855461
23:28:30.212 9.21066485657
23:28:30.238 9.25117645762
23:28:30.262 9.59227680741
23:28:30.287 9.9773536301
23:28:30.312 10.0128275058
23:28:30.337 9.73353441664
23:28:30.361 9.75064993988
23:28:30.387 9.717339267
23:28:30.412 9.72736788911
23:28:30.440 9.62451269364
I am a beginner in Python and on python 2.7.15 in windows 10 pro(64 bit). I have installed numpy,scipy scikit-learn already. Please help.
Final Output Graph from complete Data Set. Thanks # ImportanceOfBeingErnest
You could use pandas to achieve this, first store your file in a .csv format:
import math
import datetime
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd #### import this library
df = pd.read_csv("path_to_file.csv", delimiter=' ', encoding='latin-1')
x = df.ix[:,0]
y = df.ix[:,1]
# naming the x axis
plt.xlabel('Real-Time')
# naming the y axis
plt.ylabel('Acceleration (m/s2)')
# giving a title to my graph
plt.title('Accelerometer reading graph!')
# plotting the points
plt.plot(x, y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
# function to show the plot
plt.show()
if the first colunm does not have a datatime format you may convert it to this format like df.ix[:,0] = pd.to_datetime(df.ix[:,0])
and you take the hour for example:
df.ix[:,0] = df.ix[:,0].map(lambda x: x.hour)
The output after running the code was like:
The error you made in the original attempt is actually pretty minor. Instead of appending the values from the loop you redefined them.
Also you would need to use datestr2num instead of date2num, because the string read in is not yet a date.
import matplotlib
import matplotlib.pyplot as plt
import csv
with open('MaxMin.txt','r') as f_input:
csv_input = csv.reader(f_input, delimiter=' ', skipinitialspace=True)
x = []
y = []
for cols in csv_input:
x.append(matplotlib.dates.datestr2num(cols[0]))
y.append(float(cols[1]))
# naming the x axis
plt.xlabel('Real-Time')
# naming the y axis
plt.ylabel('Acceleration (m/s2)')
# giving a title to my graph
plt.title('Accelerometer reading graph!')
# plotting the points
plt.plot_date(x, y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
# function to show the plot
plt.show()
My recommendation for how to make this easier would be, to use numpy and convert the input to datetime.
from datetime import datetime
import numpy as np
import matplotlib.pyplot as plt
x,y= np.loadtxt('MaxMin.txt', dtype=str, unpack=True)
x = np.array([datetime.strptime(i, "%H:%M:%S.%f") for i in x])
y = y.astype(float)
plt.plot(x,y)
plt.gcf().autofmt_xdate()
plt.show()
Concerning the ticking of the axes: In order to have ticks every half a second you can use a MicrosecondLocator with an interval of 500000.
import matplotlib.dates
# ...
loc = matplotlib.dates.MicrosecondLocator(500000)
plt.gca().xaxis.set_major_locator(loc)
plt.gca().xaxis.set_major_formatter(matplotlib.dates.AutoDateFormatter(loc))

Display csv with candlestick_ohlc

I try to do first steps with pandas.
After a few successful steps I stuck with the following task: display data with OHLC bars.
I downloaded data for Apple stock from Google Finance and stored it to *.csv file.
After a lot of search I wrote the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
from matplotlib.finance import candlestick_ohlc
#read stored data
#First two lines of csv:
#Date,Open,High,Low,Close
#2010-01-04,30.49,30.64,30.34,30.57
data = pd.read_csv("AAPL.csv")
#graph settings
fig, ax = plt.subplots()
ax.xaxis_date()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("AAPL")
#convert date to float format
data['Date2'] = data['Date'].map(lambda d: mdates.date2num(dt.datetime.strptime(d, "%Y-%m-%d")))
candlestick_ohlc(ax, (data['Date2'], data['Open'], data['High'], data['Low'], data['Close']))
plt.show()
But it displays empty graph.
What is wrong with this code?
Thanks.
You need to change the last line to combine tuples daily. The following code:
start = dt.datetime(2015, 7, 1)
data = pd.io.data.DataReader('AAPL', 'yahoo', start)
data = data.reset_index()
data['Date2'] = data['Date'].apply(lambda d: mdates.date2num(d.to_pydatetime()))
tuples = [tuple(x) for x in data[['Date2','Open','High','Low','Close']].values]
fig, ax = plt.subplots()
ax.xaxis_date()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
plt.xticks(rotation=45)
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("AAPL")
candlestick_ohlc(ax, tuples, width=.6, colorup='g', alpha =.4);
Produces the below plot:
which you can further tinker with.

Using pandas/matplotlib/python, I cannot visualize my csv file as clusters

My csv file is,
https://github.com/camenergydatalab/EnergyDataSimulationChallenge/blob/master/challenge2/data/total_watt.csv
I want to visualize this csv file as clusters.
My ideal result would be the following image.(Higher points (red zone) would be higher energy consumption and lower points (blue zone) would be lower energy consumption.)
I want to set x-axis as dates (e.g. 2011-04-18), y-axis as time (e.g. 13:22:00), and z-axis as energy consumption (e.g. 925.840613752523).
I successfully visualized the csv data file as values per 30mins with the following program.
from matplotlib import style
from matplotlib import pylab as plt
import numpy as np
style.use('ggplot')
filename='total_watt.csv'
date=[]
number=[]
import csv
with open(filename, 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in csvreader:
if len(row) ==2 :
date.append(row[0])
number.append(row[1])
number=np.array(number)
import datetime
for ii in range(len(date)):
date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')
plt.plot(date,number)
plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()
I also succeeded to visualize the csv data file as values per day with the following program.
from matplotlib import style
from matplotlib import pylab as plt
import numpy as np
import pandas as pd
style.use('ggplot')
filename='total_watt.csv'
date=[]
number=[]
import csv
with open(filename, 'rb') as csvfile:
df = pd.read_csv('total_watt.csv', parse_dates=[0], index_col=[0])
df = df.resample('1D', how='sum')
import datetime
for ii in range(len(date)):
date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')
plt.plot(date,number)
plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')
df.plot()
plt.show()
Although I could visualize the csv file as values per 30mins and per days, I do not have any idea to visualize the csv data as clusters in 3D..
How can I program it...?
Your main issue is probably just reshaping your data so that you have date along one dimension and time along the other. Once you do that you can use whatever plotting you like best (here I've used matplotlib's mplot3d, but it has some quirks).
What follows takes your data and reshapes it appropriately so you can then plot a surface that I believe is what your are looking for. The key is using the pivot method, which restructures your data by date and time.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
fname = 'total_watt.csv'
# Read in the data, but I skipped setting the index and made sure no data
# is lost to a nonexistent header
df = pd.read_csv(fname, parse_dates=[0], header=None, names=['datetime', 'watt'])
# We want to separate the date from the time, so create two new columns
df['date'] = [x.date() for x in df['datetime']]
df['time'] = [x.time() for x in df['datetime']]
# Now we want to reshape the data so we have dates and times making the result 2D
pv = df.pivot(index='time', columns='date', values='watt')
# Not every date has every time, so fill in the subsequent NaNs or there will be holes
# in the surface
pv = pv.fillna(0.0)
# Now, we need to construct some arrays that matplotlib will like for X and Y values
xx, yy = np.mgrid[0:len(pv),0:len(pv.columns)]
# We can now plot the values directly in matplotlib using mplot3d
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(xx, yy, pv.values, cmap='jet', rstride=1, cstride=1)
ax.grid(False)
# Now we have to adjust the ticks and ticklabels - so turn the values into strings
dates = [x.strftime('%Y-%m-%d') for x in pv.columns]
times = [str(x) for x in pv.index]
# Setting a tick every fifth element seemed about right
ax.set_xticks(xx[::5,0])
ax.set_xticklabels(times[::5])
ax.set_yticks(yy[0,::5])
ax.set_yticklabels(dates[::5])
plt.show()
This gives me (using your data) the following graph:
Note that I've assumed when plotting and making the ticks that your dates and times are linear (which they are in this case). If you have data with uneven samples, you'll have to do some interpolation before plotting.

Python Matplotlib Plotting CSV data, formatting date X label

My data looks as follows:
2012021305, 65217
2012021306, 82418
2012021307, 71316
2012021308, 66833
2012021309, 69406
2012021310, 76422
2012021311, 94188
2012021312, 111817
2012021313, 127002
2012021314, 141099
2012021315, 147830
2012021316, 136330
2012021317, 122252
2012021318, 118619
2012021319, 115763
2012021320, 121393
2012021321, 130022
2012021322, 137658
2012021323, 139363
Where the first column is the data YYYYMMDDHH . I'm trying to graph the data using the csv2rec module. I can get the data to graph but the x axis and labels are not showing up the way that I expect them to.
import matplotlib
matplotlib.use('Agg')
from matplotlib.mlab import csv2rec
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from pylab import *
output_image_name='plot1.png'
input_filename="data.log"
input = open(input_filename, 'r')
input.close()
data = csv2rec(input_filename, names=['time', 'count'])
rcParams['figure.figsize'] = 10, 5
rcParams['font.size'] = 8
fig = plt.figure()
plt.plot(data['time'], data['count'])
ax = fig.add_subplot(111)
ax.plot(data['time'], data['count'])
hours = mdates.HourLocator()
fmt = mdates.DateFormatter('%Y%M%D%H')
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(fmt)
ax.grid()
plt.ylabel("Count")
plt.title("Count Log Per Hour")
fig.autofmt_xdate(bottom=0.2, rotation=90, ha='left')
plt.savefig(output_image_name)
I assume this has something to do with the date format. Any suggestions?
You need to convert the x-values to datetime objects
Something like:
time_vec = [datetime.strp(str(x),'%Y%m%d%H') for x in data['time']]
plot(time_vec,data['count'])
Currently, you are telling python to format integers (2012021305) as a date, which it does not know how to do, so it returns and empty string (although, I suspect that you are getting errors raised someplace).
You should also check your format string mark up.

Categories

Resources