I have the following code that it takes a list of files. Using their data it plots a pie chart and extracts the mean value and percentiles for each file specifically.
The file however, might contain recorded data from several days. (The file has on the left column the date and on the right the values recorded.) Now I have to do the same thing as before, but instead of plotting and getting the mean value from each whole file, I need to plot the pie chart and get the mean value for each date recorded in the file.
import dateutil.parser
import glob
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
selection = input('Press all ')
counter=0
files1 = glob.glob(r'C:\Users\And\Documents\testing *.csv')
d2={}
for sfile in files1:
if selection == 'all':
x=[]
y=[]
z=[]
xtime=0
ytime=0
ztime=0
data_file = np.genfromtxt(sfile, delimiter=',', usecols=range(2),unpack=True,skip_header=10,dtype='U')
tdelta=dateutil.parser.parse(data_file[0][1][11:])-dateutil.parser.parse(data_file[0][0][11:])
tseconds=tdelta.total_seconds()
for i in data_file[1]:
if i != 0:
if float(i) >= 55:
x.append(float(i))
xtime+=tseconds
elif float(i)>40 and float(i)<55:
y.append(float(i))
ytime+=tseconds
else:
z.append(float(i))
ztime+=tseconds
labels = ["upper", "middle", "lower"]
sizes=[xtime,ytime,ztime]
legends=[xtime,ytime,ztime]
colors=["blue","orange","yellow"]
plt.pie(sizes, explode=(0.1,0,0), labels=labels, colors=colors, autopct='%1.1f%%', shadow=False, startangle=140)
plt.legend(legends, loc='best')
plt.axis('equal')
plt.show()
plt.savefig("test{filename}.png".format(filename=counter))
plt.clf()
xarray=np.asarray(x)
yarray=np.asarray(y)
zarray=np.asarray(z)
totalarray=np.append(zarray,np.append(xarray,yarray))
counter+=1
EQ=np.mean(totalarray)
P15, P50, P85 = np.percentile(totalarray, 15), np.percentile(totalarray, 50), np.percentile(totalarray, 85)
d2[sfile[36:]]=[f'{P15:.2f}',f'{P50:.2f}',f'{P85:.2f}', f'{EQ:.2f}']
table1 = pd.DataFrame(d2,index=['P15', 'P50', 'P85', 'EQ'])
table= table1.T
The image shows a portion of the data in the csv file
I am having trouble writing a code that is able to create a list of pie charts according to the different dates that the files contain, and not plotting one pie chart for the whole file. At the end I would like to have a table with the mean value for each date. Any help how to modify the code to do this will be appreciated.
Related
My goal is to use the sorted result data to plot "Month vs Mean Temp" graph for each year on the same window.
I've sorted the first two columns that have the year and the month respectively and then saved the new sorted data into a file called NewFile, but I can't seem to get to a solution here, I used csv reader and now I'm using numpy,
Code:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
csv1 = open('Data_5.1.csv')
data = np.array(list(csv.reader(csv1,delimiter=',').astype("string")
year = data[:,0]
mounth = data[:,1]
temp= data[:,3]
fig, ax = plt.subplots(figsize=(10,10))
ax.plot(year, mounth, label='mounth/year')
ax.plot(year, temp, label='year/temp')
plt.legend()
But it just throws an error saying:
File "<ipython-input-282-282e91df631f>", line 9
year = data[:,0]
^
SyntaxError: invalid syntax
I will put two links to the files, the Data_5.1 and the NewFile respectively
Data_5.1
NewFile
1 - You didn't close brackets in line 6, hence you are getting the error in line 8.
2 - astype("string") is not needed in line 6.
I fixed your code, but you will have to complete the subplotting. Good luck!
import numpy as np
import matplotlib.pyplot as plt
import csv
plt.style.use('ggplot')
csv1 = open('Data_5.1.csv')
data = np.array(list(csv.reader(csv1,delimiter=',')))
year = data[:,0]
mounth = data[:,1]
temp= data[:,3]
fig, ax = plt.subplots(2,2) #This will create 4X4 subplots in one window
ax[0,0].plot(year, mounth, label='mounth/year') #This will plot in the 0,0 subplot
ax[0,1].plot(year, temp, label='year/temp') #This will plot in the 0,1 subplot
'''
For you to continue.
'''
plt.legend()
plt.show()
Your data is in a CSV file, and it's non-homogenous in type. Pandas is really the more appropriate tool for this.
I had to adapt your CSV slightly due to encoding errors, here is what it ended up looking like:
year,Month,Other Month,temperature_C
2003,Jan.,Some val,17.7
2004,Jan.,Some val,19.5
2005,Jan.,Some val,17.3
2006,Jan.,Some val,17.8
...
Here is a general sketch of what the code you shared could look like after the refactoring:
import matplotlib.pyplot as plt
import pandas as pd
plt.style.use('ggplot')
# csv1 = open('Data_5.1.csv')
# data = np.array(list(csv.reader(csv1,delimiter=',').astype("string")
df_1 = pd.read_csv('../resources/Data_5.1.csv', header=0, names=['year', 'month', 'some_col', 'temp'],
dtype={'some_col': str, 'temp': float, 'month': str, 'year': str})
year = df_1['year']
month = df_1['month']
temp = df_1['temp']
fig, ax = plt.subplots(figsize=(10, 10))
ax.plot(year, month, label='month/year')
ax.plot(year, temp, label='year/temp')
plt.show()
Let me know if you have any questions :)
I have been trying to plot date against frequency.
This is how my data set looks like:
2017-07-04,13
2018-04-11,13
2017-08-17,13
2017-08-30,13
2018-04-26,12
2018-01-03,12
2017-07-05,11
2017-06-21,11
This is the code I have tried:
with open('test.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(temp)
### Extract data from CSV ###
with open('test.csv', 'r') as n:
reader = csv.reader(n)
dates = []
freq = []
for row in reader:
dates.append(row[0])
freq.append(row[1])
fig = plt.figure()
graph = fig.add_subplot(111)
# Plot the data as a red line with round markers
graph.plot(dates, freq, 'r-o')
graph.set_xticks(dates)
graph.set_xticklabels(
[dates]
)
plt.show()
This is the result I got:
The xlabels are very cluttered. I want the dates in the labels to be displayed only when there is a change of value.
I don't know how to do that.
Help is appreciated.
Thanks!
Firstly, I would strongly encourage you to use the pandas library and its DataFrame object to handle your data. It has some very useful functions, such as read_csv, which will save you some work.
To have matplotlib space the xticks more sensibly, you'll want to convert your dates to datetime objects (instead of storing your dates as strings).
Here I'll read your data in with pandas, parse the dates and order by date:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Read data
df = pd.read_csv('/path/to/test.csv', names=['date', 'freq'], parse_dates=['date'])
# Sort by date
df.sort_values(by='date', inplace=True)
You can then go ahead and plot the data (you'll need the latest version of pandas to automatically handle the dates):
fig, ax = plt.subplots(1, 1)
# Plot date against frequency
ax.plot(df['date'], df['freq'], 'r-o')
# Rotate the tick labels
ax.tick_params(axis='x', rotation=45)
fig.tight_layout()
If you only wanted to display dates when the frequency changes, the following would work
ax.set_xticks(df.loc[np.diff(df['freq']) != 0, 'date'])
though I wouldn't recommend it (the unequal spacing looks messy)
I have a csv file with all two columns one that says 'Date' and the other that has the rainfall amount in inches called 'Rainfall'. I am not sure how to go about this, so far my approach has not been working. I also need to skip the first 5 lines before I enter into the 'Date' and 'Rainfall' column.
Here is the code I have so far:
import matplotlib.pyplot as plt
import csv
x = []
y = []
with open('1541553208_et.csv','r') as csvfile:
plots = csv.reader(csvfile, delimiter=',')
for row in plots:
for i in row:
x.append(row[0])
y.append(row[1])
plt.plot(x,y, label='Loaded from file!')
plt.xlabel('Dates')
plt.ylabel('Evaporation (inches)')
plt.title('Eden_7')
plt.legend()
plt.show()
When I run the code I get the following incorrect results:
I want to have it so that each months rainfall data is clustered into one
Here is an example of what I am going on:
I am trying to get the same effect as the top. How could this be done?
Thank you
You may have a simpler time using the pandas library instead of the csv library.
For instance, pandas allows you to store the csv file directly into a data structure called a dataframe. This will allow you to group on dates or rainfall and plot the data.
import pandas as pd
# rain will be an dataframe instance
rain = pd.read_csv(csvfile)
rain = rain.groupby(rain['rainfall'])
rain.plot(kind='bar')
plt.show()
Play around with it, pandas is very powerful.
You can find the pandas documentation here: https://pandas.pydata.org/pandas-docs/stable/
While this may not be an immediate solution, it may help in the long run.
Using pandas library will be easier as previously mentioned. Following your csv library can you try to run this,
import matplotlib.pyplot as plt
import csv
x = []
y = []
f = open('1541553208_et.csv')
csv_f = csv.reader(f,delimiter=',')
for row in csv_f:
x.append(row[0])
for row in csv_f:
y.append(row[1])
plt.plot(x,y, label='Loaded from file!')
plt.xlabel('Dates')
plt.ylabel('Evaporation (inches)')
plt.title('Eden_7')
plt.legend()
plt.show()
My current code takes a list from a csv file and lists the header for the user to pick from so it can plot.
import pandas as pd
df = pd.DataFrame.from_csv('log40a.csv',index_col=False)
from collections import OrderedDict
headings = OrderedDict(enumerate(df,1))
for num, heading in headings.items():
print("{}) {}".format(num, heading))
print ('Select X-Axis')
xaxis = int(input())
print ('Select Y-Axis')
yaxis = int(input())
df.plot(x= headings[xaxis], y= headings[yaxis])
My first question. How do I add a secondary Y axis. I know with matplotlib I first create a figure and then plot the first yaxis with the xaxis and then do the same thing to the 2nd yaxis. However, I am not sure how it is done in pandas. Is it similar?
I tried using matplotlib to do it but it gave me an error:
fig1 = plt.figure(figsize= (10,10))
ax = fig1.add_subplot(211)
ax.plot(headings[xaxis], headings[yaxis], label='Alt(m)', color = 'r')
ax.plot(headings[xaxis], headings[yaxis1], label='AS_Cmd', color = 'blue')
Error:
ValueError: Unrecognized character a in format string
You need to create an array with the column names that you want plotted on the y axis.
An example if you delimite the y columns with a ','
df.plot(x= headings[xaxis], y=headings[yaxis.split(",")], figsize=(15, 10))
To run it you will need to change your input method, so that it is an array rather then a string.
While plotting with matplotlib, each I have different number of columns and rows, I have to edit my script. Below I have posted the script which has 5 columns. But if I have file which has 7 columns and I want to plot 1st column against 7th column, then I have to edit my code again as in example: c0[7],c7=float(elements[7]), C7.append(c7),etc. Is there a way to automate it? so I won't have to keep changing my code each time I have different number of rows and cols. Thank you
As input parameters, you can have your data file and provide which columns you want to plot for example (1st col against 6th one). Script will take care of number of columns by itself.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
infile = open("ewcd.txt","r")
data = infile.readlines()
C0=[]
C1=[]
C2=[]
C3=[]
C4=[]
for line in data:
elements = line.split()
try:
c0=float(elements[0])
c1 = float(elements[1])
c2=float(elements[2])
c3=float(elements[3])
c4=float(elements[4])
C0.append(c0)
C1.append(c1)
C2.append(c2)
C3.append(c3)
C4.append(c4)
except IndexError:
pass
fig, ax = plt.subplots()
plt.yscale('log')
plt.tick_params(axis='both', which='major', labelsize=13)
plt.plot(C0,C1,'b-')
plt.plot(C0,C2,'g-')