Calculating and plotting a grow rate in years from a dictionary - python

I am trying to plot a graph from a CSV file with the following Python code;
import csv
import matplotlib.pyplot as plt
def population_dict(filename):
"""
Reads the population from a CSV file, containing
years in column 2 and population / 1000 in column 3.
#param filename: the filename to read the data from
#return dictionary containing year -> population
"""
dictionary = {}
with open(filename, 'r') as f:
reader = csv.reader(f)
f.next()
for row in reader:
dictionary[row[2]] = row[3]
return dictionary
dict_for_plot = population_dict('population.csv')
def plot_dict(dict_for_plot):
x_list = []
y_list = []
for data in dict_for_plot:
x = data
y = dict_for_plot[data]
x_list.append(x)
y_list.append(y)
plt.plot(x_list, y_list, 'ro')
plt.ylabel('population')
plt.xlabel('year')
plt.show()
plot_dict(dict_for_plot)
def grow_rate(data_dict):
# fill lists
growth_rates = []
x_list = []
y_list = []
for data in data_dict:
x = data
y = data_dict[data]
x_list.append(x)
y_list.append(y)
# calc grow_rate
for i in range(0, len(y_list)-1):
var = float(y_list[i+1]) - float(y_list[i])
var = var/y_list[i]
print var
growth_rates.append(var)
# growth_rate_dict = dict(zip(years, growth_rates))
grow_rate(dict_for_plot)
However, I'm getting a rather weird error on executing this code
Traceback (most recent call last):
File "/home/jharvard/Desktop/pyplot.py", line 71, in <module>
grow_rate(dict_for_plot)
File "/home/jharvard/Desktop/pyplot.py", line 64, in grow_rate
var = var/y_list[i]
TypeError: unsupported operand type(s) for /: 'float' and 'str'
I've been trying different methods to cast the y_list variable. For example; casting an int.
How can I solve this problem so I can get the percentage of the grow rate through the years to plot this.

Since CSV files are text files, you will need to convert them into numbers. Its easy to correct for the syntax error. Just use
var/float(y_list[i])
Even though that gets rid of the syntax error, there is a minor bug which is a little more difficult to spot, which may result in incorrect results under some circumstances. The main reason being that dictionaries are not ordered. i.e. the x and y values are not ordered in any way. The indentation for your program appears to be a bit off on my computer, so am unable to follow it exactly. But the gist of it appears to be that you are obtaining values from a file (x, and y values) and then finding the sequence
var[i] = (y[i+1] - y[i]) / y[i]
Unfortunately, your y_list[i] may not be in the same sequence as in the CSV file because, it is being populated from a dictionary.
In the section where you did:
for row in reader:
dictionary[row[2]] = row[3]
it is just better to preserve the order by doing
x, y = zip(*[ ( float(row[2]), float(row[3]) ) for row in reader])
x, y = map(numpy.array, [x, y])
return x, y
or something like this ...
Then, Numpy arrays have methods for handling your problem much more efficiently. You can then simply do:
growth_rates = numpy.diff(y) / y[:-1]
Hope this helps. Let me know if you have any questions.
Finally, if you do go the Numpy route, I would highly recommend its own csv reader. Check it out here: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

Related

How to fix: "Int object not iterable" when assigning variables to two lists?

I tried making a question on this earlier and did a horrible job of explaining what I wanted. Hopefully the information I provide in this one is more helpful.
The program I am trying to make will take read input from a file in the form of the following: (there will be multiple varying test cases)
7 10
4 8
The program will assign a variable to the top-right integer (in this case, 10) and the bottom-left integer (4). The program will then compute the difference of the two variables. Here is the code I have so far -
with open('C:\\Users\\ayush\\Desktop\\USACO\\paint\\paint_test.in', 'r') as fn:
matrix = fn.readlines()
input_array = []
for line in matrix:
input_array.append(line.strip())
for p,q in enumerate(input_array):
for x,y in enumerate(p):
pass
for a,b in enumerate(q):
pass
print(y - a)
When I, however, run this code I get the following error:
Traceback (most recent call last):
File "C:\Users\ayush\Desktop\USACO\paint\paint.py", line 16, in <module>
for x,y in enumerate(p):
TypeError: 'int' object is not iterable
[Finished in 0.571s]
I'm not sure as to what the problem is, and why my lists cannot be iterated.
I hope I did a better job explaining my goal this time. Please let me know if there are any additional details I could try to provide. I would really appreciate some help - I've been stuck on this for the longest time.
Thanks!
Were you going for something along the lines of:
with open('C:\\Users\\ayush\\Desktop\\USACO\\paint\\paint_test.in', 'r') as fn:
matrix = fn.readlines()
input_array = []
for line in matrix:
input_array.append(line.strip())
top_line, bottom_line = input_array # previously p, q
top_right, top_left = top_line.split() # previously x, y
bottom_right, bottom_lefft = bottom_line.split() # previously a, b
print(int(top_left) - int(bottom_right)) # you would have run into issue subtracting strings without the int() calls
?
If so, that should work, but you can avoid all the unpacking if you just use [0] and [-1] indexes to get the first and last items (this has the advantage of working on a matrix of any size):
with open('C:\\Users\\ayush\\Desktop\\USACO\\paint\\paint_test.in', 'r') as fn:
lines = fn.read().splitlines()
matrix = [
[
int(item)
for item in line.split()
]
for line in lines
]
top_left = matrix[0][-1]
bottom_right = matrix[-1][0]
print(top_left - bottom_right)

Can a high CPU load (from other applications) affect python performance/accuracy?

I'm working on a code to read and display the results of a Finite Element Analysis (FEA) calculation. The results are stored in several (relatively big) text files that contain a list of nodes (ID number, location in space) and lists for the physical fields of relevance (ID of node, value of the field on that point).
However, I have noticed that when I'm running a FEA case in the background and I try to run my code at the same time it returns errors, not always the same one and not always at the same iteration, all seemly at random and without any modification to the code or to the input files whatsoever, just by hitting the RUN button seconds apart between runs.
Example of the errors that I'm getting are:
keys[key] = np.round(np.asarray(keys[key]),7)
TypeError: can't multiply sequence by non-int of type 'float'
#-------------------------------------------------------------------------
triang = tri.Triangulation(x, y)
ValueError: x and y arrays must have a length of at least 3
#-------------------------------------------------------------------------
line = [float(n) for n in line]
ValueError: could not convert string to float: '0.1225471E'
In case you are curious, this is my code (keep in mind that it is not finished yet and that I'm a mechanical engineer, not a programmer). Any feedback on how to make it better is also appreciated:
import matplotlib.pyplot as plt
import matplotlib.tri as tri
import numpy as np
import os
triangle_max_radius = 0.003
respath = 'C:/path'
fields = ['TEMPERATURE']
# Plot figure definition --------------------------------------------------------------------------------------
fig, ax1 = plt.subplots()
fig.subplots_adjust(left=0, right=1, bottom=0.04, top=0.99)
ax1.set_aspect('equal')
# -------------------------------------------------------------------------------------------------------------
# Read outputfiles --------------------------------------------------------------------------------------------
resfiles = [f for f in os.listdir(respath) if (os.path.isfile(os.path.join(respath,f)) and f[:3]=='csv')]
resfiles = [[f,int(f[4:])] for f in resfiles]
resfiles = sorted(resfiles,key=lambda x: (x[1]))
resfiles = [os.path.join(respath,f[:][0]).replace("\\","/") for f in resfiles]
# -------------------------------------------------------------------------------------------------------------
# Read data inside outputfile ---------------------------------------------------------------------------------
for result_file in resfiles:
keys = {}
keywords = []
with open(result_file, 'r') as res:
for line in res:
if line[0:2] == '##':
if len(line) >= 5:
line = line[:3] + line[7:]
line = line.replace(';',' ')
line = line.split()
if line:
if line[0] == '##':
if len(line) >= 3:
keywords.append(line[1])
keys[line[1]] = []
elif line[0] in keywords:
curr_key = line[0]
else:
line = [float(n) for n in line]
keys[curr_key].append(line)
for key in keys:
keys[key] = np.round(np.asarray(keys[key]),7)
for item in fields:
gob_temp = np.empty((0,4))
for node in keys[item]:
temp_coords, = np.where(node[0] == keys['COORDINATES'][:,0])
gob_temp_coords = [node[0], keys['COORDINATES'][temp_coords,1], keys['COORDINATES'][temp_coords,2], node[1]]
gob_temp = np.append(gob_temp,[gob_temp_coords],axis=0)
x = gob_temp[:,1]
y = gob_temp[:,2]
z = gob_temp[:,3]
triang = tri.Triangulation(x, y)
triangles = triang.triangles
xtri = x[triangles] - np.roll(x[triangles], 1, axis=1)
ytri = y[triangles] - np.roll(y[triangles], 1, axis=1)
maxi = np.max(np.sqrt(xtri**2 + ytri**2), axis=1)
triang.set_mask(maxi > triangle_max_radius)
ax1.tricontourf(triang,z,100,cmap='plasma')
ax1.triplot(triang,color="black",lw=0.2)
plt.show()
So back to the question, is it possible for the accuracy/performance of python to be affected by CPU load or any other 'external' factors? Or that's not an option and there's definitively something wrong with my code (which works well on other circumstances by the way)?
No, other processes only affect how often your process gets time slots to execute -- i.e., from a user's perspective, how quickly it completes its job.
If you're having errors under load, this means there are errors in your program's logic -- most probably, race conditions. They basically boil down to making assumptions about your environment that are no longer true when there's other activity in it. E.g.:
Your program is multithreaded, and the logic makes assumptions about which order threads are executed in. (This includes assumptions about how long some task would take to complete.)
Your program is using shared resources (files, streams etc) that other processes are also using at the same time. (E.g. some other program is in the process of (over)writing a file while you're trying to read it. Or, if you're reading from a stream, not all data are available yet.)

Using split in python on excel to get two parameter

I'm getting really confused with all the information on here using 'split' in python. Basically I want to write a code which opens a spreadsheet (with two columns in it) and the function I write will use the first column as x's and the second column as y's and then it will plot it in the x-y plane.
I thought I would use line.splitlines to cut each line in excel into (x,y) but I keep getting
'ValueError: need more than 1 value to unpack'
I don't know what this means?
Below is what I've written so far, (xdir is an initial condition for a different part of my question):
def plotMo(filename, xdir):
infile = open(filename)
data = []
for line in infile:
x,y = line.splitlines()
x = float(x)
y = float(y)
data.append([x,y])
infile.close()
return data
plt.plot(x,y)
For example with
0 0.049976
0.01 0.049902
0.02 0.04978
0.03 0.049609
0.04 0.04939
0.05 0.049123
0.06 0.048807
I would want to the first point in my plane to be (0, 0.049976) and the second plot to be (0.01, 0.049902).
x,y = line.splitlines() tries to split the current line into several lines.
Since splitlines returns only 1 element, there's an error because python cannot find a value to assign to y.
What you want is x,y = line.split() which will split the line according to 1 or more spaces (like awk would do) if no parameter is specified.
However it depends of the format: if there are blank lines you'll get the "unpack" problem at some point, so to be safe and skip malformed lines, write:
items = line.split()
if len(items)==2: x,y = items
To sum it up, a more pythonic, shorter & safer way of writing your routine would be:
def plotMo(filename):
with open(filename) as infile:
data = []
for line in infile:
items = line.split()
if len(items)==2:
data.append([float(e) for e in items])
return data
(maybe it could be condensed more, but that's good for starters)

Extract multiple arrays from .DAT file with undefined size

I have a device that stores three data sets in a .DAT file, they always have the same heading and number of columns, but the number of rows vary.
They are (n x 4), (m x 4), (L x 3).
I need to extract the three data sets into seperate arrays for plotting.
I have been trying to use numpy.genfromtxt and numpy.loadtxt, but the only way I can get them to work for this format is to manually define the row which each data set starts.
As I will regularly need to deal with this format I have been trying to automate it.
If someone could suggest a method which might work I would greatly appreciate it. I have attached an example file.
example file
Just a quck and dirty solution. At your file size, you might run into performance issues. If you know m, n and L, initialize the output vectors with the respective length.
here is the strategy: Load the whole File in a variable. Read the variable line by line. As soon as you discover a keyword, raise a flag that you are in the specific block. In the next line, read out the line to the correct variables.
isblock1 = isblock2 = isblock3 = False
fout = [] # construct also all the other variables that you want to collect.
with open(file, 'r') as file:
lines = file.readlines() #read all the lines
for line in lines:
if isblock1:
(f, psd, ipj, itj) = line.split()
fout.append(f) #do this also with the other variables
if isblock2:
(t1, p1, p2, p12) = line.split()
if isblock3:
(t2, v1, v2) = line.split()
if 'Frequency' is in line:
isblock1 = True
isblock2 = isblock3 = False
if 'Phasor' is in line:
isblock2 = True
isblock1 = isblock3 = False
if 'Voltage' is in line:
isblock3 = True
isblock1 = isblock2 = False
Hope that helps.

Read file elements into 3 different arrays

I have a file that is space delimited with values for x,y,x. I need to visualise the data so I guess I need so read the file into 3 separate arrays (X,Y,Z) and then plot them. How do I read the file into 3 seperate arrays I have this so far which removes the white space element at the end of every line.
def fread(f=None):
"""Reads in test and training CSVs."""
X = []
Y = []
Z = []
if (f==None):
print("No file given to read, exiting...")
sys.exit(1)
read = csv.reader(open(f,'r'),delimiter = ' ')
for line in read:
line = line[:-1]
I tried to add something like:
for x,y,z in line:
X.append(x)
Y.append(y)
Z.append(z)
But I get an error like "ValueError: too many values to unpack"
I have done lots of googling but nothing seems to address having to read in a file into a separate array every element.
I should add my data isn't sorted nicely into rows/columns it just looks like this
"107745590026 2 0.02934046648 0.01023879368 3.331810236 2 0.02727724425 0.07867902517 3.319272757 2 0.01784882881"......
Thanks!
EDIT: If your data isn't actually separated into 3-element lines (and is instead one long space-separated list of values), you could use python list slicing with stride to make this easier:
X = read[::3]
Y = read[1::3]
Z = read[2::3]
This error might be happening because some of the lines in read contain more than three space-separated values. It's unclear from your question exactly what you'd want to do in these cases. If you're using python 3, you could put the first element of a line into X, the second into Y, and all the rest of that line into Z with the following:
for x, y, *z in line:
X.append(x)
Y.append(y)
for elem in z:
Z.append(elem)
If you're not using python 3, you can perform the same basic logic in a slightly more verbose way:
for i, elem in line:
if i == 0:
X.append(elem)
elif i == 1:
Y.append(elem)
else:
Z.append(elem)

Categories

Resources