Linear Interpolation in Python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm pretty new to Python and I'm trying to write a program that will do a 4-point linear interpolation reading data from a .txt file and asking for user information.
The .txt file has temperature and pressures in a table with this format:
T P1 P2 P3 P4
80,100,150,200
75, 400,405,415,430
100, 450,456,467,483
150, 500,507,519,536
200, 550,558,571,589
And here's the code:
# User input
temp = input("Enter temperature value in degrees Celcius [Range: 75-200]:")
pressure = input("Enter pressure value in bars [Range: 80-589")
temp = float(temp)
pressure = float(pressure)
# Opens file and read data
filename = open('xxxxxxxxxxxxx.txt', 'r').readlines()
# Removes \n from each line
for i in list(range((len(filename)-1))):
filename[i] = filename[i][:-1]
# Splits string
for i in list(range(len(filename))):
filename[i] = filename[i].split(',')
# Converts string numbers into decimal numbers
for i in [2,3,4,5,6]:
filename[i][0] = float(filename[i][0])
filename[i][1] = float(filename[i][1])
I'm not sure where to go from here. If the user input was say, T=100 and P=200, how would I locate the data points from the file that are directly before and after those numbers?
Obviously, I don't know much about what I'm doing, but I would appreciate any help.
ETA: Actual table values.
Also, I was not clear on the actual problem statement. Given a temperature and pressure, the program should perform an linear interpolation to find U (internal energy). The T values are the first column, the P values the first row, and the rest are U values.

There are two separate questions here: how to read data into python / NumPy,
and how to do 2d interpolatation.
For reading data, I'd suggest
numpy loadtxt,
and for interpolation,
scipy BivariateSpline.
(They both have more options than you need.)
from __future__ import division
from cStringIO import StringIO
import numpy as np
from scipy.interpolate import RectBivariateSpline
np.set_printoptions( 1, threshold=100, edgeitems=10, suppress=True )
# a file inline, for testing --
myfile = StringIO( """
# T P1 P2 P3 P4
0, 80,100,150,200
75, 400,405,415,430
100, 450,456,467,483
150, 500,507,519,536
200, 550,558,571,589
""" )
# file -> numpy array --
# (all rows must have the same number of columns)
TPU = np.loadtxt( myfile, delimiter="," )
P = TPU[0,1:] # top row
T = TPU[ 1:,0] # left col
U = TPU[1:,1:] # 4 x 4, 400 .. 589
print "T:", T
print "P:", P
print "U:", U
interpolator = RectBivariateSpline( T, P, U, kx=1, ky=1 ) # 1 bilinear, 3 spline
# try some t, p --
for t, p in (
(75, 80),
(75, 200),
(87.5, 90),
(200, 80),
(200, 90),
):
u = interpolator( t, p )
print "t %5.1f p %5.1f -> u %5.1f" % (t, p, u)
By the way, for interactive python,
IPython
makes it easy to try single lines, look at variables ...

Assuming you have a sorted list of numbers, x1, x2, x3... xn, you could use the bisect module for fast location of the interval you want (O(log n)).
from bisect import bisect, bisect_right, bisect_left
# 0 1 2 3 4 5 6 7
x = [1, 2, 4, 8, 16, 100, 200, 300]
def find_interval(x,y):
# x must be a sorted list.
index = bisect_left(x,y)
# Larger than largest element in x
if index >= len(x):
l,r = -1, None
# Exactly equal to something in x
elif y == x[index]:
l,r = index, index
# Smaller than smallest element in x
elif index == 0:
l,r = None, 0
# Inbetween two elements in x
else:
l,r = index-1, index
print (x[l] if l != None else "To left of all elements")
print (x[r] if r != None else "To right of all elements")
return (l,r)
>>> x
[1, 2, 4, 8, 16, 100, 200, 300]
>>> find_interval(x,0)
To left of all elements
1
>>> find_interval(x,1000)
300
To right of all elements
>>> find_interval(x,100)
100
100
>>> find_interval(x,12)
8
16
>>>

Using .readlines() will shoot you in the foot as soon as the file gets big. Can you formulate what you need to do in terms of
for line in open(...):
# parse line
and parse the file just once without loading it fully into memory.
Much better still, would be to use the with idiom when working with files:
with open(...) as file:
for line in file:
# parse line
This saves you a bit of headaches when there is a problem while working with the file.
You don't need to strip newlines if you will end up using float() to make a float out of a string. float('1.2 \t\n') is perfectly valid code.
for i in list(range(len(filename))):
This is bad style. The Python idiom for iterating through a list is
for element in list:
If you need an index into the list, then you should use
for i, element in enumerate(list):
Your approach is sort-of "manual" and it works, but creating a list out of a list (coming from range(...) in python 2.x) is completely unnecessary. A better "manual" alternative to your code would be
for i in xrange(len(filename)):
but it is still much less readable than the idioms above.
Now that I'm done bashing on your code, the main question is: what [the hell] do you actually need done? Can you give us the exact, word-for-word, specification of the problem you are trying to solve?
Have you looked at http://en.wikipedia.org/wiki/Linear_interpolation?
What is the significance of the input data from the terminal in your case?
Why and what for do you need the data from the file that is just before and after the input data from the terminal?
Is the temperature/pressure data somehow sorted?
What do the lines in the file represent (e.g. are they time-based or location-based or something else)?
What do the four different pressures represent?

Related

How to append numpy x and y values into a text file

I have some code which creates an x variable (frames) and a y variable (pixel intensity) in an infinite loop until the program ends. I would like to append these values every loop into a txt.file so that I can later work with the data. The data comes out as numpy arrays.
Say for example after 5 loops(5 frames) I get these values
1 2 3 4 5 (x values)
0 0 8 0 0 (y values)
I would like it to append these into a file every loop so I get after closing the program this:
1, 0
2, 0
3, 8
4, 0
5, 0
What would be the fastest way to implement this?
So far I have tried np.savetxt('data.txt', x) but this only saves the last value in the loop and doesn't add the data each loop. Is there a way to change this function or another function I could use that adds the data into the txt document.
First I will zip the values into (x,y) coordinate form and put them into a list so it is easier to append them to a text file, in your program you won't need to do this since you will have generated the x and y already within the loop prior.
x = [1, 2, 3, 4 ,5] #(x values)
y = [0, 0, 8, 0, 0] #(y values)
coordinate = list(zip(x,y))
print(coordinate)
So I used the Zip function, to store the sample results as (x_n, y_n) to a list for later.
Here is what I am appending to the text file with the below for loop (in the terminal display)
With in the loop itself you can use:
for element in coordinate: #you wouldn't need to write this since you are already in a loop
file1 = open("file.txt","a")
file1.write(f"{element} \n")
file1.close()
Output:
You can do something like this -- it is not complete because it will just append to the old file. The other issue with this is that it will not actually write the file until you close it. If you really need the file saved each time in the loop, then another solution is required.
import numpy as np
variable_to_ID_file = 3.
file_name_str = 'Foo_Var{0:.0f}.txt'.format(variable_to_ID_file)
# Need code here to delete old file if it is there
# with proper error checking, or write a blank file, then open it in append mode.
f_id = open(file_name_str, 'a')
for ii in range(4):
# Pick the delimiter you desire. I prefer tab '/t'
np.savetxt(f_id, np.column_stack((ii, 3*ii/4)), delimiter=', ', newline ='\n')
f_id.close()
If you do not need to write the file for each step in the loop, I recommend this option. It requires the Numpy arrays to be the same size.
import numpy as np
array1 = np.arange(1,5,1)
array2 = np.zeros(array1.size)
variable_to_ID_file = 3.
file_name_str = 'Foo_Var{0:.0f}.txt'.format(variable_to_ID_file)
# Pick the delimiter you desire. I prefer tab '/t'
np.savetxt(file_name_str, np.column_stack((array1, array2)), delimiter=', ')

How to efficient find existent key-values of 2-dimensional dictionary in python which are between 4 values?

I have a little Problem in Python. I got a 2 dimensional dictionary. Lets call it dict[x,y] now. x and y are integers. I try to only select the key-pair-values, which match between 4 points. Function should look like this:
def search(topleft_x, topleft_y, bottomright_x, bottomright_y):
For example: search(20, 40, 200000000, 300000000)
Now are Dictionary-items should be returned that match to:
20 < x < 20000000000
AND 40 < y < 30000000000
Most of the key-pair-values in this huge matrix are not set (see picture - this is why i cant just iterate).
This function should return a shorted dictionary. In the example shown in the picture, it would be a new dictionary with the 3 green circled values. Is there any simple solution to realize this?
I recently used 2-for-loops. In this example they would look like this:
def search():
for x in range(20, 2000000000):
for y in range(40, 3000000000):
try:
#Do something
except:
#Well item just doesnt exist
Of course this is highly inefficient. So my question is: How to Boost up this simple thing in Python? In C# i used Linq for stuff like this... What to use in python?
Thanks for help!
Example Picture
You dont go over random number ranges and ask 4million times for forgiveness - you use 2 number range to specify your "filters" and go only over existing keys in the dictionary that fall into those ranges:
# get fancy on filtering if you like, I used explicit conditions and continues for clearity
def search(d:dict,r1:range, r2:range)->dict:
d2 = {}
for x in d: # only use existing keys in d - not 20k that might be in
if x not in r1: # skip it - not in range r1
continue
d2[x] = {}
for y in d[x]: # only use existing keys in d[x] - not 20k that might be in
if y not in r2: # skip it - not in range r2
continue
d2[x][y] = "found: " + d[x][y][:] # take it, its in both ranges
return d2
d = {}
d[20] = {99: "20",999: "200",9999: "2000",99999: "20000",}
d[9999] = { 70:"70",700:"700",7000:"7000",70000:"70000"}
print(search(d,range(10,30), range(40,9000)))
Output:
{20: {99: 'found: 20', 999: 'found: 200'}}
It might be useful to take a look at modules providing sparse matrices.

Python: slow for loop performance on reading, extracting and writing from a list of thousands of files

I am extracting 150 different cell values from 350,000 (20kb) ascii raster files. My current code is fine for processing the 150 cell values from 100's of the ascii files, however it is very slow when running on the full data set.
I am still learning python so are there any obvious inefficiencies? or suggestions to improve the below code.
I have tried closing the 'dat' file in the 2nd function; no improvement.
dat = None
First: I have a function which returns the row and column locations from a cartesian grid.
def world2Pixel(gt, x, y):
ulX = gt[0]
ulY = gt[3]
xDist = gt[1]
yDist = gt[5]
rtnX = gt[2]
rtnY = gt[4]
pixel = int((x - ulX) / xDist)
line = int((ulY - y) / xDist)
return (pixel, line)
Second: A function to which I pass lists of 150 'id','x' and 'y' values in a for loop. The first function is called within and used to extract the cell value which is appended to a new list. I also have a list of files 'asc_list' and corresponding times in 'date_list'. Please ignore count / enumerate as I use this later; unless it is impeding efficiency.
def asc2series(id, x, y):
#count = 1
ls_id = []
ls_p = []
ls_d = []
for n, (asc,date) in enumerate(zip(asc, date_list)):
dat = gdal.Open(asc_list)
gt = dat.GetGeoTransform()
pixel, line = world2Pixel(gt, east, nort)
band = dat.GetRasterBand(1)
#dat = None
value = band.ReadAsArray(pixel, line, 1, 1)[0, 0]
ls_id.append(id)
ls_p.append(value)
ls_d.append(date)
Many thanks
In world2pixel you are setting rtnX and rtnY which you don't use.
You probably meant gdal.Open(asc) -- not asc_list.
You could move gt = dat.GetGeoTransform() out of the loop. (Rereading made me realize you can't really.)
You could cache calls to world2Pixel.
You're opening dat file for each pixel -- you should probably turn the logic around to only open files once and lookup all the pixels mapped to this file.
Benchmark, check the links in this podcast to see how: http://talkpython.fm/episodes/show/28/making-python-fast-profiling-python-code

Free up numbers from string

I have a very annoying output format from a program for my x,y,r values, namely:
circle(201.5508,387.68505,2.298685) # text={1}
circle(226.21442,367.48613,1.457215) # text={2}
circle(269.8067,347.73605,1.303065) # text={3}
circle(343.29599,287.43024,6.5938) # text={4}
is there a way to get the 3 numbers out into an array without doing manual labor?
So I want the above input to become
201.5508,387.68505,2.298685
226.21442,367.48613,1.457215
269.8067,347.73605,1.303065
343.29599,287.43024,6.5938
if you mean that the circle(...) construct is the output you want to parse. Try something like this:
import re
a = """circle(201.5508,387.68505,2.298685) # text={1}
circle(226.21442,367.48613,1.457215) # text={2}
circle(269.8067,347.73605,1.303065) # text={3}
circle(343.29599,287.43024,6.5938) # text={4}"""
for line in a.split("\n"):
print [float(x) for x in re.findall(r"\d+(?:\.\d+)?", line)]
Otherwise, you might mean that you want to call circle with numbers taken from an array containing 3 numbers, which you can do as:
arr = [343.29599,287.43024,6.5938]
circle(*arr)
A bit unorthodox, but as the format of your file is valid Python code and there are probably no security risks regarding untrusted code, why not just simply define a circle function which puts all the circles into a list and execute the file like:
circles = []
def circle(x, y, r):
circles.append((x, y, r))
execfile('circles.txt')
circles is now list containing triplets of x, y and r:
[(201.5508, 387.68505, 2.298685),
(226.21442, 367.48613, 1.457215),
(269.8067, 347.73605, 1.303065),
(343.29599, 287.43024, 6.5938)]

Can vtk InsertValue take float arguments?

I have a question about InsertValue
If I understand it only takes integer arguements. I was wondering if there is a way to have it take float values? Or maybe some other function that does the job of InsertValue but takes float values? I know there is InsertNextValue, but I am not sure if it'll be efficient in my case since my array is a very big array (~ 100.000 by 120)
Below is my code and in my code I am making the entries of fl values integers to make it work for now but ideally it'll be great if I don't have to do that.
Thanks in advance :)
import vtk
import math
from vtk import vtkStructuredGrid, vtkPoints, vtkFloatArray, vtkXMLStructuredGridWriter
import scipy.io
import numpy
import os
#loading the matlab files
mats = scipy.io.loadmat('/home/lusine/data/3DDA/donut_for_vtk/20130228_050000_3D_E=1.mat')
#x,y,z coordinate, fl flux values
xx = mats['xvect']
yy = mats['yvect']
zz = mats['zvect']
fl = mats['fluxmesh3d'] #3d matrix
nx = xx.shape[1]
ny = yy.shape[1]
nz = zz.shape[1]
fl = numpy.nan_to_num(fl)
inx = numpy.nonzero(fl)
l = len(inx[1])
grid = vtk.vtkStructuredGrid()
grid.SetDimensions(nx,ny,nz) # sets the dimensions of the grid
pts = vtk.vtkPoints() # represents 3D points, The data model for vtkPoints is an array of vx-vy-vz triplets accessible by (point or cell) id.
pts.SetNumberOfPoints(nx*ny*nz) # Specify the number of points for this object to hold.
p=0
for i in range(l):
pts.InsertPoint(p, xx[0][inx[0][i]], yy[0][inx[1][i]], zz[0][inx[2][i]])
p = p + 1
SetPoint()
grid.SetPoints(pts)
cdata = vtk.vtkFloatArray()
cdata.SetNumberOfComponents(1)
cdata.SetNumberOfTuples((nx-1)*(ny-1)*(nz-1))
cdata.SetName('cellData')
p=0
for i in range(l-1):
cdata.InsertValue(p,inx[0][i]+inx[1][i]+inx[2][i])
p = p+1
grid.GetCellData().SetScalars(cdata)
pdata = vtk.vtkFloatArray()
pdata.SetNumberOfComponents(1)
#Get the number of tuples (a component group) in the array
pdata.SetNumberOfTuples(nx*ny*nz)
#Sets the array name
pdata.SetName('pointData')
for i in range(l):
pdata.InsertValue(int(fl[inx[0][i]][inx[1][i]][inx[2][i]]), inx[0][i]+inx[1][i]+inx[2][i])
grid.GetPointData().SetScalars(pdata)
writer = vtk.vtkXMLStructuredGridWriter()
writer.SetFileName('new_grid.vts')
#writer.SetInput(grid)
writer.SetInputData(grid)
writer.Update()
print 'end'
The first argument of InsertValue requires an integer because it's the index where the value is going to be inserted. If instead of a vtkFloatArray pdata you had a numpy array called p, this would be the equivalent of your instruction:
pdata.InsertValue(a,b) becomes p[a]=b
p[0.1] wouldn't make sense, it a must be an integer!
But I am a bit lost on the data. What do you mean that your array is (~ 100.000 by 120)..do you have 100.000 points, and each point has a vector of 120 components? In such a case, your pdata should have 120 components, and for each point point_index you call
pdata.SetTuple[point_index,[v0,v1...,v119]
or
pdata.SetComponent[point_index,0,v0]
...
pdata.SetComponent[point_index,119,v119]
If not, are you sure that you have to access pdata based on fl values (you have to be sure that fl is int, 0 <= fl < ntuples, and that you are not going to have holes). Check if you can do the same thing that you do for cdata (btw in your code p is always equal to i, you can just use i)
It's also possible to copy a numpy array directly to vtk , see http://vtk.1045678.n5.nabble.com/vtk-to-numpy-how-to-get-a-vtk-array-tp1244891p1244895.html , but you have to be very careful with the structure of your data

Categories

Resources