Mathematical equations with imported excel coordinates - python

I have like 10 or more x and y coordinates, and also an equation for them. And I can't figure out how make those calculations - especially a part with the equations. First of all, let's say there are 5 coordinates:
and I need to apply this equation:
The first one is the main, and those two others are for control check. How could I make it so it would read those coordinates and calculate according to equation? I tried:
book = openpyxl.load_workbook('coordinates.xlsx')
sheet = book.active
for row_i in range(1, sheet.max_row + 1):
x_value = sheet.cell(row=row_i, column=1).value
y_value = sheet.cell(row=row_i, column=2).value
print(x_value,y_value)
I am stuck at making calculations and managing the whole process after inputing values. Moreover, it needs to accept as many coordinates as there are and it counts the area plot.

The top equation seems to suggest that you would do:
two_p = 0
for row_i in range(1, sheet.max_row - 1):
x_i = float(sheet.cell(row=row_i, column=1).value)
y_plus1 = float(sheet.cell(row=row_i+1, column=2).value)
y_minus1 = float(sheet.cell(row=row_i-1, column=2).value)
two_p += x_i*(y_plus1 - y_minus1)
print(two_p)
Is this what you had in mind? You would do something similar for the control equations.
EDIT: Added float conversion in the formulas above in case the data coming from excel is wrongly formatted.

Related

Can I vectorize two for loops that look for the closest zip code based on lat/lon?

Question moved to CodeReview:https://codereview.stackexchange.com/questions/257465/can-i-optimize-two-for-loops-that-look-for-the-closest-zip-code-based-on-lat-lon
I am new to python and I had the task to find the US zip code based on latitude and longitude. After messing with arcgis I realized that this was giving me empty values for certain locations. I ended up coding something that accomplishes my task by taking a dataset containing all US codes and using Euclidean distance to determine the closest zip code based on their lat/lon. However, this takes approximately 1.3 seconds on average to compute which for my nearly million records will take a while as a need a zip code for each entry. I looked that vectorization is a thing on python to speed up tasks. But, I cannot find a way to apply it to my code. Here is my code and any feedback would be appreciated:
for j in range(len(myFile)):
p1=0
p1=0
point1 = np.array((myFile["Latitude"][j], myFile["Longitude"][j])) # This is the reference point
i = 0
resultZip = str(usZips["Zip"][0])
dist = np.linalg.norm(point1 - np.array((float(usZips["Latitude"][0]), float(usZips["Longitude"][0]))))
for i in range(0, len(usZips)):
lat = float(usZips["Latitude"][i])
lon = float(usZips["Longitude"][i])
point2 = np.array((lat, lon)) # This will serve as the comparison from the dataset
temp = np.linalg.norm(point1 - point2)
if (temp <= dist): # IF the temp euclidean distance is lower than the alread set it will:
dist = temp # set the new distance to temp and...
resultZip = str(usZips["Zip"][i]) # will save the zip that has the same index as the new temp
# p1=float(myFile["Latitude"][58435])
# p2=float(myFile["Longitude"][58435])
i += 1
I am aware Google also has a reverse geocoder API but it has a request limit per day.
The file called myFile is a csv file with the attributes userId, latitude, longitude, timestamp with about a million entries. The file usZips is public dataset with information about the city, lat, lon, zip and timezone with about 43k records of zips across the US.
I don't know what your myFile and usZips look like (I cannot verify the code). So, try something like this in the framework of vectorization:
your_needed_dist = 10 # for example
lat = float(usZips["Latitude"][0])
lon = float(usZips["Longitude"][0])
lat0 = np.array(myFile["Latitude"])
lon0 = np.array(myFile["Longitude"])
dist = np.sqrt((lat-lat0)**2 - (lon-lon0)**2)
condition = dist <= your_needed_dist
# get index (or indices) that satisfy dist <= your_needed_dist
np.argwhere(condition)
# or
resultsZip = str(usZips["Zip"][condition])
Also check the definition of distance in my code (this is what you need or not).

Python, how to perform an iterate formula?

I'm trying to export from excel a formula that (as usual) iterate two other value.
I'm trying to explain me better.
I have the following three variables:
iva_versamenti_totale,
utilizzato,
riporto
iva_versamenti_totalehave a len equal to 13 and it is given by the following formula
iva_versamenti_totale={'Saldo IVA': [sum(t) for t in zip(*iva_versamenti.values())],}
Utilizzato and riporto are simultaneously obtained in an iterate manner. I have tried to get the following code but does not work:
utilizzato=dict()
riporto=dict()
for index, xi in enumerate(iva_versamenti_totale['Saldo IVA']):
if xi > 0 :
riporto[index] = riporto[index] + xi
else:
riporto[index] = riporto[index-1]-utilizzato[index]
for index, xi in enumerate(iva_versamenti_totale['Saldo IVA']):
if xi > 0 :
utilizzato[index] == 0
elif riporto[index-1] >= xi:
utilizzato[index] = -xi
else:
utilizzato[index]=riporto[index-1]
Python give me KeyError:0.
EDIT
Here my excel file:
https://drive.google.com/open?id=1SRp9uscUgYsV88991yTnZ8X4c8NElhuD
In grey input and in yellow the variables

DataFrame how to perform calculations (please see the attached photo)

As you can see the calculations under column D follow a specific pattern
i.e. prior value * (1+the rate%/365)
so in cell D2 you have 100*(1+8%/365)
D3 will be 100.021918*(1+8.06%/365)
is there an easy way to do that in python as I don't want to use excel for that purpose....and I have daily data going back 30 years.
cell_d = [100]
rates = [0.08, 0.0806, 0.0812, 0.0813, 0.08]
for i, rate in enumerate(rates):
cell_d.append(cell_d[i] * (1 + rate/365))
pd.DataFrame({'rates': rates, 'cell_d': cell_d[1:]})
Probably should rename cell_d to something more meaningful.
I don't know of any "DataFrame friendly" way to do it, but you can simply iterate over the rows with an index using a for loop.
for i in range(1, num_rows):
df[i]["value"] = df[i-1]["value"] * (1 + df[i]["rate"]/365)

Combining multiple functions through loops and parameters

UPDATE My question has been fully answered, I have applied it to my program using jarmod's answer, and although the code looks neater, it has not effected the speed of (when my graph appears( i plot this data using matplotlib) I am a a little confused on why my program runs slowly and how I can increase the speed ( takes about 30 seconds and I know this portion of the code is slowing it down) I have shown my real code in the second block of code. Also, the speed is strongly determined by the Range I set, with a short range it is quiet fast
I have this sample code here that shows my calculation needed to conduct forecasting and extracting values. I use the for loops to run through a specific range of CSV files that I labeled 1-100. I return numbers for each month (1-12) to get the forecasting average for a forecast for a given amount of month.
My full code includes 12 functions for a full year forecast but I feel the code is inefficient because the functions are very similar except for one number and reading the csv file so many times slows the program.
Is there a way I can combine these functions and perhaps add another parameter to make it run so. The biggest concern I had was that it would be hard to return separate numbers and categorize them. In other words, I would like to ideally only have one function for all 12 month accuracy predictions and the way I can possibly see how to do that would to add another parameter and another loop series, but have no idea how to go about that or if it is possible. Essentially, I would like to store all the values of onemonthaccuracy ( which goes into the file before the current file and compares the predicted value for the date associated with the currentfile) and then store all the values of twomonthaccurary and so on... so I can later use these variables for graphing and other purposes
import csv
import pandas as pd
def onemonthaccuracy(basefilenumber):
basefileread = pd.read_csv(str(basefilenumber)+'.csv', encoding='latin-1')
basefilevalue = basefileread.loc[basefileread['Customer'].str.contains('Customer A', na=False), 'Jun-16\nQty']
onemonthread = pd.read_csv(str(basefilenumber-1)+'.csv', encoding='latin-1')
onemonthvalue = onemonthread.loc[onemonthread['Customer'].str.contains('Customer A', na=False),'Jun-16\nQty']
onetotal = int(onemonthvalue)/int(basefilevalue)
return onetotal
def twomonthaccuracy(basefilenumber):
basefileread = pd.read_csv(str(basefilenumber)+'.csv', encoding='Latin-1')
basefilevalue = basefileread.loc[basefileread['Customer'].str.contains('Customer A', na=False), 'Jun-16\nQty']
twomonthread = pd.read_csv(str(basefilenumber-2)+'.csv', encoding = 'Latin-1')
twomonthvalue = twomonthread.loc[twomonthread['Customer'].str.contains('Customer A', na=False), 'Jun-16\nQty']
twototal = int(twomonthvalue)/int(basefilevalue)
return twototal
onetotal = 0
twototal = 0
onetotallist = []
twototallist = []
for basefilenumber in range(24,36):
onetotal += onemonthaccuracy(basefilenumber)
twototal +=twomonthaccuracy(basefilenumber)
onetotallist.append(onemonthaccuracy(i))
twototallist.append(twomonthaccuracy(i))
onetotalpermonth = onetotal/12
twototalpermonth = twototal/12
x = [1,2]
y = [onetotalpermonth, twototalpermonth]
z = [1,2]
w = [(onetotallist),(twototallist)]
for ze, we in zip(z, w):
plt.scatter([ze] * len(we), we, marker='D', s=5)
plt.scatter(x,y)
plt.show()
This is the real block of code I am using in my program, perhaps something is slowing it down that I am unaware of?
#other parts of code
#StartRange = yearvalue+Value
#EndRange = endValue + endyearvalue
#Range = EndRange - StartRange
# Department
#more code....
def nmonthaccuracy(basefilenumber, n):
basefileread = pd.read_csv(str(basefilenumber)+'.csv', encoding='Latin-1')
baseheader = getfileheader(basefilenumber)
basefilevalue = basefileread.loc[basefileread['Customer'].str.contains(Department, na=False), baseheader]
nmonthread = pd.read_csv(str(basefilenumber-n)+'.csv', encoding = 'Latin-1')
nmonthvalue = nmonthread.loc[nmonthread['Customer'].str.contains(Department, na=False), baseheader]
return (1-(int(basefilevalue)/int(nmonthvalue))+1) if int(nmonthvalue) > int(basefilevalue) else int(nmonthvalue)/int(basefilevalue)
N = 13
total = [0] * N
total_by_month_list = [[] for _ in range(N)]
for basefilenumber in range(int(StartRange),int(EndRange)):
for n in range(N):
total[n] += nmonthaccuracy(basefilenumber, n)
total_by_month_list[n].append(nmonthaccuracy(basefilenumber,n))
onetotal=total[1]/ Range
twototal=total[2]/ Range
threetotal=total[3]/ Range
fourtotal=total[4]/ Range
fivetotal=total[5]/ Range #... all the way to 12
onetotallist=total_by_month_list[1]
twototallist=total_by_month_list[2]
threetotallist=total_by_month_list[3]
fourtotallist=total_by_month_list[4]
fivetotallist=total_by_month_list[5] #... all the way to 12
# alot more code after this
Something like this:
def nmonthaccuracy(basefilenumber, n):
basefileread = pd.read_csv(str(basefilenumber)+'.csv', encoding='Latin-1')
basefilevalue = basefileread.loc[basefileread['Customer'].str.contains('Lam DepT', na=False), 'Jun-16\nQty']
nmonthread = pd.read_csv(str(basefilenumber-n)+'.csv', encoding = 'Latin-1')
nmonthvalue = nmonthread.loc[nmonthread['Customer'].str.contains('Lam DepT', na=False), 'Jun-16\nQty']
return int(nmonthvalue)/int(basefilevalue)
N = 2
total_by_month = [0] * N
total_aggregate = 0
for basefilenumber in range(20,30):
for n in range(N):
a = nmonthaccuracy(basefilenumber, n)
total_by_month[n] += a
total_aggregate += a
In case you are wondering what the following code does:
N = 2
total_by_month = [0] * N
It sets N to the number of months desired (2, but you could make it 12 or another value) and it then creates a total_by_month array that can store N results, one per month. It then initializes total_by_month to all zeroes (N zeroes) so that each of the N monthly totals starts at zero.

Why is this code producing a different value in a VM?

Basically I have these tif images that I need to recurse through and read pixel data to determine if a pixel in the image is of melting ice or not. This is determined via the threshold value that's set in the script. This is configured to be able to display both the years total melt value and also each month. It works fine on my own machine, but I need to run this remotely on a Linux VM. It works, but it produces a total number that is exactly 71146 greater than what it should be and what it had bee producing.
This is the snippet that does most of the processing and is ultimately causing my problems I believe.
for file in os.listdir(current):
if os.path.exists(file):
if file.endswith(".tif"):
fname = os.path.splitext(file)[0]
day = fname[4:7]
im = Image.open(file)
for x in range(0,60):
for y in range(0,109):
p = round(im.getpixel((x,y)), 4)
if p >= threshold:
combined = str(x) + "-" + str(y)
if combined not in coords:
melt += 1
coords.append( combined )
totalmelt.append( melt )
And then totalmelt is summed to get the yearly value:
total = sum(totalmelt)
The threshold value has been set previously as follows:
threshold = float(-0.0158)
I feel like I'm missing something obvious. It's been a while since I played with Python...I'm coming over from C++ right now. Thanks for any solutions you might offer!
You need to reset melt to 0 before your inner loops:
melt = 0
for x in range(0,60):
for y in range(0,109):
...
melt += 1
totalmelt.append(melt)

Categories

Resources