Why is this code producing a different value in a VM? - python

Basically I have these tif images that I need to recurse through and read pixel data to determine if a pixel in the image is of melting ice or not. This is determined via the threshold value that's set in the script. This is configured to be able to display both the years total melt value and also each month. It works fine on my own machine, but I need to run this remotely on a Linux VM. It works, but it produces a total number that is exactly 71146 greater than what it should be and what it had bee producing.
This is the snippet that does most of the processing and is ultimately causing my problems I believe.
for file in os.listdir(current):
if os.path.exists(file):
if file.endswith(".tif"):
fname = os.path.splitext(file)[0]
day = fname[4:7]
im = Image.open(file)
for x in range(0,60):
for y in range(0,109):
p = round(im.getpixel((x,y)), 4)
if p >= threshold:
combined = str(x) + "-" + str(y)
if combined not in coords:
melt += 1
coords.append( combined )
totalmelt.append( melt )
And then totalmelt is summed to get the yearly value:
total = sum(totalmelt)
The threshold value has been set previously as follows:
threshold = float(-0.0158)
I feel like I'm missing something obvious. It's been a while since I played with Python...I'm coming over from C++ right now. Thanks for any solutions you might offer!

You need to reset melt to 0 before your inner loops:
melt = 0
for x in range(0,60):
for y in range(0,109):
...
melt += 1
totalmelt.append(melt)

Related

How do I get this barssince to work in python?

I'm trying to convert indicator Top Bottom by ceyhun from tradingview to python.
I am stuck converting the barssince lines. One of them is like this
per = input(14, title="Bottom Period")
loc = low < lowest(low[1], per) and low <= lowest(low[per], per)
bottom = barssince(loc)
So far I have this in python
bottomPeriod = 14
data['counter'] = data.index.where(data.loc[data.index[-1], "low"] < min(data["low"][(-bottomPeriod-1):-1]) and data.loc[data.index[-1], "low"] <= min(data["low"][(-bottomPeriod*2-1):(-bottomPeriod-1)]))
data['counter'].fillna(method="ffill",inplace=True)
data['Rows_since_condition'] = data.index-data['counter']
data.drop(['counter'], axis=1,inplace=True)
I can't get the where to work. I'm almost about to start iterating over the dataset but it is a big one and needs to run fast. Any help is much appreciated

CSV: How to find name of the value from the list

I have a code that reads CSV file which has 3 columns: Zone, Number, and ARPU and I try to write a recommendation system that finds the best match for each value of ARPU from the list provided in the code (creates column "Suggested Plan"). Also, it finds the next greater value (creates column "Potential updated plan") and next lower value("Potential downgrade plan"):
tp_usp15 = 1500
tp_usp23 = 2300
tp_usp27 = 2700
list_usp = [tp_usp15,tp_usp23, tp_usp27]
tp_bsnspls_s = 600
tp_bsnspls_steel = 1300
tp_bsnspls_chrome = 1800
list_bsnspls = [tp_bsnspls_s,tp_bsnspls_steel,tp_bsnspls_chrome]
tp_bsnsrshn10 = 1000
tp_bsnsrshn15 = 1500
tp_bsnsrshn20 = 2000
list_bsnsrshn = [tp_bsnsrshn10,tp_bsnsrshn15,tp_bsnsrshn20]
#Common list#
common_list = list_usp + list_bsnspls + list_bsnsrshn
import pandas as pd
def get_plans(p):
best = min(common_list, key=lambda x : abs(x - p['ARPU']))
best_index = common_list.index(best) # get location of best in common_list
if best_index < len(common_list) - 1:
next_greater = common_list[best_index + 1]
else:
next_greater = best # already highest
if best_index > 0:
next_lower = common_list[best_index - 1]
else:
next_lower = best # already lowest
return best, next_greater, next_lower
`common_list = list_usp + list_bsnspls + list_bsnsrshn
common_list = sorted(common_list) # ensure it is sorted
df = pd.read_csv('root/test.csv')
df[['Suggested plan', 'Potential updated plan', 'Potential downgraded plan']] = df.apply(get_plans, axis=1, result_type="expand")
df.to_csv('Recommendation System.csv') `
It creates 3 additional columns and does the corresponding task (best match or closes value, next greater value, and next smaller value).The code works perfectly but as you can see each numeric value has its name
How to change the code to create additional columns with name next to new columns with numeric values?
For example, right now code produces:
Zone, Number, ARPU, Suggested plan, Potential Updated Plan, and Potential downgrade plan
!BUT! I need to create:
Zone, Number, ARPU, Suggested plan (numeric), Suggested plan (name), Potential Updated Plan(numeric), Potential Updated Plan(name), Potential downgrade plan (numeric),Potential downgrade plan(name)
Where columns with (name) will show the corresponding name to the value used in (numeric) columns. Thanks in advance, guys!
Photo examples:
Here is the starting CSV file.
Then, after executing the code I have this:
And I want to create additional columns with corresponding names of valuables. Example columns in in yellow

Efficient way to loop through GroupBy DataFrame

Since my last post did lack in information:
example of my df (the important col):
deviceID: unique ID for the vehicle. Vehicles send data all Xminutes.
mileage: the distance moved since the last message (in km)
positon_timestamp_measure: unixTimestamp of the time the dataset was created.
deviceID mileage positon_timestamp_measure
54672 10 1600696079
43423 20 1600696079
42342 3 1600701501
54672 3 1600702102
43423 2 1600702701
My Goal is to validate the milage by comparing it to the max speed of the vehicle (which is 80km/h) by calculating the speed of the vehicle using the timestamp and the milage. The result should then be written in the orginal dataset.
What I've done so far is the following:
df_ori['dataIndex'] = df_ori.index
df = df_ori.groupby('device_id')
#create new col and set all values to false
df_ori['valid'] = 0
for group_name, group in df:
#sort group by time
group = group.sort_values(by='position_timestamp_measure')
group = group.reset_index()
#since I can't validate the first point in the group, I set it to valid
df_ori.loc[df_ori.index == group.dataIndex.values[0], 'validPosition'] = 1
#iterate through each data in the group
for i in range(1, len(group)):
timeGoneSec = abs(group.position_timestamp_measure.values[i]-group.position_timestamp_measure.values[i-1])
timeHours = (timeGoneSec/60)/60
#calculate speed
if((group.mileage.values[i]/timeHours)<maxSpeedKMH):
df_ori.loc[dataset.index == group.dataIndex.values[i], 'validPosition'] = 1
dataset.validPosition.value_counts()
It definitely works the way I want it to, however it lacks in performance a lot. The df contains nearly 700k in data (already cleaned). I am still a beginner and can't figure out a better solution. Would really appreciate any of your help.
If I got it right, no for-loops are needed here. Here is what I've transformed your code into:
df_ori['dataIndex'] = df_ori.index
df = df_ori.groupby('device_id')
#create new col and set all values to false
df_ori['valid'] = 0
df_ori = df_ori.sort_values(['position_timestamp_measure'])
# Subtract preceding values from currnet value
df_ori['timeGoneSec'] = \
df_ori.groupby('device_id')['position_timestamp_measure'].transform('diff')
# The operation above will produce NaN values for the first values in each group
# fill the 'valid' with 1 according the original code
df_ori[df_ori['timeGoneSec'].isna(), 'valid'] = 1
df_ori['timeHours'] = df_ori['timeGoneSec']/3600 # 60*60 = 3600
df_ori['flag'] = (df_ori['mileage'] / df_ori['timeHours']) <= maxSpeedKMH
df_ori.loc[df_ori['flag'], 'valid'] = 1
# Remove helper columns
df_ori = df.drop(columns=['flag', 'timeHours', 'timeGoneSec'])
The basic idea is try to use vectorized operation as much as possible and to avoid for loops, typically iteration row by row, which can be insanly slow.
Since I can't get the context of your code, please double check the logic and make sure it works as desired.

RuntimeWarning: invalid value encountered in double_scalars- Y[i]=sum(V[:,i])/float(a)

So my friend and I are working on a project (I am kind of a beginner).
I am working on IDLE on Mac and she is working on spyder windows 7.
For some reason, this code works on her computer but not on mine. I am not sure why I keep getting this same runtime warning.
Maybe the file is too large? not sure. I read other answers on here for this same problem but didn't find anything that worked from the suggestions.
I feel it might be something that is common sense but I am overlooking.
Thank you so much
dates = [line.split(",")[3] for line in lines[1:]] #first i extract the dates.
dates = [[int(i) for i in date.split("/")] for date in dates] #then i split the month,day and year.
X = np.arange(1,13,dtype="float") #the months from 1 to 12.
V = np.zeros(48).reshape(4,12)
for date in dates :
V[date[2]-2014,date[0]-1] += 1
Y = np.zeros(12) #the array holding the average of violations for every month.
a = 0
for i in range(12):
#a = 0 #i added this variable because some months data are missing so i only divied by the number of year where there is a record.
for j in range(4):
if V[j,i] != 0.0 :
a = a + 1
Y[i]=sum(V[:,i])/float(a)
def f(x): #this is he fitted function.
return np.polyfit(X,Y,1)[0]*x+np.polyfit(X,Y,1)[1]

Python image file manipulation

Python beginner here. I am trying to make us of some data stored in a dictionary.
I have some .npy files in a folder. It is my intention to build a dictionary that encapsulates the following: reading of the map, done with np.load, the year, month, and date of the current map (as integers), the fractional time in years (given that a month has 30 days - it does not affect my calculations afterwards), and the number of pixels, and number of pixels above a certain value. At the end I expect to get a dictionary like:
{'map0':'array(from np.load)', 'year', 'month', 'day', 'fractional_time', 'pixels'
'map1':'....}
What I managed until now is the following:
import glob
file_list = glob.glob('*.npy')
def only_numbers(seq): #for getting rid of any '.npy' or any other string
seq_type= type(seq)
return seq_type().join(filter(seq_type.isdigit, seq))
maps = {}
for i in range(0, len(file_list)-1):
maps[i] = np.load(file_list[i])
numbers[i]=list(only_numbers(file_list[i]))
I have no idea how to to get a dictionary to have more values that are under the for loop. I can only manage to generate a new dictionary, or a list (e.g. numbers) for every task. For the numbers dictionary, I have no idea how to manipulate the date in the format YYYYMMDD to get the integers I am looking for.
For the pixels, I managed to get it for a single map, using:
data = np.load('20100620.npy')
print('Total pixel count: ', data.size)
c = (data > 50).astype(int)
print('Pixel >50%: ',np.count_nonzero(c))
Any hints? Until now, image processing seems to be quite a challenge.
Edit: Managed to split the dates and make them integers using
date=list(only_numbers.values())
year=int(date[i][0:4])
month=int(date[i][4:6])
day=int(date[i][6:8])
print (year, month, day)
If anyone is interested, I managed to do something else. I dropped the idea of a dictionary containing everything, as I needed to manipulate further easier. I did the following:
file_list = glob.glob('data/...') # files named YYYYMMDD.npy
file_list.sort()
def only_numbers(seq): # i make sure that i remove all characters and symbols from the name of the file
seq_type = type(seq)
return seq_type().join(filter(seq_type.isdigit, seq))
numbers = {}
time = []
np_above_value = []
for i in range(0, len(file_list) - 1):
maps = np.load(file_list[i])
maps[np.isnan(maps)] = 0 # had some NANs and getting some errors
numbers[i] = only_numbers(file_list[i]) # getting a dictionary with the name of the files that contain only the dates - calling the function I defined earlier
date = list(numbers.values()) # registering the name of the files (only the numbers) as a list
year = int(date[i][0:4]) # selecting first 4 values (YYYY) and transform them as integers, as required
month = int(date[i][4:6]) # selecting next 2 values (MM)
day = int(date[i][6:8]) # selecting next 2 values (DD)
time.append(year + ((month - 1) * 30 + day) / 360) # fractional time
print('Total pixel count for map '+ str(i) +':', maps.size) # total number of pixels for the current map in iteration
c = (maps > value).astype(int)
np_above_value.append (np.count_nonzero(c)) # list of the pixels with a value bigger than value
print('Pixels with concentration >value% for map '+ str(i) +':', np.count_nonzero(c)) # total number of pixels with a value bigger than value for the current map in iteration
plt.plot(time, np_above_value) # pixels with concentration above value as a function of time
I know it might be very clumsy. Second week of python, so please overlook that. It does the trick :)

Categories

Resources