Panda dataframe not updating all columns - python

I am running the following test code to map violations to nearby buildingIDs by "NearVicinity" and "MidVicinity". The results come out unexpected and I am not sure what I am missing in my code.
So the results I get seem to have correctly updated 'TicketIssuedDT' and 'NearVicinity', 'MidVicinity' colunns however the 'BuildingID' and 'X' columns only map correctly to result['X'][0] and result['BuildingID'][0]. All remaining 999 rows have 0 for 'BuildingID' and 'X'.
result = pd.DataFrame(np.zeros((1000, 5)),columns=['BuildingID', 'TicketIssuedDT', 'NearVicinity', 'MidVicinity','X'])
z = 0
for i in range(0,10):
#for i, j in dataframe2.iterrows():
dataframe2Lat = dataframe2['Latitude'][i]
dataframe2Long = dataframe2['Longitude'][i]
for x in range(0,11102):
#for x, y in dataframe1.iterrows():
dist = (math.fabs(dataframe2Long - dataframe1['Longitude'][x]) + math.fabs(dataframe2Lat - dataframe1['Latitude'][x]))
if dist < .02:
result['X'][z] = x
result['BuildingID'][z] = dataframe1['BuildingID'][x]
result['TicketIssuedDT'][z] = dataframe2['TicketIssuedDT'][i]
result['MidVicinity'][z] = 1
if dist < .007:
result['NearVicinity'][z] = 1
else:
result['NearVicinity'][z] = 0
z += 1
print(i)

Related

List is being sorted but then when asked to pass the items to another list it just randomly transfer the items

In my code when I transfer the items of the sorted_resistances list which contains several item transfered of a file named file.txt, to the blocks_A[x] list it doesnt transfer in order which should be in ascending order (knowing that file.txt only contains numbers). Pretty much the rest of the program should be quite easy to understand but if not the objective is to pass 12 elements in order of the sorted_resistances list to each of the 29 sublists of the blocks_A list.
blocks_B = []
y = 0
while y < 29:
y = y + 1
block_y = []
blocks_B.append(block_y)
blocks_A = []
y = 0
while y < 29:
y = y + 1
block_y = []
blocks_A.append(block_y)
with open("file.txt") as file_in:
list_of_resistances = []
for line in file_in:
list_of_resistances.append(int(line))
sorted_resistances = sorted(list_of_resistances)
x = 0
while len(sorted_resistances) > 0:
for y in sorted_resistances:
blocks_A[x].append(y)
blocks_A[x].sort()
sorted_resistances.remove(y)
if len(blocks_A[x]) == 12:
x = x + 1
print(blocks_A)
y = 0
z = -1
while y < len(list_of_resistances):
y = y + 1
z = z + 1
list_of_resistances[z] = y
print(blocks_B)

how do i loop over values within a for loop in python?

I have a df
1 1 2 2
2 2 1 1
I have written a function which:
takes the df in a for loop,
adds row(s) with a default value
replaces the values with another value in randomly selected cols
writes to csv
This is my code:
def add_x(df, max):
gt_w_x = df.copy()
counter = 0
for i in range(1, max):
if len(gt_w_x) != max:
counter+=1
# add new row with default value
gt_w_x.loc[-1,:] = 1
# reset index
gt_w_x = gt_w_x.reset_index(drop=True)
# how to loop over these values for x ??
x = 1
#x = 2
# assign value 'X' to x randomly selected cols on last row
gt_w_x.iloc[-1:, random.sample(list(range(gt_w_x.shape[1])), x)] = 'X'
x = str(x)
n = str(counter)
# write to file
df_path = 'test/' + x + '_' + n + '.csv'
gt_w_x.to_csv(df_path)
max = 4
add_x(df, max)
The output on my system is
test/1_1.csv
test/1_2.csv
cat test/1_1.csv
0,1.0,1.0,2.0,2.0
1,2.0,2.0,1.0,1.0
2,1.0,X,1.0,1.0
cat test/1_2.csv
0,1.0,1.0,2.0,2.0
1,2.0,2.0,1.0,1.0
2,1.0,X,1.0,1.0
3,1.0,X,1.0,1.0
How do I loop over values for x?
The desired output for x = 1 and x = 2 is
test/1_1.csv
test/1_2.csv
test/2_1.csv
test/2_2.csv
Currently, I run the function by commenting out different values for x which is suboptimal.
You can use a nested for loop. It works just like the one you have at the beginning of the function:
def add_x(df, max):
for x in range(1,3):
gt_w_x = df.copy()
counter = 0
for i in range(1, max):
if len(gt_w_x) != max:
counter+=1
# add new row with default value
gt_w_x.loc[-1,:] = 1
# reset index
gt_w_x = gt_w_x.reset_index(drop=True)
# assign value 'X' to x randomly selected cols on last row
gt_w_x.iloc[-1:, random.sample(list(range(gt_w_x.shape[1])), x)] = 'X'
n = str(counter)
# write to file
df_path = 'test/' + str(x) + '_' + n + '.csv'
gt_w_x.to_csv(df_path)
max = 4
add_x(df, max)

Making permanent change in a dataframe using python pandas

I would like to convert y dataframe from one format (X:XX:XX:XX) of values to another (X.X seconds)
Here is my dataframe looks like:
Start End
0 0:00:00:00
1 0:00:00:00 0:07:37:80
2 0:08:08:56 0:08:10:08
3 0:08:13:40
4 0:08:14:00 0:08:14:84
And I would like to transform it in seconds, something like that
Start End
0 0.0
1 0.0 457.80
2 488.56 490.80
3 493.40
4 494.0 494.84
To do that I did:
i = 0
j = 0
while j < 10:
while i < 10:
if data.iloc[i, j] != "":
Value = (int(data.iloc[i, j][0]) * 3600) + (int(data.iloc[i, j][2:4]) *60) + int(data.iloc[i, j][5:7]) + (int(data.iloc[i, j][8: 10])/100)
NewValue = data.iloc[:, j].replace([data.iloc[i, j]], Value)
i += 1
else:
NewValue = data.iloc[:, j].replace([data.iloc[i, j]], "")
i += 1
data.update(NewValue)
i = 0
j += 1
But I failed to replace the new values in my oldest dataframe in a permament way, when I do:
print(data)
I still get my old data frame in the wrong format.
Some one could hep me? I tried so hard!
Thank you so so much!
You are using pandas.DataFrame.update that requires a pandas dataframe as an argument. See the Example part of the update function documentation to really understand what update does https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html
If I may suggest a more idiomatic solution; you can directly map a function to all values of a pandas Series
def parse_timestring(s):
if s == "":
return s
else:
# weird to use centiseconds and not milliseconds
# l is a list with [hour, minute, second, cs]
l = [int(nbr) for nbr in s.split(":")]
return sum([a*b for a,b in zip(l, (3600, 60, 1, 0.01))])
df["Start"] = df["Start"].map(parse_timestring)
You can remove the if ... else ... from parse_timestring if you replace all empty string with nan values in your dataframe with df = df.replace("", numpy.nan) then use df["Start"] = df["Start"].map(parse_timestring, na_action='ignore')
see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html
The datetimelibrary is made to deal with such data. You should also use the apply function of pandas to avoid iterating on the dataframe like that.
You should proceed as follow :
from datetime import datetime, timedelta
def to_seconds(date):
comp = date.split(':')
delta = (datetime.strptime(':'.join(comp[1:]),"%H:%M:%S") - datetime(1900, 1, 1)) + timedelta(days=int(comp[0]))
return delta.total_seconds()
data['Start'] = data['Start'].apply(to_seconds)
data['End'] = data['End'].apply(to_seconds)
Thank you so much for your help.
Your method was working. I also found a method using loop:
To summarize, my general problem was that I had an ugly csv file that I wanted to transform is a csv usable for doing statistics, and to do that I wanted to use python.
my csv file was like:
MiceID = 1 Beginning End Type of behavior
0 0:00:00:00 Video start
1 0:00:01:36 grooming type 1
2 0:00:03:18 grooming type 2
3 0:00:06:73 0:00:08:16 grooming type 1
So in my ugly csv file I was writing only the moment of the begining of the behavior type without the end when the different types of behaviors directly followed each other, and I was writing the moment of the end of the behavior when the mice stopped to make any grooming, that allowed me to separate sequences of grooming. But this type of csv was not usable for easily making statistics.
So I wanted 1) transform all my value in seconds to have a correct format, 2) then I wanted to fill the gap in the end colonne (a gap has to be fill with the following begining value, as the end of a specific behavior in a sequence is the begining of the following), 3) then I wanted to create columns corresponding to the duration of each behavior, and finally 4) to fill this new column with the duration.
My questionning was about the first step, but I put here the code for each step separately:
step 1: transform the values in a good format
import pandas as pd
import numpy as np
data = pd.read_csv("D:/Python/TestPythonTraitementDonnéesExcel/RawDataBatch2et3.csv", engine = "python")
data.replace(np.nan, "", inplace = True)
i = 0
j = 0
while j < len(data.columns):
while i < len(data.index):
if (":" in data.iloc[i, j]) == True:
Value = str((int(data.iloc[i, j][0]) * 3600) + (int(data.iloc[i, j][2:4]) *60) + int(data.iloc[i, j][5:7]) + (int(data.iloc[i, j][8: 10])/100))
data = data.replace([data.iloc[i, j]], Value)
data.update(data)
i += 1
else:
i += 1
i = 0
j += 1
print(data)
step 2: fill the gaps
i = 0
j = 2
while j < len(data.columns):
while i < len(data.index) - 1:
if data.iloc[i, j] == "":
data.iloc[i, j] = data.iloc[i + 1, j - 1]
data.update(data)
i += 1
elif np.all(data.iloc[i:len(data.index), j] == ""):
break
else:
i += 1
i = 0
j += 4
print(data)
step 3: create a new colunm for each mice:
j = 1
k = 0
while k < len(data.columns) - 1:
k = (j * 4) + (j - 1)
data.insert(k, "Duree{}".format(k), "")
data.update(data)
j += 1
print(data)
step 3: fill the gaps
j = 4
i = 0
while j < len(data.columns):
while i < len(data.index):
if data.iloc[i, j - 2] != "":
data.iloc[i, j] = str(float(data.iloc[i, j - 2]) - float(data.iloc[i, j - 3]))
data.update(data)
i += 1
else:
break
i = 0
j += 5
print(data)
And of course, export my new usable dataframe
data.to_csv(r"D:/Python/TestPythonTraitementDonnéesExcel/FichierPropre.csv", index = False, header = True)
here are the transformations:
click on the links for the pictures
before step1
after step 1
after step 2
after step 3
after step 4

How could I measure the elapsed time that my algorithm takes throughout its running? [duplicate]

This question already has answers here:
How can I time a code segment for testing performance with Pythons timeit?
(9 answers)
Closed 3 years ago.
I'm working on my below algorithm (A star algorithm) and I want to calculate the whole time that it takes. I have read that I want to import this library for time import timeit, but I'm stuck with how could I apply it. Could I get a recommendation or fixing to my problem, please?.
This is below my code in Python:
import random
import math
grid = [[random.randint(0,1) for i in range(100)]for j in range(100)]
# clear starting and end point of potential obstacles
grid[2][2] = 0
grid[95][95] = 0
init = [5,5]
goal = [len(grid)-10,len(grid[0])-12]
heuristic = [[0 for row in range(len(grid[0]))] for col in range(len(grid))]
for i in range(len(grid)):
for j in range(len(grid[0])):
heuristic[i][j] = abs(i - goal[0]) + abs(j - goal[1])
delta = [[-1 , 0], #up
[ 0 ,-1], #left
[ 1 , 0], #down
[ 0 , 1]] #right
delta_name = ['^','<','V','>'] #The name of above actions
cost = 1 #Each step costs you one
drone_height = 60
def search():
#open list elements are of the type [g,x,y]
closed = [[0 for row in range(len(grid[0]))] for col in range(len(grid))]
action = [[-1 for row in range(len(grid[0]))] for col in range(len(grid))]
#We initialize the starting location as checked
closed[init[0]][init[1]] = 1
expand=[[-1 for row in range(len(grid[0]))] for col in range(len(grid))]
# we assigned the cordinates and g value
x = init[0]
y = init[1]
g = 0
h = heuristic[x][y]
f = g + h
#our open list will contain our initial value
open = [[f, g, h, x, y]]
found = False #flag that is set when search complete
resign = False #Flag set if we can't find expand
count = 0
#print('initial open list:')
#for i in range(len(open)):
#print(' ', open[i])
#print('----')
while found is False and resign is False:
#Check if we still have elements in the open list
if len(open) == 0: #If our open list is empty, there is nothing to expand.
resign = True
print('Fail')
print('############# Search terminated without success')
print()
else:
#if there is still elements on our list
#remove node from list
open.sort()
open.reverse() #reverse the list
next = open.pop()
#print('list item')
#print('next')
x = next[3]
y = next[4]
g = next[1]
expand[x][y] = count
count+=1
#Check if we are done
if x == goal[0] and y == goal[1]:
found = True
print(next) #The three elements above this "if".
print('############## Search is success')
print()
else:
#expand winning element and add to new open list
for i in range(len(delta)):
x2 = x + delta[i][0]
y2 = y + delta[i][1]
#if x2 and y2 falls into the grid
if x2 >= 0 and x2 < len(grid) and y2 >=0 and y2 <= len(grid[0])-1:
#if x2 and y2 not checked yet and there is not obstacles
if closed[x2][y2] == 0 and grid[x2][y2] == 0 :
g2 = g + cost #we increment the cose
h2 = heuristic[x2][y2]
f2 = g2 + h2
open.append([f2,g2,h2,x2,y2])
#print('append list item')
#print([g2,x2,y2])
#Then we check them to never expand again
closed[x2][y2] = 1
action[x2][y2] = i
for i in range(len(expand)):
print(expand[i])
print()
policy=[[' ' for row in range(len(grid[0]))] for col in range(len(grid))]
x=goal[0]
y=goal[1]
policy[x][y]='*'
while x !=init[0] or y !=init[1]:
x2=x-delta[action[x][y]][0]
y2=y-delta[action[x][y]][1]
policy[x2][y2]= delta_name[action[x][y]]
x=x2
y=y2
for i in range(len(policy)):
print(policy[i])
search()
Just import time and do a basic calculation like:
...
import time
t1 = time.time()
#your code
t2 = time.time()
runtime = t2-t1 #Here time is in second
The simplest way to use the timeit module is as follows:
def search(...):
. . .
timeit.Timer(function).timeit(number=NUMBER)
Alternatively, you could use the time terminal command.
This allows you to measure the execution time of any command you type.
You would simply do
time [your command here]
If your python file was called program.py, then you would just type
time python program.py

Order data depending on multiple range criteria

I am trying to order data using multiple ranges. Let's suppose I have some data in tt array:
n= 50
b = 20
r = 3
tt = np.array([[[3]*r]*b]*n)
and another values in list:
z = (np.arange(0,5,0.1)).tolist()
Now I need to sort data from tt depending on ranges from z, which should go between 0 and 1, next range is between 1 and 2, next one between 2 and 3 and so on.
My attempts by now were trying to create array of length of each of ranges, and use those length to cut data from tt. Looks something like this:
za = []
za2 = []
za3 = []
za4 = []
za5 = []
za6 = []
za7 = []
for y in range(50):
if 0 <= int(z[y]) < 1:
za.append(z[y])
zi = array([int(len(za))])
if 1 <= int(z[y]) < 2:
za2.append(z[y])
zi2 = array([int(len(za2))])
if 2 <= int(z[y]) < 3:
za3.append(z[y])
zi3 = array([int(len(za3))])
if 3 <= int(z[y]) < 4:
za4.append(z[y])
zi4 = array([int(len(za4))])
if 4 <= int(z[y]) < 5:
za5.append(z[y])
zi5 = array([int(len(za5))])
if 5 <= int(z[y]) < 6:
za6.append(z[y])
zi6 = array([int(len(za6))])
if 6 <= int(z[y]) < 7:
za7.append(z[y])
zi7 = array([int(len(za7))])
till = np.concatenate((np.array(zi), np.array(zi2), np.array(zi3), np.array(zi4), np.array(zi5), np.array(zi6), np.array(zi7))
ttn = []
for p in range(50):
#if hour_lenght[p] != []
tt_h = np.empty(shape(tt[0:till[p],:,:]))
tt_h[:] = np.nan
tt_h = tt_h[np.newaxis,:,:,:]
tt_h[np.newaxis,:,:,:] = tt[0:till[p],:,:]
ttn.append(tt_h)
As you can guess,I get an error "name 'zi6' is not defined" since there are no data in that range. But at least it does the job for the parts that do exist :D. However, if I include else statement after if and do something like:
for y in range(50):
if 0 <= int(z[y]) < 1:
za.append(z[y])
zi = np.array([int(len(za))])
else:
zi = np.array([np.nan])
My initial zi from 1st part gets overwritten with nan.
I should also point that the ultimate goal is to load multiple files that are having similar shape as tt (two last dimensions are always the same while the first one is changing, e.g. :
tt.shape
(50, 20, 3)
and some other tt2 is having shape:
tt2.shape
(55, 20, 3)
with z2 that is having values between 5 and 9.
z2 = (np.arange(5,9,0.1)).tolist()
So in the end I should end up with array ttn wheres
ttn[0] is filled with values from tt in range between 0 and 1,
ttn[1] should be filled with values from tt in between 1 and 2 and so on.
I very much appreciate suggestions and possible solutions on this issue.

Categories

Resources