Count iterations of pandas dataframe apply function - python

I'm looking for a way to count how many times my database function runs.
My code looks like this
df = pd.read_csv("nodos_results3.csv")
df['URL_dashboard'] = df.apply(create_url, axis = 1)
df.to_csv('nodos_results4.csv', index = False)
I want to count how many times the function "create_url" runs. If I was in C++, for example, I would simply have the function take in another input
create_url(database i_DB, int i_count)
{
//stuff I want done to database
i_count++;
}
but I'm not sure how to do something equivalent using pandas dataframe apply.
Update: For anyone who might find this while googling in the future - I personally didn't solve this issue. I was moved to another project and didn't continue working on this. Sorry I can't be more help.

apply executes the function exactly once for each row. So, the function is executed df.shape[0] times. Correction: As per #juanpa.arrivillaga, apply executes the function twice on the first row, the correct answer is df.shape[0]+1.
Alternatively, create a global variable (say, create_url_counter=0) and increment it in the function body:
def create_url(...):
global create_url_counter
...
create_url_counter += 1
Be aware, though, that having global variables is in general a bad idea.

Related

Running the same code but with two different datasets (inputs)

I have a code in JupyterLab that consists of several functions spread in several cells. The first function generates a dataset that is used in all other functions after it.
What I have to do is run the same code twice but with one of the functions modified. So it would look like this:
data_generating_function() # this function should only be ran once so it generates the same dataset for both trials
function_1() # this is the function that is to be modified once, so there are two version of this function
function_2() # this function and all functions below it stay the same but should be ran twice
function_3()
function_4()
function_5()
So I would run data_generating_function() once and generate the dataset. Then I would run one version of function1() and all the functions below it, then I would run another version of function1() and all the other functions below it.
What would be a good way to implement this? I could obviously duplicate the code and just change some function names, I could also put it all into a single cell and create a for loop. However is there a better way that would ideally preserve multiple cells too?
Thank you
Simply iterate over your two choices for the first function:
data_generating_function()
for func1 in (function1a, function1b):
func1()
function_2()
function_3()
function_4()
function_5()
Apologies if I misunderstood the question, but could you not do the following:
Cell 1:
# define all functions
Cell 2:
dataset = data_generating_function()
Cell 3:
# Run version 1 of function 1 on dataset
result_1_1 = function_1_v1(dataset)
result_2_1 = function_2(result_1_1)
result_3_1 = function_3(result_2_1)
function_4(result_3_1)
Cell 4:
# Run version 2 of function 1 on dataset
result_1_2 = function_1_v2(dataset)
result_2_2 = function_2(result_1_2)
result_3_2 = function_3(result_2_2)
function_4(result_3_2)
This solution assumes that:
you define functions with return values
that passing around the results is not "expensive"
You can also persist the results in a file if the latter is not the case.
To reduce code duplication in function_1, you can add a parameter that switches between the two versions.
You should try to avoid modifying or directly iterating over functions whenever possible. The best thing to do in this case would be to add a boolean parameter to function1 specifying which version of the function you want to run. It would look something like this:
def function1(isFirstTime):
if isFirstTime:
# do stuff the first time
pass
else:
# do stuff the second time
pass
You could then iterate over the functions:
data_generating_function()
for b in (True, False):
function1(b)
function2()
function3()
# ...

Possible and/or wise to assign a method to a variable?

I'm repeatedly trying to get similar data from time series dataframes. For example, the monthly (annualized) standard deviation of the series:
any_df.resample('M').std().mean()*12**(1/2)
It would save typing and would probably limit errors if these methods could be assigned to a variable, so they can be re-used - I guess this would look something like
my_stdev = .resample('M').std().mean()*12**(1/2)
result = any_df.my_stdev()
Is this possible and if so is it sensible?
Thanks in advance!
Why not just make your own function?
def my_stdev(df):
return df.resample('M').std().mean()*12**(1/2)
result = my_stdev(any_df)

Unexpected Behavior with Pandas Datetime/ dt.date

I found a bug in my code that was due to the date column of my dataframe not including the hours and minutes and only including the date. I traced the cause of the issue due to running these two functions consecutively vs. running them one by one. If I run the functions one by one, there is no problem. If I run them both, my results are unexpected.
I need to run these functions consecutively, but they are not dependent on one another. I'm new to Python, so I thought this might be due to the inputs being overwritten or something (not that that would have happened in Java, as far as I know). So, I changed the functions to be as follows:
def func1(dataset):
originalData = dataset
# only look at one day at a time- remove extra unnecessary info
originalData ['Date'] = pd.to_datetime(originalData ['Date'])
print dataset, 'test1'
originalData ['Date'] = originalData ['Date'].dt.date
print dataset, 'test2'
# other stuff
def func2(dataset):
originalData2 = dataset
# look at entire datetime
originalData2['Date'] = pd.to_datetime(originalData2['Date'])
print originalData2
# other stuff
Run like this, I lose the time in the second function.
csv = pd.read_csv(csvFileName)
func1(csv)
func2(csv)
Run like this, func2 results in my desired output:
csv = pd.read_csv(csvFileName)
func2(csv)
The wierd thing is if run func1, test1 prints out the date with datetime, while test2 prints out only the date. The dataset is being changed even though the changes are applied to originalDataset. Am I misunderstanding something? Thanks in advance.
If you don't want to make changes to the underlying data I'd recommend setting your data inside the function like this: originalData = dataset.copy(). This method provides a deep copy, meaning, you'll only be editing the data within the function and not overriding the underlying object.
Odd behavior, yes.
You may also run into this when taking slices of dataframes and doing transformations on them.

How can I update different arguments to the same function?

I have a function R(t) defined as:
def R(t) :
if t > 0 :
return "R"
else :
return 0
Now, I want to have several instances of this function working at once. For example, whenever a given condition is met, I would like to generate an R(2) and then, for every iteration of a for loop, I would like to subtract 1 from the argument. My problem comes when the condition is met several times in distinct iterations of the loop, so while there might be an R(2) just appearing, there will be an R(1) turning into an R(0).
I am barely learning how to use python, but I don't think that the code enabling me from accomplishing what I want is very hard. I believe that maybe if I define R(t,j) and use jas an indexing parameter it might be easier to code.
A recursive function that does what you want is:
def R(t):
if t > 0:
print "R"
return R(t-1)
else:
print t

numpy.delete() requires me to return a new matrix (name) and that can't work with my code

I have a matrix which is built at the start of the class and then I go on and use within a couple of functions then at the last function I want to delete a row from the matrix without having to use a new variable name for it otherwise I would have to change every single function call (it loops until a winner is found). It sort of works like this
matrix defined
...a lot of functions that use it then winner is called
def winner():
if hasWon():
.....
else:
fun1()
def fun1():
.....
Iterativefun()
def Iterativefun():
.....
matrix = numpy.delete(matrix, obj,axis)
winner()
Is there a way to delete a row? I thought of changing every single number in that row from 1-10 to a 0 which I'm not using so it would be ignored. Any help will be appreciated
I'm not sure what version of Numpy you have (or exactly what you're trying to do for that matter), but if you have Numpy 1.7, numpy.squeeze looks like it might be what you're looking for... http://docs.scipy.org/doc/numpy/reference/generated/numpy.squeeze.html#numpy.squeeze
If changing a row to all zeros is acceptable, you can just use:
matrix[row] = 0

Categories

Resources