Pure Python function to calculate difference between elements

Pure Python function to calculate difference between elements - python

I had this pandas series s:
3/28/22 127943
3/29/22 127970
3/30/22 127997
3/31/22 128019
4/1/22 128052
4/2/22 128059
4/3/22 128065
4/4/22 128086
4/5/22 128106
4/6/22 128144
I tried to write a lambda function to calculate the difference between elements (instead of using pandas built-in function diff) but it resulted in an error
s.apply(lambda i, j: j - i for i, j in zip(s[:-1], s[1:])
File ~\Anaconda3\lib\site-packages\pandas\core\apply.py:412, in Apply.agg_list_like(self)
410 # if we are empty
411 if not len(results):
--> 412 raise ValueError("no results")
414 if len(failed_names) > 0:
415 warnings.warn(
416 depr_nuisance_columns_msg.format(failed_names),
417 FutureWarning,
418 stacklevel=find_stack_level(),
419 )
ValueError: no results
What did the error mean? Is there a way to fix it, or a better method to achieve a similar result of applying built-in function diff?
Can I extend the fixed/new function to dataframe instead of series?
Thanks

Related

0 is not in range in pandas

So I am trying to load my data from jupyter into mysql workbench using the one "multiple row" insert statement. I will achieve that with a for loop and i am receiving some error messages.
First, a little background:
So I had my csv file which contains data set for preprocessing and I split into 2 here:
Before_handwashing=copy_monthly_df.iloc[:76]
After_handwashig=copy_monthly_df.iloc[76:]
I have successfully structured and loaded the first data set Before_handwashing into mysql work bench using this for loop below.
for x in range(Before_handwashing.shape[0]):
insert_query+='('
for y in range(Before_handwashing.shape[1]):
insert_query+= str(Before_handwashing[Before_handwashing.columns.values[y]][x])+', '
insert_query=insert_query[:-2]+'), '
Now I want to structure and load my second part of the dataset which is After_handwashig into mysql workbench using a similar code structure here.
for x in range(After_handwashig.shape[0]):
insert_query+='('
for y in range(After_handwashig.shape[1]):
insert_query+=str(After_handwashig[After_handwashig.columns.values[y]][x])+', '
insert_query=insert_query[:-2]+'), '
And I am recieving the following error messages
error message: ValueError
Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\range.py in get_loc(self, key, method, tolerance)
384 try:
--> 385 return self._range.index(new_key)
386 except ValueError as err:
ValueError: 0 is not in range
The above exception was the direct cause of the following exception:
KeyError
Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_4316/2677185951.py in <module>
2 insert_query+='('
3 for y in range(After_handwashig.shape[1]):
----> 4 insert_query+=str(After_handwashig[After_handwashig.columns.values[y]][x])+', '
5 insert_query=insert_query[:-2]+'), '
~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
940
941 elif key_is_scalar:
--> 942 return self._get_value(key)
943
944 if is_hashable(key):
~\anaconda3\lib\site-packages\pandas\core\series.py in
_get_value(self, label, takeable)
1049
1050 # Similar to Index.get_value, but we do not fall back to positional
-> 1051 loc = self.index.get_loc(label)
1052 return self.index._get_values_for_loc(self, loc, label)
1053
~\anaconda3\lib\site-packages\pandas\core\indexes\range.py in
get_loc(self, key, method, tolerance)
385 return self._range.index(new_key)
386 except ValueError as err:
--> 387 raise KeyError(key) from err
388 raise KeyError(key)
389 return super().get_loc(key, method=method, tolerance=tolerance)
KeyError: 0
Can someone help me out in answering this problem?

OK, it took me a moment to find this. Consider these statements:
Before_handwashing=copy_monthly_df.iloc[:76]
After_handwashig=copy_monthly_df.iloc[76:]
When these are done, Before contains lines with indexes 0 to 75. After contains lines starting with index 76. There is no line with index 0, so your attempt to access it causes a key error.
There are two solutions. One is to use iloc to reference lines by ordinal instead of by index:
insert_query+=str(After_handwashig[After_handwashig.columns.values[y]]./iloc(x))+', '
The other is to reset the indexes to start from 0:
After_handwashing.reset_index(drop=True, inplace=True)
There's really no point in splitting the dataframe like that. Just have your first loop do range(76): and the second do range(76,copy_monthly_dy.shape[1]):.

Integer variable for range "not found on sheet"

I continue to have issues with my code (I am self-taught so I'm far from an expert in Python or XLwings). My method is as follows:
I copy a 4 column set of data (again of varying lengths) to a 2d array.
The data is then compared against a newly found set of a data and the matches are removed from the copied array.
The contents of the columns that were assigned to the original 2d array are then cleared of their contents.
The same array (with the matches deleted), is replaced at the same starting cell as the previous array.
In short, I copy the columns, remove the matches, clear the original cells, and then paste the new contents at the same starting cell as the original columns. (Since the length of the column is not fixed, I assign this cell row to a variable "sht2Row".)
However, about 10% of the time, I get the following error:
104 sht2.range((5,3),(sht2Row,10)).clear_contents()
105 time.sleep(0.5)
--> 106 sht2.range((5,12),(sht2Row,16)).clear_contents()
107 time.sleep(0.5)
108 sht2.range("C5").value=BoughtData
~\anaconda3\lib\site-packages\xlwings\main.py in range(self, cell1, cell2)
862 raise ValueError("Second range is not on this sheet")
863 cell2 = cell2.impl
--> 864 return Range(impl=self.impl.range(cell1, cell2))
865
866 #property
~\anaconda3\lib\site-packages\xlwings\_xlwindows.py in range(self, arg1, arg2)
635 xl2 = self.xl.Range(arg2)
636
--> 637 return Range(xl=self.xl.Range(xl1, xl2))
638
639 #property
~\anaconda3\lib\site-packages\xlwings\_xlwindows.py in __call__(self, *args, **kwargs)
64 while True:
65 try:
---> 66 v = self.__method(*args, **kwargs)
67 if isinstance(v, (CDispatch, CoClassBaseClass, DispatchBaseClass)):
68 return COMRetryObjectWrapper(v)
~\AppData\Local\Temp\gen_py\3.8\00020813-0000-0000-C000-000000000046x0x1x9.py in Range(self, Cell1, Cell2)
47370 # The method Range is actually a property, but must be used as a method to correctly pass the arguments
47371 def Range(self, Cell1=defaultNamedNotOptArg, Cell2=defaultNamedOptArg):
> 47372 ret = self._oleobj_.InvokeTypes(197, LCID, 2, (9, 0), ((12, 1), (12, 17)),Cell1 , Cell2)
47374 if ret is not None:
com_error: (-2147352567, 'Exception occurred.', (0, None, None, None,
0, -2146827284), None)
Does anyone know why this would occur? In the past, I have occasionally received this error because I was working on code that talks to these sells but most of the time lately, I won't even be at the computer but will still receive this error. I don't understand this. If it is some issue with sht2Row then why isn't it occurring the first time I use the variable? Regardless, how is it possible to get a "Second range is not on this sheet" error when sht2Row is just an integer? I am really at a loss as to why this would only occasionally occur and why it would say that the range is not on the sheet...
This is my first time using this site. I normally try to figure my errors out on my own but this one has me stumped...

TypingError: Failed in nopython mode pipeline (step: nopython frontend)

I am trying to write my first function using numba jit, I have a pandas dataframe that I need to iterate through and find the root mean square for each 350 points, since the for loop of python is quite slow I decided to try numba jit, the code is:
#jit(nopython=True)
def find_rms(data, length):
res = []
for i in range(length, len(data)):
interval = np.array(data[i-length:i])
interval =np.power(interval, 2)
sum = interval.sum()
resI = sum/length
resI = np.sqrt(res)
res.appennd(resI)
return res
mydf = np.array(df.iloc[:]['c0'], dtype=np.float64)
df.iloc[350:]['rms'] = find_rms(mydf, 350)
I read somewhere thad I need to specify datatypes, therefore I wrote "dtype = np.float64" but I still get the error as:
---------------------------------------------------------------------------
TypingError Traceback (most recent call last)
<ipython-input-39-4d388f72efdc> in <module>
----> 1 df.iloc[350:]['rms'] = find_rms(mydf, 350.0)
c:\users\1\appdata\local\programs\python\python35\lib\site-packages\numba\dispatcher.py in _compile_for_args(self, *args, **kws)
346 e.patch_message(msg)
347
--> 348 error_rewrite(e, 'typing')
349 except errors.UnsupportedError as e:
350 # Something unsupported is present in the user code, add help info
c:\users\1\appdata\local\programs\python\python35\lib\site-packages\numba\dispatcher.py in error_rewrite(e, issue_type)
313 raise e
314 else:
--> 315 reraise(type(e), e, None)
316
317 argtypes = []
c:\users\1\appdata\local\programs\python\python35\lib\site-packages\numba\six.py in reraise(tp, value, tb)
656 value = tp()
657 if value.__traceback__ is not tb:
--> 658 raise value.with_traceback(tb)
659 raise value
660
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<built-in function array>) with argument(s) of type(s): (array(float64, 1d, C))
* parameterized
In definition 0:
TypingError: array(float64, 1d, C) not allowed in a homogeneous sequence
raised from c:\users\1\appdata\local\programs\python\python35\lib\site-packages\numba\typing\npydecl.py:463
In definition 1:
TypingError: array(float64, 1d, C) not allowed in a homogeneous sequence
raised from c:\users\1\appdata\local\programs\python\python35\lib\site-packages\numba\typing\npydecl.py:463
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<built-in function array>)
[2] During: typing of call at <ipython-input-34-edd252715b2d> (5)
File "<ipython-input-34-edd252715b2d>", line 5:
def find_rms(data, length):
<source elided>
for i in range(length, len(data)):
interval = np.array(data[i-length:i])
^
This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.
To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/dev/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile
If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new
Does anybody know what the problem is?

You had a typo in append and I think you also made a mistake with what the square root is to be taken of (I believe resI not res).
Other than that, the only problem was the initialization of interval. Numba doesn't want you to pass a numpy array to a numpy array. It doesn't help with anything to wrap the np.array around the slice of the array, python simply doesn't care if you do that and treats the code like you didn't but Numba in nopython mode does care and throws an error. Leaving that part out solved the problem.
#jit(nopython=True)
def find_rms(data, length):
res = []
for i in range(length, len(data)):
interval = data[i-length:i]
interval = np.power(interval, 2)
sum = interval.sum()
resI = sum/length
resI = np.sqrt(resI)
res.append(resI)
return res
mydf = np.array(df.iloc[:]['c0'], dtype=np.float64)
target = find_rms(mydf, 350)

Trouble minimizing a value in python

I'm trying to minimize a value dependant of a function (and therefore optimize the arguments of the function) so the latter matches some experimental data.
Problem is that I don't actually know if I'm coding what I want correctly, or even if I'm using the correct function, because my program gives me an error.
import scipy.optimize as op
prac3 = pd.read_excel('Buena.xlsx', sheetname='nl1')
print(prac3.columns)
tmed = 176
te = np.array(prac3['tempo'])
t = te[0:249]
K = np.array(prac3['cond'])
Kexp = K[0:249]
Kinf = 47.8
K0 = 3.02
DK = Kinf - K0
def f(Kinf,DK,k,t):
return (Kinf-DK*np.exp(-k*t))
def err(Kexp,Kcal):
return ((Kcal-Kexp)**2)
Kcal = np.array(f(Kinf,DK,k,t))
print(Kcal)
dif = np.array(err(Kexp,Kcal))
sumd = sum(dif)
print(sumd)
op.minimize(f, (Kinf,DK,k,t))
The error the program gives me reads as it follows:
ValueError Traceback (most recent call last)
<ipython-input-91-fd51b4735eed> in <module>()
48 print(sumd)
49
---> 50 op.minimize(f, (Kinf,DK,k,t))
51
52
~/anaconda3_501/lib/python3.6/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
352
353 """
--> 354 x0 = np.asarray(x0)
355 if x0.dtype.kind in np.typecodes["AllInteger"]:
356 x0 = np.asarray(x0, dtype=float)
~/anaconda3_501/lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
529
530 """
--> 531 return array(a, dtype, copy=False, order=order)
532
533
ValueError: setting an array element with a sequence.

The exception says that you're passing an array to something that expects a callable. Without seeing your traceback or knowing more of what you're trying to do, I can only guess where this is happening, but my guess is here:
op.minimize(f(Kinf,DK,k,t),sumd)
From the docs, the first parameter is a callable (function). But you're passing whatever f(Kinf,DK,k,t) returns as the first argument. And, looking at your f function, it looks like it's returning an array, not a function.
My first guess is that you want to minimize f over the args (Kinf, DK, k, t)? if so, you pass f as the function, and the tuple (Kinf, DK, k, t) as the args, like this:
op.minimize(f, sumd, (Kinf,DK,k,t))

using pool.map to apply function to list of strings in parallel?

I have a large list of http user agent strings (taken from a pandas dataframe) that I am trying to parse using the python implementation of ua-parser. I can parse the list fine when only using a single thread, but based on some preliminary speed testing, it'd take me well over 10 hours to run the whole dataset.
I am trying to use pool.map() to decrease processing time but can't quite seem to figure out how to get it to work. I've read about a dozen 'tutorials' that I found online and have searched SO (likely a duplicate of some sort, as there are a lot of similar questions), but none of the dozens of attempts have worked for one reason or another. I'm assuming/hoping it's an easy fix.
Here is what I have so far:
from ua_parser import user_agent_parser
http_str = df['user_agents'].tolist()
def uaparse(http_str):
for i, item in enumerate(http_str):
return user_agent_parser.Parse(http_str[i])
pool = mp.Pool(processes=10)
parsed = pool.map(uaparse, range(0,len(http_str))
Right now I'm seeing the following error message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-25-701fbf58d263> in <module>()
7
8 pool = mp.Pool(processes=10)
----> 9 results = pool.map(uaparse, range(0,len(http_str)))
/home/ubuntu/anaconda/lib/python2.7/multiprocessing/pool.pyc in map(self, func, iterable, chunksize)
249 '''
250 assert self._state == RUN
--> 251 return self.map_async(func, iterable, chunksize).get()
252
253 def imap(self, func, iterable, chunksize=1):
/home/ubuntu/anaconda/lib/python2.7/multiprocessing/pool.pyc in get(self, timeout)
565 return self._value
566 else:
--> 567 raise self._value
568
569 def _set(self, i, obj):
TypeError: 'int' object is not iterable
Thanks in advance for any assistance/direction you can provide.

It seems like all you need is:
http_str = df['user_agents'].tolist()
pool = mp.Pool(processes=10)
parsed = pool.map(user_agent_parser.Parse, http_str)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pure Python function to calculate difference between elements - python

Related

0 is not in range in pandas

Integer variable for range "not found on sheet"

TypingError: Failed in nopython mode pipeline (step: nopython frontend)

Trouble minimizing a value in python

using pool.map to apply function to list of strings in parallel?

Categories

Resources