python How to make timeit recognize defined inputs - python

I have a df defined that I am successfully running operations on. I want to time the difference between iterative for loops and vectorized operations. I have read various examples of how to use timeit, but when I try them I am getting the errors below. What am I doing wrong?
Imports:
import h5py
import pandas as pd
import timeit
This loop works:
for u in df['owner'].unique():
print(u, ': ', len(df[(df['owner'] == u)]), sep = '')
But when I try to time it like so ...:
s = """\
for u in df['owner'].unique():
print(u, ': ', len(df[(df['owner'] == u)]), sep = '')"""
time_iter_1_1_1 = timeit.timeit(s)
... it produces this error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-34-7526e96d565c> in <module>()
3 # print(u, ': ', len(df[(df['owner'] == u)]), sep = '')""")
4
----> 5 time_iter_1_1_1 = timeit.timeit(s)
~\Anaconda2\envs\py36\lib\timeit.py in timeit(stmt, setup, timer, number, globals)
231 number=default_number, globals=None):
232 """Convenience function to create Timer object and call timeit method."""
--> 233 return Timer(stmt, setup, timer, globals).timeit(number)
234
235 def repeat(stmt="pass", setup="pass", timer=default_timer,
~\Anaconda2\envs\py36\lib\timeit.py in timeit(self, number)
176 gc.disable()
177 try:
--> 178 timing = self.inner(it, self.timer)
179 finally:
180 if gcold:
~\Anaconda2\envs\py36\lib\timeit.py in inner(_it, _timer)
NameError: name 'df' is not defined
And when I try this ...:
time_iter_1_1_1 = timeit.timeit(
"""for u in df['owner'].unique():
print(u, ': ', len(df[(df['owner'] == u)]), sep = '')""")
... I get this error:
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 57))
...
NameError: name 'df' is not defined
The df is defined and working. How can I fix this?

There are two options, either
Pass an argument globals that allows timeit to resolve the name,
df = pd.DataFrame(...)
timeit.timeit(statement, globals={'df': df}) # globals=globals()
...Or, pass a string argument setup that sets up df for you.
timeit.timeit(statement, setup='import pandas as pd; df = pd.DataFrame(...)')

Related

Pytest `pytest.raises(ValueError)` does not seem to detect a `ValueError`

EDIT. The issue was that everytime I would import the function, it would not changed with updates. For this I needed to do
import sys, importlib
importlib.reload(sys.modules['foo'])
from foo import bar
And it started working
I am trying to write a test using Pytest to detect a ValueError if a json file passed into a function is invalid. However, when I follow the example, the test doesn't detect that the ValueError was raised.
This is the function I want to test
import pytest
import json
def read_file(input_file):
try:
with open(input_file, "r", encoding='utf-8') as reader:
pre_input_data = json.load(reader)
except ValueError:
raise ValueError
And this is my test function
def test_read_file():
with pytest.raises(ValueError):
read_file("invalidJsonFile.json")
If I just run the original function, it raises the ValueError
read_file("invalidJsonFile.json")
Invalid json file: Expecting value: line 1 column 1 (char 0)
However, when I run the test, it says it did not get a ValueError
test_read_file()
Invalid json file: Expecting value: line 1 column 1 (char 0)
---------------------------------------------------------------------------
Failed Traceback (most recent call last)
<ipython-input-47-c42b81670a67> in <module>()
----> 1 test_read_file()
2 frames
<ipython-input-46-178e6c645f01> in test_read_file()
1 def test_read_file():
2 with pytest.raises(Exception):
----> 3 read_file("invalidJsonFile.json")
/usr/local/lib/python3.6/dist-packages/_pytest/python_api.py in __exit__(self, *tp)
727 __tracebackhide__ = True
728 if tp[0] is None:
--> 729 fail(self.message)
730 self.excinfo.__init__(tp)
731 suppress_exception = issubclass(self.excinfo.type, self.expected_exception)
/usr/local/lib/python3.6/dist-packages/_pytest/outcomes.py in fail(msg, pytrace)
115 """
116 __tracebackhide__ = True
--> 117 raise Failed(msg=msg, pytrace=pytrace)
118
119
Failed: DID NOT RAISE <class 'Exception'>
Are you sure you're running the same code you sent here? because in a stack trace it looks like you're reading a different file (which could be valid and then no exception will be raised, if it's empty for example).
----> 3 read_file("sampleData.csv")
Also, you do not need to except ValueError just to raise ValueError, when you use pytest.raises(ValueError): pytest will check if the exception is instanceof ValueError.

.csv works fine, .tsv gives 'TypeError: expected string or buffer'

I'm working on a python script to parse user agent strings and reduce them down to just the 'family' (i.e., chrome, firefox, safari, etc).
I've got a script that works completely fine when run against csv files, but when I run the files against .tsv files it gives me the following error:
TypeError: expected string or buffer
Anyone else run across this problem? Sample code is below.
import pandas as pd
import numpy as np
import glob as glob
from ua_parser import user_agent_parser as uaparser
#THIS WORKS FINE:
def parse_uagent():
ua_list = []
uadf = pd.DataFrame()
for datafile in glob.glob("*.csv"):
df = pd.read_csv(datafile, sep=',')
df = df[['user_agent','date_time','user_name']]
ua = df[df.columns[0]].values
for line in ua:
uagent = uaparser.ParseUserAgent(line)
ua_list.append(uagent)
uadf = uadf.append(ua_list)
print uadf
#THIS GIVES AN ERROR:
def parse_uagent():
ua_list = []
uadf = pd.DataFrame()
for datafile in glob.glob("*.tsv"):
df = pd.read_csv(datafile, sep='\t')
df = df[['user_agent','date_time','user_name']]
ua = df[df.columns[0]].values
for line in ua:
uagent = uaparser.ParseUserAgent(line)
ua_list.append(uagent)
uadf = uadf.append(ua_list)
print uadf
Traceback:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-92-14c05dc8ee13> in <module>()
29
30
---> 31 parse_uagent()
32
<ipython-input-92-14c05dc8ee13> in parse_uagent()
19 ua = df[df.columns[0]].values
20 for line in ua:
---> 21 uagent = uaparser.ParseUserAgent(line)
22 ua_list.append(uagent)
23 uadf = uadf.append(ua_list)
/anaconda2/lib/python2.7/site-packages/ua_parser/user_agent_parser.pyc in ParseUserAgent(user_agent_string, **jsParseBits)
247 else:
248 for uaParser in USER_AGENT_PARSERS:
--> 249 family, v1, v2, v3 = uaParser.Parse(user_agent_string)
250 if family:
251 break
/anaconda2/lib/python2.7/site-packages/ua_parser/user_agent_parser.pyc in Parse(self, user_agent_string)
49 def Parse(self, user_agent_string):
50 family, v1, v2, v3 = None, None, None, None
---> 51 match = self.user_agent_re.search(user_agent_string)
52 if match:
53 if self.family_replacement:
TypeError: expected string or buffer
Figured out the issue.. the ua-parser was failing when it came across empty cells. Removing all lines with NaN prior to parsing fixed the error.

locals() and globals() in stack trace on exception (Python)

While stack traces are useful in Python, most often the data at the root of the problem are missing - is there a way of making sure that at least locals() (and possibly globals()) are added to printed stacktrace?
You can install your own exception hook and output what you need from there:
import sys, traceback
def excepthook(type, value, tb):
traceback.print_exception(type, value, tb)
while tb.tb_next:
tb = tb.tb_next
print >>sys.stderr, 'Locals:', tb.tb_frame.f_locals
print >>sys.stderr, 'Globals:', tb.tb_frame.f_globals
sys.excepthook = excepthook
def x():
y()
def y():
foo = 1
bar = 0
foo/bar
x()
To print vars from each frame in a traceback, change the above loop to
while tb:
print >>sys.stderr, 'Locals:', tb.tb_frame.f_locals
print >>sys.stderr, 'Globals:', tb.tb_frame.f_globals
tb = tb.tb_next
This is a Box of Pandora. Values can be very large in printed form; printing all locals in a stack trace can easily lead to new problems just due to error output. That's why this is not implemented in general in Python.
In small examples, though, i. e. if you know that your values aren't too large to be printed properly, you can step along the traceback yourself:
import sys
import traceback
def c():
clocal = 1001
raise Exception("foo")
def b():
blocal = 23
c()
def a():
alocal = 42
b()
try:
a()
except Exception:
frame = sys.exc_info()[2]
formattedTb = traceback.format_tb(frame)
frame = frame.tb_next
while frame:
print formattedTb.pop(0), '\t', frame.tb_frame.f_locals
frame = frame.tb_next
The output will be sth like this:
File "/home/alfe/tmp/stacktracelocals.py", line 19, in <module>
a()
{'alocal': 42}
File "/home/alfe/tmp/stacktracelocals.py", line 16, in a
b()
{'blocal': 23}
File "/home/alfe/tmp/stacktracelocals.py", line 12, in b
c()
{'clocal': 1001}
And you can, of course, install your own except hook as thg435 suggested in his answer.
if you didn't know about this already, use the pdb post-mortem feature:
x = 3.0
y = 0.0
print x/y
def div(a, b):
return a / b
print div(x,y)
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-3-d03977de5fc3> in div(a, b)
1 def div(a, b):
----> 2 return a / b
ZeroDivisionError: float division
import pdb
pdb.pm()
> <ipython-input-3-148da0dcdc9e>(2)div()
0 return a/b
ipdb> l
1 def div(a,b):
----> 2 return a/b
ipdb> a
3.0
ipdb> b
0.0
etc.
there are cases where you really need the prints though, of course. you're better off instrumenting the code (via try/except) to print out extra information around a specific weird exception you are debugging than putting this for everything though, imho.
Try traceback-with-variables package.
Usage:
from traceback_with_variables import traceback_with_variables
def main():
...
with traceback_with_variables():
...your code...
Exceptions with it:
Traceback with variables (most recent call last):
File "./temp.py", line 7, in main
return get_avg_ratio([h1, w1], [h2, w2])
sizes_str = '300 200 300 0'
h1 = 300
w1 = 200
h2 = 300
w2 = 0
File "./temp.py", line 10, in get_avg_ratio
return mean([get_ratio(h, w) for h, w in [size1, size2]])
size1 = [300, 200]
size2 = [300, 0]
File "./temp.py", line 10, in <listcomp>
return mean([get_ratio(h, w) for h, w in [size1, size2]])
.0 = <tuple_iterator object at 0x7ff61e35b820>
h = 300
w = 0
File "./temp.py", line 13, in get_ratio
return height / width
height = 300
width = 0
builtins.ZeroDivisionError: division by zero
Installation:
pip install traceback-with-variables

Can't execute function specified through %paste%

Is it possible to run a function that is specified by using the magic %paste% function in IPython?
In [1]: %paste%
def add_to_index(index,keyword,url):
for e in index:
if e[0] == keyword:
if url not in e[1]:
e[1].append(url)
return
index.append([keyword,[url]])
## -- End pasted text --
Block assigned to '%'
In [2]: %whos
Variable Type Data/Info
-----------------------------
% SList ['def add_to_index(index,<...>append([keyword,[url]])']
In [3]: add_to_index
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-e3075a18cb0c> in <module>()
----> 1 add_to_index
NameError: name 'add_to_index' is not defined
In [4]: add_to_index(index, 'test', 'http://test.com')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-4-580237464b17> in <module>()
----> 1 add_to_index(index, 'test', 'http://test.com')
NameError: name 'add_to_index' is not defined
In [5]:
The paste magic is %paste (no trailing %):
In [3]: %paste
def add_to_index(index,keyword,url):
for e in index:
if e[0] == keyword:
if url not in e[1]:
e[1].append(url)
return
index.append([keyword,[url]])
## -- End pasted text --
In [4]: add_to_index
Out[4]: <function __main__.add_to_index>
What happens in your case is that you are using the optional argument for %paste:
In [5]: %paste?
Type: Magic function
...(text omitted)
You can also pass a variable name as an argument, e.g. '%paste foo'.
This assigns the pasted block to variable 'foo' as string, without
dedenting or executing it (preceding >>> and + is still stripped)
When you do that the pasted code does not get executed, it is just assigned to the variable you gave as an argument (% in your case).

NodeBox error for a verb in python

I downloaded the package http://nodebox.net/code/index.php/Linguistics#verb_conjugation
I'm getting an error even when I tried to get a tense of a verb .
import en
print en.is_verb('use')
#prints TRUE
print en.verb.tense('use')
KeyError Traceback (most recent call last)
/home/cse/version2_tense.py in <module>()
----> 1
2
3
4
5
/home/cse/en/__init__.pyc in tense(self, word)
124
125 def tense(self, word):
--> 126 return verb_lib.verb_tense(word)
127
128 def is_tense(self, word, tense, negated=False):
/home/cse/en/verb/__init__.pyc in verb_tense(v)
175
176 infinitive = verb_infinitive(v)
--> 177 a = verb_tenses[infinitive]
178 for tense in verb_tenses_keys:
179 if a[verb_tenses_keys[tense]] == v:
KeyError: ''
The reason you are getting this error is because there is a mistake in the ~/Library/Application Support/NodeBox/en/verb/verb.txt file they are using to create the dictionary.
use is the infinitive form, however, "used" is entered as the infinitive.
at line 5857:
used,,,uses,,using,,,,,used,used,,,,,,,,,,,,
should be:
use,,,uses,,using,,,,,used,used,,,,,,,,,,,,
after editing and saving the file:
import en
print en.is_verb("use")
print en.verb.infinitive('use')
print en.verb.tense('use')
gives:
True
use
infinitive
extra:
import en
print 'use %s' % en.verb.tense("use")
print 'uses %s' % en.verb.tense("uses")
print 'using %s' % en.verb.tense('using')
print 'used %s' % en.verb.tense('used')
use infinitive
uses 3rd singular present
using present participle
used past

Categories

Resources