I downloaded the package http://nodebox.net/code/index.php/Linguistics#verb_conjugation
I'm getting an error even when I tried to get a tense of a verb .
import en
print en.is_verb('use')
#prints TRUE
print en.verb.tense('use')
KeyError Traceback (most recent call last)
/home/cse/version2_tense.py in <module>()
----> 1
2
3
4
5
/home/cse/en/__init__.pyc in tense(self, word)
124
125 def tense(self, word):
--> 126 return verb_lib.verb_tense(word)
127
128 def is_tense(self, word, tense, negated=False):
/home/cse/en/verb/__init__.pyc in verb_tense(v)
175
176 infinitive = verb_infinitive(v)
--> 177 a = verb_tenses[infinitive]
178 for tense in verb_tenses_keys:
179 if a[verb_tenses_keys[tense]] == v:
KeyError: ''
The reason you are getting this error is because there is a mistake in the ~/Library/Application Support/NodeBox/en/verb/verb.txt file they are using to create the dictionary.
use is the infinitive form, however, "used" is entered as the infinitive.
at line 5857:
used,,,uses,,using,,,,,used,used,,,,,,,,,,,,
should be:
use,,,uses,,using,,,,,used,used,,,,,,,,,,,,
after editing and saving the file:
import en
print en.is_verb("use")
print en.verb.infinitive('use')
print en.verb.tense('use')
gives:
True
use
infinitive
extra:
import en
print 'use %s' % en.verb.tense("use")
print 'uses %s' % en.verb.tense("uses")
print 'using %s' % en.verb.tense('using')
print 'used %s' % en.verb.tense('used')
use infinitive
uses 3rd singular present
using present participle
used past
Related
I want to make a dictionary from wiki but i got this error and i dont know what exactly the error mean, this the code:
! wget https://dumps.wikimedia.org/idwiki/latest/idwiki-latest-pages-articles.xml.bz2
-----------------------------------------------------------------------------------------
from gensim.corpora import WikiCorpus
wiki = WikiCorpus("idwiki-latest-pages-articles.xml.bz2", lemmatize=False, dictionary={})
with open("wiki-id-formatted.txt", 'w', encoding="utf8") as output:
counter = 0
for text in wiki.get_texts():
output.write(' '.join(text)+"\n")
counter = counter + 1
if counter > 200000:
break
and this the error
NotImplementedError Traceback (most recent call last)
<ipython-input-38-1b4f97b88e9f> in <module>()
1 # create txt file for spell check dictionary
----> 2 wiki = WikiCorpus("idwiki-latest-pages-articles.xml.bz2", lemmatize=False, dictionary={})
3
4 with open("wiki-id-formatted.txt", 'w', encoding="utf8") as output:
5 counter = 0
/usr/local/lib/python3.7/dist-packages/gensim/corpora/wikicorpus.py in __init__(self, fname, processes, lemmatize, dictionary, metadata, filter_namespaces, tokenizer_func, article_min_tokens, token_min_len, token_max_len, lower, filter_articles)
618 if lemmatize is not None:
619 raise NotImplementedError(
--> 620 'The lemmatize parameter is no longer supported. '
621 'If you need to lemmatize, use e.g. <https://github.com/clips/pattern>. '
622 'Perform lemmatization as part of your tokenization function and '
NotImplementedError: The lemmatize parameter is no longer supported. If you need to lemmatize, use e.g. <https://github.com/clips/pattern>. Perform lemmatization as part of your tokenization function and pass it as the tokenizer_func parameter to this initializer.
I have a df defined that I am successfully running operations on. I want to time the difference between iterative for loops and vectorized operations. I have read various examples of how to use timeit, but when I try them I am getting the errors below. What am I doing wrong?
Imports:
import h5py
import pandas as pd
import timeit
This loop works:
for u in df['owner'].unique():
print(u, ': ', len(df[(df['owner'] == u)]), sep = '')
But when I try to time it like so ...:
s = """\
for u in df['owner'].unique():
print(u, ': ', len(df[(df['owner'] == u)]), sep = '')"""
time_iter_1_1_1 = timeit.timeit(s)
... it produces this error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-34-7526e96d565c> in <module>()
3 # print(u, ': ', len(df[(df['owner'] == u)]), sep = '')""")
4
----> 5 time_iter_1_1_1 = timeit.timeit(s)
~\Anaconda2\envs\py36\lib\timeit.py in timeit(stmt, setup, timer, number, globals)
231 number=default_number, globals=None):
232 """Convenience function to create Timer object and call timeit method."""
--> 233 return Timer(stmt, setup, timer, globals).timeit(number)
234
235 def repeat(stmt="pass", setup="pass", timer=default_timer,
~\Anaconda2\envs\py36\lib\timeit.py in timeit(self, number)
176 gc.disable()
177 try:
--> 178 timing = self.inner(it, self.timer)
179 finally:
180 if gcold:
~\Anaconda2\envs\py36\lib\timeit.py in inner(_it, _timer)
NameError: name 'df' is not defined
And when I try this ...:
time_iter_1_1_1 = timeit.timeit(
"""for u in df['owner'].unique():
print(u, ': ', len(df[(df['owner'] == u)]), sep = '')""")
... I get this error:
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 57))
...
NameError: name 'df' is not defined
The df is defined and working. How can I fix this?
There are two options, either
Pass an argument globals that allows timeit to resolve the name,
df = pd.DataFrame(...)
timeit.timeit(statement, globals={'df': df}) # globals=globals()
...Or, pass a string argument setup that sets up df for you.
timeit.timeit(statement, setup='import pandas as pd; df = pd.DataFrame(...)')
I am working in pyspark and have the following code, where I am processing tweet and making an RDD with the user_id and text. Below is the code
"""
# Construct an RDD of (user_id, text) here.
"""
import json
def safe_parse(raw_json):
try:
json_object = json.loads(raw_json)
if 'created_at' in json_object:
return json_object
else:
return;
except ValueError as error:
return;
def get_usr_txt (line):
tmp = safe_parse (line)
return ((tmp.get('user').get('id_str'),tmp.get('text')));
usr_txt = text_file.map(lambda line: get_usr_txt(line))
print (usr_txt.take(5))
and the output looks okay (as shown below)
[('470520068', "I'm voting 4 #BernieSanders bc he doesn't ride a CAPITALIST PIG adorned w/ #GoldmanSachs $. SYSTEM RIGGED CLASS WAR "), ('2176120173', "RT #TrumpNewMedia: .#realDonaldTrump #America get out & #VoteTrump if you don't #VoteTrump NOTHING will change it's that simple!\n#Trump htt…"), ('145087572', 'RT #Libertea2012: RT TODAY: #Colorado’s leading progressive voices to endorse #BernieSanders! #Denver 11AM - 1PM in MST CO State Capitol…'), ('23047147', '[VID] Liberal Tears Pour After Bernie Supporter Had To Deal With Trump Fans '), ('526506000', 'RT #justinamash: .#tedcruz is the only remaining candidate I trust to take on what he correctly calls the Washington Cartel. ')]
However, as soon as I do
print (usr_txt.count())
I get an error like below
Py4JJavaError Traceback (most recent call last)
<ipython-input-60-9dacaf2d41b5> in <module>()
8 usr_txt = text_file.map(lambda line: get_usr_txt(line))
9 #print (usr_txt.take(5))
---> 10 print (usr_txt.count())
11
/usr/local/spark/python/pyspark/rdd.py in count(self)
1054 3
1055 """
-> 1056 return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
1057
1058 def stats(self):
What am I missing? Is the RDD not created properly? or there is something else? how do I fix it?
You have returned None from safe_parse method when there is no created_at element in the parsed json line or when there is an error in parsing. This created error while getting elements from the parsed jsons in (tmp.get('user').get('id_str'),tmp.get('text')). That caused the error to occur
The solution is to check for None in get_usr_txt method
def get_usr_txt (line):
tmp = safe_parse(line)
if(tmp != None):
return ((tmp.get('user').get('id_str'),tmp.get('text')));
Now the question is why print (usr_txt.take(5)) showed the result and print (usr_txt.count()) caused the error
Thats because usr_txt.take(5) considered only the first five rdds and not the rest and didn't have to deal with None datatype.
In Django shell:
from django.test import SimpleTestCase
c = SimpleTestCase()
haystack = '<html><b>contribution</b></html>'
c.assertInHTML('<b>contribution</b>', haystack)
c.assertInHTML('contribution', haystack)
I don't understand why the first assertion passes, but the second one doesn't:
AssertionError Traceback (most recent call last)
<ipython-input-15-20da22474686> in <module>()
5
6 c.assertInHTML('<b>contribution</b>', haystack)
----> 7 c.assertInHTML('contribution', haystack)
c:\...\lib\site-packages\django\test\testcases.py in assertInHTML(self, needle, haystack, count, msg_prefix)
680 else:
681 self.assertTrue(real_count != 0,
--> 682 msg_prefix + "Couldn't find '%s' in response" % needle)
683
684 def assertJSONEqual(self, raw, expected_data, msg=None):
C:\...\Programs\Python\Python35-32\lib\unittest\case.py in assertTrue(self, expr, msg)
675 if not expr:
676 msg = self._formatMessage(msg, "%s is not true" % safe_repr(expr))
--> 677 raise self.failureException(msg)
678
679 def _formatMessage(self, msg, standardMsg):
AssertionError: False is not true : Couldn't find 'contribution' in response
The Django docs just say "The passed-in arguments must be valid HTML." I don't think that is the problem, because the call to assert_and_parse_html on the first line doesn't raise:
def assertInHTML(self, needle, haystack, count=None, msg_prefix=''):
needle = assert_and_parse_html(self, needle, None,
'First argument is not valid HTML:')
haystack = assert_and_parse_html(self, haystack, None,
'Second argument is not valid HTML:')
real_count = haystack.count(needle)
if count is not None:
self.assertEqual(real_count, count,
msg_prefix + "Found %d instances of '%s' in response"
" (expected %d)" % (real_count, needle, count))
else:
self.assertTrue(real_count != 0,
msg_prefix + "Couldn't find '%s' in response" % needle)
I'm using Python 3.5.1 and Django 1.8.8.
This is a bug in Django:
assertInHTML(needle, haystack) has the following behaviour
assertInHTML('<p>a</p>', '<div><p>a</p><p>b</p></div>') passes: clearly correct
assertInHTML('<p>a</p><p>b</p>', '<p>a</p><p>b</p>') passes: possibly correct
assertInHTML('<p>a</p><p>b</p>', '<div><p>a</p><p>b</p></div>') fails with an assertion error.
The problem occurs when the needle does not have a unique root element that wraps everything else.
The proposed fix (which has been languishing for some time!) is to raise an exception if you try to do this - i.e., the needle must have a HTML tag that wraps everything inside it.
I'm trying to create a ProgrammableFilter in Paraview using Python. The filter should take the current selected points and count them (the filter will be more elaborated, but this is enough for explaining my problem).
In my code I'm not using any variable called 'inputs', but when I execute it I get this output (note there is an error at the end, and the code seems to be executed twice):
Generated random int: 13 using time 1419991906.3
13 Execution start
13 Selection is active
Generated random int: 59 using time 1419991906.34
59 Execution start
59 No selection is active
59 Execution end
13 Extr_Sel_raw was not None
13 Number of cells: 44
13 Execution end
Traceback (most recent call last):
File "<string>", line 22, in <module>
NameError: name 'inputs' is not defined
The code is the following, my pipeline has 2 steps, the first is a "Sphere source" and the second is the ProgrammableFilter with this code:
import paraview
import paraview.simple
import paraview.servermanager
import random
import time
a = time.time()
random.seed(a)
#time.sleep(1)
tmp_id = random.randint(1,100)
print "\nGenerated random int: %s using time %s" % (tmp_id, a)
print "%s Execution start" % (tmp_id)
proxy = paraview.simple.GetActiveSource()
active_selection = proxy.GetSelectionInput(proxy.Port)
if active_selection is None:
print "%s No selection is active" % (tmp_id)
else:
print "%s Selection is active" % (tmp_id)
Extr_Sel = paraview.simple.ExtractSelection(Selection=active_selection)
Extr_Sel_raw = paraview.servermanager.Fetch(Extr_Sel)
if Extr_Sel_raw is None:
print "%s Extr_Sel_raw was None" % (tmp_id)
else:
print "%s Extr_Sel_raw was not None" % (tmp_id)
print "%s Number of cells: %s" % (tmp_id, Extr_Sel_raw.GetNumberOfCells())
pdi = self.GetPolyDataInput()
pdo = self.GetPolyDataOutput()
pdo.SetPoints(pdi.GetPoints())
print "%s Execution end\n" % (tmp_id)
Do you know what can be causing my problem?
After some work I found how to achieve to access the selected points in Paraview without generating that weird error mentioned above.
Here is the code:
import paraview
import paraview.simple
proxy = paraview.simple.GetActiveSource()
if proxy is None:
print "Proxy is None"
return
active_selection = proxy.GetSelectionInput(proxy.Port)
if active_selection is None:
print "No selection is active"
return
print "Selected points: %s" % (active_selection.IDs)
print "Amount of points: %s" % (len(active_selection.IDs) / 2)
And this is the output if I select 6 points in a Sphere Source:
Selected points: [0, 14, 0, 15, 0, 16, 0, 20, 0, 21, 0, 22]
Amount of points: 6
You can see that each selected point generates 2 IDs, the first one is the "Process ID" and the second one is the actual ID of your point.
Anyway, the reason of the original error remains unclear to me.