The following code works as expected if I declare the "line" variable at the beginning of the script. something like ...
s = "Jul 15 12:12:51 whitelist logger: 1|999999999999|id:d9faff7c-4016-4343-b494-37028763bb66 submit date:1307130919 done date:1307130919 stat:DELIVRD err:0|L_VB3_NM_K_P|1373687445|vivnel2|L_VB3_GH_K_P|promo_camp1-bd153424349bc647|1"
When I open a file and loop through lines, the groups attribute does not work. I get an error:AttributeError: 'NoneType' object has no attribute 'groups'
# cat mylast.py
import re
f = open('customer.csv')
for line in f:
logger_re = re.compile(
"logger: ([^ ]+)\
submit date:(\d+)\
done date:(\d+)\
stat:(.+)\
err:(.+)$")
myvalues = logger_re.search(line).groups()
print myvalues
f.close()
Exception:
# python mylast.py
Traceback (most recent call last):
File "mylast.py", line 13, in ?
myvalues = logger_re.search(line).groups()
AttributeError: 'NoneType' object has no attribute 'groups'
Your regular expression is not matching your actual file contents.
As such, logger_re.search(line) returns None.
The problem here is that you indented your regular expression but did not compensate for the extra whitespace:
logger_re = re.compile(
"logger: ([^ ]+)\
submit date:(\d+)\
done date:(\d+)\
stat:(.+)\
err:(.+)$")
Note that the whitespace at the start of the line there matters. Use separate strings (Python will join them at compile time):
logger_re = re.compile(
"logger: ([^ ]+) "
"submit date:(\d+) "
"done date:(\d+) "
"stat:(.+) "
"err:(.+)$")
Your search will return None if no matches were found. You need to check that myvalues is not None before attempting to access groups().
Related
I try to follow pytube example for downloading video from YouTube:
from pytube import YouTube
video = YouTube('https://www.youtube.com/watch?v=BATOxzbVNno')
video.streams.all()
and immediately get this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-2556eb2eb903> in <module>()
1 from pytube import YouTube
2 video = YouTube('https://www.youtube.com/watch?v=BATOxzbVNno')
----> 3 video.streams.all()
5 frames
/usr/local/lib/python3.7/dist-packages/pytube/cipher.py in get_throttling_function_code(js)
301 # Extract the code within curly braces for the function itself, and merge any split lines
302 code_lines_list = find_object_from_startpoint(js, match.span()[1]).split('\n')
--> 303 joined_lines = "".join(code_lines_list)
304
305 # Prepend function definition (e.g. `Dea=function(a)`)
AttributeError: 'NoneType' object has no attribute 'span'
Please help me. It worked fine just yesterday! Thanks a lot!
Just ran into that error myself, seems it occurs quite frequently regardless of it getting temporary fixes.
Found a fix on github: NoneType object has no attribute 'span'
Just replace the function get_throttling_function_name with:
def get_throttling_function_name(js: str) -> str:
"""Extract the name of the function that computes the throttling parameter.
:param str js:
The contents of the base.js asset file.
:rtype: str
:returns:
The name of the function used to compute the throttling parameter.
"""
function_patterns = [
# https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-865985377
# a.C&&(b=a.get("n"))&&(b=Dea(b),a.set("n",b))}};
# In above case, `Dea` is the relevant function name
r'a\.[A-Z]&&\(b=a\.get\("n"\)\)&&\(b=([^(]+)\(b\)',
]
logger.debug('Finding throttling function name')
for pattern in function_patterns:
regex = re.compile(pattern)
function_match = regex.search(js)
if function_match:
logger.debug("finished regex search, matched: %s", pattern)
function_name = function_match.group(1)
is_Array = True if '[' or ']' in function_name else False
if is_Array:
index = int(re.findall(r'\d+', function_name)[0])
name = function_name.split('[')[0]
pattern = r"var %s=\[(.*?)\];" % name
regex = re.compile(pattern)
return regex.search(js).group(1).split(',')[index]
else:
return function_name
raise RegexMatchError(
caller="get_throttling_function_name", pattern="multiple"
)
I want to fetch ftp account information form vsftp log by regex.
All of our accounts were named by user plus number such as user01, user02, user03.
Tue Sep 12 18:11:20 2017 1 ::ffff:172.18.1.168 3620 /ftptest.py a _ i r user01 ftp 0 * c
Tue Sep 12 18:12:51 2017 1 ::ffff:172.18.1.168 4211 /ftptest.py a _ i r user02 ftp 0 * c
Tue Sep 12 18:16:43 2017 1 ::ffff:172.18.1.168 4322 /ftptest.py a _ i r user03 ftp 0 * c
My code is as below:
#!/usr/bin/python
import re
with open("/var/log/xferlog") as ftplog:
for line in ftplog:
line = line.strip("\n")
pattern = re.compile(r'user[\d]+')
match = pattern.search(line)
print match.group()
The result can fetch the user account but also show error message AttributeError: 'NoneType' object has no attribute 'group'
The result:
user01
user02
user03
Traceback (most recent call last):
File "test8.py", line 10, in <module>
print match.group()
AttributeError: 'NoneType' object has no attribute 'group'
Can anyone give me some advice?
pattern.search(line) return None if there is no match to line.
So your code must add a condition on that.
#!/usr/bin/python
import re
with open("/var/log/xferlog") as ftplog:
for line in ftplog:
line = line.strip("\n")
pattern = re.compile(r'user[\d]+')
match = pattern.search(line)
if match:
print match.group()
Regards Youenn.
Use a if statement to deal with the case where pattern does not match.
...
if match:
print match.group() # or anything
But note that this will silence all cases where there is no match. If you want to track those (maybe for debug) you can add
else:
print line
I'm not able to get user01, user02, user03 to print based on your sample data and code, but it looks like your regex isn't capturing the values correctly. To help you troubleshoot I'd recommend using using Python debugger to help you walk your code:
#!/usr/bin/python
import re
with open("sample") as ftplog:
for line in ftplog:
line = line.strip("\n")
pattern = re.compile(r'sparq[\d]+')
match = pattern.search(line)
if match is None:
import pdb; pdb.set_trace()
print match.group()
As a part of schoolwork we have been given this code:
>>> IN = re.compile(r'.*\bin\b(?!\b.+ing)')
>>> for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'):
... for rel in nltk.sem.extract_rels('ORG', 'LOC', doc,
... corpus='ieer', pattern = IN):
... print(nltk.sem.rtuple(rel))
We are asked to try it out with some sentences of our own to see the output, so for this i decided to define a function:
def extract(sentence):
import re
import nltk
IN = re.compile(r'.*\bin\b(?!\b.+ing)')
for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
print(nltk.sem.rtuple(rel))
When I try and run this code:
>>> from extract import extract
>>> extract("The Whitehouse in Washington")
I get the gollowing error:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
extract("The Whitehouse in Washington")
File "C:/Python34/My Scripts\extract.py", line 6, in extract
for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
File "C:\Python34\lib\site-packages\nltk\sem\relextract.py", line 216, in extract_rels
pairs = tree2semi_rel(doc.text) + tree2semi_rel(doc.headline)
AttributeError: 'str' object has no attribute 'text'
Can anyone help me understand where I am going wrong in my function?
The correct output for the test sentence should be:
[ORG: 'Whitehouse'] 'in' [LOC: 'Washington']
If you see the method definition of extract_rels, it expects the parsed document as third argument.
And here you are passing the sentence. To overcome this error, you can do following :
tagged_sentences = [ nltk.pos_tag(token) for token in tokens]
class doc():
pass
IN = re.compile(r'.*\bin\b(?!\b.+ing)')
doc.headline=["test headline for sentence"]
for i,sent in enumerate(tagged_sentences):
doc.text = nltk.ne_chunk(sent)
for rel in nltk.sem.relextract.extract_rels('ORG', 'LOC', doc, corpus='ieer', pattern=IN):
print(nltk.sem.rtuple(rel) )// you can change it according
Try it out..!!!
I'm trying to tally the number of instances of a top level domain occur in a file containing 800K+ top level domain strings that I scraped from URLs. In the code below, when I used "if mstlds in ntld:" the results appeared to be correct but upon inspection "co" and "com", "ca" and "cat" counts are incorrect. But if I use == or "is" I don't get any matches at all but instead an error:
Traceback (most recent call last):
File "checktlds4malware.py", line 111, in
mtlds_line = mtlds.readline()
AttributeError: 'str' object has no attribute 'readline'
tld_file = open(sys.argv[1],'r')
tld_line = tld_file.readline()
while tld_line:
#print(tld_line)
tld_line = tld_line.strip()
columns = tld_line.split()
ntld = columns[0] # get the ICANN TLD
ntld = ntld.lower()
mtlds = open ('malwaretlds.txt', 'r')
mtlds_line = mtlds.readline()
while mtlds_line:
print(mtlds_line)
mtlds_line = mtlds_line.strip()
columns = mtlds_line.split()
mtlds = columns[0]
mtlds = mtlds.lower()
#raw_input()
# I don't get the error when using "in" not ==
# but the comparison is not correct.
if mtlds_line == ntld:
m_count += 1
print 'ntld and mtld match: Malware domain count for ', ntld, m_count
mtlds_line = mtlds.readline()
print 'Final malware domain count for ', ntld, m_count
This is because within your while loop, you are setting mtlds to be a String. Thus, once you attempt to use the readline() method you throw the error (pretty self explanatory). You have to remember that only outside the scope of your interior while loop is mtlds pointing to a file.
While I execute the below program with little modification I am getting an error.
import sys,re
match=re.compile(r'aa[0-9]+AB')
while 1 :
line=eval(raw_input('Enter the string to search' 'or' "press 'q' to Quit"))
if line == 'q':
print "you are quit from the program"
break
if match.search(line):
print 'Matched:',line
print pat
print 'found',match.group()
print type(pat)
else:
print "no match"
print type(pat)
Input:
'aa12AB'
O/P:
>>> Matched: aa12AB
<_sre.SRE_Pattern object at 0x02793720>
found
Traceback (most recent call last):
File "C:\Pyth`enter code here`on27\lib\site-packages\Pythonwin\pywin\framework\scriptutils.py", line 325, in RunScript
exec codeObject in __main__.__dict__
File "C:\Users\thangaraj\Desktop\python program\UK Training program\New to add labtop\regular exp\Script1.py", line 11, in <module>
print 'found',match.group()
AttributeError: '_sre.SRE_Pattern' object has no attribute 'group'
>>>
You have to assign to a match object:
m = match.search(line)
and then:
m.group()
Why are you using eval? You should use match.search (although you should probably rename the variable from match as usually, the return value of search is called a match) and the return value of search will have a group method, as #Birei wrote.