regular expressions emoticons - python

I have data split into fileids. I am trying to go through the data per fileid and search for emoticons :( and :) as defined by the regex. If an emoticon is found I need to retain the information a) the emoticon was found b) in this fileid. When I run this piece of script and print the emoticon dictionary I get 0 as a value. How is this possible? I am a beginner.
emoticon = 0
for fileid in corpus.fileids():
m = re.search('^(:\(|:\))+$', fileid)
if m is not None:
emoticon +=1

It looks to me like your regex is working, and that m should indeed not be None.
>>> re.search('^(:\(|:\))+$', ':)').group()
':)'
>>> re.search('^(:\(|:\))+$', ':)').group()
':)'
>>> re.search('^(:\(|:\))+$', ':):(').group()
':):('
>>> re.search('^(:\(|:\))+$', ':)?:(').group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
However, a few things are questionable to me.
this will only match strings that are 100% emoticons
is fileid really what you're searching?

Related

Tuple index out of range error with .format(list)

I have a strange problem I don't get. I have a format string with a lot of fields. I want to supply the content for the fields using a list. The following simple demo below shows the issue:
>>> formatstr = "Hello {}, you are my {} fried since {}"
>>> list = ["John", "best", 2020]
>>> print formatstr.format(list)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: tuple index out of range
>>>
The format string has 3 fields and the list has also 3 elements.
So I don't understand the error message.
Even when I try to address the indexes within the format string:
>>>
>>> formatstr = "Hello {0:}, you are my {1:} fried since {2:}"
>>>
>>> print formatstr.format(list)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: tuple index out of range
>>>
Can you please help me? I think I blocked somewhere in my thinking.
Thanks.

How to get all elements of list in regex in Python

I want to print elements of list via regex this is my code:
myresult_tv = [ 'Extinct A Horizon Guide to Dinosaurs WEB h264-WEBTUBE', 'High Noon 2019 04 05 720p HDTV DD5 1 MPEG2-NTb', 'Wyatt Cenacs Problem Areas S02E01 1080p WEBRip x264-eSc', 'Bondi Vet S05E15 720p WEB x264-GIMINI', 'If Loving You Is Wrong S04E03 Randals Stage HDTV x264-CRiMSON', 'Wyatt Cenacs Problem Areas S02E01 WEBRip x264-eSc', 'Bondi Vet S05E15 1080p WEB x264-GIMINI']
li = []
for a in myresult_tv:
w = re.match(".*\d ", a)
c =w.group()
li.append(c)
print(li)
and the result is :
Traceback (most recent call last):
File "azazzazazaaz.py", line 31, in <module>
c =w.group()
AttributeError: 'NoneType' object has no attribute 'group'
***Repl Closed***
You're not checking if the regex matched the element of the list. You should be doing something like this:
match = re.search(pattern, string)
if match:
process(match)
Since I don't understand what your expected output, I use the same regex as yours. Try use this code:
li = []
for a in myresult_tv:
try: # I use try... except... in case the regex doesn't work at some list elements
w = re.search("(.*\d )", a) # I use search instead of match
c = w.group()
li.append(c)
except:
pass
print(li)

Regex error in python

I have a string that looks like a path from which I am trying to extract 020414_001 with a regular expression I got from here.
str1 = "Test 123 <C:\User\Test\xyz\022014-101\more\stuff\022014\1> Text"
Actually I am retrieving the string from a text file so I dont have to escape it, but for testing purpose I used this string instead:
str1 = <C:\\User\\Test\\xyz\\022014-101\\more\\stuff\\022014\\1>
Here is the code I tried to match the first occuring 022014-101:
import re
p = re.compile('(?<=\\)[\d]{6}[^\\]*')
m = p.match(str1)
print m.group(0) #Line 6
It gave me this error:
Traceback (most recent call last):
File "test12.py", line 6, in <module>
print m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'
How can I get the desired output 020414_001 ?
EDIT:
That did it:
import re
m = re.search(r'(?<=\\)[\d]{6}[^\\]*', str1)
print m.group(0)

How to handle AttributeError in python?

I am working through some example code which I've found on What's the most efficient way to find one of several substrings in Python?. I've changed the code to:
import re
to_find = re.compile("hello|there")
search_str = "blah fish cat dog haha"
match_obj = to_find.search(search_str)
#the_index = match_obj.start()
which_word_matched = ""
which_word_matched = match_obj.group()
Since there is now no match , I get:
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
What is the standard way in python to handle the scenario of no match, so as to avoid the error
match_obj = to_find.search(search_str)
if match_obj:
#do things with match_obj
Other handling will go in an else block if you need to do something even when there's no match.
Your match_obj is None because the regular expression did not match. Test for it explicitly:
which_word_matched = match_obj.group() if match_obj else ''

The right and elegant way to split a join a string in Python

I have the following list:
>>> poly
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa.shp'
>>> record
1373155
and I wish to create:
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
I wish to split in order to get the part "C:\04-las_clip_inside_area\16x16grids_1pp_fsa16x16grids_1pp_fsa".
I have tried this two-code-lines solution:
mylist = [poly.split(".")[0], "_", record, ".txt"]
>>> mylist
['C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa', '_', 1373155, '.txt']
from here, reading the example in Python join, why is it string.join(list) instead of list.join(string)?.
I find this solution to joint, but I get this error message:
>>> mylist.join("")
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'list' object has no attribute 'join'
Also if I use:
>>> "".join(mylist)
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
TypeError: sequence item 2: expected string, int found
Python join: why is it string.join(list) instead of list.join(string)?
So there is
"".join(mylist)
instead of
mylist.join("")
There's your error.
To solve your int/string problem, convert the int to string:
mylist= [poly.split(".")[0],"_",str(record),".txt"]
or write directly:
"{}_{}.txt".format(poly.split(".")[0], record)
>>> from os import path
>>>
>>> path.splitext(poly)
('C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa', '.shp')
>>>
>>> filename, ext = path.splitext(poly)
>>> "{0}_{1}.txt".format(filename, record)
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
>>> poly = 'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa.shp'
>>> record = 1373155
>>> "{}_{}.txt".format(poly.rpartition('.')[0], record)
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
or if you insist on using join()
>>> "".join([poly.rpartition('.')[0], "_", str(record), ".txt"])
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
It's important to use rpartition() (or rsplit()) as otherwise it won't work properly if the path has any other '.''s in it
You need to convert record into a string.
mylist= [poly.split(".")[0],"_",str(record),".txt"]

Categories

Resources