regular expressions emoticons

regular expressions emoticons - python

I have data split into fileids. I am trying to go through the data per fileid and search for emoticons :( and :) as defined by the regex. If an emoticon is found I need to retain the information a) the emoticon was found b) in this fileid. When I run this piece of script and print the emoticon dictionary I get 0 as a value. How is this possible? I am a beginner.
emoticon = 0
for fileid in corpus.fileids():
m = re.search('^(:\(|:\))+$', fileid)
if m is not None:
emoticon +=1

It looks to me like your regex is working, and that m should indeed not be None.
>>> re.search('^(:\(|:\))+$', ':)').group()
':)'
>>> re.search('^(:\(|:\))+$', ':)').group()
':)'
>>> re.search('^(:\(|:\))+$', ':):(').group()
':):('
>>> re.search('^(:\(|:\))+$', ':)?:(').group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
However, a few things are questionable to me.
this will only match strings that are 100% emoticons
is fileid really what you're searching?

Related

Tuple index out of range error with .format(list)

I have a strange problem I don't get. I have a format string with a lot of fields. I want to supply the content for the fields using a list. The following simple demo below shows the issue:
>>> formatstr = "Hello {}, you are my {} fried since {}"
>>> list = ["John", "best", 2020]
>>> print formatstr.format(list)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: tuple index out of range
>>>
The format string has 3 fields and the list has also 3 elements.
So I don't understand the error message.
Even when I try to address the indexes within the format string:
>>>
>>> formatstr = "Hello {0:}, you are my {1:} fried since {2:}"
>>>
>>> print formatstr.format(list)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: tuple index out of range
>>>
Can you please help me? I think I blocked somewhere in my thinking.
Thanks.

How to get all elements of list in regex in Python

I want to print elements of list via regex this is my code:
myresult_tv = [ 'Extinct A Horizon Guide to Dinosaurs WEB h264-WEBTUBE', 'High Noon 2019 04 05 720p HDTV DD5 1 MPEG2-NTb', 'Wyatt Cenacs Problem Areas S02E01 1080p WEBRip x264-eSc', 'Bondi Vet S05E15 720p WEB x264-GIMINI', 'If Loving You Is Wrong S04E03 Randals Stage HDTV x264-CRiMSON', 'Wyatt Cenacs Problem Areas S02E01 WEBRip x264-eSc', 'Bondi Vet S05E15 1080p WEB x264-GIMINI']
li = []
for a in myresult_tv:
w = re.match(".*\d ", a)
c =w.group()
li.append(c)
print(li)
and the result is :
Traceback (most recent call last):
File "azazzazazaaz.py", line 31, in <module>
c =w.group()
AttributeError: 'NoneType' object has no attribute 'group'
***Repl Closed***

You're not checking if the regex matched the element of the list. You should be doing something like this:
match = re.search(pattern, string)
if match:
process(match)

Since I don't understand what your expected output, I use the same regex as yours. Try use this code:
li = []
for a in myresult_tv:
try: # I use try... except... in case the regex doesn't work at some list elements
w = re.search("(.*\d )", a) # I use search instead of match
c = w.group()
li.append(c)
except:
pass
print(li)

Regex error in python

I have a string that looks like a path from which I am trying to extract 020414_001 with a regular expression I got from here.
str1 = "Test 123 <C:\User\Test\xyz\022014-101\more\stuff\022014\1> Text"
Actually I am retrieving the string from a text file so I dont have to escape it, but for testing purpose I used this string instead:
str1 = <C:\\User\\Test\\xyz\\022014-101\\more\\stuff\\022014\\1>
Here is the code I tried to match the first occuring 022014-101:
import re
p = re.compile('(?<=\\)[\d]{6}[^\\]*')
m = p.match(str1)
print m.group(0) #Line 6
It gave me this error:
Traceback (most recent call last):
File "test12.py", line 6, in <module>
print m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'
How can I get the desired output 020414_001 ?
EDIT:
That did it:
import re
m = re.search(r'(?<=\\)[\d]{6}[^\\]*', str1)
print m.group(0)

How to handle AttributeError in python?

I am working through some example code which I've found on What's the most efficient way to find one of several substrings in Python?. I've changed the code to:
import re
to_find = re.compile("hello|there")
search_str = "blah fish cat dog haha"
match_obj = to_find.search(search_str)
#the_index = match_obj.start()
which_word_matched = ""
which_word_matched = match_obj.group()
Since there is now no match , I get:
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
What is the standard way in python to handle the scenario of no match, so as to avoid the error

match_obj = to_find.search(search_str)
if match_obj:
#do things with match_obj
Other handling will go in an else block if you need to do something even when there's no match.

Your match_obj is None because the regular expression did not match. Test for it explicitly:
which_word_matched = match_obj.group() if match_obj else ''

The right and elegant way to split a join a string in Python

I have the following list:
>>> poly
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa.shp'
>>> record
1373155
and I wish to create:
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
I wish to split in order to get the part "C:\04-las_clip_inside_area\16x16grids_1pp_fsa16x16grids_1pp_fsa".
I have tried this two-code-lines solution:
mylist = [poly.split(".")[0], "_", record, ".txt"]
>>> mylist
['C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa', '_', 1373155, '.txt']
from here, reading the example in Python join, why is it string.join(list) instead of list.join(string)?.
I find this solution to joint, but I get this error message:
>>> mylist.join("")
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'list' object has no attribute 'join'
Also if I use:
>>> "".join(mylist)
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
TypeError: sequence item 2: expected string, int found

Python join: why is it string.join(list) instead of list.join(string)?
So there is
"".join(mylist)
instead of
mylist.join("")
There's your error.
To solve your int/string problem, convert the int to string:
mylist= [poly.split(".")[0],"_",str(record),".txt"]
or write directly:
"{}_{}.txt".format(poly.split(".")[0], record)

>>> from os import path
>>>
>>> path.splitext(poly)
('C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa', '.shp')
>>>
>>> filename, ext = path.splitext(poly)
>>> "{0}_{1}.txt".format(filename, record)
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'

>>> poly = 'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa.shp'
>>> record = 1373155
>>> "{}_{}.txt".format(poly.rpartition('.')[0], record)
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
or if you insist on using join()
>>> "".join([poly.rpartition('.')[0], "_", str(record), ".txt"])
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
It's important to use rpartition() (or rsplit()) as otherwise it won't work properly if the path has any other '.''s in it

You need to convert record into a string.
mylist= [poly.split(".")[0],"_",str(record),".txt"]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

regular expressions emoticons - python

Related

Tuple index out of range error with .format(list)

How to get all elements of list in regex in Python

Regex error in python

How to handle AttributeError in python?

The right and elegant way to split a join a string in Python

Categories

Resources