Python - U.S. ZipCode Matching - python

I'm working with Regex and I'm brand new to using python. I can't get the program to read from file and go through the match case properly. I'm getting a traceback error that looks like this:
Traceback (most recent call last):
File "C:\Users\Systematic\workspace\Project8\src\zipcode.py", line 18, in <module>
m = re.match(info, pattern)
File "C:\Python34\lib\re.py", line 160, in match
return _compile(pattern, flags).match(string)
File "C:\Python34\lib\re.py", line 282, in _compile
p, loc = _cache[type(pattern), pattern, flags]
TypeError: unhashable type: 'list'
zipin.txt:
3285
32816
32816-2362
32765-a234
32765-23
99999-9999
zipcode.py:
from pip._vendor.distlib.compat import raw_input
import re
userinput = raw_input('Please enter the name of the file containing the input zipcodes: ')
myfile = open(userinput)
info = myfile.readlines()
pattern = '^[0-9]{5}(?:-[0-9]{4})?$'
m = re.match(info, pattern)
if m is not None:
print("Match found - valid U.S. zipcode: " , info, "\n")
else: print("Error - no match - invalid U.S. zipcode: ", info, "\n")
myfile.close()

The problem is that readlines() returns a list, and re operates on stuff that is string like. Here is one way it could work:
import re
zip_re = re.compile('^[0-9]{5}(?:-[0-9]{4})?$')
for l in open('zipin.txt', 'r'):
m = zip_re.match(l.strip())
if m:
print l
break
if m is None:
print("Error - no match")
The code now operates in a loop over the file lines, and attempts to match the re on a stripped version of each line.
Edit:
It's actually possible to write this in a much shorter, albeit less clear way:
next((l for l in open('zipin.txt', 'r') if zip_re.match(l.strip())), None)

Related

Searching a document for specific strings, then print out a part of that string

So in my program im attempting to search a document called console.log that has lines like this:
65536:KONSOLL:1622118174:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34
65536:KONSOLL:1622177574:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34
65536:KONSOLL:1622190642:NSN:From AutroSafe: 28 5 2021, 08:30:42; 05.046; Service: "Self Verify" mislykket; ; ; ProcessMsg; F:2177 L:655; 53298;1;13056;;
65536:KONSOLL:1622204573:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34
In my input i always specify "Self Verify" as im looking after that. I want the detectornumber (05.046) on the output. But i get a error.
This is my code:
import os
import re
pattern = input("What are you searching for? -->")
detectorPattern = re.compile(r'\d\d.\d\d\d')
directory = os.listdir()
for x in range(len(directory)):
with open(directory[x], 'r') as reader:
print("Opening " + directory[1])
# Read and print the entire file line by line
for line in reader:
findLine = re.search(pattern, line)
if findLine is not None:
mo = detectorPattern.search(findLine)
print(mo.group())
So what im trying to do is to to go for one line, and if i find "Self Verify" i will search that line for the detector specified in detectorPattern, and print that one out.
This is the error i get:
Traceback (most recent call last):
File "C:\Users\haral\Desktop\PP\SVFinder.py", line 14, in <module>
mo = detectorPattern.search(findLine)
TypeError: expected string or bytes-like object
Change:
mo = detectorPattern.search(findLine)
To:
mo = detectorPattern.search(findLine.string)
This will print:
162219
when executing the line:
print(mo.group())
I suggest you put directly line into detectorPattern.search.
Also please not if x is not None: can be replaced by if x:
This example should works as is:
import re
pattern = "Self Verify"
reader = ['65536:KONSOLL:1622118174:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34',
'65536:KONSOLL:1622177574:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34',
'65536:KONSOLL:1622190642:NSN:From AutroSafe: 28 5 2021, 08:30:42; 05.046; Service: "Self Verify" mislykket; ; ; ProcessMsg; F:2177 L:655; 53298;1;13056;; ',
'65536:KONSOLL:1622204573:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34]']
for line in reader:
findLine = re.search(pattern, line)
detectorPattern = re.compile(r'\d\d.\d\d\d')
if findLine:
detector = detectorPattern.search(line).group()
print(detector)
Output:
162219
I want the detectornumber (05.046) on the output
This seems to do the job. I just changed findLine by pattern in detectorPattern.
import os
import re
pattern = input("What are you searching for? -->")
detectorPattern = re.compile(r'\d\d.\d\d\d')
directory = os.listdir()
for x in range(len(directory)):
with open(directory[x], 'r') as reader:
print("Opening " + directory[1])
# Read and print the entire file line by line
for line in reader:
findLine = re.search(pattern, line)
if findLine is not None:
mo = detectorPattern.search(pattern)
print(mo.group())

Issue with list indexing while converting letters (devnagari to english)

I am currently trying to map devnagari script with English alphabets. But once in a while I run into the error list index out of range . I don't want to miss out on any list . This is why I do not want to use error handling unless it is necessary. Could you please look into my script and help out why this error is occurring ?
In my word file I have located which word is causing the error but then If i use couple of sentence up and down from that word then the error is not there . i.e I think the error happens at a specific length of string.
clean=[]
dafuq=[]
clean_list = []
replacements = {'अ':'A','आ':'AA', 'इ':'I', 'ई':'II', 'उ':'U','ऊ':'UU', 'ए':'E', 'ऐ':'AI',
'ओ':'O','औ':'OU', 'क':'KA', 'ख':'KHA', 'ग':'GA', 'घ':'GHA', 'ङ':'NGA',
'च':'CA','छ':'CHHA', 'ज':'JA', 'झ':'JHA','ञ':'NIA', 'ट':'TA', 'ठ':'THA',
'ड':'DHA','ढ':'DHHA', 'ण':'NAE', 'त':'TA', 'थ':'THA','द':'DA', 'ध':'DHA',
'न':'NA','प':'PA', 'फ':'FA', 'ब':'B', 'भ':'BHA', 'म':'MA','य':'YA', 'र':'RA',
'ल':'L','व':'WA', 'स':'SA', 'ष':'SHHA', 'श':'SHA', 'ह':'HA', '्':'A',
'ऋ':'RI', 'ॠ':'RI','ऌ':'LI','ॐ':'OMS', 'ः':' ', 'ँ':'U',
'ं':'M', 'ृ':'RI', 'ा':'AA', 'ी':'II', 'ि':'I', 'े':'E', 'ै':'AI',
'ो':'O','ौ':'OU','ु' :'U','ू':'UU' }
import unicodedata
from functools import reduce
def reducer(r, v):
if unicodedata.category(v) in ('Mc', 'Mn'):
r[-1] = r[-1] + v
else:
r.append(v)
return r
with open('words_original.txt', mode='r',encoding="utf-8") as f:
with open ('alphabeths.txt', mode='w+', encoding='utf-8') as d:
with open('only_words.txt', mode='w+', encoding="utf-8") as e:
chunk_size = 4096
f_chunk = f.read(chunk_size)
while len(f_chunk)>0:
for word in f_chunk.split():
for char in ['।', ',', '’', '‘', '?','#','1','2','3','4','0','5','6','7','8','9',
'१','२','३','४','५','.''६','७','८','९','०', '5','6','7','8','9','0','\ufeff']:
if char in word:
word = word.replace(char, '')
if word.strip():
clean_list.append(word)
f_chunk = f.read(chunk_size)
for clean_word in clean_list:
test_word= reduce(reducer,clean_word,[])
final_word= (''.join(test_word))
dafuq.append(final_word)
print (final_word)
f_chunk = f.read(chunk_size)
This is the file I am testing it on
words_original.txt
words_original.txt
stacktrace error
Traceback (most recent call last):
File "C:\Users\KUSHAL\Desktop\EARTHQUAKE_PYTHON\test.py", line 82, in <module>
test_word= reduce(reducer,clean_word,[])
File "C:\Users\KUSHAL\Desktop\EARTHQUAKE_PYTHON\test.py", line 27, in reducer
r[-1] = r[-1] + v
IndexError: list index out of range
The problem lay with some unicode characters. It worked after removing them.

reading last line of txt file in python and change it into variable to make calculation

td = 'date of transaction.txt'
tf = 'current balance.txt'
tr = 'transaction record.txt'
for line in open(tf):pass
for line2 in open(td):pass
for line3 in open(tr):pass
print line2,line,line3
"""
so call recall back last record
"""
rd=raw_input('\ndate of transaction: ')
print "Choose a type of transaction to proceed... \n\tA.Withdrawal \n\tB.Deposit \n\tC.Cancel & exit"
slc=raw_input('type of transaction: ')
i=1
while (i>0):
if slc=="A" or slc=="B" or slc=="C":
i=0
else:
i=i+1
slc=raw_input('invalid selection, please choose again...(A/B/C): ')
if slc=="A":
rr=input('\namount of transaction: ')
Line_len = 10 # or however long a line is, since in my example they all looked the same
SEEK_END = 2
file = open(tf, "r")
file.seek(-Line_len, SEEK_END)
a = int(str(file.read(Line_len)).split(" ")[0].strip())
rf=a-rr
f1=open(tf, 'a+')
f1.write('\n'+rf)
f1.close()
d1=open(td, 'a+')
d1.write('\n'+rd)
d1.close
r1=open(tr, 'a+')
r1.write('\n-'+rr)
r1.close
else:
print 'later'
above is my code, the function is to get data(last line) from txt file and read it, get new data and write it to the txt file again by creating new line.
my txt file(current balance.txt) should look like this:
2894.00
2694.00
but when i try to use the last line which is 2694.00 to do calculation(rf=a-rr), it failed returning this error:
Traceback (most recent call last):
File "C:\Python27\acc.py", line 27, in <module>
file.seek(-Line_len, SEEK_END)
IOError: [Errno 22] Invalid argument
else if i use this code:
for line in open(tf):
pass
a = line
rf=a-rr
it return this error:
Traceback (most recent call last):
File "C:\Python27\acc.py", line 27, in <module>
rf=a-rr
TypeError: unsupported operand type(s) for -: 'str' and 'int'
I seriously have no idea why...please help me...
To obtain last line of the file, you can simple do
with open('my_file.txt') as file:
last_line = file.readlines()[-1]
#last_line is a string value pointing to last line, to convert it into float, you can do
number = float(last_line.strip('\n').strip(' '))
The function input is giving you a string. Try doing:
rf=a-float(rr)

Python Search & Replace With Regex

I am trying to replace every occurrence of a Regex expression in a file using Python with this code:
import re
def cleanString(string):
string = string.replace(" ", "_")
string = string.replace('_"', "")
string = string.replace('"', '')
return string
test = open('test.t.txt', "w+")
test = re.sub(r':([\"])(?:(?=(\\?))\2.)*?\1', cleanString(r':([\"])(?:(?=(\\?))\2.)*?\1'), test)
However, when I run the script I am getting the following error:
Traceback (most recent call last):
File "C:/Python27/test.py", line 10, in <module>
test = re.sub(r':([\"])(?:(?=(\\?))\2.)*?\1', cleanString(r':([\"])(?:(?=(\\?))\2.)*?\1'), test)
File "C:\Python27\lib\re.py", line 155, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer
I think it is reading the file incorrectly but I'm not sure what the actual issue is here
Your cleanString function is not returning anything. Ergo the "NoneType" error.
You probably want to do something like:
def cleanString(string):
string = string.replace(" ", "_")
string = string.replace('_"', "")
string = string.replace('"', '')
return string

Using regular expressions in python raw input

I am trying to create a script that will allow a user to enter a number of regular expressions which will pass through an input file and retrieve matches. I am currently using ahocorasick but am getting issues when I try and enter regexed patterns.
I enter a regex into the second raw_input (colour_regex) but receive this error below:
Traceback (most recent call last):
File "PLA_Enrichment_options.py", line 189, in <module>
main()
File "PLA_Enrichment_options.py", line 41, in main
tree.add(regex)
File "build/bdist.linux-x86_64/egg/ahocorasick/__init__.py", line 29, in add
TypeError: argument 1 must be string or read-only buffer, not _sre.SRE_Pattern
file_name = raw_input("What is the filename you wish to enhance? ")
enhanced_name = file_name.replace(".csv", "")
# User regexed input
tree = ahocorasick.KeywordTree()
print ("What regex would you like to use for colour? (Enter 'exit' to move on) ")
colour_regex = raw_input()
regex = re.compile(colour_regex)
while colour_regex != "exit":
tree.add(regex)
tree.make()
print 'Finding colour matches...'
output = open(enhanced_name + '-colour.csv', 'w')
file = open(feed_name, 'r')
for line in iter(file):
id, title, desc, link, image = line.strip('\n').split('\t')
offerString = '|'.join([title.lower(), desc.lower(), link.lower()])
keywords = set()
for match in tree.findall_long(offerString): # find colours
indices = list(match)
keyword = offerString[indices[0]:indices[1]]
if re.search(r'(?<![âêîôûäëïöüàèìòùáéíóú])\b%s\b(?![âêîôûäëïöüàèìòùáéíóú])' %(keyword), offerString):
keywords.add(keyword)
if keywords:
output.write('\t'.join([id, '|'.join(keywords), desc, link, image])+'\n')
else:
output.write('\t'.join([id, title, desc, link, image])+'\n')
file.close()
output.close()
Any help/guidance to the right direction would be great.
Thanks
tree = ahocorasick.KeywordTree()
regex = re.compile(colour_regex)
tree.add(regex)
You have passed the wrong type to ahocorasick.KeywordTree.add()
regex is a compiled regular expression object. The type is _sre.SRE_Pattern. If you use the original string instead, you will not get this error.
tree.add(colour_regex)
Also, this will cause an infinite loop. I think you want if instead of while, or put colour_regex = raw_input() inside the loop.
while colour_regex != "exit":

Categories

Resources