python regex error: NameError: name 're' is not defined

python regex error: NameError: name 're' is not defined - python

I have some code that reads an infofile and extracts information using python's regex and writes it into a new file. When I test this portion of the code individually in its own script, it works perfectly. However when I add it to the rest of my code, I get this error:
NameError: name 're' is not defined
Below is my entire code. The regex portion is obvious (all the re.search commands):
import glob
import subprocess
import os
import datetime
import matplotlib.pyplot as plt
import csv
import re
import ntpath
x = open('data.txt', 'w')
m = open('graphing_data.txt', 'w')
ckopuspath= '/Volumes/DAVIS/sfit-ckopus/ckopus'
command_init = 'sfit4Layer0.py -bv5 -fh'
subprocess.call(command_init.split(), shell=False)
with open('/Volumes/DAVIS/calpy_em27_neu/spectra_out_demo/info.txt', 'rt') as infofile: # the info.txt file created by CALPY
for count, line in enumerate(infofile):
with open('\\_spec_final.t15', 'w') as t:
lat = re.search('Latitude of location:\s*([^;]+)', line, re.IGNORECASE).group(0)
lat = lat.split()
lat = lat[3]
lat = float(lat)
lon = re.search('Longitude of location:\s*([^;]+)', line, re.IGNORECASE).group(0)
lon = lon.split()
lon = lon[3]
lon = float(lon)
date = re.search('Time of measurement \(UTC\): ([^;]+)', line).group(0)
date = date.split()
yeardate = date[4]
yeardate = yeardate.split('-')
year = int(yeardate[0])
month = int(yeardate[1])
day = int(yeardate[2])
time = date[5]
time = time.split(':')
hour = int(time[0])
minute = int(time[1])
second = float(time[2])
dur = re.search('Duration of measurement \[s\]: ([^;]+)', line).group(0)
dur = dur.split()
dur = float(dur[4])
numpoints = re.search('Number of values of one scan:\s*([^;]+)', line, re.IGNORECASE).group(0)
numpoints = numpoints.split()
numpoints = float(numpoints[6])
fov = re.search('semi FOV \[rad\] :\s*([^;]+)', line, re.IGNORECASE).group(0)
fov = fov.split()
fov = fov[3]
fov = float(fov[1:])
sza = re.search('sun Azimuth \[deg\]:\s*([^;]+)', line, re.IGNORECASE).group(0)
sza = sza.split()
sza = float(sza[3])
snr = 0.0000
roe = 6396.2
res = 0.5000
lowwav = re.search('first wavenumber:\s*([^;]+)', line, re.IGNORECASE).group(0)
lowwav = lowwav.split()
lowwav = float(lowwav[2])
highwav = re.search('last wavenumber:\s*([^;]+)', line, re.IGNORECASE).group(0)
highwav = highwav.split()
highwav = float(highwav[2])
spacebw = (highwav - lowwav)/ numpoints
d = datetime.datetime(year, month, day, hour, minute, second)
t.write('{:>12.5f}{:>12.5f}{:>12.5f}{:>12.5f}{:>8.1f}'.format(sza,roe,lat,lon,snr)) # line 1
t.write("\n")
t.write('{:>10d}{:>5d}{:>5d}{:>5d}{:>5d}{:>5d}'.format(year,month,day,hour,minute,second)) # line 2
t.write("\n")
t.write( ('{:%Y/%m/%d %H:%M:%S}'.format(d)) + "UT Solar Azimuth:" + ('{:>6.3f}'.format(sza)) + " Resolution:" + ('{:>6.4f}'.format(res)) + " Duration:" + ('{:>6.2f}'.format(dur))) # line 3
t.write("\n")
t.write('{:>21.13f}{:>26.13f}{:>24.17e}{:>12f}'.format(lowwav,highwav,spacebw,numpoints)) # line 4
t.write("\n")
calpy_path = '/Volumes/DAVIS/calpy_em27_neu/spectra_out_demo/140803/*' # the CALPY output files!
files1 = glob.glob(calpy_path)
with open(files1[count], 'r') as g:
for line in g:
wave_no, intensity = [float(item) for item in line.split()]
if lowwav <= wave_no <= highwav:
t.write(str(intensity) + '\n')
##########################
subprocess.call(['sfit4Layer0.py', '-bv5', '-fs'],shell=False) #I think this writes the summary file
# this retrieves info from summary and outputs it into data.txt (for readability)
# and graphing_data.txt (for graphing)
road = '/Volumes/DAVIS/calpy_em27_neu/spectra_out_demo/sfit4_trial' # path to summary file that is produced - not sure where this is usually*
for infile in glob.glob(os.path.join(road, 'summary*')):
lines = open(infile, 'r').readlines()
#extract info from summary
x.write('{0} {1} {2} {3} {4}'.format(fitrms, chi2, dofsall, dofstrg, iter))
x.write('\n')
x.close()
m.close()

Related

Python: Dictionary with values higher than the average value

I am trying to write a program that reads two textfiles (box_a and box_b). These files have the license plate number and the time this car passes two different speed cameras. The format in the files is like this: 6TKJ777, 2018-02-09 09:13:22. I would like the program to calculate the average speed (see avg_speed in the code below) between these cameras (based on the time of the passings on box_a and box_b, and the distance from the variable distance below.). The cars with an avg_speed above the speed limit (the variable speed_limit below) should be added to a dictionary where registration number is key, and avg_speed and the time the car passes box_a as a tuple. This dictionary only contains cars that have broken the speed limit. I seem to have got stuck. The code below probably have several issues, but the latest error is name license_ is not defined. Any ideas?
from datetime import datetime
date_format = ' %Y-%m-%d %H:%M:%S'
def file_to_dictionary(file):
filename = file
filename = open(file, 'r')
readings = []
for line in filename:
line = line.strip('\n')
reg = line.split(',')
readings.append(reg)
filename.close()
dictionary = dict(readings)
for key in dictionary:
print(key, ' : ', dictionary[key])
return dictionary
def list_speeders():
filename_a = "box_a.txt"
filename_b = "box_b.txt"
speed_limit = 60
distance = 5
mydict= {license_:(avg_speed,b_time)}
dict_a = file_to_dictionary(filename_a)
dict_b = file_to_dictionary(filename_b)
a_time = dict_a[license_]
b_time = dict_b[license_]
avg_speed=round(distance/(((datetime.strptime(b_time, date_format) - datetime.strptime(a_time, date_format)).total_seconds())/3600),3)
for line in dict_a:
if avg_speed > speed_limit:
mydict[license_]=avg_speed
print(mydict)
list_speeders()

The code is less cumbersome if you convert the date and time to a timestamp when you first build the dictionaries. Then it's simple:
from datetime import datetime
date_format = '%Y-%m-%d %H:%M:%S'
speed_limit = 60
distance = 5
def to_dictionary(filenames):
alldicts = tuple()
for filename in filenames:
d = {}
with open(filename, encoding='utf-8') as infile:
for line in infile:
reg, dt = line.split(',')
d[reg] = datetime.strptime(dt.strip(), date_format).timestamp()
alldicts += (d, )
return alldicts
box_a, box_b = to_dictionary(('box_a.txt', 'box_b.txt'))
speeders = {}
for k, va in box_a.items():
if (vb := box_b.get(k)):
if (average_speed := distance / abs(va-vb) * 3600) > speed_limit:
speeders[k] = average_speed
print(speeders)

Get a result of a log file parsing speed every 10 seconds in python

I have a python code to parse a 1 TB log file, but the problem is my result is shown after the parsing process is finished. So for that I need to wait for 12 hours, and after 12 hours only then the result is shown. I want to know how can I parse a log file and know the result of the parsing speed every 10 seconds.
This is my code:
import re
import timeit
log_file = '/Users/kiya/Desktop/mysql/ipscan/ip.txt'
output_file ='/Users/kiya/Desktop/mysql/ipscan/k2u.csv'
name_to_check = 'MBX_AUTHENTICATION_FAILED'
class Log_minning:
def __init__(self):
self.counter = 0
def get_userdata(self):
user_att = []
list_usr = []
counterr = 0
with open(log_file, encoding='utf-8') as infile:
for line in infile:
if name_to_check in line:
username = re.search(r'(?<=userName=)(.*)(?=,)', line)
username = username.group()
date = re.search(r"([12]\d{3}(0[1-9]|1[0-2])+"
"(0[1-9]|[12]\d|3[01]))", line)
date = date.group()
time = re.search(r"(\d{9}\+\d{4})", line)
time = time.group()
ip = re.search(
r'(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.)'
'{3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])',
line
)
ip = ip.group()
user_att.append(username)
user_att.append(date)
user_att.append(time)
user_att.append(ip)
list_usr.append(user_att)
counterr = counterr + 1
self.counter = counterr
return list_usr
if __name__ == "__main__":
lm = Log_minning()
the_time = timeit.Timer(lm.get_userdata).repeat(1, 1000)
sing_time = min(the_time)/1000
speed = 600 / sing_time * lm.counter
# for line in lm.get_userdata():
# print(line)
print(
"Processing " + str(lm.counter) + " in " + str(the_time) +
"\nThe speed aproximately " + str(speed) + " data in 10 sec"
)
This is the fully scanned seconds
Processing 117 in [6.646515152002394]

While conversion, invalid literal for float

I'm trying to train my own data on the Yolo network, but before that I have to convert the bounding boxes co-ordinates to the form it wants.
The file contents are like this:
0
53 19 163 116
and I'm trying to convert it to the form the network works with the following.
The code is:
import os
from os import walk, getcwd
from PIL import Image
classes = ["stopsign"]
def convert(size, box):
dw = 1./size[0]
dh = 1./size[1]
x = (box[0] + box[1])/2.0
y = (box[2] + box[3])/2.0
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)
"""-------------------------------------------------------------------
"""
""" Configure Paths"""
mypath = "/home/decentmakeover2/BBox-Label-Tool/Labels/002/"
outpath = "/home/decentmakeover2/output/"
cls = "stopsign"
if cls not in classes:
exit(0)
cls_id = classes.index(cls)
wd = getcwd()
list_file = open('%s/%s_list.txt'%(wd, cls), 'w')
""" Get input text file list """
txt_name_list = []
for (dirpath, dirnames, filenames) in walk(mypath):
txt_name_list.extend(filenames)
break
print(txt_name_list)
""" Process """
for txt_name in txt_name_list:
#txt_file = open("Labels/stop_sign/001.txt", "r")
""" Open input text files """
txt_path = mypath + txt_name
print("Input:" + txt_path)
txt_file = open(txt_path, "r")
lines = txt_file.read().split('\r\n') #for ubuntu, use "\r\n"
instead of "\n"
""" Open output text files """
txt_outpath = outpath + txt_name
print("Output:" + txt_outpath)
txt_outfile = open(txt_outpath, "w")
""" Convert the data to YOLO format """
ct = 0
for line in lines:
#print('lenth of line is: ')
#print(len(line))
#print('\n')
if(len(line) >= 2):
ct = ct + 1
print(line + "\n")
elems = line.split(' ')
print(elems)
xmin = elems[0]
xmax = elems[2]
ymin = elems[1]
ymax = elems[3]
#
img_path = str('%s/images/%s/%s.JPEG'%(wd, cls,
os.path.splitext(txt_name)[0]))
#t = magic.from_file(img_path)
#wh= re.search('(\d+) x (\d+)', t).groups()
im=Image.open(img_path)
w= int(im.size[0])
h= int(im.size[1])
#w = int(xmax) - int(xmin)
#h = int(ymax) - int(ymin)
# print(xmin)
print(w, h)
b = (float(xmin), float(xmax), float(ymin), float(ymax))
bb = convert((w,h), b)
print(bb)
txt_outfile.write(str(cls_id) + " " + " ".join([str(a) for
a
in bb]) + '\n')
""" Save those images with bb into list"""
if(ct != 0):
list_file.write('%s/images/%s/%s.JPEG\n'%(wd, cls,
os.path.splitext(txt_name)[0]))
list_file.close()
and i get the error:
first it prints out all the file names and the content of the data then,
['0\n53', '19', '163', '116\n']
(262, 192)
Traceback (most recent call last):
File "text.py", line 84, in <module>
b = (float(xmin), float(xmax), float(ymin), float(ymax))
ValueError: invalid literal for float(): 0
53
I'm not really sure what to do here.
Any suggestions?

As seen in the error message your first term is '0\n53' where as it should be '0' followed by '53'. thus it isn't detected as a float. just splitting with '\n' should work.

entire program not executing

I have a program that interfaces with another program to process data. The program I have written has three main parts (separated by lines of ### below). When I run the entire program, it does not work (specifically the second subprocess call doesn't execute and produce a summary file for which the third part of the program is responsible for dealing with). However, whenever I break them into three separate programs and run them in the terminal, everything is fine. Any tips or suggestions as to how to get this to run as one program?
#### part 1:
import glob
import subprocess
import os
import datetime
import matplotlib.pyplot as plt
import csv
import re
import ntpath
x = open('data.txt', 'w')
m = open('graphing_data.txt', 'w')
ckopuspath= '/Volumes/DAVIS/sfit-ckopus/ckopus'
command_init = 'sfit4Layer0.py -bv5 -fh'
subprocess.call(command_init.split(), shell=False)
for line in open('sfit4.ctl', 'r'): # write in later specific directory line
p = line.strip()
if p.startswith('band.1.nu_start'):
a,b = p.split('=')
b = float(b)
b = "{0:.3f}".format(b)
lowwav = b
lowwav = float(lowwav)
if p.startswith('band.1.nu_stop'):
a,b = p.split('=')
b = float(b)
b = "{0:.3f}".format(b)
highwav = b
highwav = float(highwav)
with open('/Volumes/DAVIS/calpy_em27_neu/spectra_out_demo/info.txt', 'rt') as infofile: # the info.txt file created by CALPY
for count, line in enumerate(infofile):
with open('t15asc.4', 'w') as t:
lat = re.search('Latitude of location:\s*([^;]+)', line, re.IGNORECASE).group(0)
lat = lat.split()
lat = lat[3]
lat = float(lat)
lon = re.search('Longitude of location:\s*([^;]+)', line, re.IGNORECASE).group(0)
lon = lon.split()
lon = lon[3]
lon = float(lon)
date = re.search('Time of measurement \(UTC\): ([^;]+)', line).group(0)
date = date.split()
yeardate = date[4]
yeardate = yeardate.split('-')
year = int(yeardate[0])
month = int(yeardate[1])
day = int(yeardate[2])
time = date[5]
time = time.split(':')
hour = int(time[0])
minute = int(time[1])
second = time[2]
second = second.split('.')
second = int(second[0])
dur = re.search('Duration of measurement \[s\]: ([^;]+)', line).group(0)
dur = dur.split()
dur = float(dur[4])
numpoints = re.search('Number of values of one scan:\s*([^;]+)', line, re.IGNORECASE).group(0)
numpoints = numpoints.split()
numpoints = float(numpoints[6])
fov = re.search('semi FOV \[rad\] :\s*([^;]+)', line, re.IGNORECASE).group(0)
fov = fov.split()
fov = fov[3]
fov = float(fov[1:])
sza = re.search('sun Azimuth \[deg\]:\s*([^;]+)', line, re.IGNORECASE).group(0)
sza = sza.split()
sza = float(sza[3])
snr = 0.0000
roe = 6396.2
res = 0.5000
calpy_path = '/Volumes/DAVIS/calpy_em27_neu/spectra_out_demo/140803/*' # the CALPY output files!
files1 = glob.glob(calpy_path)
with open(files1[count], 'r') as g:
ints = []
ints_count = 0
for line in g:
wave_no, intensity = [float(item) for item in line.split()]
if lowwav <= wave_no <= highwav:
ints.append(str(intensity))
ints_count = ints_count + 1
spacebw = (highwav - lowwav)/ ints_count
d = datetime.datetime(year, month, day, hour, minute, second)
t.write('{:>12.5f}{:>12.5f}{:>12.5f}{:>12.5f}{:>12.5f}'.format(sza,roe,lat,lon,snr)) # line 1
t.write("\n")
t.write('{:>10d}{:>5d}{:>5d}{:>5d}{:>5d}{:>5d}'.format(year,month,day,hour,minute,second)) # line 2
t.write("\n")
t.write( ('{:%Y/%m/%d %H:%M:%S}'.format(d)) + "UT Solar Azimuth:" + ('{:>6.3f}'.format(sza)) + " Resolution:" + ('{:>6.4f}'.format(res)) + " Duration:" + ('{:>6.2f}'.format(dur))) # line 3
t.write("\n")
t.write('{:>22.12f}{:>22.12f}{:>22.12e}{:>15d}'.format(lowwav,highwav,spacebw,ints_count)) # line 4
t.write("\n")
t.write('\n'.join(map(str, ints)) + '\n')
######################################## part 2:
command_two = 'sfit4Layer0.py -bv5 -fs'
subprocess.call(command_two.split(), shell=False) #writes the summary file
######################################## part 3
infile = '/Volumes/DAVIS/calpy_em27_neu/spectra_out_demo/summary'
lines = open(infile, 'r').readlines()
line_number = 0
line4= lines[line_number + 3]
variables = line4.split(" ")
sample = variables[1]
time = variables[2]
Z = variables[3]
A = variables[4]
D = variables[5]
R = variables[6]
P = variables[7]
V = variables[8]
E = variables[9]
x.write('{0} {1} {2} {3} '.format(sample, time, Z, A))
time = time[:-2]
numofgas = lines[line_number + 5]
numofgas = int(numofgas.strip())
x.write(str(numofgas))
count2 = 0
while count2 < numofgas:
gasinfo = lines[line_number + 7 + count2]
data = gasinfo.split()
gasnum = data[0]
gasname = data[1]
IFPRF = data[2]
apprcol = data[3]
retcol = data[4]
x.write(' {0} {1}'.format(gasname, retcol))
m.write('{0},{1}\n'.format(time, retcol))
count2 = count2 + 1
line17 = lines[line_number + 10 + count2]
print(numofgas)
params = line17.split()
bandstart = params[1]
bandstop = params[2]
space = params[3]
nptsb = params[4]
pmax = params[5]
fovdia = params[6]
meansnr = params[7]
x.write(' {0} {1} {2} {3} '.format(bandstart, bandstop, nptsb, meansnr))
line21 = lines[line_number + 14 + count2]
info = line21.split()
fitrms = info[0]
chi2 = info[1]
dofsall = info[2]
dofstrg = info[3]
iter = info[4]
maxxiter = info[5]
x.write('{0} {1} {2} {3} {4}'.format(fitrms, chi2, dofsall, dofstrg, iter))
x.write('\n')
x.close()
m.close()

Parsing data from a file

I have been provided with a file containing data on recorded sightings of species, which is laid out in the format;
"Species", "\t", "Latitude", "\t", "Longitude"
I need to define a function that will load the data from the file into a list, whilst for every line in the list spiting it into three components, species name, latitude and longitude.
This is what i have but it is not working:
def LineToList(FileName):
FileIn = open(FileName, "r")
DataList = []
for Line in FileIn:
Line = Line.rstrip()
DataList.append(Line)
EntryList = []
for Entry in Line:
Entry = Line.split("\t")
EntryList.append(Entry)
FileIn.close()
return DataList
LineToList("Mammal.txt")
print(DataList[1])
I need the data on each line to be separated so that i can use it later to calculate where the species was located within a certain distance of a given location.
Sample Data:
Myotis nattereri 54.07663633 -1.006446707
Myotis nattereri 54.25637837 -1.002130504
Myotis nattereri 54.25637837 -1.002130504
I am Trying to print one line of the data set to test if it is splittiing correctly but nothing is showing in the shell
Update:
This is the code i am working with now;
def LineToList(FileName):
FileIn = open(FileName, "r")
DataList = []
for Line in FileIn:
Line = Line.rstrip()
DataList.append(Line)
EntryList = []
for Entry in Line:
Entry = Line.split("\t")
EntryList.append(Entry)
return EntryList
FileIn.close()
return DataList
def CalculateDistance(Lat1, Lon1, Lat2, Lon2):
Lat1 = float(Lat1)
Lon1 = float(Lon1)
Lat2 = float(Lat2)
Lon2 = float(Lon2)
nDLat = (Lat1 - Lat2) * 0.017453293
nDLon = (Lon1 - Lon2) * 0.017453293
Lat1 = Lat1 * 0.017453293
Lat2 = Lat2 * 0.017453293
nA = (math.sin(nDLat/2) ** 2) + math.cos(Lat1) * math.cos(Lat2) * (math.sin(nDLon/2) ** 2 )
nC = 2 * math.atan2(math.sqrt(nA),math.sqrt( 1 - nA ))
nD = 6372.797 * nC
return nD
DataList = LineToList("Mammal.txt")
for Line in DataList:
LocationCount = 0
CalculateDistance(Entry[1], Entry[2], 54.988056, -1.619444)
if CalculateDistance <= 10:
LocationCount += 1
print("Number Recordings within Location Range:", LocationCount)
When running the programme come up with an error:
CalculateDistance(Entry[1], Entry[2], 54.988056, -1.619444) NameError: name 'Entry' is not defined

I saw "Biological Sciences" in your profile and just because of that i would recommend you to take a closer look at Pandas module.
It can be very easy:
import pandas as pd
df = pd.read_csv('mammal.txt', sep='\t',
names=['species','lattitude','longitude'],
header=None)
print(df)
Output:
species lattitude longitude
0 Myotis nattereri 54.076636 -1.006447
1 Myotis nattereri 54.256378 -1.002131
2 Myotis nattereri 54.256378 -1.002131

Your DataList variable is local to the LineToList function; you have to assign to another variable at file scope:
DataList = LineToList("Mammal.txt")
print(DataList[1])

I think you have a regular tab-delimited CSV that csv.reader can easily parse for you.
import csv
DataList = [row for row in csv.reader(open('Mammal.txt'), dialect='excel-tab')]
for data in DataList:
print(data)
This results in
['Myotis nattereri', '54.07663633', '-1.006446707']
['Myotis nattereri', '54.25637837', '-1.002130504']
['Myotis nattereri', '54.25637837', '-1.002130504']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python regex error: NameError: name 're' is not defined - python

Related

Python: Dictionary with values higher than the average value

Get a result of a log file parsing speed every 10 seconds in python

While conversion, invalid literal for float

entire program not executing

Parsing data from a file

Categories

Resources