I have created a script which fetches geocodes(latitudes and longtitudes) of a sample of addresses
However I want it to also return the accuracy/precision of these geocodes that are mined. for example: 100% accurate or 90% etc
# Set the input and output files
input_file_path = "geocodi_4k-5250.csv"
output_file_path = "output" # appends "####.csv" to the file name when it writes the file.
# Set the name of the column indexes here so that pandas can read the CSV file
address_column_name = "ADDRESS"
state_column_name = "STATE"
zip_column_name = "ZIP_CODE" # Leave blank("") if you do not have zip codes
# Where the program starts processing the addresses in the input file
# This is useful in case the computer crashes so you can resume the program where it left off or so you can run multiple
# instances of the program starting at different spots in the input file
start_index = 0
# How often the program prints the status of the running program
status_rate = 100
# How often the program saves a backup file
write_data_rate = 1000
# How many times the program tries to geocode an address before it gives up
attempts_to_geocode = 3
# Time it delays each time it does not find an address
# Note that this is added to itself each time it fails so it should not be set to a large number
wait_time = 3
# ----------------------------- Processing the input file -----------------------------#
df = pd.read_csv(input_file_path, low_memory=False)
# df = pd.read_excel(input_file_path)
# Raise errors if the provided column names could not be found in the input file
if address_column_name not in df.columns:
raise ValueError("Can't find the address column in the input file.")
if state_column_name not in df.columns:
raise ValueError("Can't find the state column in the input file.")
# Zip code is not needed but helps provide more accurate locations
if (zip_column_name):
if zip_column_name not in df.columns:
raise ValueError("Can't find the zip code column in the input file.")
addresses = (df[address_column_name] + ', ' + df[zip_column_name].astype(str) + ', ' + df[state_column_name]).tolist()
else:
addresses = (df[address_column_name] + ', ' + df[state_column_name]).tolist()
# ----------------------------- Function Definitions -----------------------------#
# Creates request sessions for geocoding
class GeoSessions:
def __init__(self):
self.Arcgis = requests.Session()
self.Komoot = requests.Session()
# Class that is used to return 3 new sessions for each geocoding source
def create_sessions():
return GeoSessions()
# Main geocoding function that uses the geocoding package to covert addresses into lat, longs
def geocode_address(address, s):
g = geocoder.arcgis(address, session=s.Arcgis)
if (g.ok == False):
g = geocoder.komoot(address, session=s.Komoot)
return g
def try_address(address, s, attempts_remaining, wait_time):
g = geocode_address(address, s)
if (g.ok == False):
time.sleep(wait_time)
s = create_sessions() # It is not very likely that we can't find an address so we create new sessions and wait
if (attempts_remaining > 0):
try_address(address, s, attempts_remaining-1, wait_time+wait_time)
return g
# Function used to write data to the output file
def write_data(data, index):
file_name = (output_file_path + str(index) + ".csv")
print("Created the file: " + file_name)
done = pd.DataFrame(data)
done.columns = ['Address', 'Lat', 'Long', 'Provider']
done.to_csv((file_name + ".csv"), sep=',', encoding='utf8')
# Variables used in the main for loop that do not need to be modified by the user
s = create_sessions()
results = []
failed = 0
total_failed = 0
progress = len(addresses) - start_index
# ----------------------------- Main Loop -----------------------------#
for i, address in enumerate(addresses[start_index:]):
# Print the status of how many addresses have be processed so far and how many of the failed.
if ((start_index + i) % status_rate == 0):
total_failed += failed
print(
"Completed {} of {}. Failed {} for this section and {} in total.".format(i + start_index, progress, failed,
total_failed))
failed = 0
# Try geocoding the addresses
try:
g = try_address(address, s, attempts_to_geocode, wait_time)
if (g.ok == False):
results.append([address, "was", "not", "geocoded"])
print("Gave up on address: " + address)
failed += 1
else:
results.append([address, g.latlng[0], g.latlng[1], g.provider])
# If we failed with an error like a timeout we will try the address again after we wait 5 secs
except Exception as e:
print("Failed with error {} on address {}. Will try again.".format(e, address))
try:
time.sleep(5)
s = create_sessions()
g = geocode_address(address, s)
if (g.ok == False):
print("Did not fine it.")
results.append([address, "was", "not", "geocoded"])
failed += 1
else:
print("Successfully found it.")
results.append([address, g.latlng[0], g.latlng[1], g.provider])
except Exception as e:
print("Failed with error {} on address {} again.".format(e, address))
failed += 1
results.append([address, e, e, "ERROR"])
# Writing what has been processed so far to an output file
if (i%write_data_rate == 0 and i != 0):
write_data(results, i + start_index)
# print(i, g.latlng, g.provider)
# Finished
write_data(results, i + start_index + 1)
print("Finished! :)")
My input file looks like:
ADDRESS STATE ZIP_CODE
21236 Birchwood Loop AK 99567
1731 Bragaw St AK 99508
300 E Fireweed Ln AK 99503
Output is:
Address Lat Long
21236 Birchwood Loop, 99567, AK 61.40886875 -149.4865564
1731 Bragaw St, 99508, AK 61.20489474 -149.808293
300 E Fireweed Ln, 99503, AK 61.1980295 -149.8783492
I want it to also show how accurate the geocodes are.How do I do the same
I don't have a clear answer on this. However, some solutions come to mind:
Test your geocoding against a standard list (a list with addresses with their correct geocodes). This will tell you how your geocoding performs generally.
Try multiple geocoding services. If you have that the geocodes match across services, there is a high chance that the geocode is correct. One thing to be careful about is that you may not have exactly correct matches, so you may have to calculate the distance between the points.
Did you find any other solution to this problem. I'm working on something similar and would like to hear if you have other ideas.
Related
I'm trying to build some code to restart. Here is the problem. When I restart the app from source, the restart works fine. But, when I compile it, the code restart doesn't work properly. The following error appears:
usage: MyApp [-h] [-c CONFIG] [-C CACHE] [-l LOG] [--devices] [--default-config]
MyApp: error: unrecognized arguments: /home/myapp/MyApp
But I'm not passing any args. In this case, the restart code is passing the path of the program which is, it shouldn't.
If I start the app, its starting fine. It's only on restart that there's a problem.
I'm using debian11, Python 3.9.2. And pyinstaller 5.7.0.
Here is my restart code.
class RestartCommand(Command):
#property
def help(self) -> str:
return "Restarts the app"
def __call__(self, arg: str, user: User) -> Optional[str]:
self._myapp.close()
try:
if 'frozen' in sys.builtin_module_names:
executable = sys.executable
if sys.platform == "win32":
subprocess.run([executable] + sys.argv[1:])
else:
subprocess.run([executable, "-m", "__main__"] + sys.argv[1:])
else:
args = sys.argv
if sys.platform == "win32":
subprocess.run([sys.executable] + args)
else:
args.insert(0, sys.executable)
os.execv(sys.executable, args)
except Exception as e:
print("Error while restarting: ", e)
I try to remove the argv like this.
if 'frozen' in sys.builtin_module_names:
executable = sys.executable
if sys.platform == "win32":
subprocess.run([executable])
else:
subprocess.run([executable, "-m", "__main__"])
But, the same error still appears.
I try to read to documentation but found nothing.
This Question is Python Flask, Turbo-Flask related.
I've recently faced a strange error with it and I would like to ask for some help from you if it's okay.
I am pushing the update with the below codes, and it is within a loop, it works most of the time wonderfully, But randomly (after few cycle run), I get the below error.
# content push
with app.app_context():
turbo.push(turbo.replace(render_template('base.html',
now= datetime.utcnow().strftime("%d-%b-%y %H:%M:%S"),
timestamp_inject = timestamp,
in_k_inject= in_k,
out_k_inject= out_k,
input_error_inject= input_error),
'update_utilpoll'))
This is the error message I get.
Exception in thread Thread-7:
Traceback (most recent call last):
File "C:\Users\jsb\Anaconda3\lib\threading.py", line 973, in _bootstrap_inner
self.run()
File "C:\Users\jsb\Anaconda3\lib\threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\jsb\my_bot\FLASK\Flask-chart\app.py", line 179, in util_pulling
utilpoll (connection, device)
File "C:\Users\jsb\my_bot\FLASK\Flask-chart\app.py", line 152, in utilpoll
turbo.push(turbo.replace(render_template('base.html',
File "C:\Users\jsb\Anaconda3\lib\site-packages\turbo_flask\turbo.py", line 197, in push
for recipient in to:
RuntimeError: dictionary changed size during iteration
The variable that I am sending for tendering is 'List' variable.
import commands
import tempfile
cluster = commands.getoutput('lsid | grep "My cluster name" |awk \'{print $5}\'')
path = '/nxdi_env/lsf/conf/lsbatch/' + cluster + '/configdir/lsb.users'
bl = commands.getoutput('sed -n \'/# BL Fairshare group/,/(it_normal it_priority)/p\' %s | grep -v ^# | sed \'/^$/d\'' %path)
grp = ''
group = ''
txt = ''
normal_detail = ''
priority_detail = ''
subgrp_detail = ''
def print_group(g,sub):
global normal_detail
global priority_detail
subgrp = (group + '_' + sub).lower()
subgrp1 = subgrp + ','
if sub == 'normal':
subgrp_share = txt[txt.find(subgrp1)+len(subgrp1):txt.find("]")]
subgrp_detail = normal_detail
else:
subgrp_share = txt[txt.find(subgrp1)+len(subgrp1):txt.find("])")]
subgrp_detail = priority_detail
subgrp_detail = subgrp_detail.strip().split('(all) ')
subgrp_detail = subgrp_detail[1].replace('
','').replace('([','').replace('])','').split('][')
print(' |- %s \t %-5s' %(subgrp,subgrp_share))
for i in subgrp_detail:
user, slot = i.split(',')
print(' | | - %-13s%-5s ' %(user,slot))
with tempfile.NamedTemporaryFile(delete=True) as f:
f.write(bl)
f.seek(0)
for line in f:
grp = line.split()
if '_' in grp[0]:
subgrp = grp[0]
group = subgrp.split('_')
group = group[0].upper()
if 'normal' in grp[0]:
normal_detail = line
else:
priority_detail = line
else:
print(group)
txt = line
print_group(group, 'normal')
print_group(group, 'priority')
I have a class with a few methods that I am writing unit test cases for. For minimum reproducible example, I am attaching 3 of the methods from that class:
Class that I am testing methods of :
class WebViewLincSession(object):
def renew_session_id(self, request):
session = request.getSession()
new_session_key = self.get_token()
while new_session_key in session.guard.sessions: # just in case the key is already used
new_session_key = self.get_token()
session.guard.sessions.pop(session.uid) # remove the current session
session.uid = new_session_key # update the key
session.guard.sessions[new_session_key] = session # add session back with the new key
request.addCookie(session.guard.cookieKey, new_session_key, path='/', secure=True, httpOnly=True) # send updated cookie value
def set_nonce(self, request):
'''
create a nonce value and send it as cookie
'''
if self._nonce_key is None:
if self._NONCE_FOR_TEST:
self._nonce_key = 'ecnon_for_test'
else:
self._nonce_key = 'ecnon_' + self.get_token()
new_nonce_value = self.get_token()
while new_nonce_value in self._nonce: # just in case the value is already used
new_nonce_value = self.get_token()
now = time()
stay_alive = now + self._STAY_ALIVE
# reset timeout value for all existing nonces
for key in self._nonce.keys():
if self._nonce[key] > stay_alive:
self._nonce[key] = stay_alive
self._nonce[new_nonce_value] = now + self._NONCE_TIMEOUT
request.addCookie(self._nonce_key, new_nonce_value, path='/', secure=True, httpOnly=True) # send updated cookie value
return new_nonce_value
def get_valid_nonce(self):
now = time()
return [nonce for nonce in self._nonce.keys() if self._nonce[nonce] > now]
My test class looks like following:
from __future__ import (division, absolute_import, with_statement)
from time import sleep
from mock import patch, MagicMock, mock, Mock
from requests.sessions import Session
from twisted.trial.unittest import TestCase
from viewlinc.webserver.web_viewlinc_session import WebViewLincSession
class MockGuard(object):
'''Mock guard object for testing'''
def __init__(self, *ags, **kwargs):
''' class constructor
'''
super(MockGuard, self).__init__(*ags, **kwargs)
self.cookieKey = 'test_cookie_key'
self.sessions = {'_test_session_': {}}
class MockSession(object):
'''Mock session object for testing'''
def __init__(self, *ags, **kwargs):
''' class constructor
'''
super(MockSession, self).__init__(*ags, **kwargs)
self.guard = MockGuard()
self.uid = '_test_session_'
class MockRequest(object):
'''Mock Request object for testing'''
def __init__(self, *ags, **kwargs):
''' class constructor
'''
super(MockRequest, self).__init__(*ags, **kwargs)
self.session = MockSession()
self.cookies = {}
def getSession(self):
''' returns session object
'''
return self.session
def addCookie(self, key, value, path='/', secure=True, httpOnly=True, expires=None):
''' add/replace cookie
'''
self.cookies[key] = {
'value': value,
'path': path,
'secure': secure,
'httpOnly': httpOnly,
'expires': expires
}
def getCookie(self, key):
''' retrieve a cookie
'''
cookie = self.cookies.get(key, {'value': None})
return cookie['value']
class WebViewLincSessionTests(TestCase):
'''Test WebViewLincSession methods'''
def __init__(self, *ags, **kwargs):
''' class constructor
'''
super(WebViewLincSessionTests, self).__init__(*ags, **kwargs)
self.request = MockRequest()
self.web_session = WebViewLincSession()
def test_02_renew_session_id(self):
'''Test renew_session_id
'''
self.web_session.renew_session_id(self.request)
session = self.request.session
return self.assertTrue(session.uid != '_test_session_' and session.uid in session.guard.sessions, 'renew_session_id failed')
def test_03_set_nonce(self):
'''Test set_nonce
'''
self.web_session.set_nonce(self.request)
return self.assertTrue(len(self.request.cookies) > 0, 'set_nonce failed.')
def test_04_get_valid_nonce(self):
'''Test get_valid_nonce
'''
# use a clean session
web_session = WebViewLincSession()
web_session.set_nonce(self.request)
web_session.set_nonce(self.request)
valid_nonce = web_session.get_valid_nonce()
self.assertTrue(len(valid_nonce) == 2, 'Expecting 2 valid nonces.')
sleep(16)
valid_nonce = web_session.get_valid_nonce()
return self.assertTrue(len(valid_nonce) == 1, 'Expecting 1 valid nonce.')
What I want:
I would like to use mock/patch in my test class where-ever possible. That probably means that MockGuard, MockSession and MockRequest be replaced with instances of mock. I would like to see how can this be refined to use mock/patch from unittest package in python.
Ok, trying to give you an idea. In the tests, you have created a fake addCookie method for your tests, but you only use it to check how addCookie has been called. So, for example, your test 3 and 4 you could rewrite:
def test_03_set_nonce(self):
request = mock.Mock()
self.web_session.set_nonce(request)
# we only need to know that it was called once
request.addCookie.assert_called_once()
def test_04_get_valid_nonce(self):
request = mock.Mock()
web_session = WebViewLincSession()
web_session.set_nonce(request)
web_session.set_nonce(request)
# check that addCookie it has been called twice
self.assertEqual(2, request.addCookie.call_count)
valid_nonce = web_session.get_valid_nonce()
... # the rest is not dependent on mocks
In other tests, you may have also to check the arguments used in the calls. You always have to define what you are actually want to test, and then setup your mocks so that only that functionality is tested.
Note also that in some cases it may make sense to use extra mock classes like you have done - there is nothing wrong with that, if that works best for you.
I am trying to make it so that i can connect to this server from other networks around the world, as you see on the screenshot provided i have port-forwarding set up to forward all request on external port 6000 to my static ip address stored in the host variable.
#This is my Server
import socket
import threading
host = '192.168.1.135'
port = 6000
#s = socket.socket()
s = socket.socket()#socket.AF_INET, socket.SOCK_STREAM)
s.bind((host, port))
s.listen()
def send():
mess = input("--> ")
con.send(bytes(mess, 'utf-8'))
def receive():
msg = con.recv(1024)
print(msg.decode('utf-8'))
while True:
try:
con, address = s.accept()
print(f'Connection to {address} made successfully!')
receive()
send()
except:
pass
This is my client file below, i took out my public ip address from the host variable for obvious reasons... When i try connecting while connected to the network the server is running on it joins fine, in the server file where it prints the ip address it shows my networks gateway address as the connected ip. But when when trying to connect from outside the network it doesn't work!
import socket
host = 'public ip'; port = 6000
def send():
mess = input('--> ')
s.send(bytes(mess,"utf-8"))
def receive():
msg = s.recv(1024)
print(msg.decode('utf-8'))
while True:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
s.connect((host, port))
send()
receive()
except:
pass
This link will show you a picture of the settings i have set in my network settings.
https://i.stack.imgur.com/TmNYc.png
I am pretty new to the whole "port-forwarding" idea but cannot seem to find the issue after over a week of trying hundreds of solutions nothing seems to work... If you need any more information to help in solving this just ask, thank you!
(i cant add a comment yet)
first try to set the server ip to '0.0.0.0' and try to use the port 6000 on the server and client and dont use diffrent port and forward diffrent one