Python: Expanding Complicated Tree DataStructure

Python: Expanding Complicated Tree DataStructure - python

I am exploring a data structure which get expands to sub-elements and resolves to a final element. But I only want to store top two levels.
Example: Lets say I start with New York which breaks into Bronx, Kings, New York, Queens, and Richmond as counties but then finally somehow they resolve to USA.
I am not sure if this is a good example but just to make it clear here is more clear explanation of the problem.
A (expands to) B,C,D -> B (expands to) K,L,M -> K resolves to Z
I initially wrote it in series of for loops and then use the recursion but in recursion I am loosing some of the elements that get expand and due to that I don't drill down each of the expanded element. I have put the both recursive version and non-recursive. I am looking for some advise on building this data structure, and what is the best way to do.
I call a data base query for every element in the expanded version which returns a list of items. Go until it resolves to single element. With out recursion I don't loose drilling all the way till the final element that others resolve to. But with recursion its not the same. I am also new to python so hopefully this is not a bad question to ask in a site like this.
returnCategoryQuery is a method that returns list of items by calling the database query.
With out recursion
#Dictionary to save initial category with the rest of cl_to
baseCategoryTree = {};
#categoryResults = [];
# query get all the categories a category is linked to
categoryQuery = "select cl_to from categorylinks cl left join page p on cl.cl_from = p.page_id where p.page_namespace=14 and p.page_title ='";
cursor = db.cursor(cursors.SSDictCursor);
for key, value in idTitleDictionary.iteritems():
for startCategory in value[0]:
#print startCategory + "End of Query";
categoryResults = [];
try:
categoryRow = "";
baseCategoryTree[startCategory] = [];
print categoryQuery + startCategory + "'";
cursor.execute(categoryQuery + startCategory + "'");
done = False;
while not done:
categoryRow = cursor.fetchone();
if not categoryRow:
done = True;
continue;
categoryResults.append(categoryRow['cl_to']);
for subCategoryResult in categoryResults:
print startCategory.encode('ascii') + " - " + subCategoryResult;
for item in returnCategoryQuery(categoryQuery + subCategoryResult + "'"):
print startCategory.encode('ascii') + " - " + subCategoryResult + " - " + item;
for subItem in returnCategoryQuery(categoryQuery + item + "'"):
print startCategory.encode('ascii') + " - " + subCategoryResult + " - " + item + " - " + subItem;
for subOfSubItem in returnCategoryQuery(categoryQuery + subItem + "'"):
print startCategory.encode('ascii') + " - " + subCategoryResult + " - " + item + " - " + subItem + " - " + subOfSubItem;
for sub_1_subOfSubItem in returnCategoryQuery(categoryQuery + subOfSubItem + "'"):
print startCategory.encode('ascii') + " - " + subCategoryResult + " - " + item + " - " + subItem + " - " + subOfSubItem + " - " + sub_1_subOfSubItem;
for sub_2_subOfSubItem in returnCategoryQuery(categoryQuery + sub_1_subOfSubItem + "'"):
print startCategory.encode('ascii') + " - " + subCategoryResult + " - " + item + " - " + subItem + " - " + subOfSubItem + " - " + sub_1_subOfSubItem + " - " + sub_2_subOfSubItem;
except Exception, e:
traceback.print_exc();
With Recursion
def crawlSubCategory(subCategoryList):
level = 1;
expandedList = [];
for eachCategory in subCategoryList:
level = level + 1
print "Level " + str(level) + " " + eachCategory;
#crawlSubCategory(returnCategoryQuery(categoryQuery + eachCategory + "'"));
for subOfEachCategory in returnCategoryQuery(categoryQuery + eachCategory + "'"):
level = level + 1
print "Level " + str(level) + " " + subOfEachCategory;
expandedList.append(crawlSubCategory(returnCategoryQuery(categoryQuery + subOfEachCategory + "'")));
return expandedList;
#Dictionary to save initial category with the rest of cl_to
baseCategoryTree = {};
#categoryResults = [];
# query get all the categories a category is linked to
categoryQuery = "select cl_to from categorylinks cl left join page p on cl.cl_from = p.page_id where p.page_namespace=14 and p.page_title ='";
cursor = db.cursor(cursors.SSDictCursor);
for key, value in idTitleDictionary.iteritems():
for startCategory in value[0]:
#print startCategory + "End of Query";
categoryResults = [];
try:
categoryRow = "";
baseCategoryTree[startCategory] = [];
print categoryQuery + startCategory + "'";
cursor.execute(categoryQuery + startCategory + "'");
done = False;
while not done:
categoryRow = cursor.fetchone();
if not categoryRow:
done = True;
continue;
categoryResults.append(categoryRow['cl_to']);
#crawlSubCategory(categoryResults);
except Exception, e:
traceback.print_exc();
#baseCategoryTree[startCategory].append(categoryResults);
baseCategoryTree[startCategory].append(crawlSubCategory(categoryResults));

Are you trying to lookup "Queens" and learn that it is in the USA? Have you tried encoding your tree in XML, and using lxml.etree to find an element and then use getpath to return the path in XPath format?
This would meaning adding a fourth top level to your tree, namely World, and then you would search for Queens and learn that the path to Queens is World/USA/NewYork/Queens. The answer to your question would always be the second item in the XPath.
Of course you could always just build a tree from the XML and use a tree search algorithm.

Related

Syntax error in postgresql query coded in python

I am executing PostgreSQL13 queries coding them in python 3.9 using the psycopg2 library. I also am working with PostGIS extension over PostgreSQL.
Kindly look for the comment which points out the line which causes the syntax error. I am having trouble both understanding what is the syntax error and how to debug it since I need to execute PostgreSQL queries using python so any tips will be greatly appreciated.
def corefunc(rf, openConnection):
pcur = openConnection.cursor(name="pcur" + rf)
rcur = openConnection.cursor(name="rcur" + rf)
acur = openConnection.cursor()
rcur.execute("SELECT geom FROM " + rf)
for number in range (1, 5):
acur.execute("DROP TABLE IF EXISTS " + "pf"+ rf)
acur.execute("CREATE TABLE " + "pf" + rf + " (index integer, sums integer)")
pcur.execute("SELECT geom FROM " + "pf" + str(number))
row = 1
for each in rcur.fetchall():
if number == 1: acur.execute("INSERT INTO " + "pf" + rf + " (index, sums) VALUES (" + str(row) + ",0)")
for eachone in pcur.fetchall():
#-------------------------------------------- the statement below gives the syntax error
acur.execute("UPDATE TABLE " + "pf" + rf + " SET sums = sums + "\
+"ST_Contains(" + " ' " + each[0] + " ' " + ", " + " ' " + eachone[0] + " ' " + ")::int WHERE index = " + str(row))
row = row + 1
def parallelJoin (pointsTable, rectsTable, outputTable, outputPath, openConnection):
#Implement ParallelJoin Here.
cursor = openConnection.cursor()
cursor.execute("SELECT COUNT(*) FROM " + pointsTable)
size_data = (cursor.fetchall())[0][0]
for number in range(1, 5):
cursor.execute("DROP TABLE IF EXISTS pf" + str(number))
cursor.execute("CREATE TABLE pf" + str(number) + " AS SELECT * FROM "
+ pointsTable + " LIMIT " + str(size_data/4)
+ " OFFSET " + str(((number-1)*size_data)/4))
cursor.execute("SELECT COUNT(*) FROM " + rectsTable)
size_rects = (cursor.fetchall())[0][0]
for number in range(1, 5):
cursor.execute("DROP TABLE IF EXISTS rf" + str(number))
cursor.execute("CREATE TABLE rf" + str(number) + " AS SELECT * FROM "
+ pointsTable + " LIMIT " + str(size_rects/4)
+ " OFFSET " + str(((number - 1) * size_rects)/4))
threads = dict()
for number in range(0, 4):
threads[number] = threading.Thread(target=corefunc, args=("rf" + str(number + 1), openConnection))
threads[number].start()
break
while threads[0].is_alive() or threads[1].is_alive()\
or threads[2].is_alive() or threads[3].is_alive(): pass
# more shit to do

Nevermind, I just thought checking out the syntax from somewhere and I found the problem. To update table, the query begins with "UPDATE...." not "UPDATE TABLE.....".

How to divide numbers from a text file?

This is my file text:
Covid-19 Data
Country / Number of infections / Number of Death
USA 124.356 2.236
Netherlands 10.866 771
Georgia 90 NA
Germany 58.247 455
I created a function to calculate the ratio of deaths compared to the infections, however it does not work, because some of the values aren't floats.
f=open("myfile.txt","w+")
x="USA" + " " + " " + "124.356" + " " + " " + "2.236"
y="Netherlands" + " " + " " + "10.866" + " " + " " + "771"
z="Georgia" + " " + " " + "90" + " " + " " + "NA"
w="Germany" + " " + " " + "58.247" + " " + " " + "455"
f.write("Covid-19 Data" + "\n" + "Country" + " " + "/" + " " + "Number of infections" + " " + "/" + " " + "Number of Death" + "\n")
f.write(x + "\n")
f.write(y + "\n")
f.write(z + "\n")
f.write(w)
f.close()
with open("myfile.txt", "r") as file:
try:
for i in file:
t = i.split()
result=float(t[-1])/float(t[-2])
print(results)
except:
print("fail")
file.close()
Does someone have an idea how to solve this problem ?

You can do the following:
with open("myfile.txt", "r") as file:
for i in file:
t = i.split()
try:
result = float(t[-1]) / float(t[-2])
print(result)
except ValueError:
pass
At the time you don't know if the values you are trying to divide are numeric values or not, therefore surrounding the operation with a try-catch should solve your problem.
If you want to become a bit more "clean" you can do the following:
def is_float(value):
try:
float(value)
except ValueError:
return False
return True
with open("myfile.txt", "r") as file:
for i in file:
t = i.split()
if is_float(t[-1]) and is_float(t[-2]):
result = float(t[-1]) / float(t[-2])
print(result)
The idea is the same, however.

I used the same file that you attached in your example. I created this function hopefully it helps:
with open("test.txt","r") as reader:
lines = reader.readlines()
for line in lines[2:]:
line = line.replace(".","") # Remove points to have the full value
country, number_infections, number_deaths = line.strip().split()
try:
number_infections = float(number_infections)
number_deaths = float(number_deaths)
except Exception as e:
print(f"[WARNING] Could not convert Number of Infections {number_infections} or Number of Deaths {number_deaths} to float for Country: {country}\n")
continue
ratio = number_deaths/number_infections
print(f"Country: {country} D/I ratio: {ratio}")
As you can see I avoided the headers of your file using lines[2:] that means that I will start from row 3 of your file. Also, added try/exception logic to avoid non-float converts. Hope this helps!
Edit
Just noticed that the format for thousands is used with "." instead "," in that case the period was removed in line 7.
The results for this execution is:
Country: USA D/I ratio: 0.017980636237897647
Country: Netherlands D/I ratio: 0.07095527332965212
[WARNING] Could not convert Number of Infections 90.0 or Number of Deaths NA to float for Country: Georgia
Country: Germany D/I ratio: 0.007811561110443456

Fixed the following:
The first two lines in your text-file are headers. These need to be skipped
'NA' Can't be converted to zero
If there is a 0 in your data, your program would crash. Now it wouldn't.
f=open("myfile.txt","w+")
x="USA" + " " + " " + "124.356" + " " + " " + "2.236"
y="Netherlands" + " " + " " + "10.866" + " " + " " + "771"
z="Georgia" + " " + " " + "90" + " " + " " + "NA"
w="Germany" + " " + " " + "58.247" + " " + " " + "455"
f.write("Covid-19 Data" + "\n" + "Country" + " " + "/" + " " + "Number of infections" + " " + "/" + " " + "Number of Death" + "\n")
f.write(x + "\n")
f.write(y + "\n")
f.write(z + "\n")
f.write(w)
f.close()
with open("myfile.txt", "r") as file:
#Skipping headers
next(file)
next(file)
try:
for i in file:
t = i.split()
#Make sure your code keeps working when one of the numbers is zero
x = 0
y = 0
#There are some NA's in your file. Strings not representing
#a number can't be converted to float
if t[1] != "NA":
x = t[1]
if t[2] != "NA":
y = t[2]
if x == 0 or y == 0:
result = 0
else:
result=float(x)/float(y)
print(t[0] + ": " + str(result))
except:
print("fail")
file.close()
Output:
USA: 55.615384615384606
Netherlands: 0.014093385214007782
Georgia: 0
Germany: 0.12801538461538461

Your header line in the file is Covid-19 Data. this is the first line and when you call t=i.split() you then have a list t which has data ['Covid-19', 'Data']
you cannot convert these to floats since they have letters in them. Instead you should read the first 2 header line before the loop and do nothing with them. However you are then going to have issues with Georgia as "NA" also cannot be converted to a float.
A few other points, its not good practice to have a catch all exception. Also you dont need to close the file explicitly if you open the file using a with statement.

Why are Lists causing problems

So I am working on a certain code to modify a text file. When I use this function individually, it works perfectly
TextRotation.rotTextC("cv.txt")
But when I use it in batch as a list like this
def files_LTXT(pathF):
return glob.glob(pathF + "*" + ".txt")
for i in range (len(listFileTXT)):
TextRotation.rotTextC(listFileTXT[i])
IT gives the following error:
File "C:\Users\Administrator\PycharmProjects\openCV\TextRotation.py", line
9, in rotLineC
0
valueObj = int(lineStr[c1])
0.472917 0.713281 0.845833 0.376563
IndexError: string index out of range
Function rotLineC is as follows:
def rotLineC(lineStr, c1):
if len(lineStr) > 2:
valueObj = int(lineStr[c1])
print(valueObj)
valueXC = float(lineStr[(c1+2):(c1+10)])
valueYC = float(lineStr[(c1+11):(c1+19)])
valueW = float(lineStr[(c1+20):(c1+28)])
valueH = float(lineStr[(c1+29):(c1+37)])
# print(valueXC)
# print(valueYC)
# print(valueW)
# print(valueH)
nValueXC = round(1 - valueYC, 6)
nValueYC = round(valueXC, 6)
nValueW = round(valueH, 6)
nValueH = round(valueW, 6)
rotString = str(int(valueObj)) + " " + str(nValueXC) + " " + \
str(nValueYC) + " " + str(nValueW) + " " + str(nValueH)
print(str(nValueXC) + " " + str(nValueYC) + " " + str(nValueW) + " " + str(nValueH))
print(rotString)
return rotString
This function works fine!
for i in range (len(listFileJPG)):
ImageRotation.rotImage(listFileJPG[i])

Mind to include the / to the end of the path! (I am assuming a UNIX environment here)
If the path is 'dev/my_pat', for example, your function will fail. The path must end with a /. You can it to your function:
...
if pathF[-1] != '/':
return glob.glob(pathF + "/*.txt")
...
Also, do not iterate using indices, use the pythonic way!
for file in listFileTXT(my_path):
TextRotation.rotTextC(file)

Python script write to file stopping after certain point

I'm trying to analyze a sqlite3 file and printing the results to a text file. If i test the code with print it all works fine. When i write it to a file it cuts out at the same point every time.
import sqlite3
import datetime
import time
conn = sqlite3.connect("History.sqlite")
curs = conn.cursor()
results = curs.execute("SELECT visits.id, visits.visit_time, urls.url, urls.visit_count \
FROM visits INNER JOIN urls ON urls.id = visits.url \
ORDER BY visits.id;")
exportfile = open('chrome_report.txt', 'w')
for row in results:
timestamp = row[1]
epoch_start = datetime.datetime(1601,1,1)
delta = datetime.timedelta(microseconds=int(timestamp))
fulltime = epoch_start + delta
string = str(fulltime)
timeprint = string[:19]
exportfile.write("ID: " + str(row[0]) + "\t")
exportfile.write("visit time: " + str(timeprint) + "\t")
exportfile.write("Url: " + str(row[2]) + "\t")
exportfile.write("Visit count: " + str(row[3]))
exportfile.write("\n")
print "ID: " + str(row[0]) + "\t"
print "visit time: " + str(timeprint) + "\t"
print "Url: " + str(row[2]) + "\t"
print "Visit count: " + str(row[3])
print "\n"
conn.close()
So the print results give the proper result but the export to the file stops in the middle of a url.

OK, I would start by replacing the for loop with the one below
with open('chrome_report.txt', 'w') as exportfile:
for row in results:
try:
timestamp = row[1]
epoch_start = datetime.datetime(1601,1,1)
delta = datetime.timedelta(microseconds=int(timestamp))
fulltime = epoch_start + delta
string = str(fulltime)
timeprint = string[:19]
exportfile.write("ID: " + str(row[0]) + "\t")
exportfile.write("visit time: " + str(timeprint) + "\t")
exportfile.write("Url: " + str(row[2]) + "\t")
exportfile.write("Visit count: " + str(row[3]))
exportfile.write("\n")
print "ID: " + str(row[0]) + "\t"
print "visit time: " + str(timeprint) + "\t"
print "Url: " + str(row[2]) + "\t"
print "Visit count: " + str(row[3])
print "\n"
except Exception as err:
print(err)
By using the "with" statement (context manager) we eliminate the need to close the file. By using the try/except we capture the error and print it. This will show you where your code is failing and why.

Problems with incrementing location of widget creation when function is called

I am trying to make this function create a label and 2 buttons respectively, and each time this function is called, 3the widgets will be created on the next row(directly under the previous 3 widgets).
Howver, I am not sure why the items keep being created on the same row (effectively overlapping over the same one when the function is called) despite the counter being incremented.
def fetch_quick(self, entries):
for entry in entries:
text = entry[1].get()
print(text)
exec("app._framea" + str(self._qqq+7) + "= tk.Frame(app._master, bg='white')")
exec("app._framea" + str(self._qqq+7) + ".grid(row=" + str(self._qqq+6) + ")")
exec("self.queue_entry_quick" + str(self._qqq) + " = Label(app._framea" + str(self._qqq+7) + ", text='1 '+text +' 0 a few seconds ago')")
exec("self.queue_entry_quick" + str(self._qqq) + ".grid(row=" + str(self._qqq) + ")")
exec("self._Button" + str(self._qqq) + " = Button(app._framea" + str(self._qqq+7) + ", text = self._qqq, width = 2, command=app._framea" + str(self._qqq+7) + ".destroy, bg='red')")
exec("self._Buttonb" + str(self._qqq) + " = Button(app._framea" + str(self._qqq+7) + ", text = self._qqq, width = 2, command=app._framea" + str(self._qqq+7) + ".destroy, bg='green')")
exec("self._Button" + str(self._qqq) + ".grid(row=" + str(self._qqq) + ", column=1)")
exec("self._Button" + str(self._qqq) + ".bind('<Button-1>',self.call)")
exec("self._Buttonb" + str(self._qqq) + ".grid(row=" + str(self._qqq) + ", column=2)")
exec("self._Buttonb" + str(self._qqq) + ".bind('<Button-1>',self.call)")
abcd.append(text)
self._qqq += 1
print(self._qqq)
I think it might be the issue of the widgets' creation location(on the grid) being pre-set to row 0 for all of the widgets and thus it will not update self._qqq for each time the function is called. If this is the case, I am still unsure of what to do about it

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Expanding Complicated Tree DataStructure - python

Related

Syntax error in postgresql query coded in python

How to divide numbers from a text file?

Why are Lists causing problems

Python script write to file stopping after certain point

Problems with incrementing location of widget creation when function is called

Categories

Resources