iterate over tables in a word document using python docx - python

I have been trying to figure out how to add cell 0 from all tables in a word document to my_list , I managed to add from a specific table placement number (96) but can't seem to pull all tables data, I used this for table array number 96 and it worked
`tables = list(d.tables)
tbl = d.tables[96]
my_list = []
for rw in tbl.rows:
my_list.append(rw.cells[0].text)
print(my_list)
tried a lot of different options to iterate all tables and add to my_list, one being below but it gives the error
'Table' object has no attribute 'cells'
my_list = []
tbl = list(d.tables)
for val in tbl:
for rw in tbl:
my_list.append(rw.cells[0].text)
print(my_list)
any help is greatly appriciated

This is not a complete runnable example so I cannot verify it, but:
This part is Bad Python:
for val in tbl:
for rw in tbl:
You are iterating over tbl twice, and all the outer loop is doing is running the inner loop len(tbl) times. The second line might be
for rw in val:
but I am not 100% on that due to Not Enough Information.
(Also, consider renaming both tbl and val to something resembling what they contain: list_of_tables and a_table would be far better. Also, use row instead of rw, as Python does not have some unreasonable variable name length width.)

Related

Removing duplicate records based on hierarchy using arcpy and dictionaries

I'm attempting to flag duplicate records and delete them using a data dictionary with an arcpy update cursor, and I'm running into dictionary issues.
Essentially, my code iterates through the attribute table and adds a dictionary entry of FACE_ID:CHNG_TYPE for each new FACE_ID. If it encounters a FACE_ID that's already in the dictionary, it compares the CHNG_TYPE of the duplicate FACE_IDs to see which should be deleted (I've left the weighted comparison out as it isn't the issue).
To compare them, the cursor pulls the first change (change_a) CHNG_TYPE directly from the cursor row it's in. It also pulls the FACE_ID so that it can query the dictionary to get the CHNG_TYPE for the other FACE_ID.
When I print the dictionary, it looks like what I would expect. However, change_b = dict[row[0]] is calculating to be the same value every time, and I'm not sure why.
When I create the dictionary using this code but leave out the elif statement, I can pull the change_b value accurately with dict[FACE_ID].
Code below, and any help is appreciated!
with arcpy.da.UpdateCursor(fc, ['FACE_ID', 'CHNG_TYPE', 'RELATE']) as cursor:
dict = {}
for row in cursor:
if row[0] in dict:
change_a = row[1]
change_b = dict[row[0]]
print(change_a + ' ' + change_b)
elif row[0] not in dict:
dict[row[0]] = row[1]
To give an example, this statement creates the dictionary and returns the expected value:
with arcpy.da.UpdateCursor(fc, ['FACE_ID', 'CHNG_TYPE', 'RELATE']) as cursor:
dict = {}
for row in cursor:
if row[0] not in dict:
dict[row[0]] = row[1]
dict[123456]
Have you considered using the Delete Identical function or the Find Identical function available in ArcGIS Pro?
arcpy.management.DeleteIdentical(in_dataset, fields, {xy_tolerance}, {z_tolerance})
arcpy.management.FindIdentical(in_dataset, out_dataset, fields, {xy_tolerance}, {z_tolerance}, {output_record_option})
It could be a faster and more cost-effective solution than your way.

How do you iterate over a set or a list in Flask and PyMongo?

I have produced a set of matching IDs from a database collection that looks like this:
{ObjectId('5feafffbb4cf9e627842b1d9'), ObjectId('5feaffcfb4cf9e627842b1d8'), ObjectId('5feb247f1bb7a1297060342e')}
Each ObjectId represents an ID on a collection in the DB.
I got that list by doing this: (which incidentally I also think I am doing wrong, but I don't yet know another way)
# Find all question IDs
question_list = list(mongo.db.questions.find())
all_questions = []
for x in question_list:
all_questions.append(x["_id"])
# Find all con IDs that match the question IDs
con_id = list(mongo.db.cons.find())
con_id_match = []
for y in con_id:
con_id_match.append(y["question_id"])
matches = set(con_id_match).intersection(all_questions)
print("matches", matches)
print("all_questions", all_questions)
print("con_id_match", con_id_match)
And that brings up all the IDs that are associated with a match such as the three at the top of this post. I will show what each print prints at the bottom of this post.
Now I want to get each ObjectId separately as a variable so I can search for these in the collection.
mongo.db.cons.find_one({"con": matches})
Where matches (will probably need to be a new variable) will be one of each ObjectId's that match the DB reference.
So, how do I separate the ObjectId in the matches so I get one at a time being iterated. I tried a for loop but it threw an error and I guess I am writing it wrong for a set. Thanks for the help.
Print Statements:
**matches** {ObjectId('5feafffbb4cf9e627842b1d9'), ObjectId('5feaffcfb4cf9e627842b1d8'), ObjectId('5feb247f1bb7a1297060342e')}
**all_questions** [ObjectId('5feafb52ae1b389f59423a91'), ObjectId('5feafb64ae1b389f59423a92'), ObjectId('5feaffcfb4cf9e627842b1d8'), ObjectId('5feafffbb4cf9e627842b1d9'), ObjectId('5feb247f1bb7a1297060342e'), ObjectId('6009b6e42b74a187c02ba9d7'), ObjectId('6010822e08050e32c64f2975'), ObjectId('601d125b3c4d9705f3a9720d')]
**con_id_match** [ObjectId('5feb247f1bb7a1297060342e'), ObjectId('5feafffbb4cf9e627842b1d9'), ObjectId('5feaffcfb4cf9e627842b1d8')]
Usually you can just use find method that yields documents one-by-one. And you can filter documents during iterating with python like that:
# fetch only ids
question_ids = {question['_id'] for question in mongo.db.questions.find({}, {'_id': 1})}
matches = []
for con in mongo.db.cons.find():
con_id = con['question_id']
if con_id in question_ids:
matches.append(con_id)
# you can process matched and loaded con here
print(matches)
If you have huge amount of data you can take a look to aggregation framework

Append class records to a list

I load a row of data in a class, row by row with a loop. I'd like to append each row to a list.
class Biz:
def __init__(self, dba_name, legal_name):
self.dba_name = dba_name
self.legal_name = legal_name
MasterList = []
File_From_Excel = pd.read_excel('C:\file.xlsx')
for index, row in File_From_Excel.iterrows():
record = Biz(row['Field 1'], row['Field 2'])
MasterList.append(record)
print(MasterList)
When I run code like this, I do not get an error, but I get info like this printed:
"[<main.Biz object at 0x0C11BFB0>, <main.Biz object at 0x00BDED50>]"
I'm a newbie and I haven't figured out how to overcome this one. Thank you!
You are printing a list of class instances, so the output is their memory addresses. What you probably want instead is the attributes of these instances. It can be achieved as follows (for illustrative purposes):
# this goes at the end of your code, outside of the loop!
print([[record.dba_name, record.legal_name] for record in MasterList])
This approach is by no means optimal and and will give you memory issues if there are a lot of elements in MasterList. In that case you would want to use either a generator or a class iterator.
Edit: Come to think of it, there is no need for a generator here since a simple for loop can iterate over the list:
for record in MasterList:
print([record.dba_name, record.legal_name], end=' ')

2 questions on Python : created table & fail to find duplicates in rows

I have this data set which is in this format in this way in csv file:
1st question : I am trying to find duplicates rows in the table just created in python below?
I did try to use the set function to run the rows and the output I got is
no duplicates even though there is a duplicate row in the data set.
2nd question: is it possible to reference this table as i realized that it becomes a table when I print?So that I can use it on the next step for calculation purpose.
COL_1_WIDTH = 10
COL_2_WIDTH = 35
for row in data:
IC1 = len(str(row[0]))
IC2 = len(str(row[1]))
print( str(row[0])+ str( (COL_1_WIDTH-IC1) *' ') +\
str(row[1]) + str( (COL_2_WIDTH-IC2) *' ') +\
str(row[2]))
for row in data:
if len(set(row)) !=len(row):
print ('duplicates: ', row)
else:
print ('no duplicates:', row)
P.s. Permit to use built in function & numpy only.
Grateful for any ideas. Thank you!
You don't really explain what kind of object is 'data', so I assumed it was a list of strings.
Here's how I created mine from a csv file:
with open('/home/sebastien/Documents/answerSO.csv') as file:
data=file.read() #a string
data=data.split('\n') #a list of strings
data.pop() #to delete the last element, an empty string
(note that using the csv module may be a better idea)
Now, to look for duplicates, I used the method explained here:
How do I find the duplicates in a list and create another list with them?
seen = set()
uniq = []
for row in data:
if row not in seen:
uniq.append(row)
seen.add(row)
else:
print("found a duplicate:",row)
And about referencing it, well, it's in 'data'

Unreachable code block (in python) with html templater

I'm having trouble creating a fiddly html table in python 3.4. The templater is html 1.16. Here's a simplified version of the problem: I would like to traverse a list. For each list item, I would like to write the data to a html table. The table should be two columns wide.
from html import HTML
#create html object
h = HTML()
comments=["blah1",
"blah2",
"blah3"
]
#create table object
c_table = h.table.tbody
for i, comment in enumerate(comments):
#create row if we are at an odd index
if i % 2 != 0:
row = c_table.tr
row.td(comment)
else:
#it is intended to add another <td> to the current row here
#but because the row was declared in the if block, it is out of scope
row.td(comment)
#write the html output now
print(h)
The difficulty is with the templater, specifically: accessing the row object for the second cell of the row without causing the </tr> closing tag. I have to create new cells through the row object, otherwise if I call c_table.tr.td it closes the row with </tr> and starts a new one.
Can anyone clever think of any code trickery that achieves what I'm trying to do in these circumstances?
Your comment is simply incorrect. Python does not have block scope, and the row that is defined in the if block is accessible in the else.
In fact, you can take the td out of the if block, and remove the else altogether.
You can't access that row object, because it was created inside the first if. In order to access it in your "else", you'll have to create it outside both clauses, which doesn't help you achieve your goal.
Try dividing the list into "chunks" - a list of lists with 2 objects each.
h = HTML()
comments=["blah1",
"blah2",
"blah3",
"blah4",
"blah5"
]
fixed_list = []
for i in xrange(0, len(comments), 2):
fixed_list.append(comments[i:i+2])
Now fixed list looks like this -
[["blah1", "blah2"], ["blah3", "blah4"], .....]
And now you can easily iterate over that list, and create a row for each list -
#create table object
body = h.body
tb = body.table
for comments_list in fixed_list:
row = tb.tr
for comment in comments_list:
row.td(comment)
print h

Categories

Resources