use fitz merge span texts and coordinates into row

use fitz merge span texts and coordinates into row - python

I'm try to merge every span to a row use pymupdf
import fitz
with fitz.open("0003001v1.pdf") as doc:
page = doc[0]
dict = page.get_text("dict")
if "blocks" in dict:
blocks = dict["blocks"]
fixed_blocks = dict["blocks"]
for block in blocks:
print("--------------------------".strip())
print("block: ", str(block["bbox"]).replace("(","[").replace(")","]"))
print("")
if "lines" in block.keys():
lines = block["lines"]
for line in lines:
if "spans" in line.keys():
spans = line["spans"]
for span in spans:
fixed_line_bbox = []
fixed_line_text = []
line_text = span["text"]
line_bbox = span["bbox"]
line_bbox_x_0 = line_bbox[0]
line_bbox_y_0 = line_bbox[1]
line_bbox_x_1 = line_bbox[2]
line_bbox_y_1 = line_bbox[3]
print("row:" + str(line_bbox).replace("(","[").replace(")","]") + "\t" + line_text)
the output will be:
block: [71.99899291992188, 630.993408203125, 502.38116455078125, 700.308837890625]
row:[71.99905395507812, 630.993408203125, 502.36865234375, 642.9486083984375] and look for the explicit form of the function Φ from the experimental data on the
row:[71.99905395507812, 645.2735595703125, 107.62599182128906, 657.228759765625] system
row:[107.62599182128906, 645.2735595703125, 119.32400512695312, 657.228759765625] S
row:[120.1189956665039, 645.2735595703125, 502.3509826660156, 657.228759765625] . However, the function Φ may depend on time, it means that there are
row:[71.99899291992188, 659.673583984375, 344.1631774902344, 671.6287841796875] some hidden parameters, which control the system
row:[344.1631774902344, 659.673583984375, 356.683837890625, 671.6287841796875] S
row:[356.683837890625, 659.673583984375, 502.38116455078125, 671.6287841796875] and its evolution is of the
row:[71.99899291992188, 673.95361328125, 96.2470474243164, 685.9088134765625] form
row:[257.9989929199219, 688.3536376953125, 261.3225402832031, 700.308837890625] ˙
row:[254.6388397216797, 688.1612548828125, 262.4575500488281, 700.116455078125] ϕ
row:[262.4575500488281, 688.3536376953125, 291.689697265625, 700.308837890625] = Φ(
row:[291.71893310546875, 688.1612548828125, 311.758056640625, 700.116455078125] ϕ, u
row:[311.75872802734375, 688.3536376953125, 316.4093017578125, 700.308837890625] )
row:[316.4388122558594, 688.1612548828125, 319.7623596191406, 700.116455078125] ,
how could I merge spans text and coordinates which in a single lines and get the fixed line coordinates and texts.

Related

How to fetch Total No of pages count in Python-docx?

I am learning Python-docx and I am using this solution to add page number as given on stackoverflow by #Utkarsh Dalal
def create_element(name):
return OxmlElement(name)
def create_attribute(element, name, value):
element.set(nsqn(name), value)
def add_page_number(paragraph):
paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
page_run = paragraph.add_run()
t1 = create_element('w:t')
create_attribute(t1, 'xml:space', 'preserve')
t1.text = 'Page '
page_run._r.append(t1)
page_num_run = paragraph.add_run()
fldChar1 = create_element('w:fldChar')
create_attribute(fldChar1, 'w:fldCharType', 'begin')
instrText = create_element('w:instrText')
create_attribute(instrText, 'xml:space', 'preserve')
instrText.text = "PAGE"
fldChar2 = create_element('w:fldChar')
create_attribute(fldChar2, 'w:fldCharType', 'end')
page_num_run._r.append(fldChar1)
page_num_run._r.append(instrText)
page_num_run._r.append(fldChar2)
of_run = paragraph.add_run()
t2 = create_element('w:t')
create_attribute(t2, 'xml:space', 'preserve')
t2.text = ' of '
of_run._r.append(t2)
fldChar3 = create_element('w:fldChar')
create_attribute(fldChar3, 'w:fldCharType', 'begin')
instrText2 = create_element('w:instrText')
create_attribute(instrText2, 'xml:space', 'preserve')
instrText2.text = "NUMPAGES"
fldChar4 = create_element('w:fldChar')
create_attribute(fldChar4, 'w:fldCharType', 'end')
num_pages_run = paragraph.add_run()
num_pages_run._r.append(fldChar3)
num_pages_run._r.append(instrText2)
num_pages_run._r.append(fldChar4)
for pg in num_pages_run.element:
print(pg.text)
doc = Document()
add_page_number(doc.sections[0].footer.paragraphs[0])
doc.save("your_doc.docx")
I receive the Page number in format of Page x of y.
But when I try to access the value of 'y' for total number of pages, I could not do it. I have tried to access it getting the text attribute of num_pages_run as pg.text. But all I get is NUMPAGES as output instead of number of pages.
I am looking for this feature because I would like to do some actions whenever a new page is added to the document.
Is there a way to get total pages from python-docx or any other alternative?

Depth first search algorithm skipping spaces in maze?

After concluding the first lecture of Harvard's AI course on edX, I have decided to implement the concepts taught, first being the depth-first search algorithm.
The objective of this program is to input a maze in text file mazefile and find a path from S to G using the depth-first search algorithm.
The project currently consists of 4 files, (1) the code with the class methods to operate or use the (2) text file which contains the maze, another text file (3) that contains the result file (where the AI has explored) and the main python script (4). Here they are, feel free to copy and paste these into a folder and to see how they run.
processText.py (file 1)
#code to process the mazefile file.
class importMaze:
def __init__(self,maze):
self.fileLines = []
self.fileName = maze
self.switch = False
self.toBeReturned = []
def processThis(self):
f = open(self.fileName,"r")
for x in f:
self.fileLines.append(x[:-1])
f.close()
for i in self.fileLines:
if self.switch == True:
if str(i) == "END":
self.switch = False
else:
self.toBeReturned.append(i)
else:
if str(i) == "START":
self.switch = True
return self.toBeReturned
class mazePointer:
def __init__(self,mazearray):
self.Sample = mazearray
self.initialPosition = []
for y in range(0, len(self.Sample)):
for x in range(0,len(self.Sample[y])):
if str(self.Sample[y][x]) == "S":
self.initialPosition = [x,y]
self.currentPosition = self.initialPosition
def whatIs(self,xcoordinate,ycoordinate):
return (self.Sample[ycoordinate])[xcoordinate]
def nearbyFreeSpaces(self,search):
self.freeSpaces = []
if self.whatIs(self.currentPosition[0]-1,self.currentPosition[1]) == search:
self.freeSpaces.append([self.currentPosition[0]-1,self.currentPosition[1]])
if self.whatIs(self.currentPosition[0]+1,self.currentPosition[1]) == search:
self.freeSpaces.append([self.currentPosition[0]+1,self.currentPosition[1]])
if self.whatIs(self.currentPosition[0],self.currentPosition[1]-1) == search:
self.freeSpaces.append([self.currentPosition[0],self.currentPosition[1]-1])
if self.whatIs(self.currentPosition[1],self.currentPosition[1]+1) == search:
self.freeSpaces.append([self.currentPosition[1],self.currentPosition[1]+1])
return self.freeSpaces
def moveTo(self,position):
self.currentPosition = position
TestingTrack.py (the main file)
from processText import importMaze, mazePointer
testObject = importMaze("mazefile")
environment = testObject.processThis()
finger = mazePointer(environment)
frontier = []
explored = []
result = ""
def Search():
global result
if len(finger.nearbyFreeSpaces("G")) == 1: #If the goal is bordering this space
result = finger.nearbyFreeSpaces("G")[0]
explored.append(finger.currentPosition)
else:
newPlaces = finger.nearbyFreeSpaces("F") #finds the free spaces bordering
for i in newPlaces:
if i in explored: #Skips the ones already visited
pass
else:
frontier.append(i)
while result == "":
explored.append(finger.currentPosition)
Search()
finger.moveTo(frontier[-1])
frontier.pop(-1)
exploredArray = []
for y in range(len(environment)): #Recreates the maze, fills in 'E' in where the AI has visited.
holder = ""
for x in range(len(environment[y])):
if [x,y] in explored:
holder+= "E"
else:
holder+= str(environment[y][x])
exploredArray.append(holder)
def createResult(mazeList,title,append): #Creating the file
file = open("resultfile",append)
string = title + " \n F - Free \n O - Occupied \n S - Starting point \n G - Goal \n E - Explored/Visited \n (Abdulaziz Albastaki 2020) \n \n (top left coordinate - 0,0) \n "
for i in exploredArray:
string+= "\n" + str(i)
string+= "\n \n Original problem \n"
for i in environment:
string+= "\n" +str(i)
file.write(string)
file.close()
def tracingPath():
initialExplored = explored
proceed = True
newExplored = []
for i in explored:
finger.moveTo() #incomplete
print(explored)
createResult(exploredArray,"DEPTH FIRST SEARCH", "w")
mazefile (the program will read this file to get the maze)
F - Free
O - Occupied
S - Starting point
G - Goal
(Abdulaziz Albastaki 2020)
START
OOOOOOOOOOOOOOOO
OFFFFFFFFFFFFFGO
OFOOOOOOOOOOOOFO
OFOOOOOOOOOOOOFO
OFOOOOOOOOOOOOFO
OFOOOOOOOOOOOOFO
OSFFFFFFFFFFFFFO
OOOOOOOOOOOOOOOO
END
Made by Abdulaziz Albastaki in October 2020
You can change the maze and its size however it must
-Respect the key above
-Have ONE Starting point and goal
-The maze must be in between 'START' and 'END'
-The maze MUST be surrounded by occupied space
SAMPLE PROBLEMS:
OOOOOOOOOOOOOOOO
OFFFFFFFFFFFFFGO
OFOOOOOOOOOOOOFO
OFOOOOOOOOOOOOFO
OFOOOOOOOOOOOOFO
OFOOOOOOOOOOOOFO
OSFFFFFFFFFFFFFO
OOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOO
OFOFFFFFOOOFFFOOO
OFFFOOOFOOFFOOOFO
OFOOOOOFOOFOOOOFO
OSFGFFFFFFFFFFFFO
OOOOOOOOOOOOOOOOO
There is also a resultfile, however if you would just create an empty textfile with that name (no extension), the program will fill it in with results.
The problem is with the resultfile, here it is:
DEPTH FIRST SEARCH
F - Free
O - Occupied
S - Starting point
G - Goal
E - Explored/Visited
(Abdulaziz Albastaki 2020)
(top left coordinate - 0,0)
OOOOOOOOOOOOOOOO
OFFFFFFFFFFFFFGO
OFOOOOOOOOOOOOEO
OFOOOOOOOOOOOOEO
OFOOOOOOOOOOOOEO
OEOOOOOOOOOOOOEO
OEFFFEEEEEEEEEEO
OOOOOOOOOOOOOOOO
Original problem
OOOOOOOOOOOOOOOO
OFFFFFFFFFFFFFGO
OFOOOOOOOOOOOOFO
OFOOOOOOOOOOOOFO
OFOOOOOOOOOOOOFO
OFOOOOOOOOOOOOFO
OSFFFFFFFFFFFFFO
OOOOOOOOOOOOOOOO
The AI skipped a few spaces to get to the goal, why is it doing so?
Feel free to ask me for any clarifications.

There are the following issues:
the last if block in nearbyFreeSpaces uses a wrong index:
if self.whatIs(self.currentPosition[1],self.currentPosition[1]+1) == search:
self.freeSpaces.append([self.currentPosition[1],self.currentPosition[1]+1])
should be:
if self.whatIs(self.currentPosition[0],self.currentPosition[1]+1) == search:
self.freeSpaces.append([self.currentPosition[0],self.currentPosition[1]+1])
The final position is not correctly added to the path. The last line of this block:
if len(finger.nearbyFreeSpaces("G")) == 1: #If the goal is bordering this space
result = finger.nearbyFreeSpaces("G")[0]
explored.append(finger.currentPosition)
...should be:
explored.append(result)

finding the same words in two files and leaving out not repeated ones in python

I have to write a program that correlates smoking with lung cancer risk. For that I have data in two files.
My code is computing the data given in the same lines (eg:America,23.3 with Spain,77.9 and
Italy,24.2 with Russia,60.8)
How to modify my code so that it computes the numbers of the same countries and leaves out the countries that occur only in one file (it shouldn't compute Germany, France, China, Korea because they are only in one file)
Thank you so much for your help in advance:)
smoking file:
Country, Percent Cigarette Smokers Data
America,23.3
Italy,24.2
Russia,23.7
France,14.9
England,17.9
Spain,17
Germany,21.7
second file:
Cases Lung Cancer per 100000
Spain,77.9
Russia,60.8
Korea,61.3
America,73.3
China,66.8
Vietnam,64.5
Italy,43.9
and my code:
def readFiles(smoking_datafile, cancer_datafile):
'''
Reads the data from the provided file objects smoking_datafile
and cancer_datafile. Returns a list of the data read from each
in a tuple of the form (smoking_datafile, cancer_datafile).
'''
# init
smoking_data = []
cancer_data = []
empty_str = ''
# read past file headers
smoking_datafile.readline()
cancer_datafile.readline()
# read data files
eof = False
while not eof:
# read line of data from each file
s_line = smoking_datafile.readline()
c_line = cancer_datafile.readline()
# check if at end-of-file of both files
if s_line == empty_str and c_line == empty_str:
eof = True
# check if end of smoking data file only
elif s_line == empty_str:
raise OSError('Unexpected end-of-file for smoking data file')
# check if at end of cancer data file only
elif c_line == empty_str:
raise OSError('Unexpected end-of-file for cancer data file')
# append line of data to each list
else:
smoking_data.append(s_line.strip().split(','))
cancer_data.append(c_line.strip().split(','))
# return list of data from each file
return (smoking_data, cancer_data)
def calculateCorrelation(smoking_data, cancer_data):
'''
Calculates and returns the correlation value for the data
provided in lists smoking_data and cancer_data
'''
# init
sum_smoking_vals = sum_cancer_vals = 0
sum_smoking_sqrd = sum_cancer_sqrd = 0
sum_products = 0
# calculate intermediate correlation values
num_values = len(smoking_data)
for k in range(0,num_values):
sum_smoking_vals = sum_smoking_vals + float(smoking_data[k][1])
sum_cancer_vals = sum_cancer_vals + float(cancer_data[k][1])
sum_smoking_sqrd = sum_smoking_sqrd + \
float(smoking_data[k][1]) ** 2
sum_cancer_sqrd = sum_cancer_sqrd + \
float(cancer_data[k][1]) ** 2
sum_products = sum_products + float(smoking_data[k][1]) * \
float(cancer_data[k][1])
# calculate and display correlation value
numer = (num_values * sum_products) - \
(sum_smoking_vals * sum_cancer_vals)
denom = math.sqrt(abs( \
((num_values * sum_smoking_sqrd) - (sum_smoking_vals ** 2)) * \
((num_values * sum_cancer_sqrd) - (sum_cancer_vals ** 2)) \
))
return numer / denom

Let's just focus on getting the data into a format that is easy to work with. The code below will get you a dictionary of the form ...
smokers_cancer_data = {
'America': {
'smokers': '23.3',
'cancer': '73.3'
},
'Italy': {
'smokers': '24.2',
'cancer': '43.9'
},
...
}
Once you have this you can get any values you need and perform your calculations. See the code below.
def read_data(filename: str) -> dict:
with open(filename, 'r') as file:
next(file) # Skip the header
data = dict();
for line in file:
cleaned_line = line.rstrip()
# Skip blank lines
if cleaned_line:
data_item = (cleaned_line.split(','))
data[data_item[0]] = float(data_item[1])
return data
# Load data into python dictionaries
smokers_data = read_data('smokersData.txt')
cancer_data = read_data('lungCancerData.txt')
# Build one dictionary that is easy to work with
smokers_cancer_data = dict()
for (key, value) in smokers_data.items():
if key in cancer_data:
smokers_cancer_data[key] = {
'smokers': smokers_data[key],
'cancer' : cancer_data[key]
}
print(smokers_cancer_data)
For example, if you want to calculate the sum of the smoker and cancer values.
smokers_total = 0
cancer_total = 0
for (key, value) in smokers_cancer_data.items():
smokers_total += value['smokers']
cancer_total += value['cancer']

This will return a list of all the countries that have datas, along with the data:
l3 = []
with open('smoking.txt','r') as f1, open('cancer.txt','r') as f2:
l1, l2 = f1.readlines(), f2.readlines()
for s1 in l1:
for s2 in l2:
if s1.split(',')[0] == s2.split(',')[0]:
cty = s1.split(',')[0]
smk = s1.split(',')[1].strip()
cnr = s2.split(',')[1].strip()
l3.append(f"{cty}: smoking: {smk}, cancer: {cnr}")
print(l3)
Output:
['Spain: smoking: 77.9, cancer: 17', 'Russia: smoking: 60.8, cancer: 23.7', 'America: smoking: 73.3, cancer: 23.3', 'Italy: smoking: 43.9, cancer24.2']

Parse xml w/ xsd to CSV with Python?

I am trying to parse a very large XML file which I downloaded from OSHA's website and convert it into a CSV so I can use it in a SQLite database along with some other spreadsheets. I would just use an online converter, but the osha file is apparently too big for all of them.
I wrote a script in Python which looks like this:
import csv
import xml.etree.cElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
xml_data_to_csv =open('Out.csv', 'w')
list_head=[]
Csv_writer=csv.writer(xml_data_to_csv)
count=0
for element in root.findall('data'):
List_nodes =[]
if count== 0:
inspection_number = element.find('inspection_number').tag
list_head.append(inspection_number)
establishment_name = element.find('establishment_name').tag
list_head.append(establishment_name)
city = element.find('city')
list_head.append(city)
state = element.find('state')
list_head.append(state)
zip_code = element.find('zip_code')
list_head.append(zip_code)
sic_code = element.find('sic_code')
list_head.append(sic_code)
naics_code = element.find('naics_code')
list_head.append(naics_code)
sampling_number = element.find('sampling_number')
list_head.append(sampling_number)
office_id = element.find('office_id')
list_head.append(office_id)
date_sampled = element.find('date_sampled')
list_head.append(date_sampled)
date_reported = element.find('date_reported')
list_head.append(date_reported)
eight_hour_twa_calc = element.find('eight_hour_twa_calc')
list_head.append(eight_hour_twa_calc)
instrument_type = element.find('instrument_type')
list_head.append(instrument_type)
lab_number = element.find('lab_number')
list_head.append(lab_number)
field_number = element.find('field_number')
list_head.append(field_number)
sample_type = element.find('sample_type')
list_head.append(sample_type)
blank_used = element.find('blank_used')
list_head.append(blank_used)
time_sampled = element.find('time_sampled')
list_head.append(time_sampled)
air_volume_sampled = element.find('air_volume_sampled')
list_head.append(air_volume_sampled)
sample_weight = element.find('sample_weight')
list_head.append(sample_weight)
imis_substance_code = element.find('imis_substance_code')
list_head.append(imis_substance_code)
substance = element.find('substance')
list_head.append(substance)
sample_result = element.find('sample_result')
list_head.append(sample_result)
unit_of_measurement = element.find('unit_of_measurement')
list_head.append(unit_of_measurement)
qualifier = element.find('qualifier')
list_head.append(qualifier)
Csv_writer.writerow(list_head)
count = +1
inspection_number = element.find('inspection_number').text
List_nodes.append(inspection_number)
establishment_name = element.find('establishment_name').text
List_nodes.append(establishment_name)
city = element.find('city').text
List_nodes.append(city)
state = element.find('state').text
List_nodes.append(state)
zip_code = element.find('zip_code').text
List_nodes.append(zip_code)
sic_code = element.find('sic_code').text
List_nodes.append(sic_code)
naics_code = element.find('naics_code').text
List_nodes.append(naics_code)
sampling_number = element.find('sampling_number').text
List_nodes.append(sampling_number)
office_id = element.find('office_id').text
List_nodes.append(office_id)
date_sampled = element.find('date_sampled').text
List_nodes.append(date_sampled)
date_reported = element.find('date_reported').text
List_nodes.append(date_reported)
eight_hour_twa_calc = element.find('eight_hour_twa_calc').text
List_nodes.append(eight_hour_twa_calc)
instrument_type = element.find('instrument_type').text
List_nodes.append(instrument_type)
lab_number = element.find('lab_number').text
List_nodes.append(lab_number)
field_number = element.find('field_number').text
List_nodes.append(field_number)
sample_type = element.find('sample_type').text
List_nodes.append(sample_type)
blank_used = element.find('blank_used').text
List_nodes.append()
time_sampled = element.find('time_sampled').text
List_nodes.append(time_sampled)
air_volume_sampled = element.find('air_volume_sampled').text
List_nodes.append(air_volume_sampled)
sample_weight = element.find('sample_weight').text
List_nodes.append(sample_weight)
imis_substance_code = element.find('imis_substance_code').text
List_nodes.append(imis_substance_code)
substance = element.find('substance').text
List_nodes.append(substance)
sample_result = element.find('sample_result').text
List_nodes.append(sample_result)
unit_of_measurement = element.find('unit_of_measurement').text
List_nodes.append(unit_of_measurement)
qualifier= element.find('qualifier').text
List_nodes.append(qualifier)
Csv_writer.writerow(List_nodes)
xml_data_to_csv.close()
But when I run the code I get a CSV with nothing in it. I suspect this may have something to do with the XSD file associated with the XML, but I'm not totally sure.
Does anyone know what the issue is here?

The code below is a 'compact' version of your code.
It assumes that the XML structure looks like in the script variable xml. (Based on https://www.osha.gov/opengov/sample_data_2011.zip)
The main difference bwtween this sample code and yours is that I define the fields that I want to collect once (see FIELDS) and I use this definition across the script.
import xml.etree.ElementTree as ET
FIELDS = ['lab_number', 'instrument_type'] # TODO add more fields
xml = '''<main xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="health_sample_data.xsd">
<DATA_RECORD>
<inspection_number>316180165</inspection_number>
<establishment_name>PROFESSIONAL ENGINEERING SERVICES, LLC.</establishment_name>
<city>EUFAULA</city>
<state>AL</state>
<zip_code>36027</zip_code>
<sic_code>1799</sic_code>
<naics_code>238990</naics_code>
<sampling_number>434866166</sampling_number>
<office_id>418600</office_id>
<date_sampled>2011-12-30</date_sampled>
<date_reported>2011-12-30</date_reported>
<eight_hour_twa_calc>N</eight_hour_twa_calc>
<instrument_type>TBD</instrument_type>
<lab_number>L13645</lab_number>
<field_number>S1</field_number>
<sample_type>B</sample_type>
<blank_used>N</blank_used>
<time_sampled></time_sampled>
<air_volume_sampled></air_volume_sampled>
<sample_weight></sample_weight>
<imis_substance_code>S777</imis_substance_code>
<substance>Soil</substance>
<sample_result>0</sample_result>
<unit_of_measurement>AAAAA</unit_of_measurement>
<qualifier></qualifier>
</DATA_RECORD>
<DATA_RECORD>
<inspection_number>315516757</inspection_number>
<establishment_name>MARGUERITE CONCRETE CO.</establishment_name>
<city>WORCESTER</city>
<state>MA</state>
<zip_code>1608</zip_code>
<sic_code>1771</sic_code>
<naics_code>238110</naics_code>
<sampling_number>423259902</sampling_number>
<office_id>112600</office_id>
<date_sampled>2011-12-30</date_sampled>
<date_reported>2011-12-30</date_reported>
<eight_hour_twa_calc>N</eight_hour_twa_calc>
<instrument_type>GRAV</instrument_type>
<lab_number>L13355</lab_number>
<field_number>9831B</field_number>
<sample_type>P</sample_type>
<blank_used>N</blank_used>
<time_sampled>184</time_sampled>
<air_volume_sampled>340.4</air_volume_sampled>
<sample_weight>.06</sample_weight>
<imis_substance_code>9135</imis_substance_code>
<substance>Particulates not otherwise regulated (Total Dust)</substance>
<sample_result>0.176</sample_result>
<unit_of_measurement>M</unit_of_measurement>
<qualifier></qualifier>
</DATA_RECORD></main>'''
root = ET.fromstring(xml)
records = root.findall('.//DATA_RECORD')
with open('out.csv', 'w') as out:
out.write(','.join(FIELDS) + '\n')
for record in records:
values = [record.find(f).text for f in FIELDS]
out.write(','.join(values) + '\n')
out.csv
lab_number,instrument_type
L13645,TBD
L13355,GRAV

calculating the area of an irregular shape from coordinates in a csv file using python

i am using Python to import a csv file with coordinates in it, passing it to a list and using the contained data to calculate the area of each irregular figure. The data within the csv file looks like this.
ID Name DE1 DN1 DE2 DN2 DE3 DN3
88637 Zack Fay -0.026841782 -0.071375637 0.160878583 -0.231788845 0.191811833 0.396593863
88687 Victory Greenfelder 0.219394372 -0.081932907 0.053054879 -0.048356016
88737 Lynnette Gorczany 0.043632299 0.118916157 0.005488698 -0.268612073
88787 Odelia Tremblay PhD 0.083147337 0.152277791 -0.039216388 0.469656787 -0.21725977 0.073797219
The code i am using is below - however it brings up an IndexError: as the first line doesn't have data in all columns. Is there a way to write the csv file so it only uses the colums with data in them ?
import csv
import math
def main():
try:
# ask user to open a file with coordinates for 4 points
my_file = raw_input('Enter the Irregular Differences file name and location: ')
file_list = []
with open(my_file, 'r') as my_csv_file:
reader = csv.reader(my_csv_file)
print 'my_csv_file: ', (my_csv_file)
reader.next()
for row in reader:
print row
file_list.append(row)
all = calculate(file_list)
save_write_file(all)
except IOError:
print 'File reading error, Goodbye!'
except IndexError:
print 'Index Error, Check Data'
# now do your calculations on the 'data' in the file.
def calculate(my_file):
return_list = []
for row in my_file:
de1 = float(row[2])
dn1 = float(row[3])
de2 = float(row[4])
dn2 = float(row[5])
de3 = float(row[6])
dn3 = float(row[7])
de4 = float(row[8])
dn4 = float(row[9])
de5 = float(row[10])
dn5 = float(row[11])
de6 = float(row[12])
dn6 = float(row[13])
de7 = float(row[14])
dn7 = float(row[15])
de8 = float(row[16])
dn8 = float(row[17])
de9 = float(row[18])
dn9 = float(row[19])
area_squared = abs((dn1 * de2) - (dn2 * de1)) + ((de3 * dn4) - (dn3 * de4)) + ((de5 * dn6) - (de6 * dn5)) + ((de7 * dn8) - (dn7 * de8)) + ((dn9 * de1) - (de9 * dn1))
area = area_squared / 2
row.append(area)
return_list.append(row)
return return_list
def save_write_file(all):
with open('output_task4B.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(["ID", "Name", "de1", "dn1", "de2", "dn2", "de3", "dn3", "de4", "dn4", "de5", "dn5", "de6", "dn6", "de7", "dn7", "de8", "dn8", "de9", "dn9", "Area"])
writer.writerows(all)
if __name__ == '__main__':
main()
Any suggestions

Your problem appears to be in the calculate function.
You are trying to access various indexes of row without first confirming they exist. One naive approach might be to consider the values to be zero if they are not present, except that:
+ ((dn9 * de1) - (de9 * dn1)
is an attempt to wrap around, and this might invalidate your math since they would go to zero.
A better approach is probably to use a slice of the row, and use the sequence-iterating approach instead of trying to require a certain number of points. This lets your code fit the data.
coords = row[2:] # skip id and name
assert len(coords) % 2 == 0, "Coordinates must come in pairs!"
prev_de = coords[-2]
prev_dn = coords[-1]
area_squared = 0.0
for de, dn in zip(coords[:-1:2], coords[1::2]):
area_squared += (de * prev_dn) - (dn * prev_de)
prev_de, prev_dn = de, dn
area = abs(area_squared) / 2
The next problem will be dealing with variable length output. I'd suggest putting the area before the coordinates. That way you know it's always column 3 (or whatever).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

use fitz merge span texts and coordinates into row - python

Related

How to fetch Total No of pages count in Python-docx?

Depth first search algorithm skipping spaces in maze?

finding the same words in two files and leaving out not repeated ones in python

Parse xml w/ xsd to CSV with Python?

calculating the area of an irregular shape from coordinates in a csv file using python

Categories

Resources