Read dbf with arcpy in PyCharm? - python

I have exported an ArcGIS Desktop 10.7 table into a dbf file.
Now I want to do some GIS calculation in standalone Python.
Therefore I have started a PyCharm project referencing the ArcGIS Python interpreter and hence am able to import arcpy into my main.py.
Problem is: I don't want to pip install other modules, but I don't know how to correctly read the dbf table with arcpy.
#encoding=utf-8
import arcpy
path=r"D:\test.dbf"
sc=arcpy.SearchCursor(path) # Does not work: IOError exception str() failed
tv=arcpy.mapping.TableView(path) # Does not work either: StandaloneObject invalid data source or table
The dbf file is correct, it can be read into ArcGIS.
Can someone please give me an idea, how to read the file standalone with arcpy?

Using pandas
Python from ArcMap comes with some modules. You can load the data into a pandas.DataFrame and work with this format. Pandas is well-documented and there is a lot of already asked question about it all over the web. It's also super easy to do groupby or table manipulations.
import pandas as pd
import arcpy
def read_arcpy_table(self, table, fields='*', null_value=None):
"""
Transform a table from ArcMap into a pandas.DataFrame object
table : Path the table
fields : Fields to load - '*' loads all fields
null_value : choose a value to replace null values
"""
fields_type = {f.name: f.type for f in arcpy.ListFields(table)}
if fields == '*':
fields = fields_type.keys()
fields = [f.name for f in arcpy.ListFields(table) if f.name in fields]
fields = [f for f in fields if f in fields_type and fields_type[f] != 'Geometry'] # Remove Geometry field if FeatureClass to avoid bug
# Transform in pd.Dataframe
np_array = arcpy.da.FeatureClassToNumPyArray(in_table=table,
field_names=fields,
skip_nulls=False,
null_value=null_value)
df = self.DataFrame(np_array)
return df
# Add the function into the loaded pandas module
pd.read_arcpy_table = types.MethodType(read_arcpy_table, pd)
df = pd.read_arcpy_table(table='path_to_your_table')
# Do whatever calculations need to be done
Using cursor
You can also use arcpy cursors and dict for simple calculation.
There are simple example on this page on how to use correctly cursors :
https://desktop.arcgis.com/fr/arcmap/10.3/analyze/arcpy-data-access/searchcursor-class.htm

My bad,
after reading the Using cursor approach, I figured out that using the
sc=arcpy.SearchCursor(path) # Does not work: IOError exception str() failed
approach was correct, but at the time around 3 AM, I was a little bit exhausted and missed the typo in the path that caused the error. Nevertheless, a more descriptive error message e.g. IOError could not open file rather than IOError exception str() failed would have solved my mistake as acrGIS newbie.. : /

Related

TFX. Properties for CsvCoder in CsvExampleGen: 'Columns do not match specified csv headers'

I am working with TensorFlow Extended and stack in a loading .csv file.
This file has ; separation and can't be read by default TFX generator CsvExampleGen(). It throws out the following error: ValueError: Columns do not match specified csv headers
I found that this problem related to inner dependencies such as tft.coders.CsvCoder() that requires not default parameters to parse .csv file.
Question is the following:
How to throw parameters in tft.coders.CsvCoder() from tfx.components.CsvExampleGen?
from tfx.components import CsvExampleGen
from tfx.utils.dsl_utils import external_input
data_path = './data'
intro_component = CsvExampleGen(input=external_input(data_path))
...
From the comments
Current solution is to transform the datafile with pandas:
df = pd.read_csv(_file_path, sep=';')
df.to_csv(_file_path)
(paraphrased from Oleks).

Python 3 openpyxl UserWarning: Data Validation extension not supported

So this is my first time that I'm attempting to read from an Excel file and I'm trying to do so with the openpyxl module. My aim is to collate a dictionary with a nested list as its value. However, when I get this warning when I try to run it:
UserWarning: Data Validation extension is not supported and will be removed
warn(msg)
I don't know where I'm going wrong. Any help would be much appreciated. Thanks
import openpyxl
try:
wb = openpyxl.load_workbook("Grantfundme Master London.xlsx")
except FileNotFoundError:
print("File could not be found.")
sheet = wb["FUNDS"]
database = {}
for i in range(250):#this is the number of keys I want in my dictionary so loop through rows
charity = sheet.cell(row=i + 1, column=1).value
area_of_work = []
org = []
funding = sheet.cell(row=i + 1, column=14).value
for x in range(8, 13): # this loops through columns with info I need
if sheet.cell(row=i +1, column=x).value !="":
area_of_work.append(sheet.cell(row=i +1, column=x).value)
for y in range(3, 6): # another column loop
if sheet.cell(row=i +1, column=y).value !="":
org.append(sheet.cell(row=i +1, column=y).value)
database[charity] = [area_of_work,org, funding]
try:
f = open("database.txt", "w")
f.close()
except IOError:
print("Ooops. It hasn't written to the file")
For those asking here is a screenshot of the exception:
(
Excel has a feature called Data Validation (in the Data Tools section of the Data tab in my version) where you can pick from a list of rules to limit the type of data that can be entered in a cell. This is sometimes used to create dropdown lists in Excel. This warning is telling you that this feature is not supported by openpyxl, and those rules will not be enforced. If you want the warning to go away, you can click on the Data Validation icon in Excel, then click the Clear All button to remove all data validation rules and save your workbook.
Sometimes simply clearing the Data Validation rules in the Workbook is not a viable solution - perhaps other users rely on the rules, or maybe they are locked for editing, etc.
The error can be ignored using a simple filter, and the Workbook can remain untouched, as:
import warnings
warnings.simplefilter(action='ignore', category=UserWarning)
In practice, this might look like:
import pandas as pd
import warnings
def load_data(path: str):
"""Load data from an Excel file."""
warnings.simplefilter(action='ignore', category=UserWarning)
return pd.read_excel(path)
Note:
Just remember to reset warnings, else all other UserWarnings will be ignored as well.
Thanks, for the screenshot! Without seeing the actual excel workbook it's hard to say exactly what it is complaining about.
If you notice the screenshot references line 322 of the reader worksheet module. It looks like it is telling you the data valadation extension to the OOXML standard is not supported by the openpyxl library. It's appears to be saying it found parts of the data valadation extension in your workbook and that will be lost when parsing the workbook with the openpyxl extention.

Incorporating attribute table into python

I am working in python with ArcMap & had a question. Is there a way to import the data from an attribute table into python, and if you can how do you select which attributes to print?
Thank You
A clean looking way of presenting an attribute table in python can be executed through a pandas dataframe...especially if you are working with shapefiles. The input must be a dbf file.
# Pandas Option, python 2.7
import arcpy, os, pandas
inTable = r'.dbf'
os.chdir(os.path.dirname(inTable))
outTable = 'table.xls'
arcpy.TableToExcel_conversion(inTable, outTable)
df = pandas.read_excel(outTable)
df.head() # Shows first five records of attribute table
Using SearchCursors and UpdateCursors is also an option if you want to update fields and work directly with the shapefile.
# SearchCursor option, python 2.7
import arcpy, os
shapefile = r'.shp'
cursor = arcpy.da.SearchCursor(shapefile, '*')
attributes = []
for row in cursor:
attributes.append(row)
If you want to find a specific record by a certain field name...for example ID 50...
for record in attributes:
if record[0] == 50:
print record

Pyshp shapefile reader not working

import shapefile
r = shapefile.Reader("C:\Users\Me\Desktop\py\mis.dbf")
That is as far as I get, must be something simple I don't know about. I have already spent a embarrassing amount of time on this little thing. Could one of you more knowlegeable ones tell me what I missed?
It looks like you're good to go unless you're getting an error that you didn't mention.
First of all you're looking at the dbf file which contains the shapefile attributes (similar to a spreadsheet). But that doesn't matter because the Reader ignores extensions and will try to find the .shp and .shx files as well containing the geometry and geometry record index as well.
If you're just interested in the attributes try the following after you above example:
# Print the dbf field names
print [f[0] for f in r.fields]
# Print the first record:
print r.record(0)
# Loop through all the records using an interator:
for rec in r.iterRecords(): print rec

Reading DBF files with pyodbc

In a project, I need to extract data from a Visual FoxPro database, which is stored in dbf files, y have a data directory with 539 files I need to take into account, each file represents a database table, so I've been doing some testing and my code goes like this:
import pyodbc
connection = pyodbc.connect("Driver={Microsoft Visual FoxPro Driver};SourceType=DBF;SourceDB=P:\\Data;Exclusive=No;Collate=Machine;NULL=No;DELETED=Yes")
tables = connection.cursor().tables()
for _ in tables:
print _
this prints only 15 tables, with no obvious pattern, always the same 15 tables, I thought this was because the rest of the tables were empty but I checked and it some of the tables (dbf files) on the list are empty too, then, I thought it was a permission issue, but all the files have the same permission structure, so, I don't know what's happening here.
Any light??
EDIT:
It is not truccating the output, the tables it list are not the 15 first or anything like that
I DID IT!!!!
There where several problems with what I was doing so, here I come with what I did to solve it (after implementing it the first time with Ethan Furman's solution)
The first thing was a driver problem, it turns out that the Windows' DBF drivers are 32 bits programs and runs on a 64 bits operating system, so, I had installed Python-amd64 and that was the first problem, so I installed a 32bit Python.
The second issue was a library/file issue, according to this, dbf files in VFP > 7 are diferent, so my pyodbc library won't read them correctly, so I tried some OLE-DB libraries with no success and I decided to to it from scratch.
Googling for a while took me to this post which finally gave me a light on this
Basically, what I did was the following:
import win32com.client
conn = win32com.client.Dispatch('ADODB.Connection')
db = 'C:\\Profit\\profit_a\\ARMM'
dsn = 'Provider=VFPOLEDB.1;Data Source=%s' % db
conn.Open(dsn)
cmd = win32com.client.Dispatch('ADODB.Command')
cmd.ActiveConnection = conn
cmd.CommandText = "Select * from factura, reng_fac where factura.fact_num = reng_fac.fact_num AND factura.fact_num = 6099;"
rs, total = cmd.Execute() # This returns a tuple: (<RecordSet>, number_of_records)
while total:
for x in xrange(rs.Fields.Count):
print '%s --> %s' % (rs.Fields.item(x).Name, rs.Fields.item(x).Value)
rs.MoveNext() #<- Extra indent
total = total - 1
And it gave me 20 records which I checked with DBFCommander and were OK
First, you need to install pywin32 extensions (32bits) and the Visual FoxPro OLE-DB Provider (only available for 32bits), in my case for VFP 9.0
Also, it's good to read de ADO Documentation at the w3c website
This worked for me. Thank you very much to those who replied
I would use my own dbf package and the code would go something like this:
import dbf
from glob import glob
for dbf_file in glob(r'p:\data\*.dbf'):
with dbf.Table(dbf_file) as table:
for record in table:
do_something_with(record)
A table is list-like, and iteration through it returns records. A record is list-, dict-, and obj-like, and iteration returns the values; besides iteration through the record, individual fields can be accessed either by offset (record[0] for the first field), by field-name using dict-like access (record['some_field']), or by field-name using obj.attr-like access (record.some_field).
If you just wanted to dump the contents of each dbf file into a csv file you could do:
for dbf_file in glob(r'p:\data\*.dbf'):
with dbf.Table(dbf_file) as table:
dbf.export(table, dbf_file)
I know this doesn't directly answer your question, but might still help. I've had lots of issues using ODBC with VFP databases and I've found it's often much easier treating the VFP tables as free tables when possible.
Using Yusdi Santoso's dbf.py and glob, here's some code to open each table in a directory and run through each record.
import glob
import os
import dbf
os.chdir("P:\\data")
for file in glob.glob("*.dbf"):
table = dbf.readDbf(file)
for row in table:
#do stuff

Categories

Resources