Incorporating attribute table into python - python

I am working in python with ArcMap & had a question. Is there a way to import the data from an attribute table into python, and if you can how do you select which attributes to print?
Thank You

A clean looking way of presenting an attribute table in python can be executed through a pandas dataframe...especially if you are working with shapefiles. The input must be a dbf file.
# Pandas Option, python 2.7
import arcpy, os, pandas
inTable = r'.dbf'
os.chdir(os.path.dirname(inTable))
outTable = 'table.xls'
arcpy.TableToExcel_conversion(inTable, outTable)
df = pandas.read_excel(outTable)
df.head() # Shows first five records of attribute table
Using SearchCursors and UpdateCursors is also an option if you want to update fields and work directly with the shapefile.
# SearchCursor option, python 2.7
import arcpy, os
shapefile = r'.shp'
cursor = arcpy.da.SearchCursor(shapefile, '*')
attributes = []
for row in cursor:
attributes.append(row)
If you want to find a specific record by a certain field name...for example ID 50...
for record in attributes:
if record[0] == 50:
print record

Related

Read dbf with arcpy in PyCharm?

I have exported an ArcGIS Desktop 10.7 table into a dbf file.
Now I want to do some GIS calculation in standalone Python.
Therefore I have started a PyCharm project referencing the ArcGIS Python interpreter and hence am able to import arcpy into my main.py.
Problem is: I don't want to pip install other modules, but I don't know how to correctly read the dbf table with arcpy.
#encoding=utf-8
import arcpy
path=r"D:\test.dbf"
sc=arcpy.SearchCursor(path) # Does not work: IOError exception str() failed
tv=arcpy.mapping.TableView(path) # Does not work either: StandaloneObject invalid data source or table
The dbf file is correct, it can be read into ArcGIS.
Can someone please give me an idea, how to read the file standalone with arcpy?
Using pandas
Python from ArcMap comes with some modules. You can load the data into a pandas.DataFrame and work with this format. Pandas is well-documented and there is a lot of already asked question about it all over the web. It's also super easy to do groupby or table manipulations.
import pandas as pd
import arcpy
def read_arcpy_table(self, table, fields='*', null_value=None):
"""
Transform a table from ArcMap into a pandas.DataFrame object
table : Path the table
fields : Fields to load - '*' loads all fields
null_value : choose a value to replace null values
"""
fields_type = {f.name: f.type for f in arcpy.ListFields(table)}
if fields == '*':
fields = fields_type.keys()
fields = [f.name for f in arcpy.ListFields(table) if f.name in fields]
fields = [f for f in fields if f in fields_type and fields_type[f] != 'Geometry'] # Remove Geometry field if FeatureClass to avoid bug
# Transform in pd.Dataframe
np_array = arcpy.da.FeatureClassToNumPyArray(in_table=table,
field_names=fields,
skip_nulls=False,
null_value=null_value)
df = self.DataFrame(np_array)
return df
# Add the function into the loaded pandas module
pd.read_arcpy_table = types.MethodType(read_arcpy_table, pd)
df = pd.read_arcpy_table(table='path_to_your_table')
# Do whatever calculations need to be done
Using cursor
You can also use arcpy cursors and dict for simple calculation.
There are simple example on this page on how to use correctly cursors :
https://desktop.arcgis.com/fr/arcmap/10.3/analyze/arcpy-data-access/searchcursor-class.htm
My bad,
after reading the Using cursor approach, I figured out that using the
sc=arcpy.SearchCursor(path) # Does not work: IOError exception str() failed
approach was correct, but at the time around 3 AM, I was a little bit exhausted and missed the typo in the path that caused the error. Nevertheless, a more descriptive error message e.g. IOError could not open file rather than IOError exception str() failed would have solved my mistake as acrGIS newbie.. : /

Python/xlwings set excel table style to "none"

I have exported a dataframe to Excel using xlwings and am trying to format it as a table. I want to apply the "None" style but can't figure out how to specify the "None."
This line works:
table = sheet.tables.add(source=sheet["A1"].expand(), name = 'TableName', table_style_name = "TableStyleLight1")
But instead of "TableStyleLight1" I want "None". I have tried "", '', 0, "None" and none of them work.
You can use .api to access the native object, and clear the formatting after creating the table:
table = sheet.tables.add(source=ws.range('I20:N40'), name='UnformattedTable')
sheet.api.ListObjects('UnformattedTable').TableStyle = "" # Windows specific
But as mentioned in the comments, it doesn't seem to be supported using tables.add directly.
xlwings has a missing feature page with an example on using .api. Note the above example is for windows, the api for mac is slightly different.

how to load csv data without header into cassandra column family using python script

I have created a column family in local cassandra as below with cqlsh.
CREATE TABLE sample.stackoverflow_question12 (
id1 int,
class1 int,
name1 text,
PRIMARY KEY (id1)
)
I have a sample csv file with name "data.csv" and the data in the file is as below.
id1 | name1 |class1
1 | hello | 10
2 | world | 20
Used the below python code to connect db and load data from csv by using Anaconda (After installation of Cassandra driver using pip in anaconda)
#Connecting to local Cassandra server
from Cassandra.Cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
auth_provider = PlainTextAuthProvider(username='cassandra', password='cassandra')
cluster = Cluster(["127.0.0.1"],auth_provider = auth_provider,protocol_version=4)
session = cluster.connect()
session.set_keyspace('sample')
cluster.connect()
#File loading
prepared = session.prepare(' Insert into stackoverflow_question12 (id1,class1,name1)VALUES (?, ?, ?)')
with open('D:/Cassandra/NoSQL/data.csv', 'r') as fares:
for fare in fares:
columns=fare.split(",")
id1=columns[0]
class1=columns[1]
name1=columns[2]
session.execute(prepared, [id1,class1,name1])
#closing the file
fares.close()
when I executed the above code getting below error.
Received an argument of invalid type for column "id1". Expected: <class 'cassandra.cqltypes.Int32Type'>, Got: <class 'str'>; (required argument is not an integer)
When I changed data types to text and ran the above code then it loads data with header fields too.
Can anyone help me to make changes in my code to load data without header content? or your successful code also fine if any.
The reason to make column names as id1 and class1 is id and class are keywords and throwing error in the code when used within "fares" loop.
But in real world column names would be seen as class and id. How to run code when these type of columns came into picture?
The another question I got in mind is Cassandra will store primary key first then remaining keys in ascending order. Can we load csv columns which are not indexed same as Cassandra columns storage?
Based on this, I need to build another solution.
You need to use types accordingly to your schema - for integer columns you need to use int(columns...) because split generates strings. If you want to skip header, then you can do something like this:
cnt = 0
with open('D:/Cassandra/NoSQL/data.csv', 'r') as fares:
if cnt = 0:
continue
for fare in fares:
...
Although it's better to use Python's built-in CSV reader that could be customized to skip header automatically...
P.S. If you just want to load data from CSV, I recommend to use external tools, like DSBulk that are flexible and heavily optimized for that task. See following blog posts for examples:
https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations

Adding column to left hand side of table - Python docx

I have searched high and low (on various forums) and simply can't find the answer. I have a table in a docx file and would like to use the docx Python module to modify it.
I need to add a column to the left side of the table. According to the documentation, using the add_column() function adds a column to the right side of the table.
I have also tried changing the directionality of the table to a RTL table with the following code:
import docx
from docx.enum.table import WD_TABLE_DIRECTION
file = test.docx
doc = docx.Document(file)
tbls = doc.tables #this gives me 3 tables in a list of table objects
test = tbls[1]
test.table_direction = WD_TABLE_DIRECTION.RTL
test.add_column(1)
doc.save(file)
Upon opening the resulting file, I found that the code still adds a column only to the left side.
Does someone know how to add a column to the right side of a table?
Many thanks in advance!
You can try LTR, and also use Inches to define the column width so that the added column can be displayed correctly.
import docx
from docx.enum.table import WD_TABLE_DIRECTION
from docx.shared import Cm, Inches
file = 'test.docx'
doc = docx.Document(file)
tbls = doc.tables
test = tbls[1]
test.table_direction = WD_TABLE_DIRECTION.LTR
test.add_column(Inches(1.0))
doc.save(file)

Extract and sort data from .mdb file using mdbtools in Python

I'm quite new to Python, so any help will be appreciated. I am trying to extract and sort data from 2000 .mdb files using mdbtools on Linux. So far I was able to just take the .mdb file and dump all the tables into .csv. It creates huge mess since there are lots of files that need to be processed.
What I need is to extract particular sorted data from particular table. Like for example, I need the table called "Voltage". The table consists of numerous cycles and each cycle has several rows also. The cycles usually go in chronological order, but in some cases time stamp get recorded with delay. Like cycle's one first row can have later time than cycles 1 first row. I need to extract the latest row of the cycle based on time for the first or last five cycles. For example, in table below, I will need the second row.
Cycle# Time Data
1 100.59 34
1 101.34 54
1 98.78 45
2
2
2 ...........
Here is the script I use. I am using the command python extract.py table_files.mdb. But I would like the script to just be invoked with ./extract.py. The path to filenames should be in the script itself.
import sys, subprocess, os
DATABASE = sys.argv[1]
subprocess.call(["mdb-schema", DATABASE, "mysql"])
# Get the list of table names with "mdb-tables"
table_names = subprocess.Popen(["mdb-tables", "-1", DATABASE],
stdout=subprocess.PIPE).communicate()[0]
tables = table_names.splitlines()
print "BEGIN;" # start a transaction, speeds things up when importing
sys.stdout.flush()
# Dump each table as a CSV file using "mdb-export",
# converting " " in table names to "_" for the CSV filenames.
for table in tables:
if table != '':
filename = table.replace(" ","_") + ".csv"
file = open(filename, 'w')
print("Dumping " + table)
contents = subprocess.Popen(["mdb-export", DATABASE, table],
stdout=subprocess.PIPE).communicate()[0]
file.write(contents)
file.close()
Personally, I wouldn't spend a whole lot of time fussing around trying to get mdbtools, unixODBC and pyodbc to work together. As Pedro suggested in his comment, if you can get mdb-export to dump the tables to CSV files then you'll probably save a fair bit of time by just importing those CSV files into SQLite or MySQL, i.e., something that will be more robust than using mdbtools on the Linux platform.
A few suggestions:
Given the sheer number of .mdb files (and hence .csv files) involved, you'll probably want to import the CSV data into one big table with an additional column to indicate the source filename. That will be much easier to manage than ~2000 separate tables.
When creating your target table in the new database you'll probably want to use a decimal (as opposed to float) data type for the [Time] column.
At the same time, rename the [Cycle#] column to just [Cycle]. "Funny characters" in column names can be a real nuisance.
Finally, to select the "last" reading (largest [Time] value) for a given [SourceFile] and [Cycle] you can use a query something like this:
SELECT
v1.SourceFile,
v1.Cycle,
v1.Time,
v1.Data
FROM
Voltage v1
INNER JOIN
(
SELECT
SourceFile,
Cycle,
MAX([Time]) AS MaxTime
FROM Voltage
GROUP BY SourceFile, Cycle
) v2
ON v1.SourceFile=v2.SourceFile
AND v1.Cycle=v2.Cycle
AND v1.Time=v2.MaxTime
To bring it directly to Pandas in python3 I wrote this little snippet
import sys, subprocess, os
from io import StringIO
import pandas as pd
VERBOSE = True
def mdb_to_pandas(database_path):
subprocess.call(["mdb-schema", database_path, "mysql"])
# Get the list of table names with "mdb-tables"
table_names = subprocess.Popen(["mdb-tables", "-1", database_path],
stdout=subprocess.PIPE).communicate()[0]
tables = table_names.splitlines()
sys.stdout.flush()
# Dump each table as a stringio using "mdb-export",
out_tables = {}
for rtable in tables:
table = rtable.decode()
if VERBOSE: print('running table:',table)
if table != '':
if VERBOSE: print("Dumping " + table)
contents = subprocess.Popen(["mdb-export", database_path, table],
stdout=subprocess.PIPE).communicate()[0]
temp_io = StringIO(contents.decode())
print(table, temp_io)
out_tables[table] = pd.read_csv(temp_io)
return out_tables
There's an alternative to mdbtools for Python: JayDeBeApi with the UcanAccess driver. It uses a Python -> Java bridge which slows things down, but I've been using it with considerable success and comes with decent error handling.
It takes some practice setting it up, but if you have a lot of databases to wrangle, it's well worth it.

Categories

Resources