Input FITS table to astroquery.xmatch - python

I need to use xmatch from the astroquery package to cross match a large local catalogue with 2MASS.
I load my local FITS table with astropy as usual:
from astropy.io import fits
hdu = fits.open(root+'mycat.fits')
Then try to use xmatch with that table (the table is hdu[2]) following the syntax described in the astroquery docs:
from astroquery.xmatch import XMatch
table = XMatch.query(cat1=hdu[2],
cat2='vizier:II/246/out',
max_distance=1 * u.arcsec, colRA1='RA',
colDec1='Dec')
But get the following error:
AttributeError: 'BinTableHDU' object has no attribute 'read'
The examples on the astroquery docs only shows how to give a local CSV file. But my catalogue has about 7 million entries, so it is not convenient to pass it as an ASCII CSV file.
How should I pass my FITS table as input? Thanks!

While xmatch can accept a file object as input, that file object has to be a Vizier-style .csv table. You need to convert your FITS table to an astropy table first, e.g.
from astropy.table import Table
myTable = Table(data=hdu[2].data)

Related

can't read .dbt file: Unknown dbf type

I get following error when I try to read a .dbt file with this library.
import dbf
table = dbf.Table(filename='test.dbt')
table.open(dbf.READ_ONLY)
>>dbf.DbfError: Unknown dbf type: 16 (10)
You can find the example data here.
The .dbt is the memo file, and cannot be opened separately from the .dbf file. What you need to do is
table = dbf.Table(filename='test.dbf')
and the fields stored in the .dbt file will automatically be available.

Read dbf with arcpy in PyCharm?

I have exported an ArcGIS Desktop 10.7 table into a dbf file.
Now I want to do some GIS calculation in standalone Python.
Therefore I have started a PyCharm project referencing the ArcGIS Python interpreter and hence am able to import arcpy into my main.py.
Problem is: I don't want to pip install other modules, but I don't know how to correctly read the dbf table with arcpy.
#encoding=utf-8
import arcpy
path=r"D:\test.dbf"
sc=arcpy.SearchCursor(path) # Does not work: IOError exception str() failed
tv=arcpy.mapping.TableView(path) # Does not work either: StandaloneObject invalid data source or table
The dbf file is correct, it can be read into ArcGIS.
Can someone please give me an idea, how to read the file standalone with arcpy?
Using pandas
Python from ArcMap comes with some modules. You can load the data into a pandas.DataFrame and work with this format. Pandas is well-documented and there is a lot of already asked question about it all over the web. It's also super easy to do groupby or table manipulations.
import pandas as pd
import arcpy
def read_arcpy_table(self, table, fields='*', null_value=None):
"""
Transform a table from ArcMap into a pandas.DataFrame object
table : Path the table
fields : Fields to load - '*' loads all fields
null_value : choose a value to replace null values
"""
fields_type = {f.name: f.type for f in arcpy.ListFields(table)}
if fields == '*':
fields = fields_type.keys()
fields = [f.name for f in arcpy.ListFields(table) if f.name in fields]
fields = [f for f in fields if f in fields_type and fields_type[f] != 'Geometry'] # Remove Geometry field if FeatureClass to avoid bug
# Transform in pd.Dataframe
np_array = arcpy.da.FeatureClassToNumPyArray(in_table=table,
field_names=fields,
skip_nulls=False,
null_value=null_value)
df = self.DataFrame(np_array)
return df
# Add the function into the loaded pandas module
pd.read_arcpy_table = types.MethodType(read_arcpy_table, pd)
df = pd.read_arcpy_table(table='path_to_your_table')
# Do whatever calculations need to be done
Using cursor
You can also use arcpy cursors and dict for simple calculation.
There are simple example on this page on how to use correctly cursors :
https://desktop.arcgis.com/fr/arcmap/10.3/analyze/arcpy-data-access/searchcursor-class.htm
My bad,
after reading the Using cursor approach, I figured out that using the
sc=arcpy.SearchCursor(path) # Does not work: IOError exception str() failed
approach was correct, but at the time around 3 AM, I was a little bit exhausted and missed the typo in the path that caused the error. Nevertheless, a more descriptive error message e.g. IOError could not open file rather than IOError exception str() failed would have solved my mistake as acrGIS newbie.. : /

how to load csv data without header into cassandra column family using python script

I have created a column family in local cassandra as below with cqlsh.
CREATE TABLE sample.stackoverflow_question12 (
id1 int,
class1 int,
name1 text,
PRIMARY KEY (id1)
)
I have a sample csv file with name "data.csv" and the data in the file is as below.
id1 | name1 |class1
1 | hello | 10
2 | world | 20
Used the below python code to connect db and load data from csv by using Anaconda (After installation of Cassandra driver using pip in anaconda)
#Connecting to local Cassandra server
from Cassandra.Cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
auth_provider = PlainTextAuthProvider(username='cassandra', password='cassandra')
cluster = Cluster(["127.0.0.1"],auth_provider = auth_provider,protocol_version=4)
session = cluster.connect()
session.set_keyspace('sample')
cluster.connect()
#File loading
prepared = session.prepare(' Insert into stackoverflow_question12 (id1,class1,name1)VALUES (?, ?, ?)')
with open('D:/Cassandra/NoSQL/data.csv', 'r') as fares:
for fare in fares:
columns=fare.split(",")
id1=columns[0]
class1=columns[1]
name1=columns[2]
session.execute(prepared, [id1,class1,name1])
#closing the file
fares.close()
when I executed the above code getting below error.
Received an argument of invalid type for column "id1". Expected: <class 'cassandra.cqltypes.Int32Type'>, Got: <class 'str'>; (required argument is not an integer)
When I changed data types to text and ran the above code then it loads data with header fields too.
Can anyone help me to make changes in my code to load data without header content? or your successful code also fine if any.
The reason to make column names as id1 and class1 is id and class are keywords and throwing error in the code when used within "fares" loop.
But in real world column names would be seen as class and id. How to run code when these type of columns came into picture?
The another question I got in mind is Cassandra will store primary key first then remaining keys in ascending order. Can we load csv columns which are not indexed same as Cassandra columns storage?
Based on this, I need to build another solution.
You need to use types accordingly to your schema - for integer columns you need to use int(columns...) because split generates strings. If you want to skip header, then you can do something like this:
cnt = 0
with open('D:/Cassandra/NoSQL/data.csv', 'r') as fares:
if cnt = 0:
continue
for fare in fares:
...
Although it's better to use Python's built-in CSV reader that could be customized to skip header automatically...
P.S. If you just want to load data from CSV, I recommend to use external tools, like DSBulk that are flexible and heavily optimized for that task. See following blog posts for examples:
https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations

Pyarrow.lib.Schema vs. pyarrow.parquet.Schema

When I try to load across a many-partitioned parquet file, some of the schema get inferred invalidly because of missing data which fills the schema in with nulls. I would think specifying the schema in the pyarrow.parquet.ParquetDataset would fix this but I don't know how to construct a schema of the correct pyarrow.parquet.Schema type. Some example code:
import pyarrow as pa
import pa.parquet as pq
test_schema = pa.schema([pa.field('field1', pa.string()), pa.field('field2', pa.float64())])
paths = ['test_root/partition1/file1.parquet', 'test_root/partition2/file2.parquet']
dataset = pq.ParquetDataset(paths, schema=schema)
And the error:
AttributeError: 'pyarrow.lib.Schema' object has no attribute 'to_arrow_schema'
But I can't find any documentation on how to construct a pyarrow.parquet.Schema schema as in the docs (https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html) and have only made a pyarrow.lib.Schema which gives the above error.
There is not an API to construct a Parquet schema in Python yet. You can use one that you read from a particular file, though (see pq.ParquetFile(...).schema).
Could you open an issue on the ARROW JIRA project to request the feature to construct Parquet schemas in Python?
https://issues.apache.org/jira
So thank you (whoever you are) if there was the ticket and fix in ARROW JIRA of this.
I was able to merge schemas of files in dataset and read dataset:
import pyarrow as pa
import pa.parquet as pq
merged_schema = pa.schema([])
for filename in os.listdir(dataset_folder):
schema_ = pq.read_table(os.path.join(dataset_folder, filename)).schema
merged_schema = pa.unify_schemas([schema_, merged_schema])
Read dataset:
dset = pq.ParquetDataset(
'my_dataset_folder',
schema=merged_schema,
use_legacy_dataset=False
).read()

Incorporating attribute table into python

I am working in python with ArcMap & had a question. Is there a way to import the data from an attribute table into python, and if you can how do you select which attributes to print?
Thank You
A clean looking way of presenting an attribute table in python can be executed through a pandas dataframe...especially if you are working with shapefiles. The input must be a dbf file.
# Pandas Option, python 2.7
import arcpy, os, pandas
inTable = r'.dbf'
os.chdir(os.path.dirname(inTable))
outTable = 'table.xls'
arcpy.TableToExcel_conversion(inTable, outTable)
df = pandas.read_excel(outTable)
df.head() # Shows first five records of attribute table
Using SearchCursors and UpdateCursors is also an option if you want to update fields and work directly with the shapefile.
# SearchCursor option, python 2.7
import arcpy, os
shapefile = r'.shp'
cursor = arcpy.da.SearchCursor(shapefile, '*')
attributes = []
for row in cursor:
attributes.append(row)
If you want to find a specific record by a certain field name...for example ID 50...
for record in attributes:
if record[0] == 50:
print record

Categories

Resources