I need to create a lookup table to store tabular data and retrieve the records based on multiple field values.
I found an example post # 15418386 which does almost what I need, however it always returns the same record regardless of the argument being passed.
I listed the code at the bottom of this post, in casr the link does not work.
I have verified that the file is read correctly and the data table is being populated properly as well by using the debugger in the IDE (Im using PyCharm).
The test data included in the code is:
name,age,weight,height
Bob Barker,25,175,6ft 2in
Ted Kingston,28,163,5ft 10in
Mary Manson,27,140,5ft 6in
Sue Sommers,27,132,5ft 8in
Alice Toklas,24,124,5ft 6in
The function always returns the last record, I believe the problem is in these lines of code. But I don't understand how it works.
matches = [self.records[index]
for index in self.lookup_tables[field].get(value, []) ]
return matches if matches else None
I would like to understand how the code is supposed to work so I can edit it to be able to search on multiple parameters.
original code:
from collections import defaultdict, namedtuple
import csv
class DataBase(object):
def __init__(self, csv_filename, recordname):
# read data from csv format file int list of named tuples
with open(csv_filename, 'rb') as inputfile:
csv_reader = csv.reader(inputfile, delimiter=',')
self.fields = csv_reader.next() # read header row
self.Record = namedtuple(recordname, self.fields)
self.records = [self.Record(*row) for row in csv_reader]
self.valid_fieldnames = set(self.fields)
# create an empty table of lookup tables for each field name that maps
# each unique field value to a list of record-list indices of the ones
# that contain it.
self.lookup_tables = defaultdict(lambda: defaultdict(list))
def retrieve(self, **kwargs):
"""Fetch a list of records with a field name with the value supplied
as a keyword arg ( or return None if there aren't any)."""
if len(kwargs) != 1:
raise ValueError(
'Exactly one fieldname/keyword argument required for function '
'(%s specified)' % ', '.join([repr(k) for k in kwargs.keys()])
)
field, value = kwargs.items()[0] # get only keyword arg and value
if field not in self.valid_fieldnames:
raise ValueError('keyword arg "%s" isn\'t a valid field name' % field)
if field not in self.lookup_tables: # must create field look up table
for index, record in enumerate(self.records):
value = getattr(record, field)
self.lookup_tables[field][value].append(index)
matches = [self.records[index]
for index in self.lookup_tables[field].get(value, []) ]
return matches if matches else None
if __name__ == '__main__':
empdb = DataBase('employee.csv', 'Person')
print "retrieve(name='Ted Kingston'):", empdb.retrieve(name='Ted Kingston')
print "retrieve(age='27'):", empdb.retrieve(age='27')
print "retrieve(weight='150'):", empdb.retrieve(weight='150')
The variable value is overwritten in the following if .. for .. block:
field, value = kwargs.items()[0] # <--- `value` defined
...
if field not in self.lookup_tables:
for index, record in enumerate(self.records):
value = getattr(record, field) # <--- `value` overwritten
self.lookup_tables[field][value].append(index)
So, value refers the value of the last record. You need to use another name to prevent such overwriting.
if field not in self.lookup_tables:
for index, record in enumerate(self.records):
v = getattr(record, field)
self.lookup_tables[field][v].append(index)
Related
I'm trying to insert rows into a table after changing its schema in Cassandra with the CQLEngine python library. Before the change, the model looked like:
class MetricsByDevice(Model):
device = columns.Text(primary_key=True, partition_key=True)
datetime = columns.DateTime(primary_key=True, clustering_order="DESC")
load_power = columns.Double()
inverter_power = columns.Double()
I've changed the schema to this, adding four columns (DSO, node, park and commercializer):
class MetricsByDevice(Model):
device = columns.Text(primary_key=True, partition_key=True)
datetime = columns.DateTime(primary_key=True, clustering_order="DESC")
DSO = columns.Text(index=True, default='DSO_1'),
node = columns.Text(index=True, default='Node_1'),
park = columns.Integer(index=True, default=6),
commercializer = columns.Text(index=True, default='Commercializer_1'),
load_power = columns.Double()
inverter_power = columns.Double()
Then, I've synced the table with a script containing the line
sync_table(MetricsByDate)
I've checked the database and the four columns have been created. The existing rows has these fields with value NULL (as expected).
Then I've modified the script in charge of inserting in batch rows including the values corresponding to the new fields. It looks like:
batch = BatchQuery()
for idx, message in enumerate(consumer):
data = message.value
ts_to_insert = dateutil.parser.parse(data['timestamp'])
filters = get_filters(message.partition_key)
MetricsByDate.batch(batch).create(
device=device,
date=str(ts_to_insert.date()),
time=str(ts_to_insert.time()),
created_at=now,
DSO=str(filters['DSO']),
node=str(filters['node']),
park=int(filters['park']),
commercializer=str(filters['commercializer']),
load_power=data['loadPower'],
inverter_power=data['inverterPower'],
)
if idx % 100 == 0: # Insert every 100 messages
batch.execute()
# Reset batch
batch = BatchQuery()
I've already checked that the values corresponding to the new fields aren't None and have the correct type. Nevertheless, it's inserting all the row correctly but the values in the new fields, that are NULL in Cassandra.
The batch insertion does not return any errors. I don't know if I'm missing something, or if I need to do an extra step to update the schema. I've been looking in the docs, but I can't find anything that helps.
Is there anything I'm doing wrong?
EDIT
After Alex Ott suggestion, I've inserted the lines one by one. Changing the code to:
for idx, message in enumerate(consumer):
data = message.value
ts_to_insert = dateutil.parser.parse(data['timestamp'])
filters = get_filters(message.partition_key)
metrics_by_date = MetricsByDate(
device=device,
date=str(ts_to_insert.date()),
time=str(ts_to_insert.time()),
created_at=now,
DSO=str(filters['DSO']),
node=str(filters['node']),
park=int(filters['park']),
commercializer=str(filters['commercializer']),
load_power=data['loadPower'],
inverter_power=data['inverterPower'],
)
metrics_by_date.save()
If before executing the line metrics_by_date.save() I add these print statements:
print(metrics_by_date.DSO)
print(metrics_by_date.park)
print(metrics_by_date.load_power)
print(metrics_by_date.device)
print(metrics_by_date.date)
The output is:
(<cassandra.cqlengine.columns.Text object at 0x7ff0b492a670>,)
(<cassandra.cqlengine.columns.Integer object at 0x7ff0b492d190>,)
256.99
SQ3-3.2.3.1-70-17444
2020-04-22
In the fields that are new I'm getting a cassandra object, but in the others I get their values. It maybe is a clue, because it continues to insert NULL in the new column.
Finally I got It.
It was something stupid, in the model definition, for not knwon reasons, I've added commas to separate fields instead of linebreaks...
So correcting the model definition to:
class MetricsByDevice(Model):
device = columns.Text(primary_key=True, partition_key=True)
datetime = columns.DateTime(primary_key=True, clustering_order="DESC")
DSO = columns.Text(index=True, default='DSO_1')
node = columns.Text(index=True, default='Node_1')
park = columns.Integer(index=True, default=6)
commercializer = columns.Text(index=True, default='Commercializer_1')
load_power = columns.Double()
inverter_power = columns.Double()
It works!!
I am attempting to query all rows for a column called show_id. I would then like to compare each potential item to be added to the DB with the results. Now the simplest way I can think of doing that is by checking if each show is in the results. If so pass etc. However the results from the below snippet are returned as objects. So this check fails.
Is there a better way to create the query to achieve this?
shows_inDB = Show.query.filter(Show.show_id).all()
print(shows_inDB)
Results:
<app.models.user.Show object at 0x10c2c5fd0>,
<app.models.user.Show object at 0x10c2da080>,
<app.models.user.Show object at 0x10c2da0f0>
Code for the entire function:
def save_changes_show(show_details):
"""
Save the changes to the database
"""
try:
shows_inDB = Show.query.filter(Show.show_id).all()
print(shows_inDB)
for show in show_details:
#Check the show isnt already in the DB
if show['id'] in shows_inDB:
print(str(show['id']) + ' Already Present')
else:
#Add show to DB
tv_show = Show(
show_id = show['id'],
seriesName = str(show['seriesName']).encode(),
aliases = str(show['aliases']).encode(),
banner = str(show['banner']).encode(),
seriesId = str(show['seriesId']).encode(),
status = str(show['status']).encode(),
firstAired = str(show['firstAired']).encode(),
network = str(show['network']).encode(),
networkId = str(show['networkId']).encode(),
runtime = str(show['runtime']).encode(),
genre = str(show['genre']).encode(),
overview = str(show['overview']).encode(),
lastUpdated = str(show['lastUpdated']).encode(),
airsDayOfWeek = str(show['airsDayOfWeek']).encode(),
airsTime = str(show['airsTime']).encode(),
rating = str(show['rating']).encode(),
imdbId = str(show['imdbId']).encode(),
zap2itId = str(show['zap2itId']).encode(),
added = str(show['added']).encode(),
addedBy = str(show['addedBy']).encode(),
siteRating = str(show['siteRating']).encode(),
siteRatingCount = str(show['siteRatingCount']).encode(),
slug = str(show['slug']).encode()
)
db.session.add(tv_show)
db.session.commit()
except Exception:
print(traceback.print_exc())
I have decided to use the method above and extract the data I wanted into a list, comparing each show to the list.
show_compare = []
shows_inDB = Show.query.filter().all()
for item in shows_inDB:
show_compare.append(item.show_id)
for show in show_details:
#Check the show isnt already in the DB
if show['id'] in show_compare:
print(str(show['id']) + ' Already Present')
else:
#Add show to DB
For querying a specific column value, have a look at this question: Flask SQLAlchemy query, specify column names. This is the example code given in the top answer there:
result = SomeModel.query.with_entities(SomeModel.col1, SomeModel.col2)
The crux of your problem is that you want to create a new Show instance if that show doesn't already exist in the database.
Querying the database for all shows and looping through the result for each potential new show might become very inefficient if you end up with a lot of shows in the database, and finding an object by identity is what an RDBMS does best!
This function will check to see if an object exists, and create it if not. Inspired by this answer:
def add_if_not_exists(model, **kwargs):
if not model.query.filter_by(**kwargs).first():
instance = model(**kwargs)
db.session.add(instance)
So your example would look like:
def add_if_not_exists(model, **kwargs):
if not model.query.filter_by(**kwargs).first():
instance = model(**kwargs)
db.session.add(instance)
for show in show_details:
add_if_not_exists(Show, id=show['id'])
If you really want to query all shows upfront, instead of putting all of the id's into a list, you could use a set instead of a list which will speed up your inclusion test.
E.g:
show_compare = {item.show_id for item in Show.query.all()}
for show in show_details:
# ... same as your code
So the code runs until inserting new row, at which time I get>>>
'attempting to insert [ 3 item result] into these columns [ 5 items]. I have tried to discover where my code is causing a loss in results, but cannot. Any suggestions would be great.
Additional information, my feature class I am inserting has five fields and they are they same as the fields as the source fields. It reaches to my length != and my error prints. Please assist if you anyone would like.
# coding: utf8
import arcpy
import os, sys
from arcpy import env
arcpy.env.workspace = r"E:\Roseville\ScriptDevel.gdb"
arcpy.env.overwriteOutput = bool('TRUE')
# set as python bool, not string "TRUE"
fc_buffers = "Parcels" # my indv. parcel buffers
fc_Landuse = "Geology" # my land use layer
outputLayer = "IntersectResult" # output layer
outputFields = [f.name for f in arcpy.ListFields(outputLayer) if f.type not in ['OBJECTID', "Geometry"]] + ['SHAPE#']
landUseFields = [f.name for f in arcpy.ListFields(fc_Landuse) if f.type not in ['PTYPE']]
parcelBufferFields = [f.name for f in arcpy.ListFields(fc_buffers) if f.type not in ['APN']]
intersectionFeatureLayer = arcpy.MakeFeatureLayer_management(fc_Landuse, 'intersectionFeatureLayer').getOutput(0)
selectedBuffer = arcpy.MakeFeatureLayer_management(fc_buffers, 'selectedBuffer').getOutput(0)
def orderFields(luFields, pbFields):
ordered = []
for field in outputFields:
# append the matching field
if field in landUseFields:
ordered.append(luFields[landUseFields.index(field)])
if field in parcelBufferFields:
ordered.append(pbFields[parcelBufferFields.index(field)])
return ordered
print pbfields
with arcpy.da.SearchCursor(fc_buffers, ["OBJECTID", 'SHAPE#'] + parcelBufferFields) as sc, arcpy.da.InsertCursor(outputLayer, outputFields) as ic:
for row in sc:
oid = row[0]
shape = row[1]
print (oid)
print "Got this far"
selectedBuffer.setSelectionSet('NEW', [oid])
arcpy.SelectLayerByLocation_management(intersectionFeatureLayer,"intersect", selectedBuffer)
with arcpy.da.SearchCursor(intersectionFeatureLayer, ['SHAPE#'] + landUseFields) as intersectionCursor:
for record in intersectionCursor:
recordShape = record[0]
print "list made"
outputShape = shape.intersect(recordShape, 4)
newRow = orderFields(row[2:], record[1:]) + [outputShape]
if len(newRow) != len(outputFields):
print 'there is a problem. the number of columns in the record you are attempting to insert into', outputLayer, 'does not match the number of destination columns'
print '\tattempting to insert:', newRow
print '\tinto these columns:', outputFields
continue
# insert into the outputFeatureClass
ic.insertRow(newRow)
Your with statement where you define the cursors is creating a input cursor with 5 fields, but your row you are trying to feed it is only 3 fields. You need to make sure your insert cursor is the same length as the row. I suspect the problem is actually in the orderfields method. Or what you pass to it.
I have a yaml file of the form below:
Solution:
- number of solutions: 1
number of solutions displayed: 1
- Gap: None
Status: optimal
Message: bonmin\x3a Optimal
Objective:
objective:
Value: 0.010981105395
Variable:
battery_E[b1,1,1]:
Value: 0.25
battery_E[b1,1,2]:
Value: 0.259912707017
battery_E[b1,2,1]:
Value: 0.120758408109
battery_E[b2,1,1]:
Value: 0.0899999972181
battery_E[b2,2,3]:
Value: 0.198967393893
windfarm_L[w1,2,3]:
Value: 1
windfarm_L[w1,3,1]:
Value: 1
windfarm_L[w1,3,2]:
Value: 1
Using Python27, I would like to import all battery_E values from this YAML file. I know I can iterate over the keys of battery_E dictionary to retrieve them one by one (I am already doing it using PyYAML) but I would like to avoid iterating and do it in one go!
It's not possible "in one go" - there will still be some kind of iteration either way, and that's completely OK.
However, if the memory is a concern, you can load only values of the keys of interest during YAML loading:
from __future__ import print_function
import yaml
KEY = 'battery_E'
class Loader(yaml.SafeLoader):
def __init__(self, stream):
super(Loader, self).__init__(stream)
self.values = []
def compose_mapping_node(self, anchor):
start_event = self.get_event()
tag = start_event.tag
if tag is None or tag == '!':
tag = self.resolve(yaml.MappingNode, None, start_event.implicit)
node = yaml.MappingNode(tag, [],
start_event.start_mark, None,
flow_style=start_event.flow_style)
if anchor is not None:
self.anchors[anchor] = node
while not self.check_event(yaml.MappingEndEvent):
item_key = self.compose_node(node, None)
item_value = self.compose_node(node, item_key)
if (isinstance(item_key, yaml.ScalarNode)
and item_key.value.startswith(KEY)
and item_key.value[len(KEY)] == '['):
self.values.append(self.construct_object(item_value, deep=True))
else:
node.value.append((item_key, item_value))
end_event = self.get_event()
node.end_mark = end_event.end_mark
return node
with open('test.yaml') as f:
loader = Loader(f)
try:
loader.get_single_data()
finally:
loader.dispose()
print(loader.values)
Note however, that this code does not assume anything about the position of battery_E keys in the tree inside the YAML file - it will just load all of their values.
There is no need to retrieve each entry using PyYAML, you can load the data once, and then use Pythons to select the key-value pairs with the following two lines:
data = yaml.safe_load(open('input.yaml'))
kv = {k:v['Value'] for k, v in data['Solution'][1]['Variable'].items() if k.startswith('battery_E')}
after that kv contains:
{'battery_E[b2,2,3]': 0.198967393893, 'battery_E[b1,1,1]': 0.25, 'battery_E[b1,1,2]': 0.259912707017, 'battery_E[b2,1,1]': 0.0899999972181, 'battery_E[b1,2,1]': 0.120758408109}
I am currently trying to populate 2 fields. They are both already created within a table that I want to populate with data from existing feature classes. The idea is copy all data from desired feature classes that match a particular Project #. The rows that match the project # will copy over to blank template with the matching fields. So far all is good except I need to push the data from the OBJECT ID field and the Name of the Feature Class in to 2 fields within the table.
**def featureClassName(table_path):
arcpy.AddMessage("Calculating Feature Class Name...")
print "Calculating Feature Class Name..."
featureClass = "FeatureClass"
SDE_ID = "SDE_ID"
fc_desc = arcpy.Describe(table_path)
lists = arcpy.ListFields(table_path)
print lists
with arcpy.da.SearchCursor(table_path, featureClass = "\"NAME\"" + " Is NULL") as cursor:
for row in cursor:
print row
if row.FEATURECLASS = str.replace(row.FEATURECLASS, "*", fc):
cursor.updateRow(row)
print row
del cursor, row
else:
pass**
The Code above is my attempt, out of many to populate the field with the Name of the Feature class.
I have attemped to do the same with the OID.
**for fc in fcs:
print fc
if fc:
print "Making Layer..."
lyr = arcpy.MakeFeatureLayer_management (fc, r"in_memory\temp", whereClause)
fcCount = int(arcpy.GetCount_management(lyr).getOutput(0))
print fcCount
if fcCount > 0:
tbl = arcpy.CopyRows_management(lyr, r"in_memory\temp2")
arcpy.AddMessage("Checking for Feature Class Name...")
arcpy.AddMessage("Appending...")
print "Appending..."
arcpy.Append_management(tbl, table_path, "NO_TEST")
print "Checking for Feature Class Name..."
featureClassName(table_path)
del fc, tbl, lyr, fcCount
arcpy.Delete_management(r"in_memory\temp")
arcpy.Delete_management(r"in_memory\temp2")
else:
arcpy.AddMessage("Pass... " + fc)
print ("Pass... " + fc)
del fc, lyr, fcCount
arcpy.Delete_management(r"in_memory\temp")
pass**
This code is the main loop for the feature classes within the dataset that i create a new layer/table to use for copying the data to the table. The data for Feature Class Name and OID dont have data to push, so thats where I am stuck.
Thanks Everybody
You have a number of things wrong. First, you are not setting up the cursor correctly. It has to be a updateCursor if you are going to update, and you called a searchCursor, which you called incorrectly, by the way. Second, you used = (assignment) instead of == (equality comparison) in the line "if row.FEATURECLASS ... Then 2 lines below that, your indentation is messed up on several lines. And it's not clear at all that your function knows the value of fc. Pass that as an arg to be sure. Bunch of other problems exist, but let's just give you an example that will work, and you can study it:
def featureClassName(table_path, fc):
'''Will update the FEATURECLASS FIELD in table_path rows with
value of fc (string) where FEATURECLASS field is currently null '''
arcpy.AddMessage("Calculating Feature Class Name...")
print "Calculating Feature Class Name..."
#delimit field correctly for the query expression
df = arcpy.AddFieldDelimiters(fc, 'FEATURECLASS')
ex = df + " is NULL"
flds = ['FEATURECLASS']
#in case we don't get rows, del will bomb below unless we put in a ref
#to row
row = None
#do the work
with arcpy.da.UpdateCursor(table_path, flds, ex) as cursor:
for row in cursor:
row[0] = fc #or basename, don't know which you want
cursor.updateRow(row)
del cursor, row
Notice we are now passing the name of the fc as an arg, so you will have to deal with that in the rest of your code. Also it's best to use AddFieldDelimiter, since different fc's require different delimiters, and the docs are not clear at all on this (sometimes they are just wrong).
good luck, Mike