I am looking use PowerShell to output some JSON that looks like this for use with a Python script:
{
"run_date": "2020-08-27",
"total_queries": 4,
"number_results": 3,
"number_warnings": 1,
"number_errors": 5,
"build_url": "https://some-url.com",
"queries":{
"query_a":{
"database_a": "102 rows",
"database_b": "Error: See pipeline logs for details"
},
"query_b": "No results",
"query_c": {
"database_a": "Warning: Number of results exceeded maximum threshold - 6509 rows",
"database_c": "Error: See pipeline logs for details",
"database_d": "Error: See pipeline logs for details"
}
} }
(Ignore the above closing bracket, it won't format properly on here for some reason).
I am using a foreach loop within PowerShell to run each of these queries sequentially depending on which databases they need to be ran on.
I know in Python I can create a template of the JSON like so:
options = {
'run_date': os.environ['SYSTEM_PIPELINESTARTTIME'].split()[0],
'total_queries': 0,
'number_results': 0,
'number_warnings': 0,
'number_errors': 0,
'build_url': 'options = {
'run_date': os.environ['SYSTEM_PIPELINESTARTTIME'].split()[0],
'total_hunts': 0,
'number_results': 0,
'number_warnings': 0,
'number_errors': 0,
'build_url': 'https://some-url.com',
'queries': {} }
and then use something like:
options['queries'][filename][database] = '{} rows'.format(len(data))
To add data into the Python dictionaries.
I've tried using nested PSCustomObjects but I end up with a conflict when different queries are being ran on the same database, so its trying to add a value to the PSCustomObject with the same Key. I would like to know if there is a nice 'native' way to do this in PowerShell like there is in Python.
Turns out I was just being a bit of an idiot and not remembering how to work with PowerShell objects.
Ended up first adding all the query names into the parent object like so:
foreach($name in $getqueries){
$notiObj.queries | Add-Member -NotePropertyName $name.BaseName -NotePropertyValue ([PSCustomObject]#{})}
Then adding in info about the queries themselves within the loop:
$notificationObj.queries.$queryName | Add-Member -NotePropertyName $database -NotePropertyValue "$($dataTable.Rows.Count) Rows"
If the required end-result is a Json file, there is actually no need to work with complex (and rather fat) [PSCustomObject] types. Instead you might just use a [HashTable] (or an ordered dictionary by just prefixing the hash table, like: [Ordered]#{...})
To convert hash tables from your Json file, use the ConvertFrom-Json -AsHashTable parameter (introduced in PowerShell 6.0).
To build a template (or just understand the PowerShell format), you might want to use this ConvertTo-Expression cmdlet:
$Json | ConvertFrom-Json -AsHashTable | ConvertTo-Expression
#{
'number_errors' = 5
'number_warnings' = 1
'queries' = #{
'query_b' = 'No results'
'query_a' = #{
'database_a' = '102 rows'
'database_b' = 'Error: See pipeline logs for details'
}
'query_c' = #{
'database_a' = 'Warning: Number of results exceeded maximum threshold - 6509 rows'
'database_d' = 'Error: See pipeline logs for details'
'database_c' = 'Error: See pipeline logs for details'
}
}
'build_url' = 'https://some-url.com'
'run_date' = '2020-08-27'
'number_results' = 3
'total_queries' = 4
}
Meaning you can assign this template to $Options as follows:
$Options = #{
'number_errors' = 5
'number_warnings' = 1
'queries' = #{ ...
And easily change your properties in your nested objects, like:
$Options.Queries.query_c.database_d = 'Changed'
Or add a new property to a nested object:
$Options.Queries.query_a.database_c = 'Added'
Which result in:
$Options | ConvertTo-Json
{
"run_date": "2020-08-27",
"queries": {
"query_a": {
"database_c": "Added",
"database_b": "Error: See pipeline logs for details",
"database_a": "102 rows"
},
"query_b": "No results",
"query_c": {
"database_c": "Error: See pipeline logs for details",
"database_d": "Changed",
"database_a": "Warning: Number of results exceeded maximum threshold - 6509 rows"
}
},
"number_results": 3,
"build_url": "https://some-url.com",
"total_queries": 4,
"number_errors": 5,
"number_warnings": 1
}
Related
I have the following complex data that would like to parse in PySpark:
records = '[{"segmentMembership":{"ups":{"FF6KCPTR6AQ0836R":{"lastQualificationTime":"2021-01-16 22:05:11.074357","status":"exited"},"QMS3YRT06JDEUM8O":{"lastQualificationTime":"2021-01-16 22:05:11.074357","status":"realized"},"8XH45RT87N6ZV4KQ":{"lastQualificationTime":"2021-01-16 22:05:11.074357","status":"exited"}}},"_aepgdcdevenablement2":{"emailId":{"address":"stuff#someemail.com"},"person":{"name":{"firstName":"Name2"}},"identities":{"customerid":"PH25PEUWOTA7QF93"}}},{"segmentMembership":{"ups":{"FF6KCPTR6AQ0836R":{"lastQualificationTime":"2021-01-16 22:05:11.074457","status":"realized"},"D45TOO8ZUH0B7GY7":{"lastQualificationTime":"2021-01-16 22:05:11.074457","status":"realized"},"QMS3YRT06JDEUM8O":{"lastQualificationTime":"2021-01-16 22:05:11.074457","status":"existing"}}},"_aepgdcdevenablement2":{"emailId":{"address":"stuff4#someemail.com"},"person":{"name":{"firstName":"TestName"}},"identities":{"customerid":"9LAIHVG91GCREE3Z"}}}]'
df = spark.read.json(sc.parallelize([records]))
df.show()
df.printSchema()
The problem I am having is with the segmentMembership object. The JSON object looks like this:
"segmentMembership": {
"ups": {
"FF6KCPTR6AQ0836R": {
"lastQualificationTime": "2021-01-16 22:05:11.074357",
"status": "exited"
},
"QMS3YRT06JDEUM8O": {
"lastQualificationTime": "2021-01-16 22:05:11.074357",
"status": "realized"
},
"8XH45RT87N6ZV4KQ": {
"lastQualificationTime": "2021-01-16 22:05:11.074357",
"status": "exited"
}
}
}
The annoying thing with this is, the key values ("FF6KCPTR6AQ0836R", "QMS3YRT06JDEUM8O", "8XH45RT87N6ZV4KQ") end up being defined as a column in pyspark.
In the end, if the status of the segment is "exited", I was hoping to get the results as follows.
+--------------------+----------------+---------+------------------+
|address |customerid |firstName|segment_id |
+--------------------+----------------+---------+------------------+
|stuff#someemail.com |PH25PEUWOTA7QF93|Name2 |[8XH45RT87N6ZV4KQ]|
|stuff4#someemail.com|9LAIHVG91GCREE3Z|TestName |[8XH45RT87N6ZV4KQ]|
+--------------------+----------------+---------+------------------+
After loading the data into a dataframe(above), I tried the following:
dfx = df.select("_aepgdcdevenablement2.emailId.address", "_aepgdcdevenablement2.identities.customerid", "_aepgdcdevenablement2.person.name.firstName", "segmentMembership.ups")
dfx.show(truncate=False)
seg_list = array(*[lit(k) for k in ["8XH45RT87N6ZV4KQ", "QMS3YRT06JDEUM8O"]])
print(seg_list)
# if v["status"] in ['existing', 'realized']
def confusing_compare(ups, seg_list):
seg_id_filtered_d = dict((k, ups[k]) for k in seg_list if k in ups)
# This is the line I am having a problem with.
# seg_id_status_filtered_d = {key for key, value in seg_id_filtered_d.items() if v["status"] in ['existing', 'realized']}
return list(seg_id_filtered_d)
final_conf_dx_pred = udf(confusing_compare, ArrayType(StringType()))
result_df = dfx.withColumn("segment_id", final_conf_dx_pred(dfx.ups, seg_list)).select("address", "customerid", "firstName", "segment_id")
result_df.show(truncate=False)
I am not able to check the status field within the value field of the dic.
You can actually do that without using UDF. Here I'm using all the segment names present in the schema and filtering out those with status = 'exited'. You can adapt it depending on which segments and status you want.
First, using the schema fields, get the list of all segment names like this:
segment_names = df.select("segmentMembership.ups.*").schema.fieldNames()
Then, by looping throught the list created above and using when function, you can create a column that can have either segment_name as value or null depending on status:
active_segments = [
when(col(f"segmentMembership.ups.{c}.status") != lit("exited"), lit(c))
for c in segment_names
]
Finally, add new column segments of array type and use filter function to remove null elements from the array (which corresponds to status 'exited'):
dfx = df.withColumn("segments", array(*active_segments)) \
.withColumn("segments", expr("filter(segments, x -> x is not null)")) \
.select(
col("_aepgdcdevenablement2.emailId.address"),
col("_aepgdcdevenablement2.identities.customerid"),
col("_aepgdcdevenablement2.person.name.firstName"),
col("segments").alias("segment_id")
)
dfx.show(truncate=False)
#+--------------------+----------------+---------+------------------------------------------------------+
#|address |customerid |firstName|segment_id |
#+--------------------+----------------+---------+------------------------------------------------------+
#|stuff#someemail.com |PH25PEUWOTA7QF93|Name2 |[QMS3YRT06JDEUM8O] |
#|stuff4#someemail.com|9LAIHVG91GCREE3Z|TestName |[D45TOO8ZUH0B7GY7, FF6KCPTR6AQ0836R, QMS3YRT06JDEUM8O]|
#+--------------------+----------------+---------+------------------------------------------------------+
I am relatively new to mongoDb in python, so kindly help
I have created a collection called waste:
class Waste(Document):
meta = {'collection': 'Waste'}
item_id = IntField(required=True)
date_time_record = DateTimeField(default=datetime.utcnow)
waste_id = IntField(unique=True, required=True)
weight = FloatField(required= True)
I want to do a range query for a given start and end date:
I have tried the following query:
start = datetime(start_year, start_month, start_day)
end = datetime(end_year, end_month, end_day)
kwargs['date_time_record'] = {'$lte': end, '$gte': start}
reports = Waste.objects(**kwargs).get()
But I keep getting the error: DoesNotExist: Waste matching query does not exist.
the date value being sent as:
{
"start_year": 2020,
"start_month" : 5,
"start_day" : 10,
"end_year": 2020,
"end_month" : 5,
"end_day" : 20
}
when I try to get the first object from the collection, the output in json is:
{"_id": {"$oid": "5ebbcf126fdbb9db9f74d24a"}, "item_id": 96387295, "date_time_record": {"$date": 1589366546870}, "waste_id": 24764942, "weight": 32546.0}
a $date is added and I am unable to decipher the numbers in the date field. But when I look at the data using the mongo compass it looks just fine:
There exist a record in the given date range so I am unable to understand where am I going wrong.
I got this working by using Q:
the query I used is
reports = Waste.objects((Q(date_time_record__gte=start) & Q(date_time_record__lte=end)))
The response is:
[{"_id": {"$oid": "5ebbcf126fdbb9db9f74d24a"}, "item_id": 96387295, "date_time_record": {"$date": 1589366546870}, "waste_id": 24764942, "weight": 32546.0}]
I am trying to build a list/dict that will be converted to JSON later on. I am trying to write the code that builds and populates the multiple levels of the JSON format I ultimately need. I am having an issue wrapping my head around this. Thank you for the help.
What I ultimately need -> Populate this list/dict:
dataset_permission_json = []
with this format:
{
"projects":[
{
"project":"test-project-1",
"datasets":[
{
"dataset":"testing1",
"permissions":[
{
"role":"READER",
"google_group":"testing1#test.com"
}
]
},
{
"dataset":"testing2",
"permissions":[
{
"role":"OWNER",
"google_group":"testing2#test.com"
}
]
},
{
"dataset":"testing3",
"permissions":[
{
"role":"READER",
"google_group":"testing3#test.com"
}
]
},
{
"dataset":"testing4",
"permissions":[
{
"role":"WRITER",
"google_group":"testing4#test.com"
}
]
}
]
}
]
}
I have multiple for loops that successfully print out the information I am pulling from an external API but I to be able to enter that data into the list/dict. The dynamic values I am trying to input are:
'project' i.e. test-project-1
'dataset' i.e. testing1
'role' i.e. READER
'google_group' i.e. testing1#test.com
I have tried things like:
dataset_permission_json.update({'project': project})
but cannot figure out how not to overwrite the data during the multiple for loops.
for project in projects:
print(project) ## Need to add this variable to 'projects'
for bq_group in bq_groups:
delegated_credentials = credentials.create_delegated(bq_group)
http_auth = delegated_credentials.authorize(Http())
list_datasets_in_project = bigquery_service.datasets().list(projectId=project).execute()
datasets = list_datasets_in_project.get('datasets',[])
print(dataset['datasetReference']['datasetId']) ##Add the dataset to 'datasets' under the project
for dataset in datasets:
get_dataset_permissions_result = bigquery_service.datasets().get(projectId=project, datasetId=dataset['datasetReference']['datasetId']).execute()
dataset_permissions = get_dataset_permissions_result.get('access',[])
### ADD THE NEXT LEVEL 'permissions' level here?
for dataset_permission in dataset_permissions:
if 'groupByEmail' in dataset_permission:
if bq_group in dataset_permission['groupByEmail']:
print(dataset['datasetReference']['datasetId'] && dataset_permission['groupByEmail']) ##Add to each dataset
I appreciate the help.
EDIT: Updated Progress
Ok I have created the nested structure that I was looking for using StackOverflow
Things are great except for the last part. I am trying to append the role & group to each 'permission' nest, but after everything runs the data is only appended to the last 'permission' nest in the JSON structure. It seems like it is overwriting itself during the for loop. Thoughts?
Updated for loop:
for project in projects:
for bq_group in bq_groups:
delegated_credentials = credentials.create_delegated(bq_group)
http_auth = delegated_credentials.authorize(Http())
list_datasets_in_project = bigquery_service.datasets().list(projectId=project).execute()
datasets = list_datasets_in_project.get('datasets',[])
for dataset in datasets:
get_dataset_permissions_result = bigquery_service.datasets().get(projectId=project, datasetId=dataset['datasetReference']['datasetId']).execute()
dataset_permissions = get_dataset_permissions_result.get('access',[])
for dataset_permission in dataset_permissions:
if 'groupByEmail' in dataset_permission:
if bq_group in dataset_permission['groupByEmail']:
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions']
permission = {'group': dataset_permission['groupByEmail'],'role': dataset_permission['role']}
dataset_permission_json['permissions'] = permission
UPDATE: Solved.
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions']
permission = {'group': dataset_permission['groupByEmail'],'role': dataset_permission['role']}
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions'] = permission
I'm using IbPy to read current orders. The response messages which come back to be processed with EWrapper methods have some attributes which appear to be of the wrong type.
To start, here is my handler for Order-related messages. It is intended to catch all messages due to having called reqAllOpenOrders().
from ib.opt import ibConnection, message
from ib.ext.Contract import Contract
from ib.ext.Order import Order
from ib.ext.OrderState import OrderState
_order_resp = dict(openOrderEnd=False, openOrder=[], openStatus=[])
def order_handler(msg):
""" Update our global Order data response dict
"""
global _order_resp
if msg.typeName in ['openStatus', 'openOrder']:
d = dict()
for i in msg.items():
if isinstance(i[1], (Contract, Order, OrderState)):
d[i[0]] = i[1].__dict__
else:
d[i[0]] = i[1]
_order_resp[msg.typeName].append(d.copy())
elif msg.typeName == 'openOrderEnd':
_order_resp['openOrderEnd'] = True
log.info('ORDER: {})'.format(msg))
In the above code, I'm loading all the objects and their attributes to a dict which is then appended to lists within _order_resp.
The log output lines show healthy interaction with IB:
25-Jan-16 14:57:04 INFO ORDER: <openOrder orderId=1, contract=<ib.ext.Contract.Contract object at 0x102a98150>, order=<ib.ext.Order.Order object at 0x102a98210>, orderState=<ib.ext.OrderState.OrderState object at 0x102a98350>>)
25-Jan-16 14:57:04 INFO ORDER: <orderStatus orderId=1, status=PreSubmitted, filled=0, remaining=100, avgFillPrice=0.0, permId=1114012437, parentId=0, lastFillPrice=0.0, clientId=0, whyHeld=None>)
25-Jan-16 14:57:04 INFO ORDER: <openOrderEnd>)
But when looking at the data put into the _order_resp dict, it looks like some numbers are off:
{
"contract": {
"m_comboLegsDescrip": null,
"m_conId": 265598,
"m_currency": "USD",
"m_exchange": "SMART",
...
},
"order": {
"m_account": "DU12345",
"m_action": "SELL",
"m_activeStartTime": "",
"m_activeStopTime": "",
"m_algoStrategy": null,
"m_allOrNone": false,
"m_auctionStrategy": 0,
"m_auxPrice": 0.0,
"m_basisPoints": 9223372036854775807,
"m_basisPointsType": 9223372036854775807,
...
},
"orderId": 1,
"orderState": {
"m_commission": 9223372036854775807,
"m_commissionCurrency": null,
"m_equityWithLoan": "1.7976931348623157E308",
"m_initMargin": "1.7976931348623157E308",
"m_maintMargin": "1.7976931348623157E308",
"m_maxCommission": 9223372036854775807,
"m_minCommission": 9223372036854775807,
...
}
}
],
"openOrderEnd": true,
In the source code, we see that m_maxCommission is a float(), yet the value looks like an int, and is much larger than most commissions people like paying.
Some other keys like m_equityWithLoan have string type values, but the source code says that's correct.
How do I fix the case where I'm getting large ints instead of floats? Is it possible to read the value from memory and reinterpret it as a float? Is this an Interactive Brokers API problem?
I am trying to run the following query:
data = {
'user_id':1,
'text':'Lorem ipsum',
'$inc':{'count':1},
'$set':{'updated':datetime.now()},
}
self.db.collection('collection').update({'user_id':1}, data, upsert=True)
but the two '$' queries cause it to fail. Is it possible to do this within one statement?
First of all, when you ask a question like this it's very helpful to add information on why it's failing (e.g. copy the error).
Your query fails because you're mixing $ operators with document overrides. You should use the $set operator for the user_id and text fields as well (although the user_id part in your update is irrelevant at this example).
So convert this to pymongo query:
db.test.update({user_id:1},
{$set:{text:"Lorem ipsum", updated:new Date()}, $inc:{count:1}},
true,
false)
I've removed the user_id in the update because that isn't necessary. If the document exists this value will already be 1. If it doesn't exist the upsert will copy the query part of your update into the new document.
If you're trying to do the following:
If the doc doesn't exist, insert a new doc.
If it exists, then only increment one field.
Then you can use a combo of $setOnInsert and $inc. If the song exists then $setOnInsert won't do anything and $inc will increase the value of "listened". If the song doesn't exist, then it will create a new doc with the fields "songId" and "songName". Then $inc will create the field and set the value to be 1.
let songsSchema = new mongoose.Schema({
songId: String,
songName: String,
listened: Number
})
let Song = mongoose.model('Song', songsSchema);
let saveSong = (song) => {
return Song.updateOne(
{songId: song.songId},
{
$inc: {listened: 1},
$setOnInsert: {
songId: song.songId,
songName: song.songName,
}
},
{upsert: true}
)
.then((savedSong) => {
return savedSong;
})
.catch((err) => {
console.log('ERROR SAVING SONG IN DB', err);
})