How to parse VirtualMachinePaged object using Azure SDK for Python? - python

I am trying to get list of VMs in a Resource Group using Azure SDK for Python. I configured my Visual Studio code with all the required Azure Tools. I created a function and used below code to get List of VMs.
import os
import random
import string
from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient
from azure.mgmt.network import NetworkManagementClient
from azure.mgmt.resource import ResourceManagementClient
def main():
SUBSCRIPTION_ID = os.environ.get("SUBSCRIPTION_ID", None)
GROUP_NAME = "testgroupx"
VIRTUAL_MACHINE_NAME = "virtualmachinex"
SUBNET_NAME = "subnetx"
INTERFACE_NAME = "interfacex"
NETWORK_NAME = "networknamex"
VIRTUAL_MACHINE_EXTENSION_NAME = "virtualmachineextensionx"
resource_client = ResourceManagementClient(
credential=DefaultAzureCredential(),
subscription_id=SUBSCRIPTION_ID
)
network_client = NetworkManagementClient(
credential=DefaultAzureCredential(),
subscription_id=SUBSCRIPTION_ID
)
compute_client = ComputeManagementClient(
credential=DefaultAzureCredential(),
subscription_id=SUBSCRIPTION_ID
)
vm = compute_client .virtual_machines.list(
'RGName'
)
print("Get virtual machine:\n{}", vm)
When I see the logs, I see below as the print response.
<azure.mgmt.compute.v2019_12_01.models._paged_models.VirtualMachinePaged object at 0x0000024584F92EC8>
I am really trying to get the actual object, I am not sure how can I parse it. Any ideas?

Since it returns a collection you need to use Use for loop , You can do something like this
for vm in compute_client .virtual_machines.list('RGName'):
print("\tVM: {}".format(vm.name))

VirtualMachinePaged contains a collection of an object of type VirtualMachine. You can see the source code of that class here: https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/compute/azure-mgmt-compute/azure/mgmt/compute/v2019_12_01/models/_models.py.
From this link, here're the list of attributes:
{
'id': {'key': 'id', 'type': 'str'},
'name': {'key': 'name', 'type': 'str'},
'type': {'key': 'type', 'type': 'str'},
'location': {'key': 'location', 'type': 'str'},
'tags': {'key': 'tags', 'type': '{str}'},
'plan': {'key': 'plan', 'type': 'Plan'},
'hardware_profile': {'key': 'properties.hardwareProfile', 'type': 'HardwareProfile'},
'storage_profile': {'key': 'properties.storageProfile', 'type': 'StorageProfile'},
'additional_capabilities': {'key': 'properties.additionalCapabilities', 'type': 'AdditionalCapabilities'},
'os_profile': {'key': 'properties.osProfile', 'type': 'OSProfile'},
'network_profile': {'key': 'properties.networkProfile', 'type': 'NetworkProfile'},
'diagnostics_profile': {'key': 'properties.diagnosticsProfile', 'type': 'DiagnosticsProfile'},
'availability_set': {'key': 'properties.availabilitySet', 'type': 'SubResource'},
'virtual_machine_scale_set': {'key': 'properties.virtualMachineScaleSet', 'type': 'SubResource'},
'proximity_placement_group': {'key': 'properties.proximityPlacementGroup', 'type': 'SubResource'},
'priority': {'key': 'properties.priority', 'type': 'str'},
'eviction_policy': {'key': 'properties.evictionPolicy', 'type': 'str'},
'billing_profile': {'key': 'properties.billingProfile', 'type': 'BillingProfile'},
'host': {'key': 'properties.host', 'type': 'SubResource'},
'provisioning_state': {'key': 'properties.provisioningState', 'type': 'str'},
'instance_view': {'key': 'properties.instanceView', 'type': 'VirtualMachineInstanceView'},
'license_type': {'key': 'properties.licenseType', 'type': 'str'},
'vm_id': {'key': 'properties.vmId', 'type': 'str'},
'resources': {'key': 'resources', 'type': '[VirtualMachineExtension]'},
'identity': {'key': 'identity', 'type': 'VirtualMachineIdentity'},
'zones': {'key': 'zones', 'type': '[str]'},
}
For Python 3, the code can be found here: https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/compute/azure-mgmt-compute/azure/mgmt/compute/v2019_12_01/models/_models_py3.py.

Related

Changing schema of avro file when writing to it in append mode

I'm looking for a way to modify the schema of an avro file in python. Taking the following example, using the fastavro package, first write out some initial records, with corresponding schema:
from fastavro import writer, parse_schema
schema = {
'name': 'test',
'type': 'record',
'fields': [
{'name': 'id', 'type': 'int'},
{'name': 'val', 'type': 'long'},
],
}
records = [
{u'id': 1, u'val': 0.2},
{u'id': 2, u'val': 3.1},
]
with open('test.avro', 'wb') as f:
writer(f, parse_schema(schema), records)
Uhoh, I've got some more records, but they contain None values. I'd like to append these records to the avro file, and modify my schema accordingly:
more_records = [
{u'id': 3, u'val': 1.5},
{u'id': 2, u'val': None},
]
schema['fields'][1]['type'] = ['long', 'null']
with open('test.avro', 'a+b') as f:
writer(f, parse_schema(schema), more_records)
Instead of overwriting the schema, this results in an error:
ValueError: Provided schema {'type': 'record', 'name': 'test', 'fields': [{'name': 'id', 'type': 'int'}, {'name': 'val', 'type': ['long', 'null']}], '__fastavro_parsed': True, '__named_schemas': {'test': {'type': 'record', 'name': 'test', 'fields': [{'name': 'id', 'type': 'int'}, {'name': 'val', 'type': ['long', 'null']}]}}} does not match file writer_schema {'type': 'record', 'name': 'test', 'fields': [{'name': 'id', 'type': 'int'}, {'name': 'val', 'type': 'long'}], '__fastavro_parsed': True, '__named_schemas': {'test': {'type': 'record', 'name': 'test', 'fields': [{'name': 'id', 'type': 'int'}, {'name': 'val', 'type': 'long'}]}}}
Is there a workaround for this? The fastavro docs for this suggest it's not possible, but I'm hoping someone knows of a way!
Cheers
The append API in fastavro does not currently support this. You could open an issue in that repository and discuss if something like this makes sense.

How do I get my Jupyter Notebook to stop repeating the code into the output?

How do I fix my Jupyter Notebook to stop running the code so many times? I should only be getting three outputs but instead I am getting more than that and idk how to fix it!? It is running data from a previous code as well and I am not sure why?
The AnimalShelter.py code
import pymongo
from pymongo import MongoClient
from bson.objectid import ObjectId
class AnimalShelter(object):
""" CRUD operations for Animal collection in MongoDB """
def __init__(self, username, password):
#Initializing the MongoClient. This helps to access the MongoDB databases and collections.
self.client = MongoClient('mongodb://%s:%s#localhost:45344' % (username, password))
#where xxxx is your unique port number
self.database = self.client['AAC']
#Complete this create method to implement the C in CRUD.
def create(self, data):
if data is not None:
insert = self.database.animals.insert(data) #data should be dictionary
else:
raise Exception("Nothing to save, because data parameter is empty")
#Create method to implement the R in CRUD.
def read(self, searchData):
if searchData:
data = self.database.animals.find(searchData, {"_id": False})
else:
data = self.database.animals.find({}, {"_id": False})
return data
#Create method to implement U in CRUD.
def update(self, searchData, updateData):
if searchData is not None:
result = self.database.animals.update_many(searchData, {"$set": updateData})
else:
return "{}"
return result.raw_result
#Create method to implement D in CRUD.
def delete(self, deleteData):
if deleteData is not None:
result = self.database.animals.delete_many(deleteData)
else:
return "{}"
return result.raw_result
My ipnyb code:
from AnimalShelter import AnimalShelter
a = AnimalShelter("aacuser","French")
animal_data = [
{
"name":"Hades",
"type":"dog"
},
{
"name":"Fable",
"type":"cat"
},
{
"name":"Buddy",
"type":"dog"
}
]
for i in animal_data:
a.create(i)
dogs = a.read( {"type":"dog"} )
for dog in dogs:
print(dog)
cats = a.read( {"type":"cat"} )
for cat in cats:
print(cat)
The output:
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'bruno', 'type': 'dog'}
{'name': 'sticky', 'type': 'dog'}
{'name': 'Hades', 'type': 'dog'}
{'name': 'Buddy', 'type': 'dog'}
{'name': 'Hades', 'type': 'dog'}
{'name': 'Buddy', 'type': 'dog'}
{'name': 'Hades', 'type': 'dog'}
{'name': 'Buddy', 'type': 'dog'}
{'name': 'missy', 'type': 'cat'}
{'name': 'missy', 'type': 'cat'}
{'name': 'missy', 'type': 'cat'}
{'name': 'missy', 'type': 'cat'}
{'name': 'missy', 'type': 'cat'}
{'name': 'missy', 'type': 'cat'}
{'name': 'missy', 'type': 'cat'}
{'name': 'missy', 'type': 'cat'}
{'name': 'missy', 'type': 'cat'}
{'name': 'missy', 'type': 'cat'}
{'name': 'missy', 'type': 'cat'}
{'name': 'Fable', 'type': 'cat'}
{'name': 'Fable', 'type': 'cat'}
{'name': 'Fable', 'type': 'cat'}
I have tried to restart the entire thing, made a new notebook to work out of, and cleared the outputs/reset the kernel/ran the program again and each time it adds another listing instead of it showing the three listings. Is this a bug or did I do something wrong?
MongoDB is a persistent database; so each create() adds more data in to the database.
You already have a delete method so you could add something like:
a = AnimalShelter("aacuser","French")
a.delete({"type":"dog"})
a.delete({"type":"cat"})
near the start of your code to delete any existing data before you start.

Web scraping table from UniProt database

I have a list of UniProt IDs and would like to use BeautifulSoup to scrap a table containing the structure information. The url I am using is as follows: https://www.uniprot.org/uniprot/P03496, with accession "P03496".
A snippet of the html code is as follows.
<div class="main-aside">
<div class="content entry_view_content up_entry swissprot">
<div class="section" id="structure">
<protvista-uniprot-structure accession="P03468">
<div class="protvista-uniprot-structure">
<div class="class=" protvista-uniprot-structure__table">
<protvista-datatable class="feature">
<table>...</table>
</protvista-datatable>
</div>
</div>
</protvista-uniprot-structure>
</div>
</div>
</div>
The information I require is contained between the <table>...</table> tag.
I tried
from bs4 import BeautifulSoup
import requests
url='https://www.uniprot.org/uniprot/P03468'
r=requests.get(url)
url=r.content
soup = BeautifulSoup(url,'html.parser')
soup.find("protvista-datatable", {"class": "feature"})
print(soup)
Content is provided dynamically and is not contained in your soup if you take a deeper look. It do not need BeautifulSoupto get data, your tabel is based on, simply use their api / rest interface to get structured data as JSON:
import requests
url='https://rest.uniprot.org/uniprot/P03468'
## fetch the json response
data = requests.get(url).json()
## pick needed data e.g.
data['uniProtKBCrossReferences']
Output
[{'database': 'EMBL',
'id': 'J02146',
'properties': [{'key': 'ProteinId', 'value': 'AAA43412.1'},
{'key': 'Status', 'value': '-'},
{'key': 'MoleculeType', 'value': 'Genomic_RNA'}]},
{'database': 'EMBL',
'id': 'AF389120',
'properties': [{'key': 'ProteinId', 'value': 'AAM75160.1'},
{'key': 'Status', 'value': '-'},
{'key': 'MoleculeType', 'value': 'Genomic_RNA'}]},
{'database': 'EMBL',
'id': 'EF467823',
'properties': [{'key': 'ProteinId', 'value': 'ABO21711.1'},
{'key': 'Status', 'value': '-'},
{'key': 'MoleculeType', 'value': 'Genomic_RNA'}]},
{'database': 'EMBL',
'id': 'CY009446',
'properties': [{'key': 'ProteinId', 'value': 'ABD77678.1'},
{'key': 'Status', 'value': '-'},
{'key': 'MoleculeType', 'value': 'Genomic_RNA'}]},
{'database': 'EMBL',
'id': 'K01031',
'properties': [{'key': 'ProteinId', 'value': 'AAA43415.1'},
{'key': 'Status', 'value': '-'},
{'key': 'MoleculeType', 'value': 'Genomic_RNA'}]},
{'database': 'RefSeq',
'id': 'NP_040981.1',
'properties': [{'key': 'NucleotideSequenceId', 'value': 'NC_002018.1'}]},
{'database': 'PDB',
'id': '6WZY',
'properties': [{'key': 'Method', 'value': 'X-ray'},
{'key': 'Resolution', 'value': '1.50 A'},
{'key': 'Chains', 'value': 'C=181-190'}]},...]
There's a Python package, Unipressed, by Michael Milton (#multimeric) that allows programmatic access query UniProt's new REST API.
Example:
from unipressed import UniprotkbClient
UniprotkbClient.fetch_one("P03468")["uniProtKBCrossReferences"]
Output
[{'database': 'EMBL',
'id': 'J02146',
'properties': [{'key': 'ProteinId', 'value': 'AAA43412.1'},
{'key': 'Status', 'value': '-'},
{'key': 'MoleculeType', 'value': 'Genomic_RNA'}]},
{'database': 'EMBL',
'id': 'AF389120',
'properties': [{'key': 'ProteinId', 'value': 'AAM75160.1'},
{'key': 'Status', 'value': '-'},
{'key': 'MoleculeType', 'value': 'Genomic_RNA'}]},
{'database': 'EMBL',
'id': 'EF467823',
'properties': [{'key': 'ProteinId', 'value': 'ABO21711.1'},
{'key': 'Status', 'value': '-'},
{'key': 'MoleculeType', 'value': 'Genomic_RNA'}]},
{'database': 'EMBL',
'id': 'CY009446',
'properties': [{'key': 'ProteinId', 'value': 'ABD77678.1'},
{'key': 'Status', 'value': '-'},
{'key': 'MoleculeType', 'value': 'Genomic_RNA'}]},
{'database': 'EMBL',
'id': 'K01031',
'properties': [{'key': 'ProteinId', 'value': 'AAA43415.1'},
{'key': 'Status', 'value': '-'},
{'key': 'MoleculeType', 'value': 'Genomic_RNA'}]},
{'database': 'RefSeq',
'id': 'NP_040981.1',
'properties': [{'key': 'NucleotideSequenceId', 'value': 'NC_002018.1'}]},
{'database': 'PDB',
'id': '6WZY',
'properties': [{'key': 'Method', 'value': 'X-ray'},
{'key': 'Resolution', 'value': '1.50 A'},
{'key': 'Chains', 'value': 'C=181-190'}]}, ... ]
See more examples of using Unipressed to access Uniprot's new REST API here in my reply to Biostar's post 'Accessing UNIPROT using REST API'. See using Unipressed for ID mapping here and here.

Convert nested dictionary within JSON from a string

I have JSON data that I loaded that appears to have a bit of a messy data structure where nested dictionaries are wrapped in single quotes and recognized as a string, rather than a single dictionary which I can loop through. What is the best way to drop the single quotes from the key-value property ('value').
Provided below is an example of the structure:
for val in json_data:
print(val)
{'id': 'status6',
'title': 'Estimation',
'text': '> 2 days',
'type': 'color',
'value': '{"index":14,"post_id":null,"changed_at":"2020-06-12T09:04:58.659Z"}',
'name': 'Internal: online course'},
{'id': 'date',
'title': 'Deadline',
'text': '2020-06-26',
'type': 'date',
'value': '{"date":"2020-06-26","changed_at":"2020-06-12T11:33:37.195Z"}',
'name': 'Internal: online course'},
{'id': 'tags',
'title': 'Tags',
'text': 'Internal',
'type': 'tag',
'value': '{"tag_ids":[3223513]}',
'name': 'Internal: online course'},
If I add a nested look targeting ['value'], it loops by character and not key-value pair in the dictionary.
Using json.loads to convert string to dict
import json
json_data = [{'id': 'status6',
'title': 'Estimation',
'text': '> 2 days',
'type': 'color',
'value': '{"index":14,"post_id":null,"changed_at":"2020-06-12T09:04:58.659Z"}',
'name': 'Internal: online course'},
{'id': 'date',
'title': 'Deadline',
'text': '2020-06-26',
'type': 'date',
'value': '{"date":"2020-06-26","changed_at":"2020-06-12T11:33:37.195Z"}',
'name': 'Internal: online course'},
{'id': 'tags',
'title': 'Tags',
'text': 'Internal',
'type': 'tag',
'value': '{"tag_ids":[3223513]}',
'name': 'Internal: online course'}]
# the result is a Python dictionary:
for val in json_data:
print(json.loads(val['value']))
this should be work!!

airflow GCS to BQ operator fails when JSON is the source format

I have a GoogleCloudStorageToBigQueryOperator operator running on airflow in a dag. It works perfect when working CSV files... I am now trying to ingest a JSON file, and I'm receiving errors: such like:
skipLeadingRows is not a valid src_fmt_configs for type NEWLINE_DELIMITED_JSON
The weird thing is that I'm not calling skipLeadingRows in my calling. as below:
load_Users_to_GBQ = GoogleCloudStorageToBigQueryOperator(
task_id='Table1_GCS_to_GBQ',
bucket='bucket1',
source_objects=['table*.json'],
source_format='NEWLINE_DELIMITED_JSON',
destination_project_dataset_table='DB.table1',
autodetect=False,
schema_fields=[
{'name': 'fieldid', 'type': 'integer', 'mode': 'NULLABLE'},
{'name': 'filed2', 'type': 'integer', 'mode': 'NULLABLE'},
{'name': 'field3', 'type': 'string', 'mode': 'NULLABLE'},
{'name': 'field4', 'type': 'string', 'mode': 'NULLABLE'},
{'name': 'field5', 'type': 'string', 'mode': 'NULLABLE'}
],
write_disposition='WRITE_TRUNCATE',
google_cloud_storage_conn_id='Conn1',
bigquery_conn_id='Conn1',
dag=dag)
what am I missing?
Thanks
This has been fixed in this pull request for Airflow version >= 1.10.7.

Categories

Resources