Orientdb python list databases? - python

i am using Python to get a list of databases that were more than 30 days old. So far i have been able to get the list of the databases from here. And this is my code :-
import pyorient
def list_orient_databases(name):
# Use a breakpoint in the code line below to debug your script.
print(f'{name}')
client = pyorient.OrientDB("10.121.3.55", 2525)
session_id = client.connect("admin", "admin")
db_names = client.db_list().__getattr__('databases')
db_count = 0
for db_name in db_names:
print(db_name)
How can i adjust the code to get list of databases 30 days older or more? Thanks for the help.

If you are able to somehow pull a create_date value for the DB you could use something like this using the timedelta object:
from datetime import datetime, timedelta
d = datetime.today() - timedelta(days=30)
# 'X' would be whatever the database create date name parameter is
for db_name in db_names:
if db_name['X'] <= d:
print(db_name)
You'll have to adjust this as needed but this gives you a general idea

Related

how to get the latest database collection from MongoDB with Mongo Engine

I am very new to MongoDB. I create a database within a loop. Each time (Every 2 hours), I get data from some sources and create a data collection by MongoEngine and name each collection based on the creation time (for example 05_01_2021_17_00_30).
Now, on another python code , I want to get the latest database. how can I call the latest database collection without knowing the name of it?
I saw some guidelines in Stackoverflow but codes are old and not working now. Thanks guys.
I came up with this answer:
In mongo_setup.py: when I want to create a database, it will be named after the time of creation and save the name in a text file.
import mongoengine
import datetime
def global_init():
nownow = datetime.datetime.now()
Update_file_name = str(nownow.strftime("%d_%m_%Y_%H_%M_%S"))
# For Shaking hand between Django and the last updated data base, export the name
of the latest database
# in a text file and from there, Django will understand which database is the
latest
Updated_txt = open('.\\Latest database to read for Django.txt', '+w')
Updated_txt.write(Update_file_name)
Updated_txt.close()
mongoengine.register_connection(alias='core', name=Update_file_name)
In Django views.py: we will call the text file and read the latest database's name:
database_name_text_file = 'directory of the text file...'
db_name_file = open(database_name_text_file, 'r')
db_name = db_name_file.read()
# MongoDb Database
myclient = MongoClient(port=27017)
mydatabase = myclient[db_name]
classagg = mydatabase['aggregation__class']
database_text = classagg.find()
for i in database_text:
....

Reading a MySQL query with Python where output is empty

I'm trying to connect MySQL with python in order to automate some reports. By now, I'm just testing the connection. Seems it's working but here comes the problem: the output from my Python code is different from the one that I get in MySQL.
Here I attach the query used and the output that I can find in MySQL:
The testing query for the Python connection:
SELECT accountID
FROM Account
WHERE accountID in ('340','339','343');
The output from MySQL (using Dbiever). For this test, the column chosen contains integers:
accountID
1 339
2 340
3 343
Here I attach the actual output from my Python code:
today:
20200811
Will return true if the connection works:
True
Empty DataFrame
Columns: [accountID]
Index: []
In order to help you understand the problem, please find attached my python code:
import pandas as pd
import json
import pymysql
import paramiko
from datetime import date, time
tiempo_inicial = time()
today = date.today()
today= today.strftime("%Y%m%d")
print('today:')
print(today)
#from paramiko import SSHClient
from sshtunnel import SSHTunnelForwarder
**(part that contains all the connection information, due to data protection this part can't be shared)**
print('will return true if connection works:')
print(conn.open)
query = '''SELECT accountId
FROM Account
WHERE accountID in ('340','339','343');'''
data = pd.read_sql_query(query, conn)
print(data)
conn.close()
Under my point of view doesn't have a sense this output as the connection is working and the query it's being tested previously in MySQL with a positive output. I tried with other columns that contain names or dates and the result doesn't change.
Any idea why I'm getting this "Empty DataFrame" output?
Thanks

get the last modified date of tables using bigquery tables GET api

I am trying to get the list of tables and their last_modified_date using bigquery REST API.
In the bigquery API explorer I am getting all the fields correctly but when I use the api from Python code its returning 'None' for modified date.
This is the code written for the same in python
from google.cloud import bigquery
client = bigquery.Client(project='temp')
datasets = list(client.list_datasets())
for dataset in datasets:
print dataset.dataset_id
for dataset in datasets:
for table in dataset.list_tables():
print table.table_id
print table.created
print table.modified
In this code I am getting created date correctly but modified date is 'None' for all the tables.
Not quite sure which version of the API you are using but I suspect the latest versions do not have the method dataset.list_tables().
Still, this is one way of getting last modified field, see if this works for you (or gives you some idea on how to get this data):
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json('/key.json')
dataset_list = list(client.list_datasets())
for dataset_item in dataset_list:
dataset = client.get_dataset(dataset_item.reference)
tables_list = list(client.list_tables(dataset))
for table_item in tables_list:
table = client.get_table(table_item.reference)
print "Table {} last modified: {}".format(
table.table_id, table.modified)
If you want to get the last modified time from only one table:
from google.cloud import bigquery
def get_last_bq_update(project, dataset, table_name):
client = bigquery.Client.from_service_account_json('/key.json')
table_id = f"{project}.{dataset}.{table_name}"
table = client.get_table(table_id)
print(table.modified)

accessing timestamps using python in influx db

I'm connecting to an influx db using python. Using built in dataframe tools I am successfully accessing data and am able to do everything I'd like, accept I can't access the timestamp values. For example:
import sys
from influxdb import DataFrameClient
reload(sys)
sys.setdefaultencoding('utf-8')
user = 'reader'
password = 'oddstringtoconfusebadguys'
dbname = 'autoweights'
host = '55.777.244.112'
protocol = 'line'
port = 8086
client = DataFrameClient(host,port=8086,username=user,password=password,
database=dbname,verify_ssl=False,ssl=True)
results = client.query("select * from measurementname")
df = results['measurementname']
for index, row in df.iterrows():
print row
results look like this
Name: 2017-11-14 22:11:23.534395882+00:00, dtype: object
host C4:27:EB:D7:D9:70
value 327
I can easily access row['host'] and row['value']. The date/time stamp is obviously important but try as I might I can't find an approach to get the values.
You can get the timestamp using the index not the row parameter
for index, row in df.iterrows():
print(index)
print(row)
You can also use Pinform library, an ORM for InfluxDB to easily get timestamp, fields and tags.

How can Python Observe Changes to Mongodb's Oplog

I have multiple Python scripts writing to Mongodb using pyMongo. How can another Python script observe changes to a Mongo query and perform some function when the change occurs? mongodb is setup with oplog enabled.
I wrote a incremental backup tool for MongoDB some time ago, in Python. The tool monitors data changes by tailing the oplog. Here is the relevant part of the code.
Updated answer, MongDB 3.6+
As datdinhquoc cleverly points out in the comments below, for MongoDB 3.6 and up there are Change Streams.
Updated answer, pymongo 3
from time import sleep
from pymongo import MongoClient, ASCENDING
from pymongo.cursor import CursorType
from pymongo.errors import AutoReconnect
# Time to wait for data or connection.
_SLEEP = 1.0
if __name__ == '__main__':
oplog = MongoClient().local.oplog.rs
stamp = oplog.find().sort('$natural', ASCENDING).limit(-1).next()['ts']
while True:
kw = {}
kw['filter'] = {'ts': {'$gt': stamp}}
kw['cursor_type'] = CursorType.TAILABLE_AWAIT
kw['oplog_replay'] = True
cursor = oplog.find(**kw)
try:
while cursor.alive:
for doc in cursor:
stamp = doc['ts']
print(doc) # Do something with doc.
sleep(_SLEEP)
except AutoReconnect:
sleep(_SLEEP)
Also see http://api.mongodb.com/python/current/examples/tailable.html.
Original answer, pymongo 2
from time import sleep
from pymongo import MongoClient
from pymongo.cursor import _QUERY_OPTIONS
from pymongo.errors import AutoReconnect
from bson.timestamp import Timestamp
# Tailable cursor options.
_TAIL_OPTS = {'tailable': True, 'await_data': True}
# Time to wait for data or connection.
_SLEEP = 10
if __name__ == '__main__':
db = MongoClient().local
while True:
query = {'ts': {'$gt': Timestamp(some_timestamp, 0)}} # Replace with your query.
cursor = db.oplog.rs.find(query, **_TAIL_OPTS)
cursor.add_option(_QUERY_OPTIONS['oplog_replay'])
try:
while cursor.alive:
try:
doc = next(cursor)
# Do something with doc.
except (AutoReconnect, StopIteration):
sleep(_SLEEP)
finally:
cursor.close()
I ran into this issue today and haven't found an updated answer anywhere.
The Cursor class has changed as of v3.0 and no longer accepts the tailable and await_data arguments. This example will tail the oplog and print the oplog record when it finds a record newer than the last one it found.
# Adapted from the example here: https://jira.mongodb.org/browse/PYTHON-735
# to work with pymongo 3.0
import pymongo
from pymongo.cursor import CursorType
c = pymongo.MongoClient()
# Uncomment this for master/slave.
oplog = c.local.oplog['$main']
# Uncomment this for replica sets.
#oplog = c.local.oplog.rs
first = next(oplog.find().sort('$natural', pymongo.DESCENDING).limit(-1))
ts = first['ts']
while True:
cursor = oplog.find({'ts': {'$gt': ts}}, cursor_type=CursorType.TAILABLE_AWAIT, oplog_replay=True)
while cursor.alive:
for doc in cursor:
ts = doc['ts']
print doc
# Work with doc here
Query the oplog with a tailable cursor.
It is actually funny, because oplog-monitoring is exactly what the tailable-cursor feature was added for originally. I find it extremely useful for other things as well (e.g. implementing a mongodb-based pubsub, see this post for example), but that was the original purpose.
I had the same issue. I put together this rescommunes/oplog.py. Check comments and see __main__ for an example of how you could use it with your script.

Categories

Resources