Get latest free index of DB? - python

How can I access to last DB index?
For example, in my DB I create records with automatically generated names, like news-post1, news-post2, etc. To create name for new record, I need to access latest DB index.
In my case, I need to edit names of images like above. I already know how to access file extension, but not index of DB
def generate_image_name(obj, file_data):
img_extension = path.splitext(file_data)[1]
img_name = "news-img"+<?db.Index?>+img_extension

I find what i exactly what i searching for.
For exmaple, at the moment i have 2 news records.
>>> News.query.all()[-1]
<News 2>
>>> News.query.all()[-1].id
2

Related

Azure Table Storage sync between 2 different storages

I have a list of storage accounts and I would like to copy the exact table content from source_table to destination_table exactly how it is. Which mean if I add an entry to source_table that will be moved to the destination_table same think if I delete the entry from source I want it to be deleted from destination.
So far I have in place this code:
source_table = TableService(account_name="sourcestorageaccount",
account_key="source key")
destination_storage = TableService(account_name="destination storage",
account_key="destinationKey")
query_size = 1000
# save data to storage2 and check if there is lefted data in current table,if yes recurrence
def queryAndSaveAllDataBySize(source_table_name, target_table_name, resp_data: ListGenerator,
table_out: TableService, table_in: TableService, query_size: int):
for item in resp_data:
tb_name = source_table_name
del item.etag
del item.Timestamp
print("INSERT data:" + str(item) + "into TABLE:" + tb_name)
table_in.insert_or_replace_entity(target_table_name, item)
if resp_data.next_marker:
data = table_out.query_entities(table_name=source_table_name, num_results=query_size,
marker=resp_data.next_marker)
queryAndSaveAllDataBySize(source_table_name, target_table_name, data, table_out, table_in, query_size)
tbs_out = table_service_out.list_tables()
print(tbs_out)
for tb in tbs_out:
table = tb.name
# create table with same name in storage2
table_service_in.create_table(table_name=table, fail_on_exist=False)
# first query
data = table_service_out.query_entities(tb.name, num_results=query_size)
queryAndSaveAllDataBySize(tb.name, table, data, table_service_out, table_service_in, query_size)
As you can see this block of code up runs just perfectly, it loops over the source storage account table and creates the same table and its content in destination storage account. but I am missing the part of how I can check if a record has been deleted from the source storage and remove the same record from the destination table.
I hope my question/issue is clear enough, and if not please just ask me for more informations.
Thank you so much for any help you can provide
UPDATE:
The more a think about this the more the logic get messy.
One of the solution that I thought about and tried is to have 2 lists to store every single table entity:
Source_table_entries
Destination_table_entries
Once I have populated the lists for each run I can compare the partition keys and if a partition key is present in Destination_table_entries but on in source, that will me promoted to be deleted.
But this logic will work flawless as long as I have a small table, unfortunately some table contains hundreds of thousands of entities (and I have hundreds of storages) which sooner or later will become a mess to managed.
So one of the solution that I thought about. Is to keep the same code I have above and just create a new table every week and delete the older one (from the destination storage). For example
Table week 1
Table week 2
Table week 3 (this will be deleted)
I read around that I could potentially add a metadata to the table for date and leverage that to decide which table should be deleted based on date time. But I cannot find anything in the documentation.
Can anyone please direct me on the best approach for this. Thank you so much, I am loosing my mind on this last bit

Adding, removing, and comparing values to column with Python SQLAlchemy

SQLAlchemy community, a noob in database and specifically sqlalchemy is seeking your help here. As one would expects, my database consists of rows and columns. Each row is the information about one unique person. Each person has multiple columns (date of birth, name, last name, previous log-in dates, etc.) For one of these columns (previous log-in dates), I would like to store multiple values inside a single cell. In other words, I would like to be able to store the last, let's say ten, log-in dates and be able to manipulate these dates the same way that one would manipulate a list in python. I would like to be able to append new log-in dates to this cell, remove items from the cell, and access specific index of the cell. Basically my cell would like something like this
{"04042020","04052020","04072020"}
And my database would look like this
Name | Last Name | Last Log in dates
------------------------------------------------------------
Edgar | Allen | {"04042020","04052020","04072020"}
Dimitri | Albertini | {"12042019","10112019","01072020"}
I know that sqlalchemy has a way of incorporating ARRAYs with
from sqlalchemy.dialects.postgresql import ARRAY
After some efforts, I was just merely able to create an ARRAY and I could NOT figure out a way to manipulate(append, remove, access) the array. Here is a simple prototype that create a table with just one column and the element in the first row equals {1,2,3}.
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy.dialects.postgresql import ARRAY
engine = create_engine('postgresql://rouzbeh:tiger#localhost/newtable')
metadata = MetaData()
newtable = Table("newtable", metadata,
Column("data", ARRAY(Integer))
)
metadata.create_all(engine)
connection = engine.connect()
connection.execute(newtable.insert(),data=[1,2,3])
Output in Postgress postico looks like the following screenshot. Again to reiterate, I would like to be able to access the elements ({1,2,3}) and manipulate them by removing or adding elements to it. ({1,2,3,4}) or ({1,2})
For a similar use case I am storing such data as JSON. Check out this SO post. The idea is to dump JSON as TEXT (or a JSON object, if your database is supporting this).
For more complex structures I use Marshmallow. It's especially useful with nested structures.
However, this approach is controversial. See this SO post and in this Quora discussion. If you can live with the downsides, it is an easy way to store data.

How many Tables can you put in a Tableau Data Extract (.tde file)?

I'm using the Python SDK to create a TDE file. I want to add multiple tables to the TDE file. So I tried doing that but I got a duplicate name error:
dataextract.Exceptions.TableauException: TableauException (303):
duplicate table name
No problemo, I changed the name so that it counts up with each table I create:
tde_table = tde_file.addTable('Extract'+str(i), table_definition)
but then I get a new and exciting error:
dataextract.Exceptions.TableauException: TableauException (303): table
name must be "Extract"
Perhaps Extracts created through the SDK cannot have more than one table per extract? If every table in an extract needs to be named the same thing, but they can't have duplicate names... I'm confused. Can someone help clarify this for me?
Here's all the relevant code I think, but I don't know if it'll be much help:
...
for i, df in enumerate(dataframes):
table_return_list = _form_table_definition(df,data_types,read_out)
table_definition = table_return_list[0]
header_type_map = table_return_list[1]
#use the table definition to create the table and row
tde_table = tde_file.addTable('Extract'+str(i), table_definition)
tde_row = tde.Row(table_definition)
...
Seems that it's impossible at the present moment to add more than one table to a data extract through the Python SDK. I don't know otherwise.
http://onlinehelp.tableau.com/current/api/sdk/en-us/SDK/Python/html/classtableausdk_1_1_extract_1_1_extract.html#a70b49a6eca6f1724bd89a928c73ecc8c
From their SDK documentation:
def tableausdk.Extract.Extract.addTable ( self, name,
tableDefinition ) Adds a table to the extract.
Parameters
self The object pointer.
name The name of the table to add.
Currently, this method can only add a table named "Extract".

How to select all data in PyMongo?

I want to select all data or select with conditional in table random but I can't find any guide in MongoDB in Python to do this.
And I can't show all data was select.
Here my code:
def mongoSelectStatement(result_queue):
client = MongoClient('mongodb://localhost:27017')
db = client.random
cursor = db.random.find({"gia_tri": "0.5748676522161966"})
# cursor = db.random.find()
inserted_documents_count = cursor.count()
for document in cursor:
result_queue.put(document)
There is a quite comprehensive documentation for mongodb. For python (Pymongo) here is the URL: https://api.mongodb.org/python/current/
Note: Consider the version you are running. Since the latest version has new features and functions.
To verify pymongo version you are using execute the following:
import pymongo
pymongo.version
Now. Regarding the select query you asked for. As far as I can tell the code you presented is fine. Here is the select structure in mongodb.
First off it is called find().
In pymongo; if you want to select specific rows( not really rows in mongodb they are called documents. I am saying rows to make it easy to understand. I am assuming you are comparing mongodb to SQL); alright so If you want to select specific document from the table (called collection in mongodb) use the following structure (I will use random as collection name; also assuming that the random table has the following attributes: age:10, type:ninja, class:black, level:1903):
db.random.find({ "age":"10" }) This will return all documents that have age 10 in them.
you could add more conditions simply by separating with commas
db.random.find({ "age":"10", "type":"ninja" }) This will select all data with age 10 and type ninja.
if you want to get all data just leave empty as:
db.random.find({})
Now the previous examples display everything (age, type, class, level and _id). If you want to display specific attributes say only the age you will have to add another argument to find called projection eg: (1 is show, 0 is do not show):
{'age':1}
Note here that this returns age as well as _id. _id is always returned by default. You have to explicitly tell it not to returning it as:
db.random.find({ "age":"10", "name":"ninja" }, {"age":1, "_id":0} )
I hope that could get you started.
Take a look at the documentation is very thorough.

Automatically merging arbitrary Django models

I have two Django-ORM managed databases that I'd like to merge. Both have a very similar schema, and both have the standard auth_users table, along with a few other shared tables that reference each other as well as auth_users, which I'd like to merge into a single database automatically.
Understandably, this could be very non-trivial depending upon the foreign-key relationships, and what constitutes a "unique" record in each table.
Does anyone know if there exists a tool to do this merge operation?
If nothing like this currently exists, I was considering writing my own management command, based on the standard loaddata command. Essentially, you'd use the standard dumpdata command to export tables from a source database, and then use a modified version of loaddata to "merge" them into the destination database.
For example, if I have databases A and B, and I want to merge database B into database A, then I'd want to follow a procedure according to the pseudo-code:
merge_database_dst = A
merge_database_src = B
for table in sorted(merge_database_dst.get_redundant_tables(merge_database_src), key=acyclic_dependency):
key = table.get_unique_column_key()
src_id_to_dst_id = {}
for record_src in merge_database_src.table.objects.all():
src_key_value = record_src.get_key_value(key)
try:
record_dst = merge_database_dst.table.objects.get(key)
dst_key_value = record_dst.get_key_value(key)
except merge_database_dst.table.DoesNotExist:
record_dst = merge_database_dst.table(**[(k,convert_fk(v)) for k,v in record_src._meta.fields])
record_dst.save()
dst_key_value = record_dst.get_key_value(key)
src_id_to_dst_id[(table,record_src.id)] = record_dst.id
The convert_fk() function would use the src_id_to_dst_id index to convert foreign key references in the source table to the equivalent IDs in the destination table.
To summarize, the algorithm would iterate over the table to be merged in the order of dependency, with parents iterated over first. So if we wanted to merge tables auth_users and mycustomprofile, which is dependent on auth_users, we'd iterate ['auth_users','mycustomprofile'].
Each merged table would need some sort of indicator documenting the combination of columns that denotes a universally unique record (i.e. the "key"). For auth_users, that might be the "username" and/or "email" column.
If the value of the key in database B already exists in A, then the record is not imported from B, but the ID of the existing record in A is recorded.
If the value of the key in database B does not exist in A, then the record is imported from B, and the ID of the new record is recorded.
Using the previously recorded ID, a mapping is created, explaining how to map foreign-key references to that specific record in B to the new merged/pre-existing record in A. When future records are merged into A, this mapping would be used to convert the foreign keys.
I could still envision some cases where an imported record references a table not included in the dumpdata, which might cause the entire import to fail, therefore some sort of "dryrun" option would be needed to simulate the import to ensure all FK references can be translated.
Does this seem like a practical approach? Is there a better way?
EDIT: This isn't exactly what I'm looking for, but I thought others might find it interesting. The Turbion project has a mechanism for copying changes between equivalent records in different Django models within the same database. It works by defining a translation layer (i.e. merging.ModelLayer) between two Django models, so, say if you update the "www" field in user bob#bob.com's profile, it'll automatically update the "url" field in user bob#bob.com's otherprofile.
The functionality I'm looking for is a bit different, in that I want to merge an entire (or partial) database snapshot at infrequent intervals, sort of the way the loaddata management command does.
Wow. This is going to be a complex job regardless. That said:
If I understand the needs of your project correctly, this can be something that can be done using a data migration in South. Even so, I'd be lying if I said it was going to be a joke.
My recommendation is -- and this is mostly a parrot of an assumption in your question, but I want to make it clear -- that you have one "master" table that is the base, and which has records from the other table added to it. So, table A keeps all of its existing records, and only gets additions from B. B feeds additions into A, and once done, B is deleted.
I'm hesitant to write you sample code because your actual job will be so much more complex than this, but I will anyway to try and point you in the right direction. Consider something like...
import datetime
from south.db import db
from south.v2 import DataMigration
from django.db import models
class Migration(DataMigration):
def forwards(self, orm):
for b in orm.B.objects.all():
# sanity check: does this item get copied into A at all?
if orm.A.objects.filter(username=b.username):
continue
# make an A record with the properties of my B record
a = orm.A(
first_name=b.first_name,
last_name=b.last_name,
email_address=b.email_address,
[...]
)
# save the new A record, and delete the B record
a.save()
b.delete()
def backwards(self, orm):
# backwards method, if you write one
This would end up migrating all of the Bs not in A to A, and leave you a table of Bs that are expected duplicates, which you could then check by some other means before deleting.
Like I said, this sample isn't meant to be complete. If you decide to go this route, spend time in the South documentation, and particularly make sure you look at data migrations.
That's my 2¢. Hope it helps.

Categories

Resources