Create SQL database from dict with different features in python

Create SQL database from dict with different features in python - python

I have the following dict:
base = {}
base['id1'] = {'apple':2, 'banana':4,'coconut':1}
base['id2'] = {'apple':4, 'pear':8}
base['id3'] = {'banana':1, 'tomato':2}
....
base['idN'] = {'pineapple':1}
I want to create a SQL database to store it. I normally use sqlite but here the number of variables (features in the dict) is not the same for all ids and I do not know all of them thus I cannot use the standard procedure.
Does someone know an easy way to do it ?

ids will get duplicated if you use the sql i would suggest use postgres as it has a jsonfield ypu can put your data there corresponding to each key. Assuming you are not constrained to use SQL.

Related

Refactorable database queries

Say I have category="foo" and a NoSQL query={"category"=category}. Whenever I refactor my variable name of category, I need to manually change it inside the query if I want to adopt it.
In Python 3.8+ I'm able to get the variable name as a string via the variable itself.
Now I could use query={f"{category=}".split("=")[0]=category}. Now refactoring changes the query too. This applies to any database queries or statements (SQL etc.).
Would this be bad practice? Not just concerning Python but any language where this is possible.

Would this be bad practice?
Yes, the names of local variables do not need to correlate with the fields in data stores.
You should be able to retrieve a record and filter on its fields with any python variable, no matter its name or if its nested in a larger data structure.
In pseudocode:
connection = datastore.connect(...)
# passing a string directly
connection.fetch({"category": "fruit"})
# passing a string variable
category_to_fetch = "vegetable"
connection.fetch({"category": category_to_fetch})
# something more exotic like a previous list of records
r = [("fish",)]
connection.fetch({"category": r[0][0]})
# or even a premade filter dictionary
filter = {"category": "meat"}
connection.fetch(filter)

How to update multiple records using peewee

I'm using Peewee with Postgres database. I want to know how to update multiple records in a tabel at once?
We can perform this update in SQL using these commands, and I'm looking for a Peewee equivalent approach.

Yes, you can use the insert_many() function:
Insert multiple rows at once. The rows parameter must be an iterable
that yields dictionaries. As with insert(), fields that are not
specified in the dictionary will use their default value, if one
exists.
Example:
usernames = ['charlie', 'huey', 'peewee', 'mickey']
row_dicts = ({'username': username} for username in usernames)
# Insert 4 new rows.
User.insert_many(row_dicts).execute()
More details at: http://docs.peewee-orm.com/en/latest/peewee/api.html#Model.insert_many

ORMs usually dose not support bulk update and you have to use custom SQL, you can see samples in this link (db.excute_sql)

How to efficiently fetch objects after created using bulk_create function of Django ORM?

I have to insert multiple objects in a table, there are two ways to do that-
1) Insert each one using save(). But in this case there will be n sql dB queries for n objects.
2) Insert all of them together using bulk_create(). In this case there will be one sql dB query for n objects.
Clearly, second option is better and hence I am using that. Now the problem with bulk__create is that it does not return ids of the inserted objects hence they can not be used further to create objects of other models which have foreign key to the created objects.
To overcome this, we need to fetch the objects created by bulk_create.
Now the question is "assuming as in my situation, there is no way to uniquely identify the created objects, how do we fetch them?"
Currently I am maintaining a time_stamp to fetch them, something like below-
my_objects = []
# Timestamp to be used for fetching created objects
time_stamp = datetime.datetime.now()
# Creating list of intantiated objects
for obj_data in obj_data_list:
my_objects.append(MyModel(**obj_data))
# Bulk inserting the instantiated objects to dB
MyModel.objects.bulk_create(my_objects)
# Using timestamp to fetch the created objects
MyModel.objects.filter(created_at__gte=time_stamp)
Now this works good, but will fail in one case.
If at the time of bulk-creating these objects, some more objects are created from somewhere else, then those objects will also be fetched in my query, which is not desired.
Can someone come up with a better solution?

As bulk_create will not create the primary keys, you'll have to supply the keys yourself.
This process is simple if you are not using the default generated primary key, which is an AutoField.
If you are sticking with the default, you'll need to wrap your code into an atomic transaction and supply the primary key yourself. This way you'll know what records are inserted.
from django.db import transaction
inserted_ids = []
with transacation.atomic():
my_objects = []
max_id = int(MyModel.objects.latest('pk').pk)
id_count = max_id
for obj_data in obj_data_list:
id_count += 1
obj_data['id'] = id_count
inserted_ids.append(obj_data['id'])
my_objects.append(MyModel(**obj_data))
MyModel.objects.bulk_create(my_objects)
inserted_ids = range(max_id, id_count)

As you already know.
If the model’s primary key is an AutoField it does not retrieve and
set the primary key attribute, as save() does.
The way you're trying to do, it's usually the way people do.
The other solution in some cases, this way is better.
my_ids = MyModel.objects.values_list('id', flat=True)
objs = MyModel.objects.bulk_create(my_objects)
new_objs = MyModel.objects.exclude(id__in=my_ids).values_list('id', flat=True)

Is there a way to register queries in SQLite?

Instead of writing...
SELECT {long ass list of crap}
FROM long_table_name
WHERE {annoyingly complex criteria} = 1
...every time it's required, is there a way to register this query? A sort of CREATE_QUERY command, if you will?
Thanks in advance.

Use a view.
CREATE VIEW view_name AS
SELECT {long ass list of crap}
FROM long_table_name
WHERE {annoyingly complex criteria} = 1;
Afterwards, you can simply write SELECT * FROM view_name

You can't create stored procedure in SQLite, instead you do some of this
Create a table or separate database where you can store your query and retrieve them using a unique key or name
Create a file and retrieve them
Hard code them in your application
Create class that do the query and call it when you need it with input parameter(if necessary)

Python insert variable in loop into SQLite database using SQLAlchemy

I am using SQLAlchemy with declarative base and Python 2.6.7 to insert data in a loop into an SQLite database.
As brief background, I have implemented a dictionary approach to creating a set of variables in a loop. What I am trying to do is scrape some data from a website, and have between 1 and 12 pieces of data in the following element:
overall_star_ratings = doc.findall("//div[#id='maincontent2']/div/table/tr[2]//td/img")
count_stars = len(overall_star_ratings)
In an empty SQLite database I have variables "t1_star,"..."t12_star," and I want to iterate over the list of values in "overall_star_ratings" and assign the values to the database variables, which varies depending on the page. I'm using SQLAlchemy, so (in highly inefficient language) what I'm looking to do is assign the values and insert into the DB as follows (I'm looping through 'rows' in the code, such that the 'row' command inserts the value for *t1_star* into the database column 't1_star', etc.):
if count==2:
row.t1_star = overall_star_ratings[1].get('alt')
row.t2_star = overall_star_ratings[2].get('alt')
elif count==1:
row.t1_star = overall_star_ratings[1].get('alt')
This works but is highly inefficient, so I implemented a "dictionary" approach to creating the variables, as I've seen in some "variable variables" questions on Stack Overflow. So, here is what I've tried:
d = {}
for x in range(1, count_stars+1):
count = x-1
d["t{0}_star".format(x)] = overall_star_ratings[count].get('alt')
This works for creating the 't1_star,' 't2_star" keys for the dictionary as well as the values. The problem comes when I try to insert the data into the database. I have tried adding the following to the above loop:
key = "t{0}_star".format(x)
value = d["t{0}_star".format(x)]
row.key = value
I've also tried adding the following after the above loop is completed:
for key, value in d.items():
row.key = value
The problem is that it is not inserting anything. It appears that the problem is in the row.key part of the script, not in the value, but I am not certain of that. From all that I can see, the keys are the same strings as I'm seeing when I do it the "inefficient" way (i.e., t1_star, etc.), so I'm not sure why this isn't working.
Any suggestions would be greatly appreciated!
Thanks,
Greg

Python attribute access doesn't work like that. row.key looks up the attribute with the literal name "key", not the value that's in the variable key.
You probably need to use setattr:
setattr(row, key, value)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create SQL database from dict with different features in python - python

ids will get duplicated if you use the sql i would suggest use postgres as it has a jsonfield ypu can put your data there corresponding to each key. Assuming you are not constrained to use SQL.

Related

Refactorable database queries

How to update multiple records using peewee

How to efficiently fetch objects after created using bulk_create function of Django ORM?

Is there a way to register queries in SQLite?

Python insert variable in loop into SQLite database using SQLAlchemy

Categories

Resources