I have table Users in my database:
id
name
last_name
status
1
John
Black
active
2
Drake
Bell
disabled
3
Pep
Guardiola
active
4
Steve
Salt
active
users_data = []
I would like to get all id and all status row from this db and write to empty dict.
What kind of query should I use? Filter, get or something else?
And what if I would like to get one column, not two?
If, you want to access the values of specific columns for all instances of a table :
id_status_list = Users.objects.values_list('id', 'status')
You can have more info here, in the official documentation
Note that Django provides an ORM to ease queries onto the database (See this page for more info on the queries) :
To fetch all column values of all users instances from your Users table :
users_list = Users.objects.all()
To fetch all column values of specific Users in the table :
active_users_list = Users.objects.filter(status="active")
To fetch all column values of a specific User in the table :
user_33 = Users.objects.get(pk=33)
Use the .values() method:
>>> Users.objects.values('id', 'status')
[{'id': 1, 'status': 'actice'}, {}]
The result is a QuerySet which mostly behaves like a list, you can then do list(Users.objects.values('id', 'status')) to get the list object.
users_data = list(Users.objects.values('id', 'status'))
yourmodelname.objects.values('id','status')
this code show you db in two column include id and status
users_data = list(yourmodelname.objects.values('id','status'))
and with this code you can show your result on dictionary
Suppose your model name is User. For the first part of the question use this code:
User.objects.value('id', 'sataus') # to get a dictionary
User.objects.value_list('id', 'sataus') # to get a list of values
And for the second part of the question: 'And what if I would like to get one column, not two?' you can use these codes:
User.objects.values('id') # to get a dictionary
User.objects.values_list('id') # to get a list of values
User.objects.values('status') # to get a dictionary
User.objects.values_list('status') # to get a list of values
Related
I'll demonstrate by using an example. This is the model (the primary key is implicit):
class Item(models.Model):
sku = models.CharField(null=False)
description = models.CharField(null=True)
I have a list of skus, I need to get the latest descriptions for all skus in the filter list that are written in the table for the model Item. Latest item == greatest id.
I need a way to annotate the latest description per sku:
Item.objects.values("sku").filter(sku__in=list_of_skus).annotate(latest_descr=Latest('description').order_by("-id")
but this won't work for various reasons (excluding the missing aggregate function).
Item.objects.values("sku").filter(sku__in=list_of_skus).annotate(latest_descr=Latest('description').lastest("-id")
Or use this
Item.objects.values("sku").filter(sku__in=list_of_skus).annotate(latest_descr=Latest('description').order_by("-id").reverse()[0]
I used postgres ArrayAgg aggregate function to aggregate the latest description like so:
from django.contrib.postgres.aggregates import ArrayAgg
class ArrayAggLatest(ArrayAgg):
template = "(%(function)s(%(expressions)s ORDER BY id DESC))[1]"
Item.objects.filter(sku__in=skus).values("sku").annotate(descr=ArrayAggLatest("description"))
The aggregate function aggregates all descriptions ordered by descending ID of the original table and gets the 1st element (0 element is None)
Answer from #M.J.GH.PY or #dekomote war not correct.
If you have a model:
class Item(models.Model):
sku = models.CharField(null=False)
description = models.CharField(null=True)
this model has already by default order_by= 'id',
You don't need annotate something. You can:
get the last object:
Item.objects.filter(sku__in=list_of_skus).last()
get the last value of description:
Item.objects.filter(sku__in=list_of_skus).values_list('description', flat=True).last()
Both variants give you a None if a queryset is empty.
Man, was I having trouble with how to word the title.
Summary: For a database project in uni, we have to import 1 million rows of data into a database, where each row represents an article scraped from the internet. One of the columns in this data, is the author of the article. As many articles were written by the same author, I wanted to create a table separate from the articles, that linked each unique author to a primary key and then I wanted to replace the author string in the article table, with the key for that author in the other table. How is this done in the most efficient way and is it possible to do it in a way that ensures a deterministic output, in that a specific author string would ALWAYS map to a certain pkey, no matter the order the article rows "come in" when this method creates that table.
What I've done: The way I did it, was to (in Python using Pandas), go through all 1 million article rows and make a unique list of all the authors I found. Then I created a dictionary based on this list (sorted). Then I used this dictionary to replace the author string in the articles tables with a key corresponding to a specific author, and then used the dict to create my authors table. However, as I see it, if a row was inserted into my data with an author not found the first time around, it could mess with the alphabetical order my method adds the authors to the dict in, thus making it not-so-deterministic. So, what do people normally do in these instances? Can SQL on the 1mio articles directly make a new authors table with unique authors and keys, and replace the author string in the articles table? Could it be an idea to use hashing with a specific hash key to ensure a certain string always maps to a certain key, or?
Show some code:
def get_authors_dict():
authors_lists = []
df = pd.read_csv("1mio-raw.csv", usecols=['authors'], low_memory=True)
unique_authors_list = df['authors'].unique()
num_of_authors = len(unique_authors_list)
authors_dict = {}
i = 0
prog = 0
for author in unique_authors_list:
try:
authors_dict[author]
i += 1
except KeyError:
authors_dict[author] = i
i += 1
print(prog / num_of_authors * 100, "%")
prog += 1
return authors_dict
authors_dict = get_authors_dict()
col1_author_id = list(authors_dict.values())
col2_author_name = list(authors_dict.keys())
data_dict = {'col1': col1_author_id,
'col2': col2_author_name}
df = pd.DataFrame(data=data_dict, columns=['col1', 'col2'])
df.to_csv('author.csv', index=False, header=False, sep="~")
f = open('author.csv', encoding="utf8")
conn = psycopg2.connect(--------)
cur = conn.cursor()
cur.copy_from(f, 'author', sep='~')
conn.commit()
cur.close()
# Processing all the 1mio rows again in seperate file
# and making changes to the dataframe using the dict:
sample_data['authors'] = sample_data['authors'].map(authors_dict)
So if i understand you correctly, you want to create a SQL table which connects authors to articles. Your problem is, that you do not know what primary key you should use in such a table, since an author might have written more than one article.
In this cases, instead of trying to do something clever, i would just use a composite primary key for your table. This means you define the author row in connection with the title/publishing date/identfier of the article as the primary key for the table. Thus, each row of your table has a unique identifier (if no author has written two identical articles). This is independent of your python code, as this needs to be defined in the database.
This question might help you to define a composite primary key.
I have the following model of a blog post:
title = db.Column(db.String())
content = db.Column(db.String())
tags = db.Column(ARRAY(db.String))
Tags field can be an empty list.
Now I want to select all distinct tags from the database entries with max performance - excluding empty arrays.
So, say I have 3 records with the following values of the tags field:
['database', 'server', 'connection']
[]
['connection', 'security']
The result would be ['database', 'server', 'connection', 'security']
The actual order is not important.
The distinct() method should still work fine with array columns.
from sqlalchemy import func
unique_vals = BlogPost.query(func.unnest(BlogPost.tags)).distinct().all()
https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=distinct#sqlalchemy.orm.query.Query.distinct
This would be identical to running a query in postgres:
SELECT DISTINCT unnest(tags) FROM blog_posts
If you can process the results after(usually you can) and don't want to use a nested query for this sort of thing, I usually resort to doing something like;
func.array_agg(func.array_to_string(BlogPost.tags, "||")).label("tag_lists")
and then split on the join string(||) after.
I want to insert data from a dictionary into a sqlite table, I am using slqalchemy to do that, the keys in the dictionary and the column names are the same, and I want to insert the values into the same column name in the table. So this is my code:
#This is the class where I create a table from with sqlalchemy, and I want to
#insert my data into.
#I didn't write the __init__ for simplicity
class Sizecurve(Base):
__tablename__ = 'sizecurve'
XS = Column(String(5))
S = Column(String(5))
M = Column(String(5))
L = Column(String(5))
XL = Column(String(5))
XXL = Column(String(5))
o = Mapping() #This creates an object which is actually a dictionary
for eachitem in myitems:
# Here I populate the dictionary with keys from another list
# This gives me a dictionary looking like this: o={'S':None, 'M':None, 'L':None}
o[eachitem] = None
for eachsize in mysizes:
# Here I assign values to each key of the dictionary, if a value exists if not just None
# product_row is a class and size and stock are its attributes
if(product_row.size in o):
o[product_row.size] = product_row.stock
# I put the final object into a list
simplelist.append(o)
Now I want to put each the values from the dictionaries saved in simplelist into the right column in the sizecurve table. But I am stuck I don't know how to do that? So for example I have an object like this:
o= {'S':4, 'M':2, 'L':1}
And I want to see for the row for column S value 4, column M value 2 etc.
Yes, it's possible (though aren't you missing primary keys/foreign keys on this table?).
session.add(Sizecurve(**o))
session.commit()
That should insert the row.
http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements
EDIT: On second read it seems like you are trying to insert all those values into one column? If so, I would make use of pickle.
https://docs.python.org/3.5/library/pickle.html
If performance is an issue (pickle is pretty fast, but if your doing 10000 reads per second it'll be the bottleneck), you should either redesign the table or use a database like PostgreSQL that supports JSON objects.
I have found this answer to a similar question, though this is about reading the data from a json file, so now I am working on understanding the code and also changing my data type to json so that I can insert them in the right place.
Convert JSON to SQLite in Python - How to map json keys to database columns properly?
I am looking to select all values from one column which are distinct using Peewee.
For example if i had the table
Organization Year
company_1 2000
company_1 2001
company_2 2000
....
To just return unique values in the organization column [i.e.company_1 and company_2]
I had assumed this was possible using the distinct option as documented http://docs.peewee-orm.com/en/latest/peewee/api.html#SelectQuery.distinct
My current code:
organizations_returned = organization_db.select().distinct(organization_db.organization_column).execute()
for item in organizations_returned:
print (item.organization_column)
Does not result in distinct rows returned (it results in e.g. company_1 twice).
The other option i tried:
organization_db.select().distinct([organization_db.organization_column]).execute()
included [ ] within the disctinct option, which although appearing to be more consistent with the documentation, resulted in the error peewee.OperationalError: near "ON": syntax error:
Am i correct in assume that it is possible to return unique values directly from Peewee - and if so, what am i doing wrong?
Model structure:
cd_sql = SqliteDatabase(sql_location, threadlocals=True, pragmas=(("synchronous", "off"),))
class BaseModel(Model):
class Meta:
database = cd_sql
class organization_db(BaseModel):
organization_column = CharField()
year_column = CharField()
So what coleifer was getting at is that Sqlite doesn't support DISTINCT ON. That's not a big issue though, I think you can accomplish what you want like so:
organization_db.select(organization_db.organization).distinct()