I have a large database table of more than a million records and django is taking really long time to retrieve the data. When I had less records the data was retrieved fast.
I am using the get() method to retrieve the data from the database. I did try the filter() method but when I did that then it gave me entire table rather than filtering on the given condition.
Currently I retrieve the data using the code shown below:
context['variables'] = variable.objects.get(id=self.kwargs['pk'])
I know why it is slow, because its trying to go through all the records and get the records whose id matched. But I was wondering if there was a way I could restrict the search to last 100 records or if there is something I am not doing correctly with the filter() function. Any help would be appretiated.
Related
I have a table where I wrote 1.6 million records, and each has two columns: an ID, and a JSON string column.
I want to select all of those records and write the json in each row as a file. However, the query result is too large, and I get the 403 associated with that:
"403 Response too large to return. Consider specifying a destination table in your job configuration."
I've been looking at the below documentation around this and understand that they recommend specifying a table for the results and viewing them there, BUT all I want to do is select * from the table, so that would effectively just be copying it over, and I feel like I would run into the same issue querying that result table.
https://cloud.google.com/bigquery/docs/reference/standard-sql/introduction
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.allow_large_results
What is the best practice here? Pagination? Table sampling? list_rows?
I'm using the python client library as stated in the question title. My current code is just this:
query = f'SELECT * FROM `{project}.{dataset}.{table}`'
return client.query(query)
I should also mention that the IDs are not sequential, they're just alphanumerics.
The best practice and efficient way is to export your data and then download it instead of querying the whole table (SELECT *).
From there, you may extract your needed data from the exported files (eg. CSV, JSON, etc) using python code without having to wait for your code to finish the SELECT * query.
Is there any solution we can retrieve Salesforce data from Python with more than 2000 records for each chunk? I have used REST API to retrieve data and check nextRecordsUrl for the next chunk. But if it is a million records, this solution will take time. I tried to find a Salesforce parameter to increase the number of records for each chunk (>2000 records) but didn't find it yet.
Another idea is if we know how many nextRecordsUrl, we can use multi-threading in Python to retrieve data. But it seems we need to submit each nextRecordsUrl to get the next one.
If you have others ideas, pls suggest them. Currently, I can't use some filter conditions in SQL to limit data.
You could look into using Bulk API query, it'd let you return data in 10K chunks. But it comes with a bit of thinking shift. Your normal API is synchronous (give me next chunk, wait, give me next chunk, wait). With Bulk API you submit the job and from time to time you ask "is it done yet".
There's even a feature called "PK chunking" (split the results by primary key)
Consider going through the trailhead: https://trailhead.salesforce.com/content/learn/modules/large-data-volumes
And maybe play with Salesforce's Data Loader. Query your stuff normal way and measure the time, then with bulk api option selected. should give you idea what's the bottleneck and whether the big rewrite will gain anything.
https://developer.salesforce.com/docs/atlas.en-us.230.0.api_asynch.meta/api_asynch/asynch_api_bulk_query_intro.htm
I am trying to get the number of items in a table from dynamo db.
Code
def urlfn():
if request.method == 'GET':
print("GET REq processing")
return render_template('index.html',count = table.item_count)
But I am not getting the real count. I found that there is a 6 hour delay in getting the real count. Is there any way to get the real count of items in a table.
assuming in your code above that table is a service resource already defined, you can use:
len(table.scan())
this will give you an up to date count of items in your table. BUT it reads every single item in your table - for significantly large tables this can take a long time. AND it uses read capacity on your table to do so. So, for most practical purposes it really isn't a very good way to do so,
Depending on your use case here there are a few other options:
add a meta item that is updated everytime a new document is added to the dynamo. This is just document of whatever hash key / sort key combination you want with an attribute of "value" that you add 1 to every time you add a new item to the database.
you forget about using Dynamo. Sorry that sounds harsh, but DynamoDB is a no-sql database and attempting to use it in the same manner as a traditional relational database system is folly. # of 'rows' is not something that Dynamo is designed for because thats not its use case scope. There are no rows in Dynamo - there are documents, and those documents are partitioned, and you access small chunks of them at a time - meaning that the back end architecture does not lend itself for knowing what the entire system has in it at any given time (hence the 6 hour delay)
I am new to dynamo db and want to compare values of a list(python) with attribute value of dynamo db table.
I am able to compare single value by using query with index key:
response = dynamotable.query(
IndexName='Classicmovies',
KeyConditionExpression = Key('DDT').eq('BBB-rrr-jjj-mq'))
but want to compare entire list which should be in .eq as follow:
movies =['ddd-dddss-gdgdg','kkdf-dfdfd-www','dfw-gddf-gssg']
I have searched alot and not able to figure out right way.
Hard to say what you are trying to do. A query will only retrieve a bunch of records belonging to a single item collection. Maybe what you need is a scan but please avoid heavily using scans unless of its for maintenance purposes.
I am trying to enter about 1 millions records to PostgreSql since I create table dynamically I don't have any models associated with it so I cant perform bulk_insert of django
How is there any method of inserting data in a efficient manner.
I am trying using single insert statement but this is very time consuming and too slow
Your problem is not about django. You better carry the data(not necesarry but could be good) to the server that you want to insert and create a simple python program or sth. else to insert the data.
Avoid to insert a data at this size by using an http server.