_id | name |
------------------
ew293 item_1
13fse item_2
dsv82 item_3
----------------
Lets assume this is the part of the data base, and i want to fetch a limited data
Ex:
db.collection.find({}, {name:1,_id:0}).limit(40)
Everytime i access, i want next set of 40 entries.
Is there any such command -- like the below one
db.collection.find({}, {name:1,_id:0}).limit(40).next()
I want next set of data only if it exists
And i access this in Python, so need a python code on this
Use the skip() method. This is supported in PyMongo (Python Mongo Driver) as well.
To get first 40 entries:
db.collection.find({}, {name:1,_id:0}).limit(40)
To get the next 40 entries:
db.collection.find({}, {name:1,_id:0}).skip(40).limit(40)
To get another 40 entries:
db.collection.find({}, {name:1,_id:0}).skip(80).limit(40)
Related
I am using Multiprocessing.Pool in my current program because I wanted to increase the fetching and dumping of data from on-premises data center to another db in a different server. The current rate is too slow for MB worth of data, this seems to work best in my current requirement:
def fetch_data()
select data from on_prem_db (id, name...data)
#using Pool and starmap,
#runs dump_data function in 5 parallel threads
dump_data()
pass
def dump_data()
insert entry in table_f1
insert entry in table_g1
Now I am running into the issue where sometimes, multiple threads fetching already processed granules which leads to unique key violation.
eg: first thread fetch [10,20,40,50,70]
second thread fetch[30,40,60,70,80]
rows with id 40 and 70 and duplicated. I am supposed to see 10 entry in my db but I see only 8 entries, and 2 of them raises unique key violation.
How can I make sure that different threads fetch different rows from my source db which is on-prem db so that my program don't try to insert already inserted rows?
eg of my select query:
fetch_data_list_of_ids = [list of ids of processed data]
data_list = list(itertools.islice(on_prem_db.get_data(table_name),5))
Is there a way I can make a list and append the row ids of already processed data in fetch_data () ?
And every time data_list runs a new query to fetch the data, next thing i would do is check if the newly fetched data has ids in fetch_data_list_of_ids list ?
Or is there any other way I can do it to make sure duplicate entries are not being processed??
I wanna store data in a Django Table as if I am storing data into some category and item access them in FIFO manner.
Here I should be able to store a data item to a category.
Category:
ID: 1,
Name: firstname
Percentage: 40
I have to store them in a different category pass if > 40 and in fail if <40.
Table Fail:
ID 1 --> Inserted first
ID 2 --> Insrted second
ID 3 --> Inserted third
Table Pass:
ID 4 --> Inserted first
ID 5 --> Insrted second
ID 6 --> Inserted third
For some reason, I have to rank them using the First In First Out(FIFO) method.
What is the best way to do that?
I may not have understood very well your question as I miss your data models and the real function you want to use for that purpose, but how you are doing it currently in pseudocode is the correct way.
What you insert first, if you did not define a custom PrimaryKey field with different structure, it will have autoincremented IDs, as it is the default behavior for tables in Django. After the insertion, you just need to query over them and select the desired order with the order_by function.
I am currently working on a python projekt with tensorflow and I need to preprocess my data.
The data I want to use is stored in an sqlite3 database with the columns:
timestamp|dev|event
10:00 |01 | on
11:00 |02 | off
11:15 |01 | off
11:30 |02 | on
And I would like to export the Data into a file (.csv) looking like that:
Timestamp|01 |02 |...
10:00 |on |0 |...
11:00 |on |off|...
11:15 |off|off|...
11:30 |off|on |...
Which always has the latest information of every Device associated with the current timestamp and with every new timestamp the old values should stay and if there is an update only those value(s) should be updated.
The number of Devices does not change and I can find that number with
SELECT COUNT(DISTINCT dev) FROM table01;
Currently that Number is 38 diffrent devices and a total of 10000 entries.
Is there a way to to this computation with sqlite3 or do I have to write a program in python to process the data. I am new to both topics.
~Fabian
You can work it in sqlite, something along this lines
select
timestamp,
group_concat(case when dev="01" then event else "" end, "") as D01,
group_concat(case when dev="02" then event else "" end, "") as D02
from
table01
group by
timestamp;
Basically you are pivoting the table.
Challenges are, as the pivot needs to be kind of dynamic i.e. the list of the devices are not fixed. You need to query the list of devices and then build this query i.e. case when else part based on the list of the devices.
Also, generally you need to group based on the timestamp, as for device status for different devices will be in different row for a single timestamp.
Also if the {timestamp, device} is not unique you need make it unique.
So I am using Aurora MYSQL DB, and my AWS Lambda instance needs to do the following.
Assume a table with two columns, ID, and Translated ID.
I have acess to a Lambda function, which takes the ID as input, and outputs the Translated ID. It can also take a list of IDs, and give back list of translated IDs.
The problem is right now, I am doing it row by row with the workflow as:
1. get top 100 Rows from table, where translated ID is null,
2. for each row, retrieve the ID, use the API to get the translated ID.
3. Update the row with the translated id.
4. rinse and repeat for all 100 rows.
The problem is due to the latency of involving the api in between, the row by row operaton is causing the lambda function to timeout. Is there any way to do a batch operation, while still aligning the translated IDS, vertically with the corresponding IDs?. Something like:
get top 100 IDS from table, where translated ID is null.
Use the API to take the list of all 100 IDS, and get a corresponding list of 100 translated IDs.
Pefro, (in one single update command preferably) update all the 100 ID rows, with their corresponding Translated-id column.
4 queries:
(0). ensure the environment is clean (you can omit this one if you are never reusing a database connection).
DROP TEMPORARY TABLE IF EXISTS my_updates;
(1). Create a temp table to hold the new values.
CREATE TEMPORARY TABLE my_updates (
id INT NOT NULL,
translated_id INT NOT NULL,
PRIMARY KEY(id)
);
(2). Insert all the new values in a bulk insert.
INSERT INTO my_updates (id, translated_id)
VALUES (?,?), (?,?), (?,?), ...
Repeat (?,?) × 100. Pass an array of 200 elements to this query. Some MySQL libraries have shortcuts for multiple row inserts, others you need to build the row parameter placeholder sets.
(3). You now have all 100 new tuples on the server, so you can ask it to update... join.
UPDATE base_table b
JOIN my_updates m ON m.id = b.id
SET b.translated_id = m.translated_id;
You could also do this in one query, though it is a little more convoluted:
UPDATE base_table
SET translated_id = CASE id
WHEN #i1 THEN #ti1
WHEN #i2 THEN #ti2
...
WHEN #i100 THEN #ti100
ELSE translated_id END
WHERE id IN (#i1,#i2,...#i100);
I've used #value here as placeholders to explain what goes where, since it would be less intuitive than the example above, but this query should actually be done with ? placeholders as well. The argument passed would be an array of 300 members, with 100 sets of (id,translated_id) and then all of the (id) values again for the WHERE. The ELSE is a safety precaution... it should never actually be reached, but no data will be overwritten if it is.
I have some predefined columns in a Cassandra table/column family. I also get a another JSON /dictionary object which has key:value pair which are not predefined and hence were not included in the create table statement (I am using cql3 library on python). I want to add this data to my Cassandra table, I want a unique key in the json object to become a new column. If a key which has already been made a column is present in the json, Cassandra should just reject that query without throwing an exception. It throws an exception if I try to simply alter the table to add a pre existing column. Is there an alter table add column if column doesn't exists type query or should I handle it through exception handling?
Currently there isn't a mechanism for doing this. The IF NOT EXISTS functionality is currently only for CREATE and DROP modification statements (take a look at Conditional schema modifications).
I would remodel the table to allow what you call the key to be in a column (as in column of a table) and use a collection to store its various values ending up with something like:
key | date_added | values
--------+--------------------------------------+-------
1 | e64d8be0-4f2a-11e4-b409-138283d4b034 | ['item1', 'item2']
7 | e64d8be0-4f2a-11e4-b409-138283d4b034 | ['item1', 'item5']
1 | e64d8be0-4f2a-11e4-b409-138283d4b034 | ['item4']
On a side note, if you have a IF NOT EXISTS alter statement every time you're carrying out a query you would incur a lot of overhead verifying that the row doesn't exist.