Update logic for data integration

Update logic for data integration - python

I really need help in finding logic for my script for updating :(
I have a SQL Server database with customer data with ModifiedDate column. And I am moving this data to Odoo using External API with exactly the same columns.
My script will run every day and take the new data from the SQL Server database and add it to Odoo; it has to update those customers which have changed also.
For creating, I am taking max(CreateDate) and using it in the query to take customers after it.
But I am not able to find a good solution for update. Any help?
PS: as data is pretty big, I need to find the fastest solution :( Thanks!

I have found a solution.
I will do one time fetch to the data. And then take today as a parameter and use in the query as:
where ModifiedDate > today
If anybody has a better idea, please share :) Thanks :)

Related

Any idea to create the price table which associate with date in Django(Python)

I would like to create a price table by date, I tried to google this for python and django, but still have no idea for this. I don't want to create one to one relationship object like an options. but I would like to create the database associating date and price. Sorry that it may be simple question..
Would it be solution to create a database by using PostgreSQL, and read by django? or any resource / reference can help get me in right direction to access this problem?
Thanks so much

Well there is more to it then assigning a price to a date. You will need one or more tables that hold the establishment(hotels) data. These would include the room information as all rooms will not have the same price. Also the price will probably change over time, so you will need to track that. Then there is the reservation information to track. This is just some of the basics. It is not a simple task by any means. I would try a simpler project to start with to learn Django and how to get data in and out of it.

How can I select distinct names and with their corresponding earliest detected time in django orm?

Im really having a difficult time querying my model in django orm. I need to create a query where it will select distinct names with their corresponding earliest time detected. Any help or advice would be very appreciated.
Note that the real table has hundreds of entries this is only a simple representation.1
Ps. Im just new here, i dont know how to show the image properly. Sorry

MyModel.objects.order_by('timestamp').distinct('name')
I think would work?

Checking if date in date range in Python but over 100k records

We are trying to overhaul a scheduling application at the moment. It is written in Python/Django and using DRF to power a React frontend.
I just have a quick question- apologies if this has been answered already.
I have seen Dietrich Epp's answer to this problem on this thread.
I am just wondering if I have to check if a time is between two datetime objects over 100k records, what the fastest way to achieve this is?
I have considered indexing all of the datetimes in Haystack so that Elasticsearch can deal with the searching but do not want to overcomplicate if it can be solved simply.
Thanks all!

In my opinion, there is no need for haystack, you don't need overcomplicate things. With a database like postgresql or mysql 100k records are not so much.
A query like that should resolve in database in less than 0.1 second, and faster using an index, you should only try to improve the performance if the query is going to be executed more than once a second, so:
just leverage the work to the database
try to create an index on the date column in database
try not to do it on python
and be certain not doing 100k queries (common issue in django)
Use this approach https://stackoverflow.com/a/4668718/912450

Counting Records in Azure Table Storage (Year: 2017)

We have a table in Azure Table Storage that is storing a LOT of data in it (IoT stuff). We are attempting a simple migration away from Azure Tables Storage to our own data services.
I'm hoping to get a rough idea of how much data we are migrating exactly.
EG: 2,000,000 records for IoT device #1234.
The problem I am facing is in getting a count of all the records that are present in the table with some constrains (EG: Count all records pertaining to one IoT device #1234 etc etc).
I did some fair amount of research to find posts that say that this count feature is not implemented in the ATS. These posts however, were circa 2010 to 2014.
I'm assumed (hoped) that this feature has been implemented now since it's now 2017 and I'm trying to find docs to it.
I'm using python to interact with out ATS.
Could someone please post the link to the docs here that show how I can get the count of records using python (or even HTTP / rest etc)?
Or if someone knows for sure that this feature is still unavailable, that would help me move on as well and figure another way to go about things!
Thanks in advance!

Returning number of entities in the table storage is for sure not available in Azure Table Storage SDK and service. You could make a table scan query to return all entities from your table but if you have millions of these entities the query will probably time out. it is also going to have pretty big perf impact on your table. Alternatively you could try making segmented queries in a loop until you reach the end of the table.

Or if someone knows for sure that this feature is still unavailable,
that would help me move on as well and figure another way to go about
things!
This feature is still not available or in other words as of today there's no API which will give you a count of total number of rows in a table. You would have to write your own code to do so.
Could someone please post the link to the docs here that show how I
can get the count of records using python (or even HTTP / rest etc)?
For this you would need to list all entities in a table. Since you're only interested in the count, you can reduce the size response data by making use of Query Projection and fetching just one or two attributes of the entities (may be PartitionKey and RowKey). Please see my answer here for more details: Count rows within partition in Azure table storage.

Materialize data from cache table to production table [PostgreSQL]

I am trying to find the best solution (perfomance/easy code) for the following situation:
Considering a database system with two tables, A (production table) and A'(cache table):
Future rows are added first into A' table in order to not disturb the production one.
When a timer says go (at midnight, for example) rows from A' are incorporated to A.
Dealing with duplicates, inexistent rows, etc have to be considerated.
I've been reading some about Materialized Views, Triggers, etc. The problem is that I should not introduce so much noise in the production table because is the reference table for a server (a PowerDNS server in fact).
So, what do you guys make of it? Should I better use triggers, MV, or programatically outside of the database?? (I'm using python, BTW)
Thanks in advance for helping me.

The "best" solution according to the criteria you've laid out so far would just be to insert into the production table.
...unless there's actually something extremely relevant you're not telling us

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Update logic for data integration - python

I have found a solution. I will do one time fetch to the data. And then take today as a parameter and use in the query as: where ModifiedDate > today If anybody has a better idea, please share :) Thanks :)

Related

Any idea to create the price table which associate with date in Django(Python)

How can I select distinct names and with their corresponding earliest detected time in django orm?

Checking if date in date range in Python but over 100k records

Counting Records in Azure Table Storage (Year: 2017)

Materialize data from cache table to production table [PostgreSQL]

Categories

Resources