Having learned SQL before learning any Python, I have a fairly lengthy program/query that I wrote in SQL server which heavily transforms and calculates the data (Basically taking forecast, inventory, and Bill of Materials, and efficiency data and then automatically generating a production plan. While I am sure there are things I could optimize, the query/program itself is aorund 3,000 lines).
While I have figured out how to update the data in SQL Server using a combination of pandas, pyodbc, and fast_to_sql, I have not been able to find a simple method for running a SQL Server script through Python.
I am sure that I could achieve the same thing by just having the data manipulation occur in python rather than SQL Server, it would be fairly time intensive to translate everything.
If there is anything I can do to clarify please let me know. For reference I am using the 2017 version of Microsoft SQL Server python version 3.8.3.
Try to combine all of your MSSQL scripts into Stored Procedures and then call it from Python.
Related
I have a question and hope someone can direct me in the right direction; Basically every week I have to run a query (SSMS) to get a table containing some information (date, clientnumber, clientID, orderid etc) and then I copy all the information and that table and past it in a folder as a CSV file. it takes me about 15 min to do all this but I am just thinking can I automate this, if yes how can I do that and also can I schedule it so it can run by itself every week. I believe we live in a technological era and this should be done without human input; so I hope I can find someone here willing to show me how to do it using Python.
Many thanks for considering my request.
This should be pretty simple to automate:
Use some database adapter which can work with your database, for MSSQL the one delivered by pyodbc will be fine,
Within the script, connect to the database, perform the query, parse an output,
Save parsed output to a .csv file (you can use csv Python module),
Run the script as the periodic task using cron/schtask if you work on Linux/Windows respectively.
Please note that your question is too broad, and shows no research effort.
You will find that Python can do the tasks you desire.
There are many different ways to interact with SQL servers, depending on your implementation. I suggest you learn Python+SQL using the built-in sqlite3 library. You will want to save your query as a string, and pass it into an SQL connection manager of your choice; this depends on your server setup, there are many different SQL packages for Python.
You can use pandas for parsing the data, and saving it to a ~.csv file (literally called to_csv).
Python does have many libraries for scheduling tasks, but I suggest you hold off for a while. Develop your code in a way that it can be run manually, which will still be much faster/easier than without Python. Once you know your code works, you can easily implement a scheduler. The downside is that your program will always need to be running, and you will need to keep checking to see if it is running. Personally, I would keep it restricted to manually running the script; you could compile to an ~.exe and bind to a hotkey if you need the accessibility.
I have a question on the general strategy of how to integrate data into an MSSQL database.
Currently, I use python for my whole ETL process. I use it to clean, transform, and integrate the data in an MSSQL database. My data is small so I think this process works fine for now.
However, I think it a little awkward for my code to constantly read data and write data to the database. I think this strategy will be an issue once I'm dealing with large amount of data and the constant read/write seems very inefficient. However, I don't know enough to know if this is a real problem or not.
I want to know if this is a feasible approach or should I switch entirely to SSIS to handle it. SSIS to me is clunky and I'd prefer not to re-write my entire code. Any input on the general ETL architecture would be very helpful.
Is this practice alright? Maybe?
There are too many factors to give a definitive answer. Conceptually, what you're doing - Extract data from source, Transform it, Load it to destination, ETL, is all that SSIS does. It likely can do things more efficiently than python - at least I've had a devil of a time getting a bulk load to work with memory mapped data. Dump to disk and bulk insert that via python - no problem. But, if the existing process works, then let it go until it doesn't work.
If your team knows Python, introducing SSIS just to do ETL is likely going to be a bigger maintenance cost than scaling up your existing approach. On the other hand, if it's standard-ish Python + libraries and you're on SQL Server 2017+, you might be able to execute your scripts from within the database itself via sp_execute_external_script
If the ETL process runs on the same box as the database, then ensure you have sufficient resources to support both processes at their maximum observed levels of activity. If the ETL runs elsewhere, then you'll want to ensure you have fast, full duplex connectivity between the database server and the processing box.
Stand up a load testing environment that parallels production's resources. Dummy up a 10x increase in source data and observe how the ETL fares. 100x, 1000x. At some point, you'll identify what development sins you committed that do not scale and then you're poised to ask a really good, detailed question describing the current architecture, the specific code that does not perform well under load and how one can reproduce this load.
The above design considerations will hold true for Python, SSIS or any other ETL solution - prepackaged or bespoke.
How are you today?
I'm a newbie in Python. I'm working with SQL server 2014 and Python 3.7. So, my issue is: When any change occurs in a table on DB, I want to receive a message (or event, or something like that) on my server (Web API - if you like this name).
I don't know how to do that with Python.
I have an practice (an exp. maybe). I worked with C# and SQL Server, and in this case, I used "SQL Dependency" method in C# to solve that. It's really good!
Have something like that in Python? Many thank for any idea, please!
Thank you so much.
I do not know many things about SQL. But I guess there are tools for SQL to detect those changes. And then you could create an everlasting loop thread using multithreading package to capture that change. (Remember to use time.sleep() to block your thread so that It wouldn't occupy the CPU for too long.) Once you capture the change, you could call the function that you want to use. (Actually, you could design a simple event engine to do that). I am a newbie in Computer Science and I hope my answer is correct and helpful. :)
I'm connecting to MySQL with the MySQLdb module. I don't want to use Python's time functions: I want to know how long the query ran within MySQL, i.e. the number I see after I've run a query within MySQL directly.
I do see a thread where this is addressed as something one could eventually dig down to, but I was hoping that since MySQL reports that number, the Python connection would have picked it up somewhere.
May this help?
SET profiling = 1;
Run your query;
SHOW PROFILES;
See here:http://dev.mysql.com/doc/refman/5.7/en/show-profile.html
Because of the above commands will be removed in the future version, Performance Schema can be used http://dev.mysql.com/doc/refman/5.7/en/performance-schema.html and http://dev.mysql.com/doc/refman/5.7/en/performance-schema-query-profiling.html.
On the above links, there are more details on Query Profiling Using Performance Schema.
I am newbie to Titan graph database.
I am trying to process local data, and insert them into titan db.
I am looking for the program language or script language that can do fast way to process local data and update/insert titan db.
bulbs, is python interface, using REST API to update titan db. but I see sometimes the program hang over there.
Can I use shell script to process the file, and call gremlin script to update titan db?
Thanks a lot for advice.
If the graph schema is not too complex and the data in a single file, the easiest way is to simply use a Gremlin script. Check out this simple recipe to load an edge list:
http://gremlindocs.com/#recipes/reading-from-a-file
If you have a large amount of data, consider using the BatchGraph wrapper for easier programming, auto-commit and better performance:
https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation
Once you have your script, you could run it in the Gremlin REPL or execute it from shell script with gremlin.sh:
https://github.com/tinkerpop/gremlin/wiki/Using-Gremlin-through-Groovy#gremlin-and-groovy-shell
Note that your question is about Titan, but I've responded generically with Blueprints in mind (so you will see TinkerGraph examples in many of these links), but since Titan is Blueprints compatible the code should work just as well for Titan.
I know this is an old question, but gremlin-migrate is an npm package that runs gremlin scripts in the order in which they were intended. I'd use it not really for one-off data loads, but more for continuously ensuring your DB schema etc is correct and up-to-date. Good to include in your CI/CD pipeline :-).
Disclosure: I'm the author of the tool, which I created after reading this and not finding a gremlin based migration tool in npm.