Python writing data on SQL from .csv files

Python writing data on SQL from .csv files - python

I want to automatize the ETL process from .csv files into SQL Server.
First of all I have a issue with .csv files who have this structure (see the next line). As you can see I need to delete the first line and the last 4.
https://ibb.co/Z6rrbPY
I try using pandas and csv services from python, but I don't find the solution. I'm stuck in this part and is the beginning of what I'm trying to do
Let me know what can I do.
Thanks a lot.

Related

How to use Python to automate the movement of data between two Excel workbooks with specific parameters

Thanks for taking the time to read my question.
I am working on a personal project to learn python scripting for excel, and I want to learn how to move data from one workbook to another.
In this example, I am emulating a company employee ledger that has name, position, address, and more (The organizations is by row so every employee takes up one row). But the project is to have a selected number of people be transferred to a new ledger (another excel file). So I have a list of emails in a .txt file (it could even be another excel file but I thought .txt would be easier), and I would want the script to run through the .txt file, get the emails, and look for any rows that have a matching email address(all emails are in cell 'B'). And if any are found, then copy that entire row to the new excel file.
I tried a lot of ways to make this work, but I could not figure it out. I am really new to python so I am not even sure if this is possible. Would really appreciate some help!

You have essentially two packages that will allow manipulation of Excel files. For reading in data and performing analysis the standard package for use is pandas. You can save the files as .xlsx however you are only really working with base table data and not the file itself (IE, you are extracing data FROM the file, not working WITH the file)
However what you need is really to perform manipulation on Excel files directly which is better done with openpyxl
You can also read files (such as your text file) using with open function that is native to Python and is not a third party import like pandas or openpyxl.
Part of learning to program includes learning how to use documentation.
As such, here is the documentation you require with sufficient examples to learn openpyxl: https://openpyxl.readthedocs.io/en/stable/
And you can learn about pandas here: https://pandas.pydata.org/docs/user_guide/index.html
And you can learn about python with open here: https://docs.python.org/3/tutorial/inputoutput.html
Hope this helps.
EDIT: It's possible I or another person can give you a specific example using your data / code etc, but you would have to provide it fully. Since you're learning, I suggest using the documentation or youtube.

What is the correct way to upload files to a SQL Server inside my web application?

I am developing a web application in which users can upload excel files. I know I can use the OPENROWSET function to read data from excel into a SQL Server but I am refraining from doing so because this function requires a file path.
It seems kind of indirect as I am uploading a file to a directory and then telling SQL Server go look in that directory for the file instead of just giving SQL Server the file.
The other option would be to read the Excel file into a pandas dataframe and then use the to_sql function but pandas read_excel function is quite slow and the other method I am sure would be faster.
Which of these two methods is "correct" when handling file uploads from a web application?
If the first method is not frowned upon or "incorrect", then I am almost certain it is faster and will use that. I just want an experienced developers thoughts or opinions. The webapp's backend is Python and flask.

If I am understanding your question correctly, you are trying to load the contents of an xls(s) file into a SQLServer database. This is actually not trivial to do, as depending on what is in the Excel file you might want to have one table, or more probably multiple tables based on the data. So I would step back for a bit and ask three questions:
What is the data I need to save and how should that data be structured in my SQL tables. Forget about excel at this point -- maybe just examine the first row of data and see how you need to save it.
How do I get the file into my web application? For example, when the user uploads a file you would want to use a POST form and send the file data to your server and your server to save that file (for example, either on S3, or in a /tmp folder, or into memory for temporary processing).
Now that you know what your input is (the xls(x) file and its location) and how you need to save your data (the sql schema), now it's time to decide what the best tool for the job is. Pandas is probably not going to be a good tool, unless you literally just want to load the file and dump it as-is with minimal (if any) changes to a single table. At this point I would suggest using something like xlrd if only xls files, or openpyxl for xls and xlsx files. This way you can shape your data any way you want. For example, if the user enters in malformed dates; empty cells (should they default to something?); mismatched types, etc.
In other words, the task you're describing is not trivial at all. It will take quite a bit of planning and designing, and then quite a good deal of python code once you have your design decided. Feel free to ask more questions here for more specific questions if you need to (for example, how to capture the POST data in a file update or whatever you need help with).

How to upload multiple excel documents into one dataset using python?

I an new to code and I would like to know whether it is possible to upload multiple excel documents into one dataset using python? If so, what is the code for this? All of the code I have seen is used for uploading one single excel document. Moreover, do I have to convert the data into CSV form first or I can use code to convert it into CSV after uploading it?
I am using jupyter notebook in anaconda to run my python code.
Your assistance is greatly appreciated.

By uploading, do you mean reading a file? If so, just create a list or dictionary, open the files and write them 1 by 1 into your list / dictionary. Also, it would be really helpful creating CSV files first. If you want to do it manually you can easily by saving the file as CSV in Excel.

Take data from an xls sheet and add them into python commands

I've been asked to create a Python script to automate a server deployment for 80 retail stores.
As part of this script, I have a secondary script that I call to change multiple values in 9 XML files, however, the values are unique for each store, so this script needs to be changed each time, but after I am gone, this is going to be done by semi / non-technical people, so we don't want them to change the Python scripts directly for fear of breaking them.
This in mind, I would like to have these people input the store details into an XLS sheet, and a python file read this sheet and put the data it finds into the existing python script with the data to be changed.
The file will be 2 columns, with the required data in the 2nd one.
I'm sorry if this is a long explanation, but that is the gist of it. I'm using python 2.6. Does anyone have a clue about how I can do this? Or which language might be better for this. I also know Bash and Javascript.
Thanks in advance

Depending on the complexity and the volume of your data
for small Openpyxl,
for large pandas

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!

You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.