I an new to code and I would like to know whether it is possible to upload multiple excel documents into one dataset using python? If so, what is the code for this? All of the code I have seen is used for uploading one single excel document. Moreover, do I have to convert the data into CSV form first or I can use code to convert it into CSV after uploading it?
I am using jupyter notebook in anaconda to run my python code.
Your assistance is greatly appreciated.
By uploading, do you mean reading a file? If so, just create a list or dictionary, open the files and write them 1 by 1 into your list / dictionary. Also, it would be really helpful creating CSV files first. If you want to do it manually you can easily by saving the file as CSV in Excel.
Related
I'm working on a project that needs to update a CSV file with user info periodically. The CSV is stored in an S3 bucket so I'm assuming I would use boto3 to do this. However, I'm not exactly sure how to go about this- would I need to download the CSV from S3 and then append to it, or is there a way to do it directly? Any code samples would be appreciated.
Ideally this would be something where DynamoDB would work pretty well (as long as you can create a hash key). Your solution would require the following.
Download the CSV
Append new values to the CSV Files
Upload the CSV.
A big issue here is the possibility (not sure how this is planned) that the CSV file is updated multiple times before being uploaded, which would lead to data loss.
Using something like DynamoDB, you could have a table, and just use the put_item api call to add new values as you see fit. Then, whenever you wish, you could write a python script to scan for all the values and then write a CSV file however you wish!
Thanks for taking the time to read my question.
I am working on a personal project to learn python scripting for excel, and I want to learn how to move data from one workbook to another.
In this example, I am emulating a company employee ledger that has name, position, address, and more (The organizations is by row so every employee takes up one row). But the project is to have a selected number of people be transferred to a new ledger (another excel file). So I have a list of emails in a .txt file (it could even be another excel file but I thought .txt would be easier), and I would want the script to run through the .txt file, get the emails, and look for any rows that have a matching email address(all emails are in cell 'B'). And if any are found, then copy that entire row to the new excel file.
I tried a lot of ways to make this work, but I could not figure it out. I am really new to python so I am not even sure if this is possible. Would really appreciate some help!
You have essentially two packages that will allow manipulation of Excel files. For reading in data and performing analysis the standard package for use is pandas. You can save the files as .xlsx however you are only really working with base table data and not the file itself (IE, you are extracing data FROM the file, not working WITH the file)
However what you need is really to perform manipulation on Excel files directly which is better done with openpyxl
You can also read files (such as your text file) using with open function that is native to Python and is not a third party import like pandas or openpyxl.
Part of learning to program includes learning how to use documentation.
As such, here is the documentation you require with sufficient examples to learn openpyxl: https://openpyxl.readthedocs.io/en/stable/
And you can learn about pandas here: https://pandas.pydata.org/docs/user_guide/index.html
And you can learn about python with open here: https://docs.python.org/3/tutorial/inputoutput.html
Hope this helps.
EDIT: It's possible I or another person can give you a specific example using your data / code etc, but you would have to provide it fully. Since you're learning, I suggest using the documentation or youtube.
I want to automatize the ETL process from .csv files into SQL Server.
First of all I have a issue with .csv files who have this structure (see the next line). As you can see I need to delete the first line and the last 4.
https://ibb.co/Z6rrbPY
I try using pandas and csv services from python, but I don't find the solution. I'm stuck in this part and is the beginning of what I'm trying to do
Let me know what can I do.
Thanks a lot.
I'm working with slightly big data and i need to write this data to an xlsx file. Sometimes the size of this files can be 15GB. I have a python code that gets data as dataframes and writes data to excel continuously so i need to write data to an existing excel and the existing sheet. I was using 'openpyxl'.
There are two problems that I faced while working with that library.
Firstly to append an existing excel it needs to load workbook which is an impossible thing for me because of the data size. I must use
the lowest RAM I can use. -
Secondly this lib is useful only writing
to the different sheets. When I'm trying to write data to same sheet
even if I give the 'startrow' for the saving process it deletes the
old data and writes new one starting from that row.
I already tried the solution available here to address my problem but it doesn't fit my requirements.
Do you have any idea how I can do this?.
I'm currently making a program in python that creates data and then gets stored into a text file. The data is in a column like formation and when i change the file format to csv, it opens LibreOffice Calc (raspberry pi's version of excel) which is exactly how i wanted the data to be formatted.
But i want to take it one step further and convert my CSV file data into a PDF. I've looked on the web and it says how to convert a pdf into a csv which isn't what i want. I also saw something called pyPDF but im not sure about if that would be of any use.
This is the string of data that is being looped 10 times,
resultStr = 'Test,{},InNum,{},stats,{},Duration(ms),{} \n'.format("OFF",inPin, result, round(duration*1000))
Once the loop finishes, a text file gets opened and the 'resultStr' is the string is getting stored.
Thanks everyone for your help,
~Neamus
Using ReportLab, you can programatically generate PDF documents with your data. There are plenty of examples available to demonstrate the framework and how to use it. In your case, you should simply append to your document story in a loop for each of your CSV result strings.