Having a trouble with table reading on pandas - python

I'm really new to programming and I've been trying to emulate the 'pandas.read_table' code from Python for Data Analysis book(the chapter on MovieLens 1M Data Set, pg.23ish). Below is the link to the file used for database and the images of jupyter notebook on which I've typed the codes. As you'll see there, I'm having a trouble with the data values not reading properly as it should, and I can't seem to figure out why. Your help will be much appreciated!
Trouble screen
Database file

If you are reading data from a .csv file, use pd.read_csv.
If you want to use pd.read_table, you have to specify the delimiter as the comma with the argument sep=','. What is happening is that pd.read_table is trying to separate your input information at every ::, but it looks like your data is separated by commas instead.
More information here:
http://pandas.pydata.org/pandas-docs/stable/io.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html

Related

How to use Python to automate the movement of data between two Excel workbooks with specific parameters

Thanks for taking the time to read my question.
I am working on a personal project to learn python scripting for excel, and I want to learn how to move data from one workbook to another.
In this example, I am emulating a company employee ledger that has name, position, address, and more (The organizations is by row so every employee takes up one row). But the project is to have a selected number of people be transferred to a new ledger (another excel file). So I have a list of emails in a .txt file (it could even be another excel file but I thought .txt would be easier), and I would want the script to run through the .txt file, get the emails, and look for any rows that have a matching email address(all emails are in cell 'B'). And if any are found, then copy that entire row to the new excel file.
I tried a lot of ways to make this work, but I could not figure it out. I am really new to python so I am not even sure if this is possible. Would really appreciate some help!
You have essentially two packages that will allow manipulation of Excel files. For reading in data and performing analysis the standard package for use is pandas. You can save the files as .xlsx however you are only really working with base table data and not the file itself (IE, you are extracing data FROM the file, not working WITH the file)
However what you need is really to perform manipulation on Excel files directly which is better done with openpyxl
You can also read files (such as your text file) using with open function that is native to Python and is not a third party import like pandas or openpyxl.
Part of learning to program includes learning how to use documentation.
As such, here is the documentation you require with sufficient examples to learn openpyxl: https://openpyxl.readthedocs.io/en/stable/
And you can learn about pandas here: https://pandas.pydata.org/docs/user_guide/index.html
And you can learn about python with open here: https://docs.python.org/3/tutorial/inputoutput.html
Hope this helps.
EDIT: It's possible I or another person can give you a specific example using your data / code etc, but you would have to provide it fully. Since you're learning, I suggest using the documentation or youtube.

How to Convert a CSV file into a PDF

I'm currently making a program in python that creates data and then gets stored into a text file. The data is in a column like formation and when i change the file format to csv, it opens LibreOffice Calc (raspberry pi's version of excel) which is exactly how i wanted the data to be formatted.
But i want to take it one step further and convert my CSV file data into a PDF. I've looked on the web and it says how to convert a pdf into a csv which isn't what i want. I also saw something called pyPDF but im not sure about if that would be of any use.
This is the string of data that is being looped 10 times,
resultStr = 'Test,{},InNum,{},stats,{},Duration(ms),{} \n'.format("OFF",inPin, result, round(duration*1000))
Once the loop finishes, a text file gets opened and the 'resultStr' is the string is getting stored.
Thanks everyone for your help,
~Neamus
Using ReportLab, you can programatically generate PDF documents with your data. There are plenty of examples available to demonstrate the framework and how to use it. In your case, you should simply append to your document story in a loop for each of your CSV result strings.

Any way to save format when importing an excel file in Python?

I'm doing some work on the data in an excel sheet using python pandas. When I write and save the data it seems that pandas only saves and cares about the raw data on the import. Meaning a lot of stuff I really want to keep such as cell colouring, font size, borders, etc get lost. Does anyone know of a way to make pandas save such things?
From what I've read so far it doesn't appear to be possible. The best solution I've found so far is to use the xlsxwriter to format the file in my code before exporting. This seems like a very tedious task that will involve a lot of testing to figure out how to achieve the various formats and aesthetic changes I need. I haven't found anything but would said writer happen to in any way be able to save the sheet format upon import?
Alternatively, what would you suggest I do to solve the problem that I have described?
Separate data from formatting. Have a sheet that contains only the data – that's the one you will be reading/writing to – and another that has formatting and reads the data from the first sheet.

How can I adapt my code to make it compatible to Microsoft Excel?

Problem
I was trying to implement an web API(based on Flask), which would be used to query the database given some specific conditions, reconstruct the data and finally export the result to a .csv file.
Since the amount of data is really really huge, I can not construct the whole dataset and generate the .csv file all at once(e.g. create a DataFrame using pandas and finally call df.to_csv()), because that would cause a slow query and maybe the http connection would end up timeout.
So I create a generator which query the database 500 records per time and yield the result one by one, like:
def __generator(q):
[...] # some code here
while True:
if records == None:
break
records = q[offset:offset+limit] # q means a sqlalchemy query object
[...] # omit some reconstruct code
for record in records:
yield record
and finally construct a Response object, and send .csv to client side:
return Response(__generate(q), mimetype='text/csv') # Flask
The generator works well and all data are encoded by 'uft-8', but when I try to open the .csv file using Microsoft Excel, it appears to be messy code.
Measures Already Tried
add a BOM header to the export file, doesn't work;
using some other encode like 'gb18030', and 'cp936', most of the messy code disappear, some still remained, and some part of the table structure become weird.
My Question Is
How can I make my code compatible to Microsoft Excel? That means at least two conditions should be satisfied:
no messy code, well displayed;
well structured table;
I would be really appreciated for your answer!
How are you importing the csv file to excel? Have you tried importing the csv as a text file?
By reading as text format for each column, it wont modify columns that it reads as different types like dates. Your code may be correct, and excel may just be modifying the data when it parses it as a csv - by importing as text format, it wont modify anything.
I would recommend you look into xlutils. It's been around for quite some time, and our company has used it both for reading configuration files to run automated test and for generating reports of test results.

Any way to create an excel sheet with data from a url with python

Sorry if the title is confusing. Basically what I am trying to do is create an excel sheet with data that is in a url that I have.
The url is a search API for twitter that retrieves the past 100 tweets with a given keyword of my choice. I am trying to to create an excel sheet that stores each tweet in it's own row. Essentially it will only be 1 column but will be 100 rows.
I have looked online but haven't really seen a way to do exactly what I need so if anyone knows a tutorial i should look at or could show me how to get started that would be great.
Thanks!
There will probably not be a tutorial on exactly how to do this. you need to put a couple different concepts together
Get the data from the url. This can be as simple as urllib.urlopen
Turn that data (string) into a usable format. Twitter will probably return json. Turn that into a python dict
Open a file for writing
Loop through twitter data and write to ouput file
You only need to create a .csv file. It will work great with excel. For one column file you just need to write the header then write each line of data. Python provides everything you need to create well formed csv files in the csv module in the standard library

Categories

Resources