I would like to upload a large number of binary values from a file (a .phys file) into Python and then export these values into Excel for graphing purposes. Excel only supports ~32,000 rows at a time, but I have up to 3mil values in some cases. I am able to upload the data set into Python using
f = open("c:\DR005289_F00001.PHYS", "rb")
How do I then export this file to Excel in a format which Excel can support? For example, how could I break up the data into columns? I don't care how many values are in each column, it can be an arbitrary break depending on what Excel can support.
This has served me well. Use xlwt to Put all the data into the file.
I would create a list of lists to break the data into columns. Write each list (pick a length, 10k?) to the excel file.
Related
From Python i want to export to csv format a dataframe
The dataframe contains two columns like this
So when i write this :
df['NAME'] = df['NAME'].astype(str) # or .astype('string')
df.to_csv('output.csv',index=False,sep=';')
The excel output in csv format returns this :
and reads the value "MAY8218" as a date format "may-18" while i want it to be read as "MAY8218".
I've tried many ways but none of them is working. I don't want an alternative like putting quotation marks to the left and the right of the value.
Thanks.
If you want to export the dataframe to use it in excel just export it as xlsx. It works for me and maintains the value as string in the original format.
df.to_excel('output.xlsx',index=False)
The CSV format is a text format. The file contains no hint for the type of the field. The problem is that Excel has the worst possible support for CSV files: it assumes that CSV files always use its own conventions when you try to read one. In short, one Excel implementation can only read correctly what it has written...
That means that you cannot prevent Excel to interpret the csv data the way it wants, at least when you open a csv file. Fortunately you have other options:
import the csv file instead of opening it. This time you have options to configure the way the file should be processed.
use LibreOffice calc for processing CSV files. LibreOffice is a little behind Microsoft Office on most points except for csv file handling where it has an excellent support.
I was wondering why I get funny behaviour using a csv file that has been "changed" in excel.
I have a csv file of around 211,029 rows and pass this csv file into pandas using a Jupyter-notebook
The simplest example I can give of a change is simply clicking on the filter icon in excel saving the file, unclicking the filter icon and saving again (making no physical changes in the data).
When I pass my csv file through pandas, after a few filter operations, some rows go missing.
This is in comparison to that of doing absolutely nothing with the csv file. Leaving the csv file completely alone gives me the correct number of rows I need after filtering compared to "making changes" to the csv file.
Why is this? Is it because of the number of rows in a csv file? Are we supposed to leave csv files untouched if we are planning to filter through pandas anyways?
(As a side note I'm using Excel on a MacBook.)
Excel does not leave any file "untouched". It applies formatting to every file it opens (e.g. float values like "5.06" will be interpreted as date and changed to "05 Jun"). Depending on the expected datatype these rows might be displayed wrongly or missing in your notebook.
Better use sed or awk to manipulate csv files (or a text editor for smaller files).
So I have an excel sheet that contains in this order:
Sample_name | column data | column data2 | column data ... n
I also have a .txt file that contains
Sample_name
What I want to do is filter the excel file for only the sample names contained in the .txt file. My current idea is to go through each column (excel sheet) and see if it matches any name in the .txt file, if it does, then grab the whole column. However, this seems like a nonefficient way to do it. I also need to do this using python. I was hoping someone could give me an idea on how to approach this better. Thank you very much.
Excel PowerQuery should do the trick:
Load .txt file as a table (list)
Load sheet with the data columns as another table
Merge (e.g. Left join) first table with second table
Optional: adjust/select the columns to be included or excluded in the resulting table
In Python with Pandas’ data frames the same can be accomplished (joining 2 data frames)
P.S. Pandas supports loading CSV files and txt files (as a variant of CSV) into a data frame
Is there is way to create sheet 2 in same csv file by using python code
yes. There is :
df = pd.read_excel("C:\\DWDM\\Status.xlsx") # read ur original file
workbook = load_workbook(filename="C:\\DWDM\\Status.xlsx")
ws2 = workbook.create_sheet("Summary", 0) # other sheet with name Summary is added to the same.
and you can check the same with "workbook.sheetnames"
You can do this by using multiple CSV files - one CSV file per sheet.
A comma-separated value file is a plain text format. It is only going to be able to represent flat data, such as a table (or a "sheet")
When storing multiple sheets, you should use separate CSV files. You can write each one separately and import/parse them individually into their destination.
I used this method xls to csv converter to convert excel files(xls and xlsx) to csv files.
But in this example, it uses csv.write.writerow() method, and I cannot find any method related to write cell by cell from csv writer object.
So how can I write to csv cell by cell?
There isn't any built-in support for writing cell by cell (or column by column). Why do you want to do this anyway? If you are trying to convert an Excel file to CSV, it is simple enough to do it row by row. If for whatever reason you really just want to write a few specific cells, in a nonsequential manner, you have to manage those writes yourself, perhaps in a 2-dimensional array (a list of lists would work fine) and then write that data structure out row by row to the CSV.