I used this method xls to csv converter to convert excel files(xls and xlsx) to csv files.
But in this example, it uses csv.write.writerow() method, and I cannot find any method related to write cell by cell from csv writer object.
So how can I write to csv cell by cell?
There isn't any built-in support for writing cell by cell (or column by column). Why do you want to do this anyway? If you are trying to convert an Excel file to CSV, it is simple enough to do it row by row. If for whatever reason you really just want to write a few specific cells, in a nonsequential manner, you have to manage those writes yourself, perhaps in a 2-dimensional array (a list of lists would work fine) and then write that data structure out row by row to the CSV.
Related
I was wondering why I get funny behaviour using a csv file that has been "changed" in excel.
I have a csv file of around 211,029 rows and pass this csv file into pandas using a Jupyter-notebook
The simplest example I can give of a change is simply clicking on the filter icon in excel saving the file, unclicking the filter icon and saving again (making no physical changes in the data).
When I pass my csv file through pandas, after a few filter operations, some rows go missing.
This is in comparison to that of doing absolutely nothing with the csv file. Leaving the csv file completely alone gives me the correct number of rows I need after filtering compared to "making changes" to the csv file.
Why is this? Is it because of the number of rows in a csv file? Are we supposed to leave csv files untouched if we are planning to filter through pandas anyways?
(As a side note I'm using Excel on a MacBook.)
Excel does not leave any file "untouched". It applies formatting to every file it opens (e.g. float values like "5.06" will be interpreted as date and changed to "05 Jun"). Depending on the expected datatype these rows might be displayed wrongly or missing in your notebook.
Better use sed or awk to manipulate csv files (or a text editor for smaller files).
I'm running a python script to automate some of my day-to-day tasks at work. One task I'm trying to do is simply add a row to an existing ods sheet that I usually open via LibreOffice.
This file has multiple sheets and depending on what my script is doing, it will add data to different sheets.
The thing is, I'm having trouble finding a simple and easy way to just add some data to the first unpopulated row of the sheet.
Reading about odslib3, pyexcel and other packages, it seems that to write a row, I need to specifically tell the row number and column to write data, and opening the ods file just to see what cell to write and tell the pythom script seems unproductive
Is there a way to easily add a row of data to an ods sheet without informing row number and column ?
If I understand the question I believe that using a .remove() and a .append() will do the trick. It will create and populate data on the last row (can't say its the most efficient though).
EX if:
from pyexcel_ods3 import save_data
from pyexcel_ods3 import get_data
data = get_data("info.ods")
print(data["Sheet1"])
[['first_row','first_row'],[]]
if([] in data["Sheet1"]):
data["Sheet1"].remove([])#remove unpopulated row
data["Sheet1"].append(["second_row","second_row"])#add new row
print(data["Sheet1"])
[['first_row','first_row'],['second_row','second_row']]
Excel Data
This is the data I've in an excel file. There are 10 sheets containing different data and I want to sort data present in each sheet by the 'BA_Rank' column in descending order.
After sorting the data, I've to write the sorted data in an excel file.
(for eg. the data which was present in sheet1 of the unsorted sheet should be written in sheet1 of the sorted list and so on...)
If I remove the heading from the first row, I can use the pandas (sort_values()) function to sort the data present in the first sheet and save it in another list.
like this
import pandas as pd
import xlrd
doc = xlrd.open_workbook('without_sort.xlsx')
xl = pd.read_excel('without_sort.xlsx')
length = doc.nsheets
#print(length)
#for i in range(0,length):
#sheet = xl.parse(i)
result = xl.sort_values('BA_Rank', ascending = False)
result.to_excel('SortedData.xlsx')
print(result)
So is there any way I can sort the data without removing the header file from the first row?
and how can I iterate between sheets so as to sort the data present in multiple sheets?
(Note: All the sheets contain the same columns and I need to sort every sheet using 'BA_Rank' in descending order.)
First input, you don't need to call xlrd when using pandas, it's done under the hood.
Secondly, the read_excel method its REALLY smart. You can (imo should) define the sheet you're pulling data from. You can also set up lines to skip, inform where the header line is or to ignore it (and then set column names manually). Check the docs, it's quite extensive.
If this "10 sheets" it's merely anecdotal, you could use something like xlrd to extract the workbook's sheet quantity and work by index (or extract names directly).
The sorting looks right to me.
Finally, if you wanna save it all in the same workbook, I would use openpyxl or some similar library (there are many others, like pyexcelerate for large files).
This procedure pretty much always looks like:
Create/Open destination file (often it's the same method)
Write down data, sheet by sheet
Close/Save file
If the data is to be writen all on the same sheet, pd.concat([all_dataframes]).to_excel("path_to_store") should get it done
So here is my situation. Using Python I want to copy specific columns from excel spreadsheet into specific columns into a csv worksheet.
The pre-filled column header names are named differently in each spreadsheet and I need to use a sublist as a parameter.
For example, in the first sublist, data column in excel needs to be copied from/to:
spreadsheet csv
"scan_date" => "date_of_scan"
Two sublists as parameters: one of names copied from excel, one of names of where to paste into csv.
Not sure if a dictionary sublist would be better than two individual sublists?
Also, the csv column header names are in row B (not row A like excel) which has complicated things such as data frames.
So, ideally I would like to have sublists converted to arrays,
spreadsheet iterates columns to find "scan_date"
copies data
iterates to find "date_of_scan" in csv
paste data
moves on to the second item in the sublists and repeats.
I've tried pandas and openpyxl and just can't seem to figure out the approach/syntax of how to do it.
Any help would be greatly appreciated.
Thank you.
Clarification edit:
The csv file has some preexisting data within. Also, I cannot change the headers into different columns. So, if "date_of_scan" is in column "RF" then it must stay in column "RF". I was able to copy, say, the 5 columns of data from excel into a temp spreadsheet and then concatenate into the csv but it always moved the pasted columns to the beginning of the csv document (columns A, B, C, D, E).
It is hard to know the answer without seeing you specific dataset, but it seems to me that a simpler approach might be to simply make your excel sheet a df, drop everything except the columns you want in the csv then write a csv with pandas. Here's some psuedo-code.
import pandas as pd
df=pd.read_excel('your_file_name.xlsx')
drop_cols=[,,,] #list of columns to get rid of
df.drop(drop_cols,axis='columns')
col_dict={'a':'x','b':'y','c':'z'} #however you want to map you new columns in this example abc are old columns and xyz are new ones
#this line will actually rename your columns with the dictionary
df=df.rename(columns=col_dict)
df.to_csv('new_file_name.csv') #write new file
and this will actually run in python, but I created the df from dummy data instead of an excel file.
#with dummy data
df=pd.DataFrame([0,1,2],index=['a','b','c']).T
col_dict={'a':'x','b':'y','c':'z'}
df=df.rename(columns=col_dict)
df.to_csv('new_file_name.csv') #write new file
I would like to upload a large number of binary values from a file (a .phys file) into Python and then export these values into Excel for graphing purposes. Excel only supports ~32,000 rows at a time, but I have up to 3mil values in some cases. I am able to upload the data set into Python using
f = open("c:\DR005289_F00001.PHYS", "rb")
How do I then export this file to Excel in a format which Excel can support? For example, how could I break up the data into columns? I don't care how many values are in each column, it can be an arbitrary break depending on what Excel can support.
This has served me well. Use xlwt to Put all the data into the file.
I would create a list of lists to break the data into columns. Write each list (pick a length, 10k?) to the excel file.