split excel sheet for every nrows using python - python

I have an excel file with more than 1 million rows. Now i need to split that for every n rows and save it in a new file. am very new to python. Any help, is much appreciated and needed

As suggested by OhAuth you can save the Excel document to a csv file. That would be a good start to begin the processing of you data.
Processing your data you can use the Python csv library. That would not require any installation since it comes with Python automatically.
If you want something more "powerful" you might want to look into Pandas. However, that requires an installation of the module.
If you do not want to use the csv module of Python nor the pandas module because you do not want to read into the docs, you could also do something like.
f = open("myCSVfile", "r")
for row in f:
singleRow = row.split(",") #replace the "," with the delimiter you chose to seperate your columns
print singleRow
> [value1, value2, value3, ...] #it returns a list and list comprehension is well documented and easy to understand, thus, further processing wont be difficult
However, I strongly recommend looking into the moduls since they handle csv data better, more efficient and on 'the long shot' save you time and trouble.

Related

Adding a new row of data to an existing ods sheet

I'm running a python script to automate some of my day-to-day tasks at work. One task I'm trying to do is simply add a row to an existing ods sheet that I usually open via LibreOffice.
This file has multiple sheets and depending on what my script is doing, it will add data to different sheets.
The thing is, I'm having trouble finding a simple and easy way to just add some data to the first unpopulated row of the sheet.
Reading about odslib3, pyexcel and other packages, it seems that to write a row, I need to specifically tell the row number and column to write data, and opening the ods file just to see what cell to write and tell the pythom script seems unproductive
Is there a way to easily add a row of data to an ods sheet without informing row number and column ?
If I understand the question I believe that using a .remove() and a .append() will do the trick. It will create and populate data on the last row (can't say its the most efficient though).
EX if:
from pyexcel_ods3 import save_data
from pyexcel_ods3 import get_data
data = get_data("info.ods")
print(data["Sheet1"])
[['first_row','first_row'],[]]
if([] in data["Sheet1"]):
data["Sheet1"].remove([])#remove unpopulated row
data["Sheet1"].append(["second_row","second_row"])#add new row
print(data["Sheet1"])
[['first_row','first_row'],['second_row','second_row']]

Handle csv table in Python without pandas, csv or any other modules or libraries

Using Python, I need to read a csv table.
I need to assign that csv table to a variable so I can handle it later to filter, remove duplicates, etc.
I have restricted conditions like I can not use any module or library, just with native Python functions.
Looking on internet, I only find examples of pandas or csv modules for working with the csv table.
Is it possible to do some of these tasks in Python without using modules or libraries? (this is a school activity)
Obviously it depends on what you are looking for but it can be easily done with open, rstrip and split
csvtable=open('path\file_name.csv')
separator = ','
ntable=[]
for row in csvtable:
if (separator in row):
row=row.rstrip()
row=row.split(separator)
ntable += [row]

how to retrieve all lines with errors in pandas

For example, I can use
pd.read_csv('file.csv')
to load a csv file.
By default, it fails when there are any parsing errors. I understand that one can use error_bad_lines=False to skip the rows with errors.
But my question is:
How to get all the lines where errors occur? This way, I can potentially solve the problem for not only this particular file.csv but also other related files in a batch file1.csv, file2.csv, file3.csv ...
One easy way would be to prepend a row index number to each row. This can easily be done with Awk or Python before loading the data. You could even do it in-memory using StringIO or your own custom file-like object in Python which would "magically" prepend the row numbers.

csv module in python troubles

I've read countless threads on here but I'm still unable to figure out exactly how to do this. I'm using the CSV module in python to write data to a csv file. My difficulty is, I've stored the header files in a list (called header) and it contains a variable number of columns. I need to reference each column name so I can write it to my file, which would be easy, except for the fact that it might contain a variable # of columns and I can't figure out how to have a variable # of arrays that I can write to (of course I'm using zip(*header, list1,list2,list3,...) to write to the csv file, but how to generate the list(i) so that header[i] populates the ith list??? I'm sorry for the lack of code, I just can't figure out how to even begin ...

handling a huge file with python and pytables

simple problem, but maybe tricky answer:
The problem is how to handle a huge .txt file with pytables.
I have a big .txt file, with MILLIONS of lines, short lines, for example:
line 1 23458739
line 2 47395736
...........
...........
The content of this .txt must be saved into a pytable, ok, it's easy. Nothing else to do with the info in the txt file, just copy into pytables, now we have a pytable with, for example, 10 columns and millions of rows.
The problem comes up when, with the content in the txt file, 10 columns x millions lines are directly generated in the paytable BUT, depending on the data on each line of the .txt file, new colums must be created on the pytable. So how to handle this efficiently??
Solution 1: first copy all the text file, line by line into pytable (millions), and then iterate over each row on pytable (millions again) and, depending on the values, generate the new columns needed for the pytable.
Solution 2: read line by line the .txt file, do whatever needed, calculate the new needed values, and then send all the info to a pyrtable.
Solution 3:.....any other efficient and faster solution???
I think that basic problem here is one of the conceptual model. PyTables' Tables only handle regular (or structured) data. However, the data that you have is irregular or unstructured in that the structure is determined as you read the data. Said another way, PyTables needs the column description to be known completely by the time that create_table() is called. There is no way around this.
Since in your problem statement any line may add a new column you have no choice but to do this in two full passes through the data: (1) read through the data and determine the columns and (2) write the data to the table. In pseudocode:
import tables as tb
cols = {}
# discover columns
d = open('data.txt')
for line in d:
for col in line:
if col not in cols:
cols['colname'] = col
# write table
d.seek(0)
f = tb.open_file(...)
t = f.create_table(..., description=cols)
for line in d:
row = line_to_row(line)
t.append(row)
d.close()
f.close()
Obviously, if you knew the table structure ahead of time you could skip the first loop and this would be much faster.

Categories

Resources