csv module in python troubles - python

I've read countless threads on here but I'm still unable to figure out exactly how to do this. I'm using the CSV module in python to write data to a csv file. My difficulty is, I've stored the header files in a list (called header) and it contains a variable number of columns. I need to reference each column name so I can write it to my file, which would be easy, except for the fact that it might contain a variable # of columns and I can't figure out how to have a variable # of arrays that I can write to (of course I'm using zip(*header, list1,list2,list3,...) to write to the csv file, but how to generate the list(i) so that header[i] populates the ith list??? I'm sorry for the lack of code, I just can't figure out how to even begin ...

Related

Adding a new row of data to an existing ods sheet

I'm running a python script to automate some of my day-to-day tasks at work. One task I'm trying to do is simply add a row to an existing ods sheet that I usually open via LibreOffice.
This file has multiple sheets and depending on what my script is doing, it will add data to different sheets.
The thing is, I'm having trouble finding a simple and easy way to just add some data to the first unpopulated row of the sheet.
Reading about odslib3, pyexcel and other packages, it seems that to write a row, I need to specifically tell the row number and column to write data, and opening the ods file just to see what cell to write and tell the pythom script seems unproductive
Is there a way to easily add a row of data to an ods sheet without informing row number and column ?
If I understand the question I believe that using a .remove() and a .append() will do the trick. It will create and populate data on the last row (can't say its the most efficient though).
EX if:
from pyexcel_ods3 import save_data
from pyexcel_ods3 import get_data
data = get_data("info.ods")
print(data["Sheet1"])
[['first_row','first_row'],[]]
if([] in data["Sheet1"]):
data["Sheet1"].remove([])#remove unpopulated row
data["Sheet1"].append(["second_row","second_row"])#add new row
print(data["Sheet1"])
[['first_row','first_row'],['second_row','second_row']]

Output to CSV changing datatype

So I have a csv file with a column called reference_id. The values in reference id are 15 characters long, so something like '162473985649957'. When I open the CSV file, excel has changed the datatype to General and the numbers are something like '1.62474E+14'. To fix this in excel, I change the column type to Number and remove the decimals and it displays the correct value. I should add, it only does this in CSV file, if I output to xlsx, it works fine. PRoblem is, the file has to be csv.
Is there a way to fix this using python? I'm trying to automate a process. I have tried using the following to convert it to a string. It works in the sense that is converts the column to a string, but it still shows up incorrectly in the csv file.
df['reference_id'] = df['reference_id'].astype(str)
df.to_csv(r'Prev Day Branch Transaction Mems.csv')
Thanks
When I open the CSV file, excel has changed the data
This is an Excel problem. You can't fix how Excel decides to interpret your CSV. (You can work around some issues by using the text import format, but that's cumbersome.)
Either use XLS/XLSX files when working with Excel, or use eg. Gnumeric our something other that doesn't wantonly mangle your data.

Pandas txt to csv output only displays the first two lines of values, how do I get the full data to show?

My issue is as follows.
I've gathered some contact data from SurveyMonkey using the SM API, and I've converted that data into a txt file. When opening the txt file, I see the full data from the survey that I'm trying to convert into csv, however when I use the following code:
df = pd.read_csv("my_file.txt",sep =",", encoding = "iso-8859-10")
df.to_csv('my_file.csv')
It creates a csv file with only two lines of values (and cuts off in the middle of the second line). Similarly if I try to organize the data within a pandas dataframe, it only registers the first two lines, meaning most of my txt file is not being read registered.
As I've never run into this problem before and I've been able to convert into CSV without issues, I'm wondering if anyone here has ideas as to what might be causing this issue to occur and how I could go about solving it?
All help is much appreciated.
Edit:
I was able to get the data to display properly in csv, when I converted it directly into csv from json instead of converting it to a txt file first. I was not however able to figure out what when wrong in the conversion from txt to csv, as I tried multiple different encodings but came to the same result.

Need help using ascii to write .dat files

Currently, I'm trying to use astropy.io.ascii in python anaconda to write a .dat file that includes data I've already read in (using ascii) from a different .dat file. I defined a specific table in the pre-existing file to be Data, the problem with Data is that I need to multiply the first of the columns by a factor of 101325 to change it's units, and I need fourth of the four columns to disappear entirely. So I defined the first column as Pressure_pa and I converted its units, then I defined the other two columns to be Altitude_km and Temperature_K. Is there any way I can use ascii's write function to tell it to write a .dat file containing the three columns I defined? And how would I go about it? Below is the code that has brought me up to the point of having defined these three columns of data:
from astropy.io import ascii
Data=ascii.read('output_couple_121_100.dat',guess=False,header_start=384,data_start=385,data_end=485,delimiter=' ')
Pressure_pa=Data['P(atm)'][:}*101325
Altitude_km=Data['Alt(km)'][:]
Temperature_K=Data['T'][:]
Now I thought that I might be able to use ascii.write(), to write a .dat file with Pressure_pa, Altitude_km and Temperature_K into the same file, is there any way to do this?
So I think I figured it out! I'll create a more generic version to fit others
from astropy.io import ascii
Data=ascii.read('filename.dat',guess=False,header_start=1,data_start=2,data_end=10,delimiter=' ')
#above: defining Data as a certain section of a .dat file beginning at line 2 through 10 with headers in line 1
ascii.write(Data,'new_desired_file_name.dat',names=['col1','col2','col3','col4'],exclude_names=['col3'],delimiter=' ')
#above: telling ascii to take Data and creat a .dat file with it, when defining the names, define a name for every column in Data and then use the exclude_names command to tell it not to include those specific columns

split excel sheet for every nrows using python

I have an excel file with more than 1 million rows. Now i need to split that for every n rows and save it in a new file. am very new to python. Any help, is much appreciated and needed
As suggested by OhAuth you can save the Excel document to a csv file. That would be a good start to begin the processing of you data.
Processing your data you can use the Python csv library. That would not require any installation since it comes with Python automatically.
If you want something more "powerful" you might want to look into Pandas. However, that requires an installation of the module.
If you do not want to use the csv module of Python nor the pandas module because you do not want to read into the docs, you could also do something like.
f = open("myCSVfile", "r")
for row in f:
singleRow = row.split(",") #replace the "," with the delimiter you chose to seperate your columns
print singleRow
> [value1, value2, value3, ...] #it returns a list and list comprehension is well documented and easy to understand, thus, further processing wont be difficult
However, I strongly recommend looking into the moduls since they handle csv data better, more efficient and on 'the long shot' save you time and trouble.

Categories

Resources