writing a csv file by column in pandas throws error - python

I am reading and writing a csv file using pandas.
I am reading a csv file column by column and writing it to a seperate csv file by column by column reading works fine but while writing a csv file it thorws error
import pandas
f1 = open('artist_links','a')
data_df = pandas.read_csv('upc1.upcs_result.csv')
#data_wr = pandas.to_csv('test.csv')
df = data_df['one']
dd = data_df['two']
header = ["df", "dd"]
df.to_csv("test.csv",columns = header)
Output:
Traceback (most recent call last):
File "merge.py", line 9, in <module>
df.to_csv("test.csv",columns = header)
TypeError: to_csv() got an unexpected keyword argument 'columns'
But there is a column argument actully here pandas library
How could i make this program work(Writing column by column)

Changes in v0.16.0
http://pandas.pydata.org/pandas-docs/dev/whatsnew.html
cols as the keyword for the csv and Excel writers was replaced with columns.
Try cols instead or upgrade pandas.
Instead of:
df.to_csv("test.csv", columns=header)
Use:
df.to_csv("test.csv", cols=header)
Edit: Either way you should upgrade. Sincerely. If the error is a keyword argument and you are basing your method off of documentation for the most recent version on software written over 1.5 years ago, with substantial changes made since then, you should upgrade.
EDIT2: If you're desperate to make life difficult for yourself and continue using outdated functions and try to use new features, you could do workarounds. This is not recommended, since some stuff may be a lot more subtle and throw exceptions when you least expect it.
You could... do...
lst = []
for column in header:
s = df[column]
# export all to list and nested
lst.append(s.tolist())
# zip resulting lists to tuples
zipped = zip(*lst)
# export them as a CSV.
print '\n'.join([','.join(i) for i in zipped])
EDIT3: Much simpler, but you could also do:
df2 = df[header] # return copy
df2.to_csv()

Related

pandas OSError: [Errno 22] Invalid argument in read_excel [duplicate]

This question already has answers here:
pandas.read_excel parameter "sheet_name" not working
(6 answers)
Closed 1 year ago.
I try to read a excel file using pandas with the code below:
path = "QVI_transaction_data.xlsx"
I also tried using "./QVI_transaction_data.xlsx" rather than the one above, the name is just copy pasted from os.listdir() so there is no transcription problems
pd.read_excel(path, sheet_name = "in")
but it didn't worked, it outputs this error:
OSError: [Errno 22] Invalid argument
I also tried without the sheet_name argument, others posts say that there is a problem with the filename but I had worked with pandas before and I don't think there is something wrong with the name. anyone knows what is wrong about this?
this is how the file looks like:
One possible thing that could be done is to convert the excel file (.xlsx) file to .csv file which can be done through file and export it with csv file and then it could be loaded like: -
import pandas as pd
Data=pd.read_csv("File Name...")
print(Data)
Or if you want to load only the excel file directly this could be done: -
import pandas as pds
file =('path_of_excel_file')
newData = pds.read_excel(file)
newData
I have tried in the both possible ways in form of csv and as well as excel. Try something like this: -
import pandas as pds
file =('path_of_excel_file')
newData = pds.read_excel(file)
newData
As we can not have Data to reproduce, hence there might be different situation and different solutions.
I'm enlisting few situations which may lead you in a right direction...
Situation 1:
If you are using old python Version, then you should try simply below as its sheetname with older version and sheet_name with new version.
import pandas as pd
df = pd.read_excel(file_with_data, sheetname=sheet_with_data)
OR
You can use pd.ExcelFile instead ..
xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'in')
OR
xl = pd.ExcelFile(path)
# xl = pd.ExcelFile("Full_Path_of _file")
xl.sheet_names
[u'in', u'in1', u'in2']
df = xl.parse("in")
df.head()
OR
df = pd.read_excel(open('your_xls_xlsx_filename','rb'), sheet_name='Sheet 1')
# or using sheet index starting 0
df = pd.read_excel(open('your_xls_xlsx_filename','rb'), sheet_name=1)
Note: Opting sheetname argument needs be opted meticulously for python pandas version-wise>
For Older python Version: use sheetname
For New python version: use sheet_name
Situation 2:
Copy the address directly from the right-click file properties-security will cause this problem So, copying and pasting the file path also produce these issues while there is no other evident issues
solve
It has nothing to do with the backslash /forward slash in the path, and it has nothing to do with whether the path contains. There are two solutions
1 Enter the path manually
2 Open this path in Explorer, then copy
General reading convention:
# When the parameter is None, all tables are returned, which is a dictionary of tables;
sheet = pd.read_excel('example.xls',sheet_name= None)
# When the parameter is list = [0, 1, 2, 3], the returned multi-table is also a dictionary
sheet = pd.read_excel('example.xls',sheet_name= 0)
sheet = pd.read_excel('example.xls',sheet_name= [0,1])
#The data of the table can also be read according to the name of the table header or the position of the table
sheet = pd.read_excel('example.xls',sheet_name= 'Sheet0')
sheet = pd.read_excel('example.xls',sheet_name= ['Sheet0','Sheet1'])
sheet = pd.read_excel('example.xls',sheet_name=[0,1,'Sheet3'])

How can I fix this issue writing column headers in a pre-existing Excel sheet containing a data table?

I'm new to using pandas and writing in Excel files using code so please forgive me if there's an obvious answer to this. I have a single Excel sheet with a few columns of data, with empty cells where the column headers should be. I want to use this program to write in the headers, but I'm having an issue with saving and closing it. Here is the code:
import pandas as pd
from openpyxl import load_workbook
headers = []
# code not shown, but just prompts user for column headers and saves in list 'headers'
#open .xlsx file
book = load_workbook(r'path.xlsx')
writer = pd.ExcelWriter(r'path.xlsx', engine='xlsxwriter')
#write column headers from list into empty cells
writer.columns = headers[:]
#save and close
writer.save('path.xlsx')
writer.close()
book.close()
Here is the error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-84b69076ad08> in <module>
30 # workbook.write(cell, h)
31
---> 32 writer.save('path.xlsx')
33 writer.close()
34 book.close()
TypeError: save() takes 1 positional argument but 2 were given
Based on what I've read, it seems like this is actually a pretty generic error that doesn't indicate what the actual problem is. I tried a few variations of save(), including using the whole raw path as an argument and not including any arguments. Moreover, when I temporarily get rid of the save line just to see if the close lines work, they don't. So perhaps there is something more fundamentally wrong with my code than just these few lines?

Read Specific Columns from a pickle files

So I know in Pandas, you can specific what columns to pull from a csv file to generate a DataFrame
df = pd.read_csv('data.csv', usecols=['a','b','c'])
How do you do this with a pickled file?
df = pd.read_pickle('data.pkl', usecols=['a','b','c'])
gives TypeError: read_pickle() got an unexpected keyword argument 'usecols'
I cant find the correct argument in the documentation
Since pickle files contain complete python objects I doubt you can select columns whilst loading them or atleast it seems pandas doesnt support that directly. But you can first load it completely and then filter for your columns as such:
df = pd.read_pickle('data.pkl')
df = df.filter(['a', 'b', 'c'])
Documentation

Problems with creating a CSV file using Excel

I have some data in an Excel file. I would like to analyze them using Python. I started by creating a CSV file using this guide.
Thus I have created a CSV (Comma delimited) file filled with the following data:
I wrote a few lines of code in Python using Spyder:
import pandas
colnames = ['GDP', 'Unemployment', 'CPI', 'HousePricing']
data = pandas.read_csv('Dane_2.csv', names = colnames)
GDP = data.GDP.tolist()
print(GDP)
The output is nothing I've expected:
It can be easily seen that the output differs a lot from the figures in GDP column. I will appreciate any tips or hints which will help to deal with my problem.
Seems like in the GDP column there are decimal values from the first column in the .csv file and first digits of the second column. There's either something wrong with the .csv you created, but more probably you need to specify separator in the pandas.read_csv line. Also, add header=None, to make sure you don't lose the first line of the file (i.e. it will get replaced by colnames).
Try this:
import pandas
colnames = ['GDP', 'Unemployment', 'CPI', 'HousePricing']
data = pandas.read_csv('Dane_2.csv', names = colnames, header=None, sep=';')
GDP = data.GDP.tolist()
print(GDP)

Invoke pandas Series apply function with read_fwf

New to pandas here.
I have a pandas DataFrame called all_invoice with only one column called 'whole_line'.
Each row in all_invoice is a fixed width string. I need a new DataFrame from all_invoice using read_fwf.
I have a working solution that looks like this:
invoice = pd.DataFrame()
for i,r in all_invoice['whole_line'].iteritems():
temp_df = pd.read_fwf(StringIO(r), colspecs=in_specs,
names=in_cols, converters=in_convert)
invoice = invoice.append(temp_df, ignore_index = True)
in_specs, in_cols, and in_convert have been defined earlier in my script.
So this solution works but is very slow. For 18K rows with 85 columns, it takes about 6 minutes for this part of the code to execute. I'm hoping for a more elegant solution that doesn't involve iterating over the rows in the DataFrame or Series and that will use the apply function to call read_fwf to make this go faster. So I tried:
invoice = all_invoice['whole_line'].apply(pd.read_fwf, colspecs=in_specs,names=in_cols, converters=in_convert)
The tail end of my traceback looks like:
OSError: [Errno 36] File name too long:
Following that colon is the string that is passed to the read_fwf method. I suspect that this is happening because read_fwf needs a file path or buffer. In my working (but slow) code, I'm able to call StringIO() on the string to make it a buffer but I cannot do that with the apply function. Any help with getting the apply working or another way to make use of the read_fwf on the entire series/df at once to avoid iterating over the rows is appreciated. Thanks.
Have you tried just doing:
invoice = pd.read_fwf(filename, colspecs=in_specs,
names=in_cols, converters=in_convert)

Categories

Resources