Here the the code to process and save csv file, and raw input csv file and output csv file, using pandas on Python 2.7 and wondering why there is an additional column at the beginning when saving the file? Thanks.
c_a,c_b,c_c,c_d
hello,python,pandas,0.0
hi,java,pandas,1.0
ho,c++,numpy,0.0
sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
sample.to_csv('saved.csv')
Here is the saved file, there is an additional column at the beginning, whose values are 0, 1, 2.
cat saved.csv
,c_a,c_b,c_c,c_d
0,hello,python,pandas,0
1,hi,java,pandas,1
2,ho,c++,numpy,0
The additional column corresponds to the index of the dataframe and is aggregated once you read the CSV file. You can use this index to slice, select or sort your DF in an effective manner.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.html
http://pandas.pydata.org/pandas-docs/stable/indexing.html
If you want to avoid this index, you can set the index flag to False when you save your dataframe with the function pd.to_csv. Also, you are removing the header and aggregating it later, but you can use the header of the CSV to avoid this step.
sample = pd.read_csv('123.csv', dtype={0:str, 1:str, 2:str, 3:float})
sample.to_csv('output.csv', index= False)
Hope it helps :)
Related
I've extracted information from 142 different files, which is stored in CSV-file with one column, which contains both number and text. I want to copy row 11-145, transform it, and paste it into another file (xlsx or csv doesn't matter). Then, I want to skip the next 10 rows, and copy row 156-290, transform and paste it etc etc. I have tried the following code:
import numpy as np
overview = np.zeros((145, 135))
for i in original:
original[i+11:i+145, 1] = overview[1, i+1:i+135]
print(overview)
The original file is the imported file, for which I used pd.read_csv.
pd.read_csv is a function that returns a dataframe.
To select specific rows from a dataframe you can use this function :
df.loc[start:stop:step]
so it would look something like this :
df = pd.read_csv(your_file)
new_df = df.loc[11:140]
#transform it as you please
#convert it to excel or csv
new_df .to_excel("new_file.xlsx") or new_df .to_csv("new_file.csv")
I am trying to save a csv to a folder after making some edits to the file.
Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv.
I tried:
pd.read_csv('C:/Path to file to edit.csv', index_col = False)
And to save the file...
pd.to_csv('C:/Path to save edited file.csv', index_col = False)
However, I still got the unwanted index column. How can I avoid this when I save my files?
Use index=False.
df.to_csv('your.csv', index=False)
There are two ways to handle the situation where we do not want the index to be stored in csv file.
As others have stated you can use index=False while saving your
dataframe to csv file.
df.to_csv('file_name.csv',index=False)
Or you can save your dataframe as it is with an index, and while reading you just drop the column unnamed 0 containing your previous index.Simple!
df.to_csv(' file_name.csv ')
df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)
If you want no index, read file using:
import pandas as pd
df = pd.read_csv('file.csv', index_col=0)
save it using
df.to_csv('file.csv', index=False)
As others have stated, if you don't want to save the index column in the first place, you can use df.to_csv('processed.csv', index=False)
However, since the data you will usually use, have some sort of index themselves, let's say a 'timestamp' column, I would keep the index and load the data using it.
So, to save the indexed data, first set their index and then save the DataFrame:
df.set_index('timestamp')
df.to_csv('processed.csv')
Afterwards, you can either read the data with the index:
pd.read_csv('processed.csv', index_col='timestamp')
or read the data, and then set the index:
pd.read_csv('filename.csv')
pd.set_index('column_name')
Another solution if you want to keep this column as index.
pd.read_csv('filename.csv', index_col='Unnamed: 0')
If you want a good format next statement is the best:
dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)
In this case you have got a csv file with ',' as separate between columns and utf-8 format.
In addition, numerical index won't appear.
I am trying to save a csv to a folder after making some edits to the file.
Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv.
I tried:
pd.read_csv('C:/Path to file to edit.csv', index_col = False)
And to save the file...
pd.to_csv('C:/Path to save edited file.csv', index_col = False)
However, I still got the unwanted index column. How can I avoid this when I save my files?
Use index=False.
df.to_csv('your.csv', index=False)
There are two ways to handle the situation where we do not want the index to be stored in csv file.
As others have stated you can use index=False while saving your
dataframe to csv file.
df.to_csv('file_name.csv',index=False)
Or you can save your dataframe as it is with an index, and while reading you just drop the column unnamed 0 containing your previous index.Simple!
df.to_csv(' file_name.csv ')
df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)
If you want no index, read file using:
import pandas as pd
df = pd.read_csv('file.csv', index_col=0)
save it using
df.to_csv('file.csv', index=False)
As others have stated, if you don't want to save the index column in the first place, you can use df.to_csv('processed.csv', index=False)
However, since the data you will usually use, have some sort of index themselves, let's say a 'timestamp' column, I would keep the index and load the data using it.
So, to save the indexed data, first set their index and then save the DataFrame:
df.set_index('timestamp')
df.to_csv('processed.csv')
Afterwards, you can either read the data with the index:
pd.read_csv('processed.csv', index_col='timestamp')
or read the data, and then set the index:
pd.read_csv('filename.csv')
pd.set_index('column_name')
Another solution if you want to keep this column as index.
pd.read_csv('filename.csv', index_col='Unnamed: 0')
If you want a good format next statement is the best:
dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)
In this case you have got a csv file with ',' as separate between columns and utf-8 format.
In addition, numerical index won't appear.
Im trying to add a new column to a pandas dataframe. Also, I try to give a name to index to be printed out in Excel when I export the data
import pandas as pd
import csv
#read csv file
file='RALS-04.csv'
df=pd.read_csv(file)
#select the columns that I want
column1=df.iloc[:,0]
column2=df.iloc[:,2]
column3=df.iloc[:,3]
column1.index.name="items"
column2.index.name="march2012"
column3.index.name="march2011"
df=pd.concat([column1, column2, column3], axis=1)
#create a new column with 'RALS' as a defaut value
df['comps']='RALS'
#writing them back to a new CSV file
with open('test.csv','a') as f:
df.to_csv(f, index=False, header=True)
The output is the 'RALS' that I added to the dataframe goes to Row 2000 while the data stops at row 15. How to constrain the RALS so that it doesnt go beyond the length of the data being exported? I would also prefer a more elegant, automated way rather than specifying at which row should the default value stops at.
The second question is, the labels that I have assigned to the columns using columns.index.name, does not appear in the output. Instead it is replaced by a 0 and a 1. Please advise solutions.
Thanks so much for inputs
I am trying to save a csv to a folder after making some edits to the file.
Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv.
I tried:
pd.read_csv('C:/Path to file to edit.csv', index_col = False)
And to save the file...
pd.to_csv('C:/Path to save edited file.csv', index_col = False)
However, I still got the unwanted index column. How can I avoid this when I save my files?
Use index=False.
df.to_csv('your.csv', index=False)
There are two ways to handle the situation where we do not want the index to be stored in csv file.
As others have stated you can use index=False while saving your
dataframe to csv file.
df.to_csv('file_name.csv',index=False)
Or you can save your dataframe as it is with an index, and while reading you just drop the column unnamed 0 containing your previous index.Simple!
df.to_csv(' file_name.csv ')
df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)
If you want no index, read file using:
import pandas as pd
df = pd.read_csv('file.csv', index_col=0)
save it using
df.to_csv('file.csv', index=False)
As others have stated, if you don't want to save the index column in the first place, you can use df.to_csv('processed.csv', index=False)
However, since the data you will usually use, have some sort of index themselves, let's say a 'timestamp' column, I would keep the index and load the data using it.
So, to save the indexed data, first set their index and then save the DataFrame:
df.set_index('timestamp')
df.to_csv('processed.csv')
Afterwards, you can either read the data with the index:
pd.read_csv('processed.csv', index_col='timestamp')
or read the data, and then set the index:
pd.read_csv('filename.csv')
pd.set_index('column_name')
Another solution if you want to keep this column as index.
pd.read_csv('filename.csv', index_col='Unnamed: 0')
If you want a good format next statement is the best:
dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)
In this case you have got a csv file with ',' as separate between columns and utf-8 format.
In addition, numerical index won't appear.