Im trying to send a pandas dataframe into a csv file
import pandas as pd
import os
case_array = [['2017042724', '05/18/2017'], ['2017042723', '05/18/2017'], ['2017042722', '05/18/2017'], ['2017042721', '05/18/2017']]
filename = 'case_array.csv'
path = "C:\shares\folder"
fullpath = os.path.join(path, filename)
for case_row in case_array:
df = pd.DataFrame(case_row)
try:
with open(fullpath, 'w+') as f:
df.to_csv(f, header=False)
print('Success')
except:
print('Unable to Write CSV')
try:
df = pd.read_csv(fullpath)
print(df)
except:
print('Unable to Read CSV')
but its inserting each row as a column, inserting a header column (was set to False) and overwriting the previous insertion:
0 2017042721
1 05/18/2017
If I insert the entire array it will insert rows without the header row. (This is the correct result I want) The issue is the script I writing I need to insert each row at a time.
How do I get pandas dataframe to insert a row instead of a column?
edit1
like this:
0 1
2017042721 05/18/2017
2017042723 05/18/2017
You do not have to loop over the array to do it. You can make a dataframe out of the array and have it written to a csv using to_csv().
case_array = [['2017042724', '05/18/2017'], ['2017042723', '05/18/2017'], ['2017042722', '05/18/2017'], ['2017042721', '05/18/2017']]
df=pd.DataFrame(case_array)
df.to_csv(fullpath, header=False)
EDIT
If you must iterate over the array you below code:
for case_row in case_array:
df = pd.DataFrame(case_row).T
try:
with open(fullpath, 'a') as f:
df.to_csv(f, header=False, index=False)
print('Success')
except:
print('Unable to Write CSV')
Related
In my python script, I'm trying to read into csv files and if it has a column "PROD_NAME", it finds a value within that column and replaces it with another value. Currently, whenever I run the script, everything is going through the "try" clause and acts like it is working but when I look into the file itself, the values remain unchanged.. Nothing is hitting the "except" clause and the Command prompt prints replace for each file it supposedly changed.. any help would be appreciated. Thanks!
def worker():
filenames = glob.glob(dest_dir + '\\*.csv')
for filename in filenames:# this is loop over files***************************
my_file = Path(os.path.join(dest_dir, filename))
try:
with open(filename) as f:
# read data
df1 = pd.read_csv(filename, skiprows=1, encoding='ISO-8859-1') # read column header only - to get the list of columns
dtypes = {}
#print(filename, df1)
for col in df1.columns:# make all columns text, to avoid formatting errors
dtypes[col] = 'str'
df1 = pd.read_csv(filename, dtype=dtypes, skiprows=1, encoding='ISO-8859-1')
if 'PROD_NAME' in df1.columns:
df1 = df1.replace("NA_NRF", "FA_GUAR")
print("Replaced" + filename)
except:
if 'PROD_NAME' in df1.columns:
print(filename)
worker()
Original DF:
!4 PROD_NAME ENTRY_YEAR
* NA_NRF 2014
The NA_NRF is supposed to change to FA_GUAR
This should do the job:
with open(filename) as f:
df_before = pd.read_csv(f, sep=';')
for i in df_before.columns.values:
if i == "PROD_NAME":
df_after = df_before.replace("NA_NRF", "FA_GUAR")
df_after.to_csv(filename, index=False, sep=';')
else:
print("nothing to change")
When I added sep=';' it stopped giving me headaches about quotes...
I want to know if it is possible to use the pandas to_csv() function to add a dataframe to an existing csv file. The csv file has the same structure as the loaded data.
You can specify a python write mode in the pandas to_csv function. For append it is 'a'.
In your case:
df.to_csv('my_csv.csv', mode='a', header=False)
The default mode is 'w'.
If the file initially might be missing, you can make sure the header is printed at the first write using this variation:
output_path='my_csv.csv'
df.to_csv(output_path, mode='a', header=not os.path.exists(output_path))
You can append to a csv by opening the file in append mode:
with open('my_csv.csv', 'a') as f:
df.to_csv(f, header=False)
If this was your csv, foo.csv:
,A,B,C
0,1,2,3
1,4,5,6
If you read that and then append, for example, df + 6:
In [1]: df = pd.read_csv('foo.csv', index_col=0)
In [2]: df
Out[2]:
A B C
0 1 2 3
1 4 5 6
In [3]: df + 6
Out[3]:
A B C
0 7 8 9
1 10 11 12
In [4]: with open('foo.csv', 'a') as f:
(df + 6).to_csv(f, header=False)
foo.csv becomes:
,A,B,C
0,1,2,3
1,4,5,6
0,7,8,9
1,10,11,12
with open(filename, 'a') as f:
df.to_csv(f, header=f.tell()==0)
Create file unless exists, otherwise append
Add header if file is being created, otherwise skip it
A little helper function I use with some header checking safeguards to handle it all:
def appendDFToCSV_void(df, csvFilePath, sep=","):
import os
if not os.path.isfile(csvFilePath):
df.to_csv(csvFilePath, mode='a', index=False, sep=sep)
elif len(df.columns) != len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns):
raise Exception("Columns do not match!! Dataframe has " + str(len(df.columns)) + " columns. CSV file has " + str(len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns)) + " columns.")
elif not (df.columns == pd.read_csv(csvFilePath, nrows=1, sep=sep).columns).all():
raise Exception("Columns and column order of dataframe and csv file do not match!!")
else:
df.to_csv(csvFilePath, mode='a', index=False, sep=sep, header=False)
Initially starting with a pyspark dataframes - I got type conversion errors (when converting to pandas df's and then appending to csv) given the schema/column types in my pyspark dataframes
Solved the problem by forcing all columns in each df to be of type string and then appending this to csv as follows:
with open('testAppend.csv', 'a') as f:
df2.toPandas().astype(str).to_csv(f, header=False)
This is how I did it in 2021
Let us say I have a csv sales.csv which has the following data in it:
sales.csv:
Order Name,Price,Qty
oil,200,2
butter,180,10
and to add more rows I can load them in a data frame and append it to the csv like this:
import pandas
data = [
['matchstick', '60', '11'],
['cookies', '10', '120']
]
dataframe = pandas.DataFrame(data)
dataframe.to_csv("sales.csv", index=False, mode='a', header=False)
and the output will be:
Order Name,Price,Qty
oil,200,2
butter,180,10
matchstick,60,11
cookies,10,120
A bit late to the party but you can also use a context manager, if you're opening and closing your file multiple times, or logging data, statistics, etc.
from contextlib import contextmanager
import pandas as pd
#contextmanager
def open_file(path, mode):
file_to=open(path,mode)
yield file_to
file_to.close()
##later
saved_df=pd.DataFrame(data)
with open_file('yourcsv.csv','r') as infile:
saved_df.to_csv('yourcsv.csv',mode='a',header=False)`
I've been trying to make a CSV from a big list of another CSVs and here's the deal: I want to get the names of these CSV files and put them in the CSV that I want to create, plus, I also need the row count from the CSV files that I'm getting the names of, here's what I've tried so far:
def getRegisters(file):
results = pd.read_csv(file, header = None, error_bad_lines= False, sep = '\t', low_memory = False)
print(len(results))
return len(results)
path = "C:/Users/gdldieca/Desktop/TESTSFORPANW/New folder"
dirs = os.listdir(path)
with open("C:/Users/gdldieca/Desktop/TESTSFORPANW/New folder/FilesNames.csv", 'w', newline='') as f:
writer = csv.writer(f, delimiter = '\t')
writer.writerow(("File", "Rows"))
for names in dirs:
sfile = getRegisters("C:/Users/gdldieca/Desktop/TESTSFORPANW/New folder/" + str(names))
writer.writerow((names, sfile))
However I can't seem to get the files row count even tho Pandas actually returns it. I'm getting this error:
_csv.Error: iterable expected, not int
The final result would be something like this written into the CSV
File1 90
File2 10
If you are using pandas , I think you can use also for make a csv file with all values that you need..Here an alternative
import os
import pandas as pd
directory='D:\\MY\\PATH\\ALLCSVFILE\\'
#create a list for add all
rows_list = []
for filename in os.listdir(directory):
if filename.endswith(".csv"):
file=os.path.join(directory, filename)
df=pd.read_csv(file)
#Count rows
rowcount=len(df.index)
new_row = {'namefile':filename, 'count':rowcount}
rows_list.append(new_row)
#pass list to dataframe
df1 = pd.DataFrame(rows_list)
print(df1)
df1.to_csv('test.csv', sep=',')
result :
I'm trying to import a batch of CSV's into PostgreSQL and constantly run into an issue with missing data:
psycopg2.DataError: missing data for column "column_name" CONTEXT:
COPY table_name, line where ever in the CSV that data wasn't
recorded, and here are data values up to the missing column.
There is no way to get the complete set of data written to the row at times, and I have to deal with the files as is. I am trying to figure a way to remove the row if data wasn't recorded into any column. Here's what I have:
file_list = glob.glob(path)
for f in file_list:
filename = os.path.basename(f) #get the file name
arc_csv = arc_path + filename #path for revised copy of CSV
with open(f, 'r') as inp, open(arc_csv, 'wb') as out:
writer = csv.writer(out)
for line in csv.reader(inp):
if "" not in line: #if the row doesn't have any empty fields
writer.writerow(line)
cursor.execute("COPY table_name FROM %s WITH CSV HEADER DELIMITER ','",(arc_csv,))
You could use pandas to remove rows with missing values:
import glob, os, pandas
file_list = glob.glob(path)
for f in file_list:
filename = os.path.basename(f)
arc_csv = arc_path + filename
data = pandas.read_csv(f, index_col=0)
ind = data.apply(lambda x: not pandas.isnull(x.values).any(), axis=1)
# ^ provides an index of all rows with no missing data
data[ind].to_csv(arc_csv) # writes the revised data to csv
However, this could get slow if you're working with large datasets.
EDIT - added index_col=0 as an argument to pandas.read_csv() to prevent the added index column issue. This uses the first column in the csv as an existing index. Replace 0 with another column's number if you have reason not to use the first column as index.
Unfortunately, you cannot parameterize table or column names. Use string formatting, but make sure to validate/escape the value properly:
cursor.execute("COPY table_name FROM {column_name} WITH CSV HEADER DELIMITER ','".format(column_name=arc_csv))
I want to know if it is possible to use the pandas to_csv() function to add a dataframe to an existing csv file. The csv file has the same structure as the loaded data.
You can specify a python write mode in the pandas to_csv function. For append it is 'a'.
In your case:
df.to_csv('my_csv.csv', mode='a', header=False)
The default mode is 'w'.
If the file initially might be missing, you can make sure the header is printed at the first write using this variation:
output_path='my_csv.csv'
df.to_csv(output_path, mode='a', header=not os.path.exists(output_path))
You can append to a csv by opening the file in append mode:
with open('my_csv.csv', 'a') as f:
df.to_csv(f, header=False)
If this was your csv, foo.csv:
,A,B,C
0,1,2,3
1,4,5,6
If you read that and then append, for example, df + 6:
In [1]: df = pd.read_csv('foo.csv', index_col=0)
In [2]: df
Out[2]:
A B C
0 1 2 3
1 4 5 6
In [3]: df + 6
Out[3]:
A B C
0 7 8 9
1 10 11 12
In [4]: with open('foo.csv', 'a') as f:
(df + 6).to_csv(f, header=False)
foo.csv becomes:
,A,B,C
0,1,2,3
1,4,5,6
0,7,8,9
1,10,11,12
with open(filename, 'a') as f:
df.to_csv(f, header=f.tell()==0)
Create file unless exists, otherwise append
Add header if file is being created, otherwise skip it
A little helper function I use with some header checking safeguards to handle it all:
def appendDFToCSV_void(df, csvFilePath, sep=","):
import os
if not os.path.isfile(csvFilePath):
df.to_csv(csvFilePath, mode='a', index=False, sep=sep)
elif len(df.columns) != len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns):
raise Exception("Columns do not match!! Dataframe has " + str(len(df.columns)) + " columns. CSV file has " + str(len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns)) + " columns.")
elif not (df.columns == pd.read_csv(csvFilePath, nrows=1, sep=sep).columns).all():
raise Exception("Columns and column order of dataframe and csv file do not match!!")
else:
df.to_csv(csvFilePath, mode='a', index=False, sep=sep, header=False)
Initially starting with a pyspark dataframes - I got type conversion errors (when converting to pandas df's and then appending to csv) given the schema/column types in my pyspark dataframes
Solved the problem by forcing all columns in each df to be of type string and then appending this to csv as follows:
with open('testAppend.csv', 'a') as f:
df2.toPandas().astype(str).to_csv(f, header=False)
This is how I did it in 2021
Let us say I have a csv sales.csv which has the following data in it:
sales.csv:
Order Name,Price,Qty
oil,200,2
butter,180,10
and to add more rows I can load them in a data frame and append it to the csv like this:
import pandas
data = [
['matchstick', '60', '11'],
['cookies', '10', '120']
]
dataframe = pandas.DataFrame(data)
dataframe.to_csv("sales.csv", index=False, mode='a', header=False)
and the output will be:
Order Name,Price,Qty
oil,200,2
butter,180,10
matchstick,60,11
cookies,10,120
A bit late to the party but you can also use a context manager, if you're opening and closing your file multiple times, or logging data, statistics, etc.
from contextlib import contextmanager
import pandas as pd
#contextmanager
def open_file(path, mode):
file_to=open(path,mode)
yield file_to
file_to.close()
##later
saved_df=pd.DataFrame(data)
with open_file('yourcsv.csv','r') as infile:
saved_df.to_csv('yourcsv.csv',mode='a',header=False)`