saving csv files to new directory - python

I am trying to use this code to write my edited csv files to a new directory. Does anyone know how I specify the directory?
I have tried this but it doesn't seem to be working.
dir = r'C:/Users/PycharmProjects/pythonProject1' # raw string for windows.
csv_files = [f for f in Path(dir).glob('*.csv')] # finds all csvs in your folder.
cols = ['Temperature']
for csv in csv_files: #iterate list
df = pd.read_csv(csv) #read csv
df[cols].to_csv('C:/Users/Desktop', csv.name, index=False)
print(f'{csv.name} saved.')

I think your only problem is the way you're calling to_csv(), passing a directory and a filename. I tried that and got this error:
IsADirectoryError: [Errno 21] Is a directory: '/Users/zyoung/Desktop/processed'
because to_csv() is expecting a path to a file, not a directory path and a file name.
You need to join the output directory and CSV's file name, and pass that, like:
out_dir = PurePath(base_dir, r"processed")
# ...
# ...
csv_out = PurePath(out_dir, csv_in)
df[cols].to_csv(csv_out, index=False)
I'm writing to the subdirectory processed, in my current dir ("."), and using the PurePath() function to do smart joins of the path components.
Here's the complete program I wrote for myself to test this:
import os
from pathlib import Path, PurePath
import pandas as pd
base_dir = r"."
out_dir = PurePath(base_dir, r"processed")
csv_files = [x for x in Path(base_dir).glob("*.csv")]
if not os.path.exists(out_dir):
os.mkdir(out_dir)
cols = ["Temperature"]
for csv_in in csv_files:
df = pd.read_csv(csv_in)
csv_out = PurePath(out_dir, csv_in)
df[cols].to_csv(csv_out, index=False)
print(f"Saved {csv_out.name}")

Related

Concatenating CSVs after reading in python

I am new to python and am trying to read all the files in a directory that end with the extension ".txt" and spit them back out into a CSV with headers. So far I have been able to successfully do that except it is iterating through my list twice instead of just once and I can't seem to figure out where it is reading it for the second time.
I am using this code:
import pandas as pd
import os, shutil
# Get current directory path and list of file names
path = os.getcwd()
file_names = os.listdir(path)
col_names = [A, B, C,...]
merge_list = []
#reads and concatenates files into single CSV
def read_and_concat_files(file_names):
read_file = pd.read_csv(file_name, delimiter='|', names=col_names)
merge_list.append(read_file)
merge_file = pd.concat(merge_list)
merge_file.to_csv('Combined_EDDs.csv', index=False)
# Use 'for loop' to iterate through each file_name in the list of file_names, calling function on each item in list
for root, dirs, files in os.walk(path, topdown=True):
for file_name in file_names:
if os.path.exists(file_name):
if file_name.endswith(".txt"): #reads only files with .txt extension
read_and_concat_files(file_name)
print(f'\nFormatting file {file_name}...\n')
And I am expecting a CSV with about 165 lines and instead I end up with 330. I suspected it was in the loop somewhere but everything I have tried hasn't helped. TIA

How to recursively generate and save csv files to a specific directory?

I am working on a code that gets as input a zip file that contains excel files, extract them in a folder, convert them in dataframes and load all these dataframes files in a list. I would like to create a new folder, convert those dataframes in csv files and save them in the above-mentioned folder. The goal is to be able to download as a zip file a folder of csv files.
The main problem for me is to make sure that every csv file has the name of the excel file it was originated from.
I'm adding my code, in the first block there's the first part of the code, while in the second one there's the part of the code in which i have a problem.
running this last part of the code i get this error:
"XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xef\xbb\xbf;data'"
%%capture
import os
import numpy as np
import pandas as pd
import glob
import os.path
!pip install xlrd==1.2.0
from google.colab import files
uploaded = files.upload()
%%capture
zipname = list(uploaded.keys())[0]
destination_path = 'files'
infolder = os.path.join('/content/', destination_path)
!unzip -o $zipname -d $destination_path
# Load an excel file, return events dataframe + file header dataframe
def load_xlsx(fullpath):
return events, meta
tasks = [os.path.join(dp, fname) for dp, dn, filenames in os.walk(infolder) for fname in filenames if fname.lower().endswith('.xls')]
dfs = []
metas = []
for fname in tasks:
df, meta = load_xlsx(fname)
dfs.append(df)
metas.append(meta)
newpath = 'csv2021'
if not os.path.exists(newpath):
os.makedirs(newpath)
filepath = os.path.join('/content/files/', newpath)
for fname in tasks:
filename = load_xlsx(fname)
my_csv = filename.to_csv(os.path.join(filepath, filename), encoding="utf-8-sig" , sep = ';')

How to automatically name Excel files just generated from csv files with Python

I need to transform csv files into Excel files in an automatic way. I am failing in naming Excel files with the name of the corresponding csv file.
I saved csv files as 'Trials_1', 'Trials_2', Trilas_3' but with the code that I wrote Python gives me an error and asks me for csv file named 'Trials_4'. Then, if I rename csv file 'Trials_1' into 'Trials_4' the program works and generates an Excel file named 'Trials_1'.
How can I correct my code?
'''
import csv
import openpyxl as xl
import os, os.path
directory=r'C:\\Users\\PycharmProjects\\input\\'
folder=r'C:\\Users\\PycharmProjects\\output\\'
for csv_file in os.listdir(directory):
def csv_to_excel(csv_file, excel_file):
csv_data=[]
with open(os.path.join(directory, csv_file)) as file_obj:
reader=csv.reader(file_obj)
for row in reader:
csv_data.append(row)
workbook= xl.Workbook()
sheet=workbook.active
for row in csv_data:
sheet.append(row)
workbook.save(os.path.join(folder,excel_file))
if __name__=="__main__":
m = sum(1 for f in os.listdir(directory) if os.path.isfile(os.path.join(directory, f)))
new_name = "{}Trial_{}.csv".format(directory, m + 1)
k = sum(1 for file in os.listdir(folder) if os.path.isfile(os.path.join(folder, file)))
new_name_e = "{}Trial_{}.xlsx".format(folder, k + 1)
csv_to_excel(new_name,new_name_e)
'''
Thanks.
Hi Annachiara welcome to StackOverflow,
I would modify the "csv_to_excel" function by using only pandas.
Before that you should install 'xlsxwriter' with:
pip install XlsxWriter
Then the function would be like this:
def csv_to_excel(csv_file,excel_file,csv_sep=';'):
# read the csv file with pandas
df=pd.read_csv(csv_file,sep=csv_sep)
# create the excel file
writer=pd.ExcelWriter(excel_file, engine='xlsxwriter')
# copy the csv content (df) into the excel file
df.to_excel(writer,index=False)
# save the excel file
writer.save()
# print what you converted for reference
print(f'csv file {csv_file} saved as excel in {excel_file}')
Just only make sure that the csv is read correctly: I added just the separator parameter, but you might want to add all the other parameters (like parse dates etc.)
Then you can convert the list of csv files with a for loop (I used more steps to make it clearer)
dir_in=r'C:\\Users\\PycharmProjects\\input\\'
dir_out=r'C:\\Users\\PycharmProjects\\output\\'
csvs_to_convert=os.listdir(dir_in)
for csv_file_in in csvs_to_convert:
# remove extension from csv files
file_name_no_extension=os.path.splitext(csv_file_in)[0]
# add excel extension .xlsx
excel_name_out=file_name_no_extension+'.xlsx'
# write names with their directories
complete_excel_name_out=os.path.join(dir_out,excel_name_out)
complete_csv_name_in=os.path.join(dir_in,csv_file_in)
# convert csv file to excel file
csv_to_excel(complete_csv_name_in,complete_excel_name_out,csv_sep=';')
Each csv as seperate excel file
import glob
import pandas as pd
import os
csv_files = glob.glob('*.csv')
for filename in csv_files:
sheet_name = os.path.split(filename)[-1].replace('.csv', '.xlsx')
df = pd.read_csv(filename)
df.to_excel(sheet_name, index=False)
All csv in same excel in different sheet
import glob
import pandas as pd
import os
# Create excel file
writer = pd.ExcelWriter('all_csv.xlsx')
csv_files = glob.glob('*.csv')
for filename in csv_files:
sheet_name = os.path.split(filename)[-1].replace('.csv', '')
df = pd.read_csv(filename)
# Append each csv as sheet
df.to_excel(writer, sheet_name=sheet_name, index=False)
writer.save()
Assuming you would like to keep the same structure of your code, I just fixed some technical issues in your code to make it work (please change the folders path to your own):
import csv
import openpyxl as xl
import glob, os, os.path
directory= 'input'
folder= '../output' # Since 'input' would be my cwd, need to step back a directory to reach 'output'
# Using your function, just passing different arguments for convinient.
def csv_to_excel(f_path, f_name):
csv_data=[]
with open(f_path, 'r') as file_obj:
reader=csv.reader(file_obj)
for row in reader:
csv_data.append(row)
workbook= xl.Workbook()
sheet=workbook.active
for row in csv_data:
sheet.append(row)
workbook.save(os.path.join(folder, f_name + ".xlsx"))
def main():
os.chdir(directory) # Defining input directory as your cwd
# Searching for all files with csv extention and sending each to your function
for file in glob.glob("*.csv"):
f_path = os.getcwd() + '\\' + file # Saving the absolute path to the file
f_name = (os.path.splitext(file)[0]) # Saving the name of the file
csv_to_excel(f_path, f_name)
if __name__=="__main__":
main()
P.S:
Please avoid iterating a definition of a function since you only need to define a function once.

Open csv using Python using relative filepath

os.chdir(r"C:\Downloads")
I'm getting stuck with reading in files in Python.
Why does specifying the relative file path not work when reading the file?
files = os.listdir(r"csvfilestoimport")
files
['file1.csv', 'file2.csv']
df1 = pd.concat([pd.read_csv(f) for f in files])
FileNotFoundError: [Errno 2] File file.csv does not exist:'file1.csv'
os was my choice before I learned about pathlib.
from pathlib import Path
path = Path("C:\Downloads")
df = pd.concat([pd.read_csv(f) for f in path.rglob("*.csv")])
With pathlib you don't have to join directory and file manually.
Try creating a new file with a name you are sure that doesn't exist before (in your entire computer), and check that it is created in the folder you think. Then try to read it.
Ok, now with your example. Please, note that
files = os.listdir(r"csvfilestoimport")
['file1.csv', 'file2.csv']
really means
['csvfilestoimport\file1.csv', 'csvfilestoimport\file2.csv']
So, you need to add this folder (r"csvfilestoimport"+f)
df1 = pd.concat([pd.read_csv(r"csvfilestoimport\"+f) for f in files])
See this eg.
root_path = r"C:\Downloads"
filelist = glob.glob(f"{root_path}//*.csv")
df1 = pd.concat([pd.read_csv(f) for f in filelist])
In OS.chdir() try giving the full path of the downloads "C:\Users\xxxx\Downloads" and try again
os.chdir(r'C:\Users\xxxxx\Downloads')

Opening multiple CSV files

I am trying to open multiple excel files. My program throws error message "FileNotFoundError". The file is present in the directory.
Here is the code:
import os
import pandas as pd
path = "C:\\GPA Calculations for CSM\\twentyfourteen"
files = os.listdir(path)
print (files)
df = pd.DataFrame()
for f in files:
df = pd.read_excel(f,'Internal', skiprows = 7)
print ("file name is " + f)
print (df.loc[0][1])
print (df.loc[1][1])
print (df.loc[2][1])
Program gives error on df = pd.read_excel(f,'Internal', skiprows = 7).
I opened the same file on another program (which opens single file) and that worked fine. Any suggestions or advice would be highly appreciated.
os.listdir lists the filenames relative to the directory (path) you're giving as argument. Thus, you need to join the path and filename together to get the absolute path for each file. Thus, in your loop:
for filename in files:
abspath = os.path.join(path, filename)
<etc, replace f by abspath>

Categories

Resources