I haven't been able to find any information on this topic here, and would really appreciate your help! I'm pretty new to python, but here's what I have.
I have multiple file in a folder, and want to read them, transpose them, and then rewrite them into a new folder. I think I have everything going, but can't figure out how to rewrite everything.
here is my code:
path = 'C:\Users\Christopher\Documents\Clemson\Pleurodires\stability data\Es03\fixed\processed'
filenames = glob.glob(path + "/*.csv")
for filename in filenames:
dfs = (pd.read_csv(filename))
df = dfs.transpose()
df.to_csv('transposed\' + 'Tr_' + filename)
the last line (i hope) should put all the new files in the folder called 'transposed', adding a Tr_ in front of the name which was loaded initially (ie if the file name was 'hello' it would now be 'Tr_hello' inside of the folder transposed).
What is happening when I run the code above, is that it says it works, but then the files don't exist anywhere in my computer. I've tried playing around with a variety of different ways to get the df.to_csv to work and this is the closest I've gotten
Edit
Thanks for everyone's help, I ended up combining a mix of Nanashi's and EdChun's code to get this, which works: (the final files are in the correct folder, and are called Tr_filename)
path = r'C:\Users\Christopher\Documents\Clemson\Pleurodires\stability data\Es03\fixed\processed'
filenames = glob.glob(path + "/*.csv")
for filename in filenames:
short = os.path.split(filename)
newfilename = 'Tr_%s' % short[-1]
#print newfilename
dfs = (pd.read_csv(filename))
df = dfs.transpose()
df.to_csv(os.path.join('transposed', newfilename))
A few things:
filenames = glob.glob(path + "/*.csv") -- unless I'm wrong, that should be a backslash, not a forward-slash. Forward slashes are primarily used in Unix systems, etc. but definitely not in Windows where path names are concerned.
Try printing out filename. It will give you the whole path as well. At the df.to_csv line, you're actually writing to path + filename + transposed + Tr + filename. You have to isolate the specific filename (using split or the os module may work).
I'm using Ubuntu, so this might not apply that accurately, but here's how I'll do it.
import pandas as pd
from glob import glob
path = "/home/nanashi/Documents/Python 2.7/Scrapers/Scrapy/itbooks"
filenames = glob(path + "/*.csv")
for filename in filenames:
specname = filename.split("/")[-1]
print filename
print specname
dfs = pd.read_csv(filename)
df = dfs.transpose()
df.to_csv("transposed/%s" % specname)
Result:
/home/nanashi/Documents/Python 2.7/Scrapers/Scrapy/itbooks/realestateau.csv
realestateau.csv
/home/nanashi/Documents/Python 2.7/Scrapers/Scrapy/itbooks/itbooks.csv
itbooks.csv
[Finished in 0.6s]
Screenshot of transposed file:
Let us know if this helps.
Your code seems to have multiple errors try the following:
import os
path = r'C:\Users\Christopher\Documents\Clemson\Pleurodires\stability data\Es03\fixed\processed'
filenames = glob.glob(path + "/*.csv")
for filename in filenames:
dfs = (pd.read_csv(filename))
df = dfs.transpose()
df.to_csv(os.path.join(r'transposed\Tr_', filename))
Related
I'm trying to put 2 csv files into a list in order to view them.
My path is correct. My files are in the path.
I get the error above, as if my editor doesn't recognize my C: drive. No matter how I set my path / put my csv files, Python doesn't recognize C: for some reason.
Any advice for a beginner who would like to get out the dealing with this type of minutia?
ls = []
path = r'C:\fMRI'
print(os.listdir(path))
li_mapper = map(lambda filename: pd.read_csv(filename, index_col=None, header=0), path)
ls = list(li_mapper)
print(ls)
Strings are iterable. C is not a CSV file that can be read. In other words, map(..., path) will iterate [C, :, \, ...]
If you want to make a list of dataframes, then you want this.
path = r'C:\fMRI'
dfs = [ pd.read_csv(filename, index_col=None, header=0) for filename in os.listdir(path)) ]
But you should also filter that listdir result on actual files that end in .csv.
Also, pd.read_csv(filename is not the absolute path to the file, only the name. You will need to prepend the path variable to it.
If you want this to work from anywhere you need to join the filename back with the path. I also argue this is getting a little long for list comprehension.
import os
path = r'C:\fMRI'
dfs = []
for filename in os.listdir(path):
if not filename.endswith('.csv'):
continue
fpath = os.path.join(path,filename)
dfs.append(pd.read_csv(path, index_col=None, header=0))
My first post on StackOverflow, so please be nice. In other words, a super beginner to Python.
So I want to read multiple files from a folder, divide the text and save the output as a new file. I currently have figured out this part of the code, but it only works on one file at a time. I have tried googling but can't figure out a way to use this code on multiple text files in a folder and save it as "output" + a number, for each file in the folder. Is this something that's doable?
with open("file_path") as fReader:
corpus = fReader.read()
loc = corpus.find("\n\n")
print(corpus[:loc], file=open("output.txt","a"))
Possibly work with a list, like:
from pathlib import Path
source_dir = Path("./") # path to the directory
files = list(x for x in filePath.iterdir() if x.is_file())
for i in range(len(files)):
file = Path(files[i])
outfile = "output_" + str(i) + file.suffix
with open(file) as fReader, open(outfile, "w") as fOut:
corpus = fReader.read()
loc = corpus.find("\n\n")
fOut.write(corpus[:loc])
** sorry for multiple editting....
welcome to the site. Yes, what you are asking above is completely doable and you are on the right track. You will need to do a little research/practice with the os module which is highly useful when working with files. The two commands that you will want to research a bit are:
os.path.join()
os.listdir()
I would suggest you put two folders within your python file, one called data and the other called output to catch the results. Start and see if you can just make the code to list all the files in your data directory, and just keep building that loop. Something like this should list all the files:
# folder file lister/test writer
import os
source_folder_name = 'data' # the folder to be read that is in the SAME directory as this file
output_folder_name = 'output' # will be used later...
files = os.listdir(source_folder_name)
# get this working first
for f in files:
print(f)
# make output folder names and just write a 1-liner into each file...
for f in files:
output_filename = f.split('.')[0] # the part before the period
output_filename += '_output.csv'
output_path = os.path.join(output_folder_name, output_filename)
with open(output_path, 'w') as writer:
writer.write('some data')
I am making what i suspect to be a very silly error here but vast majority of what i've found online talks about reading multiple files into a single dataframe or outputting results into a single file which is not my goal here.
Aim: read hundreds of CSV files, one by one, filter each one and output the result in a file using the original file's name in the output/result file (e.g. "Processed_<original_file>.csv*")*, then move on to the next file in the loop, read & filter that, put the results for that in a new output/result file. and so on.
Problem: I either run into a problem where only a single result file is produced (from the last read file in the loop) or if i use the code below , having read various SO pages , i get an invalid argument error.
Error: OSError: [Error 22] invalid argument: 'c:/users/my Directory/sourceFiles\Processed_c:/users/my Directory/sourceFiles\files1.csv'
I know i'm getting my loop & re-naming wrong at the moment but can't figure out how to do this without loading ALL my csvs into a single dataframe using list & concat and outputting everything into a single result file (which is not my aim) --- i want to output each result file into individual files , which share the name of the original file.
ideally given the size & number of files (700+ each 400mb) involve i rather use Pandas as that seems to be more efficient from what ive learnt so far.
import pandas as pd
import glob
import os
path = "c:/users/my Directory/"
csvFiles = glob.glob( path + "/sourceFiles/files*")
for files in csvFiles:
df = pd.read_csv(files, index_col=None, encoding='Latin-1', engine = 'python',
error_bad_lines = False)
df_f = df[df.iloc[:, 2] == "Office"]
filepath = os.path.join(path,'Processed_'+str(files)+'.csv')
df_f.to_csv(filepath)
The error message is nice because it shows you exactly what is wrong--your filename for the output save is wrong because the c:/users/... is repeated twice and concatenated together.
Try something with os.path.basename() to strip file extension and path:
fileout = path + '\\' + 'Processed_' + os.path.splitext(os.path.basename(files))[0] + '.csv'
And most importantly, test it with a couple print statements to see if your ins and outs are what you expect. Just comment-out the analysis lines.
import pandas as pd
import glob
import os
path = "c:/users/my Directory/"
csvFiles = glob.glob( path + "/sourceFiles/files*")
for files in csvFiles:
print(files)
#df = pd.read_csv(files, index_col=None, encoding='Latin-1', engine = 'python',
error_bad_lines = False)
#df_f = df[df.iloc[:, 2] == "Office"]
filepath = path + '\\' + 'Processed_' + os.path.splitext(os.path.basename(files))[0] + '.csv'
print(filepath)
#df_f.to_csv(filepath)
I am trying to split file name and file extension at specific path I am facing a problem the code is working fine in a certain way but at some point os.path.splitext() is not efficient at this point. the method is not splitting extension correctly, it's splitting the text and this cause an incorrect list for me.
This is the output when I run the code:
['', '.pdf', '.txt', '.docx', '.jpg', '.4000 + KeyGen', '.lnk', '.3', '.9', '.ini', '.png', '.url', '.exe', '.JPG', '.PNG', '.38)', '.Trainer', '.1 + Crack', '.ME]', '.pptx']
Notice there are some empty strings and some start with .ME although I don't have any extension at the path ends .ME or .9 etc.
What I can do is ether clean up my list with more code or adjust something in for loop to suit my needs.
Note that I am new to programming I am trying to learn as much as I can I am doing this for one week or so thanks for helping in advance.
import os, shutil
path = "C:\\Users\\VERX\\Desktop"
directories = []
file_names = []
# Split file name from file extension and append each one of them to its list.
for entry in os.listdir(path):
file_name, file_type = os.path.splitext(entry)
if file_type not in directories:
directories.append(file_type)
file_names.append(entry)
print(directories)
I've found several related posts to this but when I try to use the code suggested I keep getting "The system cannot find the file specified". I imagine it's some kind of path problem. There are several folders within the "Cust" folder and each of those folders have several files and some have "." in the file name I need to remove. Any idea what I have wrong here?
customer_folders_path = r"C:\Users\All\Documents\Cust"
for directname, directnames, files in os.walk(customer_folders_path):
for file in files:
filename_split = os.path.splitext(file)
filename_zero = filename_split[0]
if "." in filename_zero:
os.rename(filename_zero, filename_zero.replace(".", ""))
When you use os.walk and then iterate through the files, remember that you are only iterating through file names - not the full path (which is what is needed by os.rename in order to function properly). You can adjust by adding the full path to the file itself, which in your case would be represented by joining directname and filename_zero together using os.path.join:
os.rename(os.path.join(directname, filename_zero),
os.path.join(directname, filename_zero.replace(".", "")))
Also, not sure if you use it elsewhere, but you could remove your filename_split variable and define filename_zero as filename_zero = os.path.splitext(file)[0], which will do the same thing. You may also want to change customer_folders_path = r"C:\Users\All\Documents\Cust" to customer_folders_path = "C:/Users/All/Documents/Cust", as the directory will be properly interpreted by Python.
EDIT: As intelligently pointed out by #bozdoz, when you split off the suffix, you lose the 'original' file and therefore it can't be found. Here is an example that should work in your situation:
import os
customer_folders_path = "C:/Users/All/Documents/Cust"
for directname, directnames, files in os.walk(customer_folders_path):
for f in files:
# Split the file into the filename and the extension, saving
# as separate variables
filename, ext = os.path.splitext(f)
if "." in filename:
# If a '.' is in the name, rename, appending the suffix
# to the new file
new_name = filename.replace(".", "")
os.rename(
os.path.join(directname, f),
os.path.join(directname, new_name + ext))
You need to use the original filename as the first parameter to os.rename and handle the case where the filename didn't have a period in the first place. How about:
customer_folders_path = r"C:\Users\All\Documents\Cust"
for directname, directnames, files in os.walk(customer_folders_path):
for fn in files:
if '.' in fn:
fullname = os.path.join(directname, fn)
os.rename(fullname, os.path.splitext(fullname)[0])