so I have managed to concatenate every single .txt file of one directory into one file with this code:
import os
import glob
folder_path = "/Users/EnronSpam/enron1/ham"
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename, 'r', encoding="latin-1") as f:
text = f.read()
with open('new.txt', 'a') as a:
a.write(text)
but in my 'EnronSpam' folder there are actually multiple directories (enron 1-6), each of which has a ham directory. How is it possible to go through each directory and add every single file of that directory into one file?
If you just want to collect all the txt files from the enron[1-6]/ham folders try this:
glob.glob("/Users/EnronSpam/enron[1-6]/ham/*.txt")
It will pick up all txt files from the enron[1-6] folders' ham subfolders.
Also a slightly reworked snippet of the original code looks like this:
import glob
glob_path = "/Users/EnronSpam/enron[1-6]/ham/*.txt"
with open("new.txt", "w") as a:
for filename in glob.glob(glob_path):
with open(filename, "r", encoding="latin-1") as f:
a.write(f.read())
Instead of always opening and appending to the new file it makes more sense to open it right at the beginning and write the content of the ham txt files.
So, given that the count and the names of the directories are known, you should just add the full paths in a list and loop execute it all for each element:
import os
import glob
folder_list = ["/Users/EnronSpam/enron1/ham", "/Users/EnronSpam/enron2/ham", "/Users/EnronSpam/enron3/ham"]
for folder in folder_list:
for filename in glob.glob(os.path.join(folder, '*.txt')):
with open(filename, 'r', encoding="latin-1") as f:
text = f.read()
with open('new.txt', 'a') as a:
a.write(text)
Related
I have a folder with a .txt files. the name of the files are:
my_file1.txt
my_file2.txt
my_file3.txt
my_file4.txt
In this way, only the last number is different.
import pickle
my_list = []
with open("/Users/users_a/Desktop/website-basic/sub_domain/sub_domain01.txt", "rb") as f1,
open("/Users/users_a/Desktop/website-ba\
sic/sub_domain/sub_domain02.txt", "rb") as f2, open("/Users/users_a/Desktop/website-
basic/sub_domain/sub_domain03.txt", "rb") as f3:
my_list.append(pickle.load(f1))
my_list.append(pickle.load(f2))
my_list.append(pickle.load(f3))
print(my_list)
In this way, I load a file and put it in the my_list variable to make a list and work. As the number of files to work increases, the code becomes too long and cumbersome.
Is there an easier and more pythonic way to load only the desired txt file??
You can use os.listdir():
import os
import pickle
my_list = []
path = "/Users/users_a/Desktop/website-basic/sub_domain"
for file in os.listdir(path):
if file.endswith(".txt"):
with open(f"{path}/{file}","r") as f:
my_list.append(pickle.load(f))
Where file is the filename of a file in path
I suggest using os.path.join() instead of hard coding the file paths
If your folder only contains the files you want to load you can just use:
for file in os.listdir(path):
with open(f"{path}/{file}","r") as f:
my_list.append(pickle.load(f))
Edit for my_file[number].txt
If you only want files in the form of my_file[number].txt, use:
import os
import re
import pickle
my_list = []
path = "/Users/users_a/Desktop/website-basic/sub_domain"
for file in os.listdir(path):
if re.match(r"my_file\d+.txt", file):
with open(f"{path}/{file}","r") as f:
my_list.append(pickle.load(f))
Online regex demo https://regex101.com/r/XJb2DF/1
I would like to read all the contents from all the text files in a directory. I have 4 text files in the "path" directory, and my codes are;
for filename in os.listdir(path):
filepath = os.path.join(path, filename)
with open(filepath, mode='r') as f:
content = f.read()
thelist = content.splitlines()
f.close()
print(filepath)
print(content)
print()
When I run the codes, I can only read the contents from only one text file.
I will be thankful that there are any advice or suggestions from you or that you know any other informative inquiries for this question in stackoverflow.
If you need to filter the files' name per suffix, i.e. file extension, you can either use the string method endswith or the glob module of the standard library https://docs.python.org/3/library/glob.html
Here an example of code which save each file content as a string in a list.
import os
path = '.' # or your path
files_content = []
for filename in filter(lambda p: p.endswith("txt"), os.listdir(path)):
filepath = os.path.join(path, filename)
with open(filepath, mode='r') as f:
files_content += [f.read()]
With the glob way here an example
import glob
for filename in glob.glob('*txt'):
print(filename)
This should list your file and you can read them one by one. All the lines of the files are stored in all_lines list. If you wish to store the content too, you can keep append it too
from pathlib import Path
from os import listdir
from os.path import isfile, join
path = "path_to_dir"
only_files = [f for f in listdir(path) if isfile(join(path, f))]
all_lines = []
for file_name in only_files:
file_path = Path(path) / file_name
with open(file_path, 'r') as f:
file_content = f.read()
all_lines.append(file_content.splitlines())
print(file_content)
# use all_lines
Note: when using with you do not need to call close() explicitly
Reference: How do I list all files of a directory?
Basically, if you want to read all the files, you need to save them somehow. In your example, you are overriding thelist with content.splitlines() which deletes everything already in it.
Instead you should define thelist outside of the loop and use thelist.append(content.splitlines) each time, which adds the content to the list each iteration
Then you can iterate over thelist later and get the data out.
I have the Python code below in which I am attempting to access a folder called downloaded that contains multiple JSON object files.
Within each JSON there is a value keyword for which I need to extract and add to the list named keywordList
I've attempted by adding the filenames to fileList (which works ok), but I cannot seem to loop through the fileList and extract the keyword connected.
Amy help much appreciated, thanks!
import os
os.chdir('/Users/Me/Api/downloaded')
fileList = []
keywordList = []
for filenames in os.walk('/Users/Me/Api/downloaded'):
fileList.append(filenames)
for file in filenames:
with open(file, encoding='utf-8', mode='r') as currentFile:
keywordList.append(currentFile['keyword'])
print(keywordList)
Your question mentioned JSON. So I have addressed that.
Let me know if this helps.
import json
import os
import glob
import pprint
keywordList = []
path = '/Users/Me/Api/downloaded'
for filename in glob.glob(os.path.join(path, '*.json')): #only process .JSON files in folder.
with open(filename, encoding='utf-8', mode='r') as currentFile:
data=currentFile.read().replace('\n', '')
keyword = json.loads(data)["keytolookup"]
if keyword not in keywordList:
keywordList.append(keyword)
pprint(keywordList)
EDIT note: Updated answer changing for loop from original response of: for filename in os.listdir(path)
OP mentioned glob version worked better. Had given that as alternative too.
You are adding the filenames in the fileList array but in the second for loop you are iterating over the filenames instead of the fileList.
import os
os.chdir('/Users/Me/Api/downloaded')
fileList = []
keywordList = []
for filenames in os.walk('/Users/Me/Api/downloaded'):
fileList.append(filenames)
for file in fileList:
with open(file, encoding='utf-8', mode='r') as currentFile:
keywordList.append(currentFile['keyword'])
Shouldn't the line for file in filenames: be for file in fileList:?
Also I think this is the correct way to use os.walk()
import os
fileList = []
keywordList = []
for root, dirs, files in os.walk('/Users/Me/Api/downloaded', topdown=False):
for name in files:
fileList.append(os.path.join(root, name))
for file in fileList:
with open(file, encoding='utf-8', mode='r') as currentFile:
keywordList.append(currentFile['keyword'])
print(keywordList)
open() returns a filehandle to the open file. You still need to loop over the contents of the file. By default, the contents are split by line-end (\n). After that, you have to match the keyword to the line.
Replace the second for loop with:
for file in filenames:
with open(file, encoding='utf-8', mode='r') as currentFile:
for line in currentFile:
if 'keyword' in line:
keywordList.append('keyword')
Also, have a look at the Python JSON module. Recursive iteration over json/dicts is answered here.
You are using currentFile like it is a json object, but it is only a file handle. I have added the missing step, the parsing of the file to a json object.
import os
import json
os.chdir('/Users/Me/Api/downloaded')
fileList = []
keywordList = []
for filenames in os.walk('/Users/Me/Api/downloaded'):
fileList.append(filenames)
for file in filenames:
with open(file, encoding='utf-8', mode='r') as currentFile:
data = json.load(currentFile) # Parses the file to json object
keywordList.append(data['keyword'])
print(keywordList)
I want to find string e.g. "Version1" from my files of a folder which contains multiple ".c" and ".h" files in it and replace it with "Version2.2.1" using python file.
Anyone know how this can be done?
Here's a solution using os, glob and ntpath. The results are saved in a directory called "output". You need to put this in the directory where you have the .c and .h files and run it.
Create a separate directory called output and put the edited files there:
import glob
import ntpath
import os
output_dir = "output"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
for f in glob.glob("*.[ch]"):
with open(f, 'r') as inputfile:
with open('%s/%s' % (output_dir, ntpath.basename(f)), 'w') as outputfile:
for line in inputfile:
outputfile.write(line.replace('Version1', 'Version2.2.1'))
Replace strings in place:
IMPORTANT! Please make sure to back up your files before running this:
import glob
for f in glob.glob("*.[ch]"):
with open(f, "r") as inputfile:
newText = inputfile.read().replace('Version1', 'Version2.2.1')
with open(f, "w") as outputfile:
outputfile.write(newText)
I want to read 50 CSV files and write down the results to a single CSV file. My current code reads only a single CSV file that is 1.csv and write the output to out.csv. How can I tweak this code ? Please help.
import csv
f1 = open("1.csv", "rb")
reader = csv.reader(f1)
header = reader.next()
f2 = open("out.csv", "wb")
writer = csv.writer(f2)
writer.writerow(header)
for row in reader:
if row[8] == 'READ' and row[10] != '0000':
writer.writerow(row)
f1.close()
f2.close()
Try using glob to loop through the files, read it and then append it to the new file.
To search the file for lines with an identifier use re
import glob
import re
out = []
for fil in glob.glob("path/to/files/*.csv"):
for line in open(fil,'r'):
if re.search('READ',line):
out.append(line)
You're going to need to look into something like os in Python and walk through your directories of excel files working each one individually. Instead of writing out to a new page each time a file is worked, you're going to append to the file that you are writing to.
This example will walk through all documents connected to the root tree finding all files in all directories. This could be overkill if you have all your files in one directory.
import os
# traverse root directory, and list directories as dirs and excel files as files
for root, dirs, files in os.walk("."):
path = root.split(os.sep)
print((len(path) - 1) * '---', os.path.basename(root))
for file in files:
print(len(path) * '---', file)