Iterating through, and using the contents of, files in a folder

Iterating through, and using the contents of, files in a folder - python

Evening folks.
I'm trying to iterate through the contents of a folder, and use the contents of each file in the folder.
More specifically, i have a folder of JSON files that have CIDRs in them. I need to iterate through the files, read the file, compare the CIDRs to the IP that's searching through it, then move on to the next file, if the IP isn't found in the CIDR list within the file.
I've been able to load and iterate through a single file, parse the JSON file, and compare the CIDRs to the JSON using the "ipaddress" and "json" modules built into Python. But when i try to iterate through the individual files, i get a "file not found" error.
The real catch is that i'm trying to do this entirely with standard library Python modules, which is throwing me for a loop.
Here's what i've done so far.
This function can read the JSON file if one is loaded specifically:
import json
with open('example.json', 'r') as example_file:
example_data = json.load(example_file)
print(json.dumps(chime_data, indent=4))
print(type(example_data))
print(example_data.keys())
print(example_data['JsonKey'])
individual_item = chime_data['JsonKey']
print(JsonKey)
And this one will read and Compare the CIDRs to an input IP address
import json
from ipaddress import ip_network, ip_address
with open('Example.json', 'r') as example_file:
example_data = json.load(example_file)
example = example_data['JsonKey']
print("Please provide valid IP: ")
ip = input()
def in_example(ip, cidr):
return ip_address(ip) in ip_network(cidr)
for data in cidr:
if ip_address(ip) in ip_network(data):
print("The IP is in the list as", data)
else:
continue
print("Have nice day.")
And these both work. But when i try to iterate through using this method, i get a "File not Found" error
import json
import os
working_directory = '/Desktop/ExampleFolder'
for subdir, dirs, files in os.walk(working_directory):
for file in files:
if file.endswith('.json'):
with open(file, 'r') as example_file:
example_data = json.load(example_file)
print(json.dumps(example_data, indent=4))
else:
print('Well shit, something broke')
print(type(example_data))
print(example_data.keys())
print(example_data['JsonKey'])
cidrs = chime_data['JsonKey']
print(cidrs)
Which prints out:
Traceback (most recent call last):
File "Desktop/jsonread.py", line 19, in <module>
with open(file, 'r') as example_file:
FileNotFoundError: [Errno 2] No such file or directory: 'first_file.json'
Would love some feedback and guidance.

Related

Error when trying to read and write multiple files

I modified the code based on the comments from experts in this thread. Now the script reads and writes all the individual files. The script reiterates, highlight and write the output. The current issue is, after highlighting the last instance of the search item, the script removes all the remaining contents after the last search instance in the output of each file.
Here is the modified code:
import os
import sys
import re
source = raw_input("Enter the source files path:")
listfiles = os.listdir(source)
for f in listfiles:
filepath = source+'\\'+f
infile = open(filepath, 'r+')
source_content = infile.read()
color = ('red')
regex = re.compile(r"(\b be \b)|(\b by \b)|(\b user \b)|(\bmay\b)|(\bmight\b)|(\bwill\b)|(\b's\b)|(\bdon't\b)|(\bdoesn't\b)|(\bwon't\b)|(\bsupport\b)|(\bcan't\b)|(\bkill\b)|(\betc\b)|(\b NA \b)|(\bfollow\b)|(\bhang\b)|(\bbelow\b)", re.I)
i = 0; output = ""
for m in regex.finditer(source_content):
output += "".join([source_content[i:m.start()],
"<strong><span style='color:%s'>" % color[0:],
source_content[m.start():m.end()],
"</span></strong>"])
i = m.end()
outfile = open(filepath, 'w+')
outfile.seek(0)
outfile.write(output)
print "\nProcess Completed!\n"
infile.close()
outfile.close()
raw_input()

The error message tells you what the error is:
No such file or directory: 'sample1.html'
Make sure the file exists. Or do a try statement to give it a default behavior.

The reason why you get that error is because the python script doesn't have any knowledge about where the files are located that you want to open.
You have to provide the file path to open it as I have done below. I have simply concatenated the source file path+'\\'+filename and saved the result in a variable named as filepath. Now simply use this variable to open a file in open().
import os
import sys
source = raw_input("Enter the source files path:")
listfiles = os.listdir(source)
for f in listfiles:
filepath = source+'\\'+f # This is the file path
infile = open(filepath, 'r')
Also there are couple of other problems with your code, if you want to open the file for both reading and writing then you have to use r+ mode. More over in case of Windows if you open a file using r+ mode then you may have to use file.seek() before file.write() to avoid an other issue. You can read the reason for using the file.seek() here.

Reading appengine backup_info file gives EOFError

I'm trying to inspect my appengine backup files to work out when a data corruption occured. I used gsutil to locate and download the file:
gsutil ls -l gs://my_backup/ > my_backup.txt
gsutil cp gs://my_backup/LongAlphaString.Mymodel.backup_info file://1.backup_info
I then created a small python program, attempting to read the file and parse it using the appengine libraries.
#!/usr/bin/python
APPENGINE_PATH='/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/'
ADDITIONAL_LIBS = [
'lib/yaml/lib'
]
import sys
sys.path.append(APPENGINE_PATH)
for l in ADDITIONAL_LIBS:
sys.path.append(APPENGINE_PATH+l)
import logging
from google.appengine.api.files import records
import cStringIO
def parse_backup_info_file(content):
"""Returns entities iterator from a backup_info file content."""
reader = records.RecordsReader(cStringIO.StringIO(content))
version = reader.read()
if version != '1':
raise IOError('Unsupported version')
return (datastore.Entity.FromPb(record) for record in reader)
INPUT_FILE_NAME='1.backup_info'
f=open(INPUT_FILE_NAME, 'rb')
f.seek(0)
content=f.read()
records = parse_backup_info_file(content)
for r in records:
logging.info(r)
f.close()
The code for parse_backup_info_file was copied from
backup_handler.py
When I run the program, I get the following output:
./view_record.py
Traceback (most recent call last):
File "./view_record.py", line 30, in <module>
records = parse_backup_info_file(content)
File "./view_record.py", line 19, in parse_backup_info_file
version = reader.read()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/files/records.py", line 335, in read
(chunk, record_type) = self.__try_read_record()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/files/records.py", line 307, in __try_read_record
(length, len(data)))
EOFError: Not enough data read. Expected: 24898 but got 2112
I've tried with a half a dozen different backup_info files, and they all show the same error (with different numbers.)
I have noticed that they all have the same expected length: I was reviewing different versions of the same model when I made that observation, it's not true when I view the backup files of other Modules.
EOFError: Not enough data read. Expected: 24932 but got 911
EOFError: Not enough data read. Expected: 25409 but got 2220
Is there anything obviously wrong with my approach?
I guess the other option is that the appengine backup utility is not creating valid backup files.
Anything else you can suggest would be very welcome.
Thanks in Advance

There are multiple metadata files created when an AppEngine Datastore backup is run:
LongAlphaString.backup_info is created once. This contains metadata about all of the entity types and backup files that were created in datastore backup.
LongAlphaString.[EntityType].backup_info is created once per entity type. This contains metadata about the the specific backup files created for [EntityType] along with schema information for the [EntityType].
Your code works for interrogating the file contents of LongAlphaString.backup_info, however it seems that you are trying to interrogate the file contents of LongAlphaString.[EntityType].backup_info. Here's a script that will print the contents in a human-readable format for each file type:
import cStringIO
import os
import sys
sys.path.append('/usr/local/google_appengine')
from google.appengine.api import datastore
from google.appengine.api.files import records
from google.appengine.ext.datastore_admin import backup_pb2
ALL_BACKUP_INFO = 'long_string.backup_info'
ENTITY_KINDS = ['long_string.entity_kind.backup_info']
def parse_backup_info_file(content):
"""Returns entities iterator from a backup_info file content."""
reader = records.RecordsReader(cStringIO.StringIO(content))
version = reader.read()
if version != '1':
raise IOError('Unsupported version')
return (datastore.Entity.FromPb(record) for record in reader)
print "*****" + ALL_BACKUP_INFO + "*****"
with open(ALL_BACKUP_INFO, 'r') as myfile:
parsed = parse_backup_info_file(myfile.read())
for record in parsed:
print record
for entity_kind in ENTITY_KINDS:
print os.linesep + "*****" + entity_kind + "*****"
with open(entity_kind, 'r') as myfile:
backup = backup_pb2.Backup()
backup.ParseFromString(myfile.read())
print backup

Python 3. Need to write to a file, check to see if a line exist, then write to the file again

I recently recovered a ton pictures from a friend's dead hard drive and I decided to wanted to write a program in python to:
Go through all the files
Check their MD5Sum
Check to see if the MD5Sum exists in a text file
If it does, let me know with "DUPLICATE HAS BEEN FOUND"
If it doesn't, add the MD5Sum to the text file.
The ultimate goal being to delete all duplicates. However, when I run this code, I get the following:
Traceback (most recent call last):
File "C:\Users\godofgrunts\Documents\hasher.py", line 16, in <module>
for line in myfile:
io.UnsupportedOperation: not readable
Am I doing this completely wrong or am I just misunderstanding something?
import hashlib
import os
import re
rootDir = 'H:\\recovered'
hasher = hashlib.md5()
with open('md5sums.txt', 'w') as myfile:
for dirName, subdirList, fileList in os.walk(rootDir):
for fname in fileList:
with open((os.path.join(dirName, fname)), 'rb') as pic:
buf = pic.read()
hasher.update(buf)
md5 = str(hasher.hexdigest())
for line in myfile:
if re.search("\b{0}\b".format(md5),line):
print("DUPLICATE HAS BEEN FOUND")
else:
myfile.write(md5 +'\n')

You have opened your file in writing mode ('w') In your with statement. To open it both writing and reading mode, do:
with open('md5sums.txt', 'w+') as myfile:

The correct mode is "r+", not "w+".
http://docs.python.org/3.3/tutorial/inputoutput.html#reading-and-writing-files

Emails via Python - Paste contents of excel/csv as a formatted table onto the mail body

I am trying to send mails via Python using smtplib. My main concern is to get the contents of a csv/excel and paste the data as it is(tabular format) onto the mail body of the email being sent out. I have the following snippet ready to search for the file and print the contents on the shell. How would I get the same output onto a mail body?
from os import listdir
import csv
import os
#Search for a csv in the specified folder
directory = "folder_path"
def find_csv_filenames( path_to_dir, suffix="Data.csv" ):
filenames = listdir(path_to_dir)
return [ filename for filename in filenames if filename.endswith( suffix ) ]
filenames = find_csv_filenames(directory)
for name in filenames:
datafile=name
print(name)
path=directory+'//'+datafile
#Read the selected csv
with open(path,'r') as csvfile:
spamreader=csv.reader(csvfile,delimiter=' ',quotechar='|')
for row in spamreader:
print(', '.join(row))
TIA for your help.

Create a StringIO instance, say csvText and instead of print use
csvText.write(", ".join(row)+"\n")
The final newline is necessary, because it is not automatically added as by print. Finally (i.e. after the loop) calling csvText.getvalue() will return what you want to mail.
I would also suggest not to glue file specification together by yourself but call os.path.join() instead.

"'NoneType' object is not iterable" error

Just wrote my first python program! I get zip files as attachment in mail which is saved in local folder. The program checks if there is any new file and if there is one it extracts the zip file and based on the filename it extracts to different folder. When i run my code i get the following error:
Traceback (most recent call last): File "C:/Zip/zipauto.py", line 28, in for file in new_files: TypeError: 'NoneType' object is not iterable
Can anyone please tell me where i am going wrong.
Thanks a lot for your time,
Navin
Here is my code:
import zipfile
import os
ROOT_DIR = 'C://Zip//Zipped//'
destinationPath1 = "C://Zip//Extracted1//"
destinationPath2 = "C://Zip//Extracted2//"
def check_for_new_files(path=ROOT_DIR):
new_files=[]
for file in os.listdir(path):
print "New file found ... ", file
def process_file(file):
sourceZip = zipfile.ZipFile(file, 'r')
for filename in sourceZip.namelist():
if filename.startswith("xx") and filename.endswith(".csv"):
sourceZip.extract(filename, destinationPath1)
elif filename.startswith("yy") and filename.endswith(".csv"):
sourceZip.extract(filename, destinationPath2)
sourceZip.close()
if __name__=="__main__":
while True:
new_files=check_for_new_files(ROOT_DIR)
for file in new_files: # fails here
print "Unzipping files ... ", file
process_file(ROOT_DIR+"/"+file)

check_for_new_files has no return statement, and therefore implicitely returns None. Therefore,
new_files=check_for_new_files(ROOT_DIR)
sets new_files to None, and you cannot iterate over None.
Return the read files in check_for_new_files:
def check_for_new_files(path=ROOT_DIR):
new_files = os.listdir(path)
for file in new_files:
print "New file found ... ", file
return new_files

Here is the answer to your NEXT 2 questions:
(1) while True:: your code will loop forever.
(2) your function check_for_new_files doesn't check for new files, it checks for any files. You need to either move each incoming file to an archive directory after it's been processed, or use some kind of timestamp mechanism.

Example, student_grade = dict(zip(names, grades)) make sure names and grades are lists and both having at least more than one item to iterate with. This has helped me

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterating through, and using the contents of, files in a folder - python

Related

Error when trying to read and write multiple files

Reading appengine backup_info file gives EOFError

Python 3. Need to write to a file, check to see if a line exist, then write to the file again

Emails via Python - Paste contents of excel/csv as a formatted table onto the mail body

"'NoneType' object is not iterable" error

Categories

Resources