File searching in python - python

I want a function that takes a string argument (data) and breaks that string into words (word). Afterwards, it should take all the files in a directory, get each file name and check if all the words are present in the file name or not.
If present then print the name of the file and print "do you want to open it " if yes then print "opened" and break all the loops. If no then it should continue searching.
At the end, it should print whether the file is present or not in the directory.
Here's the code I wrote.
def file_search(data):
data = data.split()
for root, dirs, files in os.walk("/media/", topdown=False):
word_match = True
opened = False
if not opened:
for name in files:
for word in data:
if word not in name:
word_match = False
if word_match:
print "file found:" + name + "where path is" + root
print "do you want to open it "
answer = raw_input()
if answer == "yes" :
opened = True
print "file opened"
break

Somehow I fixed it.
def file_search2(name, name_words):
check = True
for word in name_words:
if word not in name:
check = False
break
return check
def file_search(filename):
file_found = False
file_opened = False
words = filename.split()
for root, dirs, files in os.walk('/media/', topdown=False):
for name in files:
if file_search2(name, words) and file_opened == False:
file_found = True
print "file found :" + name
print "Do you want to open the file:"
answer = raw_input()
if "yes" in answer:
file_opened = True
print "file opened successfully"
if file_opened == False:
print "file not found"
file_search("file name with space")

Related

Openning all text file & getting a string in python [duplicate]

I want to check if a string is in a text file. If it is, do X. If it's not, do Y. However, this code always returns True for some reason. Can anyone see what is wrong?
def check():
datafile = file('example.txt')
found = False
for line in datafile:
if blabla in line:
found = True
break
check()
if True:
print "true"
else:
print "false"
The reason why you always got True has already been given, so I'll just offer another suggestion:
If your file is not too large, you can read it into a string, and just use that (easier and often faster than reading and checking line per line):
with open('example.txt') as f:
if 'blabla' in f.read():
print("true")
Another trick: you can alleviate the possible memory problems by using mmap.mmap() to create a "string-like" object that uses the underlying file (instead of reading the whole file in memory):
import mmap
with open('example.txt') as f:
s = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
if s.find('blabla') != -1:
print('true')
NOTE: in python 3, mmaps behave like bytearray objects rather than strings, so the subsequence you look for with find() has to be a bytes object rather than a string as well, eg. s.find(b'blabla'):
#!/usr/bin/env python3
import mmap
with open('example.txt', 'rb', 0) as file, \
mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
if s.find(b'blabla') != -1:
print('true')
You could also use regular expressions on mmap e.g., case-insensitive search: if re.search(br'(?i)blabla', s):
As Jeffrey Said, you are not checking the value of check(). In addition, your check() function is not returning anything. Note the difference:
def check():
with open('example.txt') as f:
datafile = f.readlines()
found = False # This isn't really necessary
for line in datafile:
if blabla in line:
# found = True # Not necessary
return True
return False # Because you finished the search without finding
Then you can test the output of check():
if check():
print('True')
else:
print('False')
Here's another way to possibly answer your question using the find function which gives you a literal numerical value of where something truly is
open('file', 'r').read().find('')
in find write the word you want to find
and 'file' stands for your file name
if True:
print "true"
This always happens because True is always True.
You want something like this:
if check():
print "true"
else:
print "false"
Good luck!
I made a little function for this purpose. It searches for a word in the input file and then adds it to the output file.
def searcher(outf, inf, string):
with open(outf, 'a') as f1:
if string in open(inf).read():
f1.write(string)
outf is the output file
inf is the input file
string is of course, the desired string that you wish to find and add to outf.
Your check function should return the found boolean and use that to determine what to print.
def check():
datafile = file('example.txt')
found = False
for line in datafile:
if blabla in line:
found = True
break
return found
found = check()
if found:
print "true"
else:
print "false"
the second block could also be condensed to:
if check():
print "true"
else:
print "false"
Two problems:
Your function does not return anything; a function that does not explicitly return anything returns None (which is falsy)
True is always True - you are not checking the result of your function
.
def check(fname, txt):
with open(fname) as dataf:
return any(txt in line for line in dataf)
if check('example.txt', 'blabla'):
print "true"
else:
print "false"
How to search the text in the file and Returns an file path in which the word is found
(Как искать часть текста в файле и возвращять путь к файлу в котором это слово найдено)
import os
import re
class Searcher:
def __init__(self, path, query):
self.path = path
if self.path[-1] != '/':
self.path += '/'
self.path = self.path.replace('/', '\\')
self.query = query
self.searched = {}
def find(self):
for root, dirs, files in os.walk( self.path ):
for file in files:
if re.match(r'.*?\.txt$', file) is not None:
if root[-1] != '\\':
root += '\\'
f = open(root + file, 'rt')
txt = f.read()
f.close()
count = len( re.findall( self.query, txt ) )
if count > 0:
self.searched[root + file] = count
def getResults(self):
return self.searched
In Main()
# -*- coding: UTF-8 -*-
import sys
from search import Searcher
path = 'c:\\temp\\'
search = 'search string'
if __name__ == '__main__':
if len(sys.argv) == 3:
# создаем объект поисковика и передаем ему аргументы
Search = Searcher(sys.argv[1], sys.argv[2])
else:
Search = Searcher(path, search)
# начать поиск
Search.find()
# получаем результат
results = Search.getResults()
# выводим результат
print 'Found ', len(results), ' files:'
for file, count in results.items():
print 'File: ', file, ' Found entries:' , count
If user wants to search for the word in given text file.
fopen = open('logfile.txt',mode='r+')
fread = fopen.readlines()
x = input("Enter the search string: ")
for line in fread:
if x in line:
print(line)
found = False
def check():
datafile = file('example.txt')
for line in datafile:
if blabla in line:
found = True
break
return found
if check():
print "true"
else:
print "false"
found = False
def check():
datafile = file('example.txt')
for line in datafile:
if "blabla" in line:
found = True
break
return found
if check():
print "found"
else:
print "not found"
Here's another. Takes an absolute file path and a given string and passes it to word_find(), uses readlines() method on the given file within the enumerate() method which gives an iterable count as it traverses line by line, in the end giving you the line with the matching string, plus the given line number. Cheers.
def word_find(file, word):
with open(file, 'r') as target_file:
for num, line in enumerate(target_file.readlines(), 1):
if str(word) in line:
print(f'<Line {num}> {line}')
else:
print(f'> {word} not found.')
if __name__ == '__main__':
file_to_process = '/path/to/file'
string_to_find = input()
word_find(file_to_process, string_to_find)
"found" needs to be created as global variable in the function as "if else" statement is out of the function. You also don't need to use "break" to break the loop code.
The following should work to find out if the text file has desired string.
with open('text_text.txt') as f:
datafile = f.readlines()
def check():
global found
found = False
for line in datafile:
if 'the' in line:
found = True
check()
if found == True:
print("True")
else:
print("False")

odd while loop behavior (python)

I have this code:
class CleanUp:
def __init__(self,directory):
self.directory = directory
def del_items(self,*file_extensions):
"""deletes specified file extensions in specificied directory"""
removed_files = [file for file in os.listdir(self.directory) for ext in file_extensions if ext in file]
for index ,file in enumerate(removed_files):
print(str(index + 1) + ": " + file + "\n")
confirm_delete = input("are you sure you want to delete all {0} files? y|n ".format(len(removed_files)))
while confirm_delete.lower() not in ("y","n"):<--------- this while loop
confirm_delete = input("are you sure you want to delete all {0} files? y|n ".format(len(removed_files)))
if confirm_delete.lower() == "y":
for file in removed_files:
try:
os.remove(os.path.join(self.directory,file))
except:
pass
print("successfully deleted {0} files".format(len(removed_files)))
else:
print("deletion cancelled goodbye")
pass
directory = input("please enter a directory ")
while not os.path.exists(directory):
print("{0} is not a valid directory \n".format(directory))
directory = input("please enter a directory ")
file_extensions = input("please put in file extensions of files that need deleting. seperate them by one space ")
file_extensions = file_extensions.split()
desktop = CleanUp(directory)
deleted_files = desktop.del_items(*file_extensions)
This line works
while confirm_delete.lower() not in ("y","n"):
however, when I try to do
while confirm_delete.lower() != "y" or confirm_delete.lower() != "n":
the while loop never passes.
I'm sure it has something to do with the or but
why doesn't it work when done like that?
Because that condition will always be true; there is no string value which is both "y" and "n" at the same time. Use and instead.

How to write "No file ends with" a user-defined extension

I was wondering if there was a way to tell the user that no file in a directory they specified has the file extension they are looking for. The only way I could think of uses an if/else, but would be tripped up if any other file extension exists in the directory. I was able to find something but it was bash: Listing files in a directory that do not end with vXXX and not exactly what I was looking for.
Here is an example of a directory:
out-30000000.txt.processed
out-31000000.txt.processed
out-32000000.txt.processed
out-33000000.txt.processed
out-34000000.txt.processed
nope.csv
If I use the following code:
def folder_location():
location = raw_input("What is the folder containing the data you like processed located? ")
#location = "C:/Code/Samples/Dates/2015-06-07/Large-Scale Data Parsing/Data Files"
if os.path.exists(location) == True: #Tests to see if user entered a valid path
print "You entered:",location
if raw_input("Is this correct? Use 'Y' or 'N' to answer. ") == "Y":
print ""
file_extension(location)
else:
folder_location()
else:
print "I'm sorry, but the file location you have entered does not exist. Please try again."
folder_location()
def file_extension(location):
file_extension = raw_input("What is the file type (.txt for example)? ")
print "You entered:", file_extension
if raw_input("Is this correct? Use 'Y' or 'N' to answer. ") == "Y":
print ""
each_file(location, file_extension)
else:
file_extension(location)
def each_file(location, file_extension):
try:
column = (raw_input("Please enter column name you want to analyze: ")) #Using smcn
print "You entered:",column
if raw_input("Is this correct? Use 'Y' and 'N' to answer. ") == "Y":
print ""
sort_by(location,file_extension,column)
else:
each_file(location,file_extension)
except TypeError:
print "That is not a valid column name. Please try again."
each_file(location,file_extension)
def sort_by(location, file_extension, column):
content = os.listdir(location)
for item in content:
if item.endswith(file_extension):
data = csv.reader(open(os.path.join(location,item)),delimiter=',')
col_position = get_columnposition(data.next(),column)
to_count = sorted(data, key=operator.itemgetter(col_position))
count_date(to_count, location)
else:
print "No file in this directory ends with " + file_extension
I get:
No file in this directory ends with .processed
and then the rest of my output (code not posted here).
Is there a way for me to say (I'm going to put it in a code block just to show how it works in my mind):
def file_extension(location):
file_extension = raw_input("What is the file type (.txt for example)? ")
print "You entered:", file_extension
if raw_input("Is this correct? Use 'Y' or 'N' to answer. ") == "Y":
print ""
each_file(location, file_extension)
else:
file_extension(location)
def each_file(location, file_extension):
try:
column = (raw_input("Please enter column name you want to analyze: ")) #Using smcn
print "You entered:",column
if raw_input("Is this correct? Use 'Y' and 'N' to answer. ") == "Y":
print ""
sort_by(location,file_extension,column)
else:
each_file(location,file_extension)
except TypeError:
print "That is not a valid column name. Please try again."
each_file(location,file_extension)
def sort_by(location, file_extension, column):
content = os.listdir(location)
for item in content:
if item.endswith(file_extension):
data = csv.reader(open(os.path.join(location,item)),delimiter=',')
col_position = get_columnposition(data.next(),column)
to_count = sorted(data, key=operator.itemgetter(col_position))
count_date(to_count, location)
if no item.endswith(file_extension):
print "No file in this directory ends with " + file_extension
Any help would be greatly appreciated. If it would help, I could edit in the rest of my code I have at the moment.
Thanks!
Your logic should be the following:
Ask for the directory
Ask for the extension
Check if any file ends with that extension
If there is at least one file, then ask for the column
To make all this easier, use csv and glob:
import glob
import csv
import os
directory = input('Please enter the directory: ')
extension = input('Please enter the extension (.txt, .csv): ')
files = list(glob.glob(os.path.join(directory, extension)))
if not files:
print('Sorry, no files match your extension {} in the directory {}'.
format(extension, directory))
else:
for file_name in files:
col = input('Enter the column number for {}'.format(file_name))
with open(file_name, 'r') as thefile:
reader = csv.reader(thefile, delimiter=',')
for row in reader:
try:
do_something(row[col])
except IndexError:
print('Column {} does not exist'.format(col))

Hidden Dictionaries Python

I am making a simple Text Based File System. Anyhow I am having trouble when printing out all three parts to my File System. In the File System there are three parts, the Name, Date, and Text. The Name is the file's name, the Date is the date the file was written on, and the Text is the file's contents. Now when I am appending the Name, Date, and Text to the files dictionary I can not get the Text to print out. Below is the code I am using to append the three variables to the dictionary.
files[filename] = {filedate:filetext}
Then I am using the following code to print out each of the values. (Only the Name and Date will print out)
for filename in files:
print "--------------------------------------------"
print "File Name: " + str(filename)
for filedate in files[filename]:
print "File Date: " + str(filedate)
for filetext in files.values():
print "File Contents: " + str(filetext)
I am not sure why it won't work correctly. Below is my full code so far.
import datetime
import time
files = {}
# g = open('files.txt', 'r')
# g.read(str(files))
# g.close()
def startup():
print "\n ------------------- "
print " FILE SYSTEM MANAGER "
print " ------------------- "
print "\n What would you like to do with your files?"
print " To make a new file type in: NEW"
print " To edit a current file type in: EDIT"
print " Tp delete a current file type in: DELETE"
print " To view all current files type in: ALL"
print " To search a specific file type in: SEARCH"
chooser = raw_input("\n Please enter NEW, EDIT, DELETE, ALL, or SEARCH: ")
if chooser.lower() == "new":
newfile()
elif chooser.lower() == "edit":
editfiles()
elif chooser.lower() == "delete":
deletefiles()
elif chooser.lower() == "all":
allfiles()
elif chooser.lower() == "search":
searchfiles()
else:
startup()
#-- New File -------------------------------
def newfile():
filename = ""
filetext = ""
while filename == "":
print "--------------------------------------------"
filename = raw_input("\n Please input your new files name: ")
while filetext == "":
filetext = raw_input("\n Please input the text for your new file: ")
filedate = datetime.date.today()
files[filename] = {filedate:filetext}
# f = open ('files.txt', 'w')
# f.write(str(files))
# f.close()
print "\n File Added"
print "\n--------------------------------------------"
print "\n ------------------- "
print " FILE SYSTEM MANAGER "
print " ------------------- "
print "\n What would you like to do with your files?"
print " To make a new file type in: NEW"
print " To edit a current file type in: EDIT"
print " Tp delete a current file type in: DELETE"
print " To view all current files type in: ALL"
print " To search a specific file type in: SEARCH"
chooser = raw_input("\n Please enter NEW, EDIT, DELETE, ALL, or SEARCH: ")
if chooser.lower() == "new":
newfile()
elif chooser.lower() == "edit":
editfiles()
elif chooser.lower() == "delete":
deletefiles()
elif chooser.lower() == "all":
allfiles()
elif chooser.lower() == "search":
searchfiles()
else:
startup()
def editfiles():
pass
def deletefiles():
pass
def allfiles():
for filename in files:
print "--------------------------------------------"
print "File Name: " + str(filename)
for filedate in files[filename]:
print "File Date: " + str(filedate)
for filetext in files.values():
print "File Contents: " + str(filetext)
def searchfiles():
pass
startup()
P.S. If you're feeling extra nice, I am trying to get the file writing to work correctly. The parts where I have tried to write it to a file are commented out. I am not exactly sure how to write to a file, but I gave it a shot. I am writing to the files.txt file, and I want it to save the file dictionary, and open it every time the program is closed.
Besides that the structure you use is VERY weird, you should replace
for filetext in files.values():
with
for filetext in files[filename].values():
I would rather use namedtuple to represent file record though.
Part of your problem is that files.values() enumerates the outer dict values (where you were placing dicts representing individual files), not the text for a given file. I am perplexed because it should have printed the entire dict - I can't explain why it didn't print anything. Nominally, you could fix it with for filetext in files[filename].values(), but you can also get key, value in one fell swoop and reduce the number of lookups:
for filename, content in files.iteritems():
print "--------------------------------------------"
print "File Name: " + str(filename)
for filedate, text in content.iteritems():
print "File Date: " + str(filedate)
print "File Contents: " + str(filetext)

Python Nested Dictionarys

I have this file system that I am trying to get to work. My problem so far is when printing out all of the files. I can get the name to print out, but then I do not know how to access the date and text.
My Full Code
import datetime
import time
files = {}
# g = open('files.txt', 'r')
# g.read(str(files))
# g.close()
def startup():
print "\n ------------------- "
print " FILE SYSTEM MANAGER "
print " ------------------- "
print "\n What would you like to do with your files?"
print " To make a new file type in: NEW"
print " To edit a current file type in: EDIT"
print " To view all current files type in: ALL"
print " To search a specific file type in: SEARCH"
chooser = raw_input("\n Please enter NEW, EDIT, ALL, or SEARCH: ")
if chooser.lower() == "new":
newfile()
elif chooser.lower() == "edit":
editfiles()
elif chooser.lower() == "all":
allfiles()
elif chooser.lower() == "search":
searchfiles()
else:
startup()
#-- New File -------------------------------
def newfile():
filename = ""
filetext = ""
while filename == "":
print "--------------------------------------------"
filename = raw_input("\n Please input your new files name: ")
while filetext == "":
filetext = raw_input("\n Please input the text for your new file: ")
filedate = datetime.date.today()
files[filename] = {filedate:filetext}
# f = open ('files.txt', 'w')
# f.write(str(files))
# f.close()
print "\n File Added"
print "\n--------------------------------------------"
print "\n ------------------- "
print " FILE SYSTEM MANAGER "
print " ------------------- "
print "\n What would you like to do with your files?"
print " To make a new file type in: NEW"
print " To edit a current file type in: EDIT"
print " To view all current files type in: ALL"
print " To search a specific file type in: SEARCH"
chooser = raw_input("\n Please enter NEW, EDIT, ALL, or SEARCH: ")
if chooser.lower() == "new":
newfile()
elif chooser.lower() == "edit":
editfiles()
elif chooser.lower() == "all":
allfiles()
elif chooser.lower() == "search":
searchfiles()
else:
startup()
def editfiles():
pass
def allfiles():
for i in files:
print "--------------------------------------------"
print "File Name: " + str((i))
for i[filedate] in files:
print "File Date: " + (i[filedate])
def searchfiles():
pass
startup()
It works correctly and prints the name of each file with this:
for i in files:
print "--------------------------------------------"
print "File Name: " + str((i))
then after that I can't seem to access the date and text.
I am saving the dictionaries to the dictionary file like this:
files[filename] = {filedate:filetext}
The code I am using to try to get the filedate is this:
for i in files:
print "--------------------------------------------"
print "File Name: " + str((i))
for i[filedate] in files:
print "File Date: " + (i[filedate])
and the error it gives me is >> NameError: global name 'filedate' is not defines
EDIT
how would I also add the filetext to the for loop for it to print?
THANK YOU
First off, you are iterating through the dictionary, and by default only the keys are returned, so when you do
for i in files:
Only the keys (names of the files) are stored in i, so i[filedate] would return nothing even if filedate was defined. You need to use dict.items() for both cases, which return both the key and value as pairs. Correcting your code, it will become this:
def allfiles():
for filename, filevalue in files.items():
print "--------------------------------------------"
print "File Name: " + filename
for filedate, filetext in filesvalue.items():
print "File Date: " + filedate
for a_date in files[i]:
print "File Date: " + a_date
I think would work fine ...
it becomes much more clear if you change your variable names
def allfiles():
for fileName in files:
print "--------------------------------------------"
print "File Name: " + fileName
for a_date in files[fileName]:
print "File Date: " + a_date
filedate is only defined in the function newfile(). If you want to be able to use it in the function allfiles() then you either need to re-declare it there or make the variable global.

Categories

Resources