Python search string value from file - python

I'm trying to create a python script to look up for a specific string in a txt file
For example I have the text file dbname.txt includes the following :
Level1="50,90,40,60"
Level2="20,10,30,80"
I will need the script to search for the user input in the file and print the output that equals that value Like :
Please enter the quantity : 50
The level is : Level1
I am stuck in the search portion from the file ?
any advise ?
Thanks in advance

In these sorts of limited cases, I would recommend regular expressions.
import re
import os
You need a file to get the info out of, make a directory for it, if it's not there, and then write the file:
os.mkdir = '/tmp'
filepath = '/tmp/foo.txt'
with open(filepath, 'w') as file:
file.write('Level1="50,90,40,60"\n'
'Level2="20,10,30,80"')
Then read the info and parse it:
with open(filepath) as file:
txt = file.read()
We'll use a regular expression with two capturing groups, the first for the Level, the second for the numbers:
mapping = re.findall(r'(Level\d+)="(.*)"', txt)
This will give us a list of tuple pairs. Semantically I'd consider them keys and values. Then get your user input and search your data:
user_input = raw_input('Please enter the quantity: ')
I typed 50, and then:
for key, value in mapping:
if user_input in value:
print('The level is {0}'.format(key))
which prints:
The level is Level1

Use the mmap module, the most efficient way to do this hands down. mmap doesn't read the entire file into memory (it pages it on demand) and supports both find() and rfind()
with open("hello.txt", "r+b") as f:
# memory-map the file, size 0 means whole file
mm = mmap.mmap(f.fileno(), 0)
position = mm.find('blah')

Related

Input string and Search it inside a file Python

I am a newbie I am trying to implement a code where if I type text it will look inside the file and will say if there's something matched or not if doesn't matched anything it will display no record however this below code is not giving the right output any idea thank you very much in advance
input = raw_input("Input Text you want to search: ")
with open('try.txt') as f:
found = False
if input in f:
print "true"
found = True
if not found:
print('no record!')
In order to match a string against the text from the file, you need to read the file:
with open('try.txt') as f:
data = f.read()
Then to check if a string is found in the file, check like this:
if input_ in data:
pass
Also, two tips:
1) Indent your code correctly. Use four spaces for every level of indentation.
2) Don't use reserved keywords to name your variables. Instead of input, use input_ or something else.
You are not actually reading the file, try something like file_content = f.read() and then do a if input in file_content.
This should print "true" if found or "no record!" for not found
I've not included your boolean "found" variable because it's not used.
First the file data is read into "data" variable as a string then we perform a check with in operator
input = raw_input("Input Text you want to search: ")
with open('try.txt', 'r') as myfile:
data=myfile.read()
if input in data:
print "true"
else:
print('no record!')

Pulling file names from a list of full paths?

I am trying to pull out file names from a specifically formatted document, and put them into a list. The document contains a large amount of information, but the lines I am concerned about look like the following with "File Name: " always at the start of the line:
File Name: C:\windows\system32\cmd.exe
I tried the following:
xmlfile = open('my_file.xml', 'r')
filetext = xmlfile.read()
file_list = []
file_list.append(re.findall(r'\bFile Name:\s+.*\\.*(?=\n)', filetext))
This makes file_list look like:
[['File Name: c:\\windows\\system32\\file1.exe',
'File Name: c:\\windows\\system32\\file2.exe',
'File Name: c:\\windows\\system32\\file3.exe']]
I'm looking for my output to simply be:
(file1.exe, file2.exe, file3.exe)
I also tried using ntpath.basename on my above output, but it looks like it wants a string as input and not a list.
I'm very new to Python and scripting in general, so any suggestions would be appreciated.
You can get the expected output with following regular expression:
file_list = re.findall(r'\bFile Name:\s+.*\\([^\\]*)(?=\n)', filetext)
([^\\]*) will capture everything except a slash after final path separator until \n is encountered, see online example. Since findall already returns a list there's no need to append the return value to existing list.
You can do it in a more declarative style. It ensures less bugs, high memory efficiency.
import os.path
pat = re.compile(r'\bFile Name:\s+.*\\.*(?=\n)')
with open('my_file.xml') as f:
ms = (pat.match(line) for line in f)
ns = (os.path.basename(m) for m in ms)
# the iterator ns emits names such as 'foo.txt'
for n in ns:
# do something
If you change the regex slightly, i.e the grouping you don't even need os.path.
I would change this up a bit to make it a bit clearer to read and separate the process a bit - clearly it can be done in one step, but I think your code is going to be tough to manage later
import re
import os
with open('my_file.xml', 'r') as xmlfile:
filetext = xmlfile.read() # this way the file handle goes away - you left the file open
file_list = []
my_pattern = re.compile(r'\bFile Name:\s+.*\\.*(?=\n)')
for filename in my_pattern.findall(filetext):
cleaned_name = filename.split(os.sep)[-1]
file_list.append(cleaned_name)
You're on the right track. The reason basename wasn't working was because re.findall() returns a list which was being put into yet another list. Here's a fix for that which iterates through that list returned and creates another with just the base file names in:
import re
import os
with open('my_file.xml', 'rU') as xmlfile:
file_text = xmlfile.read()
file_list = [os.path.basename(fn)
for fn in re.findall(r'\bFile Name:\s+.*\\.*(?=\n)', file_text)]

Extracting data from a text file with Python

So I have a large text file. It contains a bunch of information in the following format:
|NAME|NUMBER(1)|AST|TYPE(0)|TYPE|NUMBER(2)||NUMBER(3)|NUMBER(4)|DESCRIPTION|
Sorry for the vagueness. All the information is formatted like the above and between each descriptor is the separator '|'. I want to be able to search the file for the 'NAME' and the print each descriptor in it's own tag such as this example:
Name
Number(1):
AST:
TYPE(0):
etc....
In case I'm still confusing, I want to be able to search the name and then print out the information that follows each being separated by a '|'.
Can anyone help?
EDIT
Here is an example of a part of the text file:
|Trevor Jones|70|AST|White|Earth|3||500|1500|Old Man Living in a retirement home|
This is the code I have so far:
with open('LARGE.TXT') as fd:
name='Trevor Jones'
input=[x.split('|') for x in fd.readlines()]
to_search={x[0]:x for x in input}
print('\n'.join(to_search[name]))
First you need to break the file up somehow. I think that a dictionary is the best option here. Then you can get what you need.
d = {}
# Where `fl` is our file object
for L in fl:
# Skip the first pipe
detached = L[1:].split('|')
# May wish to process here
d[detached[0]] = detached[1:]
# Can do whatever with this information now
print d.get('string_to_search')
Something like
#Opens the file in a 'safe' manner
with open('large_text_file') as fd:
#This reads in the file and splits it into tokens,
#the strip removes the extra pipes
input = [x.strip('|').split('|') for x in fd.readlines()]
#This makes it into a searchable dictionary
to_search = {x[0]:x for x in input}
and then search with
to_search[NAME]
Depending on the format you want the answers in use
print ' '.join(to_search[NAME])
or
print '\n'.join(to_search[NAME])
A word of warning, this solution assumes that the names are unique, if they aren't a more complex solution may be required.

Rename Files Based on File Content

Using Python, I'm trying to rename a series of .txt files in a directory according to a specific phrase in each given text file. Put differently and more specifically, I have a few hundred text files with arbitrary names but within each file is a unique phrase (something like No. 85-2156). I would like to replace the arbitrary file name with that given phrase for every text file. The phrase is not always on the same line (though it doesn't deviate that much) but it always is in the same format and with the No. prefix.
I've looked at the os module and I understand how
os.listdir
os.path.join
os.rename
could be useful but I don't understand how to combine those functions with intratext manipulation functions like linecache or general line reading functions.
I've thought through many ways of accomplishing this task but it seems like easiest and most efficient way would be to create a loop that finds the unique phrase in a file, assigns it to a variable and use that variable to rename the file before moving to the next file.
This seems like it should be easy, so much so that I feel silly writing this question. I've spent the last few hours looking reading documentation and parsing through StackOverflow but it doesn't seem like anyone has quite had this issue before -- or at least they haven't asked about their problem.
Can anyone point me in the right direction?
EDIT 1: When I create the regex pattern using this website, it creates bulky but seemingly workable code:
import re
txt='No. 09-1159'
re1='(No)' # Word 1
re2='(\\.)' # Any Single Character 1
re3='( )' # White Space 1
re4='(\\d)' # Any Single Digit 1
re5='(\\d)' # Any Single Digit 2
re6='(-)' # Any Single Character 2
re7='(\\d)' # Any Single Digit 3
re8='(\\d)' # Any Single Digit 4
re9='(\\d)' # Any Single Digit 5
re10='(\\d)' # Any Single Digit 6
rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,re.IGNORECASE|re.DOTALL)
m = rg.search(txt)
name = m.group(0)
print name
When I manipulate that to fit the glob.glob structure, and make it like this:
import glob
import os
import re
re1='(No)' # Word 1
re2='(\\.)' # Any Single Character 1
re3='( )' # White Space 1
re4='(\\d)' # Any Single Digit 1
re5='(\\d)' # Any Single Digit 2
re6='(-)' # Any Single Character 2
re7='(\\d)' # Any Single Digit 3
re8='(\\d)' # Any Single Digit 4
re9='(\\d)' # Any Single Digit 5
re10='(\\d)' # Any Single Digit 6
rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,re.IGNORECASE|re.DOTALL)
for fname in glob.glob("\file\structure\here\*.txt"):
with open(fname) as f:
contents = f.read()
tname = rg.search(contents)
print tname
Then this prints out the byte location of the the pattern -- signifying that the regex pattern is correct. However, when I add in the nname = tname.group(0) line after the original tname = rg.search(contents) and change around the print function to reflect the change, it gives me the following error: AttributeError: 'NoneType' object has no attribute 'group'. When I tried copying and pasting #joaquin's code line for line, it came up with the same error. I was going to post this as a comment to the #spatz answer but I wanted to include so much code that this seemed to be a better way to express the `new' problem. Thank you all for the help so far.
Edit 2: This is for the #joaquin answer below:
import glob
import os
import re
for fname in glob.glob("/directory/structure/here/*.txt"):
with open(fname) as f:
contents = f.read()
tname = re.search('No\. (\d\d\-\d\d\d\d)', contents)
nname = tname.group(1)
print nname
Last Edit: I got it to work using mostly the code as written. What was happening is that there were some files that didn't have that regex expression so I assumed Python would skip them. Silly me. So I spent three days learning to write two lines of code (I know the lesson is more than that). I also used the error catching method recommended here. I wish I could check all of you as the answer, but I bothered #Joaquin the most so I gave it to him. This was a great learning experience. Thank you all for being so generous with your time. The final code is below.
import os
import re
pat3 = "No\. (\d\d-\d\d)"
ext = '.txt'
mydir = '/directory/files/here'
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(pat3, txt)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
else:
print '{} already exists, passing'.format(newpath)
Instead of providing you with some code which you will simply copy-paste without understanding, I'd like to walk you through the solution so that you will be able to write it yourself, and more importantly gain enough knowledge to be able to do it alone next time.
The code which does what you need is made up of three main parts:
Getting a list of all filenames you need to iterate
For each file, extract the information you need to generate a new name for the file
Rename the file from its old name to the new one you just generated
Getting a list of filenames
This is best achieved with the glob module. This module allows you to specify shell-like wildcards and it will expand them. This means that in order to get a list of .txt file in a given directory, you will need to call the function glob.iglob("/path/to/directory/*.txt") and iterate over its result (for filename in ...:).
Generate new name
Once we have our filename, we need to open() it, read its contents using read() and store it in a variable where we can search for what we need. That would look something like this:
with open(filename) as f:
contents = f.read()
Now that we have the contents, we need to look for the unique phrase. This can be done using regular expressions. Store the new filename you want in a variable, say newfilename.
Rename
Now that we have both the old and the new filenames, we need to simply rename the file, and that is done using os.rename(filename, newfilename).
If you want to move the files to a different directory, use os.rename(filename, os.path.join("/path/to/new/dir", newfilename). Note that we need os.path.join here to construct the new path for the file using a directory path and newfilename.
There is no checking or protection for failures (check is archpath is a file, if newpath already exists, if the search is succesful, etc...), but this should work:
import os
import re
pat = "No\. (\d\d\-\d\d\d\d)"
mydir = 'mydir'
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(pat, txt)
name = s.group(1)
newpath = os.path.join(mydir, name)
os.rename(archpath, newpath)
Edit: I tested the regex to show how it works:
>>> import re
>>> pat = "No\. (\d\d\-\d\d\d\d)"
>>> txt='nothing here or whatever No. 09-1159 you want, does not matter'
>>> s = re.search(pat, txt)
>>> s.group(1)
'09-1159'
>>>
The regex is very simple:
\. -> a dot
\d -> a decimal digit
\- -> a dash
So, it says: search for the string "No. " followed by 2+4 decimal digits separated by a dash.
The parentheses are to create a group that I can recover with s.group(1) and that contains the code number.
And that is what you get, before and after:
Text of files one.txt, two.txt and three.txt is always the same, only the number changes:
this is the first
file with a number
nothing here or whatever No. 09-1159 you want, does not matter
the number is
Create a backup of your files, then try something like this:
import glob
import os
def your_function_to_dig_out_filename(lines):
import re
# i'll let you attempt this yourself
for fn in glob.glob('/path/to/your/dir/*.txt'):
with open(fn) as f:
spam = f.readlines()
new_fn = your_function_to_dig_out_filename(spam)
if not os.path.exists(new_fn):
os.rename(fn, new_fn)
else:
print '{} already exists, passing'.format(new_fn)

f.read coming up empty

I'm doing all this in the interpreter..
loc1 = '/council/council1'
file1 = open(loc1, 'r')
at this point i can do file1.read() and it prints the file's contents as a string to standard output
but if i add this..
string1 = file1.read()
string 1 comes back empty.. i have no idea what i could be doing wrong. this seems like the most basic thing!
if I go on to type file1.read() again, the output to standard output is just an empty string. so, somehow i am losing my file when i try to create a string with file1.read()
You can only read a file once. After that, the current read-position is at the end of the file.
If you add file1.seek(0) before you re-read it, you should be able to read the contents again. A better approach, however, is to read into a string the first time and then keep it in memory:
loc1 = '/council/council1'
file1 = open(loc1, 'r')
string1 = file1.read()
print string1
You do not lose it, you just move offset pointer to the end of file and try to read some more data. Since it is the end of the file, no more data is available and you get empty string. Try reopening file or seeking to zero position:
f.read()
f.seek(0)
f.read()
Using with is the best syntax to use because it closes the connection to the file after using it(since python 2.5):
with open('/council/council1', 'r') as input_file:
text = input_file.read()
print(text)
To quote the official documentation on read():
To read a file’s contents, call f.read(size)
When size is omitted or negative, the entire contents of the file will
be read and returned;
And the most relevant part:
If the end of the file has been reached, f.read() will return an empty
string ('').
Which means that if you use read() twice consecutively, it is expected that the second time you'll get an empty string. Either store it the first time or use f.seek(0) to go back to the start. Together, they provide a lower level API to give you greater control.
Besides using a context manager to automatically open and close the file, there's another way to read a whole text file, using pathlib, example below:
#!/usr/bin/env python3
from pathlib import Path
txt_file = Path("myfile.txt")
try:
content = txt_file.read_text()
except FileNotFoundError:
print("Could not find file")
else:
print(f"The content is: {content}")
print(f"I can also read again: {txt_file.read_text()}")
As you can see, you can call read_text() several times and you'll get the full content, no surprises. Of course you wouldn't want to do that in production code since read_text() opens and closes the file each time, it's still best to store it. I could recommend pathlib highly when dealing with files and file paths.
It's outside the scope, but it may be worth noting a difference when reading line by line. Unlike the file object obtained by open(), PosixPath returned by Path() is not iterable. The equivalent of:
with open('file.txt') as f:
for line in f:
print(line)
Would be something like:
for line in Path('file.txt').read_text().split('\n'):
print(line)
One advantage of the first approach, with open, is that the entire file is not read into memory at once.
make sure your location is correct. Do you actually have a directory called /council under your root directory (/) ?. also use, os.path.join() to create your path
loc1 = os.path.join("/path","dir1","dir2")

Categories

Resources