Extract specific portion of a text file in python 3.x - python

How do I make the if statement to read from a specific location of a text file and stop at a specific point and then print it out. for example, printing out one patient's data, not all the list. beginner programmer here. thank you
ID = input("please enter a refernce id to search for the patient : ")
info = open("data.txt", 'r')
if ID in info:
# This should return only one patient's information not all the text file
else:
print("not in file")
info.close()

We would need to know the specific details of how the file is formatted to give an exact answer, but here is one way that may be helpful.
Firstly, your 'info' is right now just a TextIOWrapper object. You can tell by running print(type(info)). You need to make it info = open('data.txt', 'r').read() to give you a string of the text, or info = open('data.txt', 'r').readlines() to give you a list of the text by line, if the format is just plain text.
Assuming the data looks something like this:
Patient: Charlie
Age = 99
Description: blah blah blah
Patient: Judith
Age: 100
Description: blah blah blahs
You can do the following:
First, find and store the index of the ID you are looking for. Secondly, find and store the index of some string that denotes a new ID. In this case, that's the word 'Patient'. Lastly, return the string between those two indices.
Example:
ID = input("please enter a reference id to search for the patient: ")
info = open("data.txt", 'r').read()
if ID in info:
#find() returns the beginning index of a string
f = info.find(ID)
goods = info[f:]
l = goods.find('Patient')
goods = goods[:l]
print(goods)
else:
print("not in file")
Something along those lines should do the trick. There are probably better ways depending on the structure of the file. Things can go wrong if the user input is not specific enough, or the word patient is scattered in the descriptions, but the idea remains the same. You should do some error handling for the input, as well. I hope that helps! Good luck with your project.

Related

How to read off of a specific line in a text file using python

Looking to have my code read one text file and store the line number of a user input as num and then use the variable num to read the same line on another file.
currently, the code for the first step of reading the first text file is working and has been tested but the second part doesn't display anything after being executed. I have changed multiple things but am still stuck. Help would be much appreciated.
here is my code:
print("Check Stock")
ca = input("Check all barcodes?")
if ca == "y":
for x in range(0,5):
with open ("stockbarcodes.txt") as f:
linesa = f.readlines()
print(linesa[x])
with open ("stockname.txt") as f:
linesb = f.readlines()
print(linesb[x])
print(" ")
else:
bc = input("Scan barcode: ")
f1 = open ("stockname.txt")
for num, line in enumerate(f1, 1):
if bc in line:
linesba = f1.readlines()
print(linesba[num])
As user Ikriemer points, it seems that you want to retrieve the stock name based on the barcode. For that kind of task you rather create a normalized Data Base, which discribes Entities, Properties and relationships. As you can se here there are a lot of things to take into account.
This code was tested on Mac OS, but considering OP's comment (who seems to be using windows), it is ok if the dtype is not specified.
Considering that the above solution may not be as quick as you like, you also have two options.
First option
As I can not check the content of your example files, the strategy that you show in your code makes me believe that your assuming both files are ordered, in a way that first line of the barcode file corresponds to first item in the stock name file. Given that, you can query the index of an element (barcode) in an array like data structure, and retrieve the element of another array (name) stored in the same position. Code below:
import numpy as np
print("Check Stock")
ca = input("Check all barcodes? (y/n): ")
if ca == "y":
for x in range(0, 5):
with open("stockbarcodes.txt") as f:
linesa = f.readlines()
print(linesa[x], sep="")
with open("stockname.txt") as f:
linesb = f.readlines()
print(linesb[x], sep="")
print(" ")
else:
try:
codes = np.genfromtxt("stockbarcodes.txt").tolist()
names = np.genfromtxt("stockname.txt", dtype=np.str).tolist()
bc = input("Scan barcode: ")
index = codes.index(int(bc))
print(names[index])
except IndexError:
print("Bar code {} not found".format(bc))
Second option
This option could be considered a workaround method to a data base like file. You need to store your data in some way that you can search the values associated with an specific entry. Such kind of tasks could be done with a dictionary. Just replace the else clause with this:
else:
try:
codes = np.genfromtxt("stockbarcodes.txt").tolist()
names = np.genfromtxt("stockname.txt", dtype=np.str).tolist()
table = {k: v for k, v in zip(codes, names)}
bc = input("Scan barcode: ")
print(table[int(bc)])
except KeyError:
print("Bar code {} not found".format(bc))
Again, in the dictionary comprehension we are assuming both files are ordered. I strongly suggest you to validate this assumption, to warranty that the first bar code corresponds to the first stock, second to second, and so on. Only after that, you may like to store the dictionary as a file, so you can load it and query it as you please. Check this answer fot that purpose.

String Cutting with multiple lines

so i'm new to python besides some experience with tKintner (some GUI experiments).
I read an .mbox file and copy the plain/text in a string. This text contains a registering form. So a Stefan, living in Maple Street, London working for the Company "MultiVendor XXVideos" has registered with an email for a subscription.
Name_OF_Person: Stefan
Adress_HOME: London, Maple
Street
45
Company_NAME: MultiVendor
XXVideos
I would like to take this data and put in a .csv row with column
"Name", "Adress", "Company",...
Now i tried to cut and slice everything. For debugging i use "print"(IDE = KATE/KDE + terminal... :-D ).
Problem is, that the data contains multiple lines after keywords but i only get the first line.
How would you improve my code?
import mailbox
import csv
import email
from time import sleep
import string
fieldnames = ["ID","Subject","Name", "Adress", "Company"]
searchKeys = [ 'Name_OF_Person','Adress_HOME','Company_NAME']
mbox_file = "REG.mbox"
export_file_name = "test.csv"
if __name__ == "__main__":
with open(export_file_name,"w") as csvfile:
writer = csv.DictWriter(csvfile, dialect='excel',fieldnames=fieldnames)
writer.writeheader()
for message in mailbox.mbox(mbox_file):
if message.is_multipart():
content = '\n'.join(part.get_payload() for part in message.get_payload())
content = content.split('<')[0] # only want text/plain.. Ill split #right before HTML starts
#print content
else:
content = message.get_payload()
idea = message['message-id']
sub = message['subject']
fr = message['from']
date = message['date']
writer.writerow ('ID':idea,......) # CSV writing will work fine
for line in content.splitlines():
line = line.strip()
for pose in searchKeys:
if pose in line:
tmp = line.split(pose)
pmt = tmp[1].split(":")[1]
if next in line !=:
print pose +"\t"+pmt
sleep(1)
csvfile.closed
OUTPUT:
OFFICIAL_POSTAL_ADDRESS =20
Here, the lines are missing..
from file:
OFFICIAL_POSTAL_ADDRESS: =20
London, testarossa street 41
EDIT2:
#Yaniv
Thank you, iam still trying to understand every step, but just wanted to give a comment. I like the idea to work with the list/matrix/vector "key_value_pairs"
The amount of keywords in the emails is ~20 words. Additionally, my values are sometimes line broken by "=".
I was thinking something like:
Search text for Keyword A,
if true:
search text from Keyword A until keyword B
if true:
copy text after A until B
Name_OF_=
Person: Stefan
Adress_
=HOME: London, Maple
Street
45
Company_NAME: MultiVendor
XXVideos
Maybe the HTML from EMAIL.mbox is easier to process?
<tr><td bgcolor=3D"#eeeeee"><font face=3D"Verdana" size=3D"1">
<strong>NAM=
E_REGISTERING_PERSON</strong></font></td><td bgcolor=3D"#eeeeee"><font
fac=e=3D"Verdana" size=3D"1">Stefan </font></td></tr>
But the "=" are still there
should i replace ["="," = "] with "" ?
I would go for a "routine" parsing loop over the input lines, and maintain a current_key and current_value variables, as a value for a certain key in your data might be "annoying", and spread across multiple lines.
I've demonstrated such parsing approach in the code below, with some assumptions regarding your problem. For example, if an input line starts with a whitespace, I assumed it must be the case of such "annoying" value (spread across multiple lines). Such lines would be concatenated into a single value, using some configurable string (the parameter join_lines_using_this). Another assumption is that you might want to strip whitespaces from both keys and values.
Feel free to adapt the code to fit your assumptions on the input, and raise Exceptions whenever they don't hold!
# Note the usage of .strip() in some places, to strip away whitespaces. I assumed you might want that.
def parse_funky_text(text, join_lines_using_this=" "):
key_value_pairs = []
current_key, current_value = None, ""
for line in text.splitlines():
line_split = line.split(':')
if line.startswith(" ") or len(line_split) == 1:
if current_key is None:
raise ValueError("Failed to parse this line, not sure which key it belongs to: %s" % line)
current_value += join_lines_using_this + line.strip()
else:
if current_key is not None:
key_value_pairs.append((current_key, current_value))
current_key, current_value = None, ""
current_key = line_split[0].strip()
# We've just found a new key, so here you might want to perform additional checks,
# e.g. if current_key not in sharedKeys: raise ValueError("Encountered a weird key?! %s in line: %s" % (current_key, line))
current_value = ':'.join(line_split[1:]).strip()
# Don't forget the last parsed key, value
if current_key is not None:
key_value_pairs.append((current_key, current_value))
return key_value_pairs
Example usage:
text = """Name_OF_Person: Stefan
Adress_HOME: London, Maple
Street
45
Company_NAME: MultiVendor
XXVideos"""
parse_funky_text(text)
Will output:
[('Name_OF_Person', 'Stefan'), ('Adress_HOME', 'London, Maple Street 45'), ('Company_NAME', 'MultiVendor XXVideos')]
You indicate in the comments that your input strings from the content should be relatively consistent. If that is the case, and you want to be able to split that string across multiple lines, the easiest thing to do would be to replace \n with spaces and then just parse the single string.
I've intentionally constrained my answer to using just string methods rather than inventing a huge function to do this. Reason: 1) Your process is already complex enough, and 2) your question really boils down to how to process the string data across multiple lines. If that is the case, and the pattern is consistent, this will get this one off job done
content = content.replace('\n', ' ')
Then you can split on each of the boundries in your consistently structured headers.
content = content.split("Name_OF_Person:")[1] #take second element of the list
person = content.split("Adress_HOME:")[0] # take content before "Adress Home"
content = content.split("Adress_HOME:")[1] #take second element of the list
address = content.split("Company_NAME:")[0] # take content before
company = content.split("Adress_HOME:")[1] #take second element of the list (the remainder) which is company
Normally, I would suggest regex. (https://docs.python.org/3.4/library/re.html). Long term, if you need to do this sort of thing again, regex is going to pay dividends on time spend munging data. To make a regex function "cut" across multiple lines, you would use the re.MULTILINE option. So it might endup looking something like re.search('Name_OF_Person:(.*)Adress_HOME:', html_reg_form, re.MULTILINE)

picking a file and reading the words from it python

I need help with this, I'm a total beginner at python. my assignment is to create a program that has the user pick a category, then scramble words from a file that are in that category. I just want to figure out why this first part isn't working, the first part being the first of four different methods that run depending on which category the user picks.
print ("Instructions: Enter your chosen category, animals, places, names or colors.")
viewYourFile = input("Enter your category")
category = 'animals'
if category == 'animals':
animals = open('animals.txt')
next = animals.read(1)
while next != "":
animal1 = animals.read(1)
animal2 = animals.read(2)
animal3 = animals.read(3)
animal4 = animals.read(4)
animal5 = animals.read(5)
animalList = ['animal1', 'animal2', 'animal3', 'animal4', 'animal5']
chosenAnimal = random.choice(animalList)
animalLetters = list(chosenAnimal)
random.shuffle(animalLetters)
scrambledAnimal = ' '.join(animalLetters)
print(scrambledAnimal)
print("Enter the correct spelling of the word")
The first problem is that you're reading only 1-5 letters from the file.
Please read the (documentation)[https://docs.python.org/2/tutorial/inputoutput.html] on how the read function works. The number in parentheses is how many bytes you want to read.
You may want a simpler solution, such as reading the entire file and splitting it into words. This would look something like:
file_contents = animals.read()
animalList = file_contents.split()
If split is new to you, then (look up)[https://docs.python.org/2/library/string.html] that method as well.
The next problem is that you've set your animal list to literal strings, rather than the input values you read. I think you want the line to read:
animalList = [animal1, animal2, animal3, animal4, animal5]

Python row.get(regex)

I have been searching for this for a few hours now and I can't seem to find anything about it.
I am parsing a .csv file and I need to pull email addresses out of it. The headers indicate whether it is an email or not, but it is possible for the files to have different formats and I want to handle this with a regex, but I don't know how to do this.
A part of my code is below:
input_file = csv.DictReader(open("contacts.csv"))
for row in input_file:
if row.get('E-mail 1 - Value'):
print row.get('E-mail 1 - Value')
elif row.get('E-mail 2 - Value'):
print row.get('E-mail 2 - Value')
I would like to be able to do something like this:
input_file = csv.DictReader(open("contacts.csv"))
for row in input_file:
if row.get('E-mail*'):
print row.get('E-mail*')
Where it will grab anything that has a header starting with email, but I can't seem to figure out how. I tried to use re.search, but it wasn't what I needed since I don't know how to give it the input string. Thanks in advance for any help!
William
Here is an example that prints all the contents of the columns that start with "E-mail" for every row:
input_file = csv.DictReader(open("contacts.csv"))
for row in input_file:
for column in row.keys():
if column.startswith("E-mail"):
print row[column]

Using variables in a reg-ex

So I matched (with the help of kind contributors on stack overflow) the item number in:
User Number 1 will probably like movie ID: RecommendedItem[item:557, value:7.32173]the most!
Now I'm trying to extract the corresponding name from another text file using the item number. Its contents look like:
557::Voyage to the Bottom of the Sea (1961)::Adventure|Sci-Fi
For some reason I'm just coming up with 'None' on terminal. No matches found.
myfile = open('result.txt', 'r')
myfile2 = open('movies.txt', 'r')
content = myfile2.read()
for line in myfile:
m = re.search(r'(?<=RecommendedItem\[item:)(\d+)',line)
n = re.search(r'(?<=^'+m.group(0)+'\:\:)(\w+)',content)
print n
I'm not sure if I can use a variable in a look behind assertion..
Really appreciate all the help I'm getting here!
EDIT: Turns out the only problem was the unneeded caret symbol in the second regular-expression.
Here, once you've found the number, you use a 'old style' (could equally use .format if you so desired) string format to put it into the regular expression. I thought it'd be nice to access the values via a dictionary hence the named matches, you could do it without this though. To get the a list of genres, just .split("|") the string under suggestionDict["Genres"].
import re
num = 557
suggestion="557::Voyage to the Bottom of the Sea (1961)::Adventure|Sci-Fi"
suggestionDict = re.search(r'%d::(?P<Title>[a-zA-Z0-9 ]+)\s\((?P<Date>\d+)\)::(?P<Genres>[a-zA-Z1-9|]+)' % num, suggestion).groupdict()
#printing to show if it works/doesn't
print('\n'.join(["%s:%s" % (k,d) for k,d in suggestionDict.items()]))
#clearer example of how to use
print("\nCLEAR EXAMPLE:")
print(suggestionDict["Title"])
Prodcuing
Title:Voyage to the Bottom of the Sea
Genres:Adventure|Sci
Date:1961
CLEAR EXAMPLE:
Voyage to the Bottom of the Sea
>>>

Categories

Resources