Python 3: Pulling specific data from documents

Python 3: Pulling specific data from documents - python

I am new to python and using it for my internship. My goal is to pull specific data from about 100 .ls documents (all in the same folder) and then write it to another .txt file and from there import it into excel. My problem is I can read all the files, but cannot figure out how to pull the specifics from that file into a list. From the list I want to write them into a .txt file and then import to excel.
Is there anyway to read set readlines() to only capture certain lines?

It's hard to know exactly what you want without an example or sample code/content. What you might do is create a list and append the desired line to it.
result_list = [] # Create an empty list
with open("myfile.txt", "r") as f:
Lines = f.readlines() # read the lines of the file
for line in Lines: # loop through the lines
if "desired_string" in line:
result_list.append(line) # if the line contains the string, the line is added

Related

Delete last line from multiple docx files in python

I want to delete text of multiple docx file using python language.
Let's say the contents of a file are:
This is line 1
This is line 2
This is line 3
This is line 4
I want to delete the very last line only i.e. This is line 4.
I've tried many code but getting errors.
Try 1:
with open(r"FILE_PATH.docx", 'r+', errors='ignore') as fp:
# read an store all lines into list
lines = fp.readlines()
# move file pointer to the beginning of a file
fp.seek(0)
# truncate the file
fp.truncate()
# start writing lines except the last line
# lines[:-1] from line 0 to the second last line
fp.writelines(lines[:-1])
Above code runs with 0 errors but getting some loss of data in the docx file.
See the relevant screenshots here and here.

You will not get the correct lines from a docx using that method, a docx is not the like a text file. (If you use your current method on a txt file it will work).
Do this and you can see what you are removing:
with open(r"FILE_PATH.docx", 'r+', errors='ignore') as fp:
# read an store all lines into list
lines = fp.readlines()
print(lines[-1]) # or print(lines) to see all the lines
You are not removing This is line 4 you are removing a part of the docx file.
Although there are ways to read a docx without additional libraries, using something like docx2txt or textract might be easier.
There are other questions in stack overflow that address how to read and modify a docx, take a look and you will find a way to adapt your code if a docx is still what you want to work with.

Grab Specific elements from a List after reading a file

I am using Python and I have a text file with results from a previous complex code. It wrote to a file called 'results' structured by:
xml file name.xml
['chebi:28726', 'chebi:27466', 'chebi:27721', 'chebi:15532', 'chebi:15346']
xml file name.xml
['chebi:27868', 'chebi:27668', 'chebi:15471', 'chebi:15521', 'chebi:15346']
xml file name.xml
['chebi:28528', 'chebi:28325', 'chebi:10723', 'chebi:28493', 'chebi:15346']
etc...
my current code is:
file = open("results.txt", "r")
data = file.readlines()
for a in data:
print(a)
The problem is I want to grab the specific elements within that list, for example chebi:28528, and convert them from their current compounds into a different format. I wrote the code for this conversion already, but am having trouble with the step before the actual conversion of the compounds.
The problem is that I need to be able to loop through the file and select each element from that list but I am unable to do so.
If i do
for a in data:
for b in a:
It selects each individual character and not the entire word (chebi:28528).
Is there a way I can loop through the text file and grab just the specific Chebi compounds so that I can then convert them into a different format needed? Python is treating the entire list of compounds as 1 elements, and indexing within that list will just correspond to a character rather than the compound.

So assuming that your file is as above, it looks like you have lists in raw test format. You can loop on those word elements by converting them to Python lists using ast or something similar.
You had the right ideas but you're looping through characters actually. How about this?
import ast
with open('results.txt', 'r') as f:
data = f.readlines()
for line in data:
if '[' not in line:
continue
ls = ast.literal_eval(line)
for word in ls:
if 'chebi' in word:
process_me(word)

Python reading nothing from file [duplicate]

I am a beginner of Python. I am trying now figuring out why the second 'for' loop doesn't work in the following script. I mean that I could only get the result of the first 'for' loop, but nothing from the second one. I copied and pasted my script and the data csv in the below.
It will be helpful if you tell me why it goes in this way and how to make the second 'for' loop work as well.
My SCRIPT:
import csv
file = "data.csv"
fh = open(file, 'rb')
read = csv.DictReader(fh)
for e in read:
print(e['a'])
for e in read:
print(e['b'])
"data.csv":
a,b,c
tree,bough,trunk
animal,leg,trunk
fish,fin,body

The csv reader is an iterator over the file. Once you go through it once, you read to the end of the file, so there is no more to read. If you need to go through it again, you can seek to the beginning of the file:
fh.seek(0)
This will reset the file to the beginning so you can read it again. Depending on the code, it may also be necessary to skip the field name header:
next(fh)
This is necessary for your code, since the DictReader consumed that line the first time around to determine the field names, and it's not going to do that again. It may not be necessary for other uses of csv.
If the file isn't too big and you need to do several things with the data, you could also just read the whole thing into a list:
data = list(read)
Then you can do what you want with data.

I have created small piece of function which doe take path of csv file read and return list of dict at once then you loop through list very easily,
def read_csv_data(path):
"""
Reads CSV from given path and Return list of dict with Mapping
"""
data = csv.reader(open(path))
# Read the column names from the first line of the file
fields = data.next()
data_lines = []
for row in data:
items = dict(zip(fields, row))
data_lines.append(items)
return data_lines
Regards

Writing results into a .txt file

I created a code to take two .txt files, compare them and export the results to another .txt file. Below is my code (sorry about the mess).
Any ideas? Or am I just an imbecile?
Using python 3.5.2:
# Barcodes Search (V3actual)
# Import the text files, putting them into arrays/lists
with open('Barcodes1000', 'r') as f:
barcodes = {line.strip() for line in f}
with open('EANstaging1000', 'r') as f:
EAN_staging = {line.strip() for line in f}
##diff = barcodes ^ EAN_staging
##print (diff)
in_barcodes_but_not_in_EAN_staging = barcodes.difference(EAN_staging)
print (in_barcodes_but_not_in_EAN_staging)
# Exporting in_barcodes_but_not_in_EAN_staging to a .txt file
with open("BarcodesSearch29_06_16", "wt") as BarcodesSearch29_06_16: # Create .txt file
BarcodesSearch29_06_16.write(in_barcodes_but_not_in_EAN_staging) # Write results to the .txt file

From the comments to your question, it sounds like your issue is that you want to save your list of strings as a file. File.write expects a single string as input, while File.writelines expects a list of strings, which is what your data appears to be.
with open("BarcodesSearch29_06_16", "wt") as BarcodesSearch29_06_16:
BarcodesSearch29_06_16.writelines(in_barcodes_but_not_in_EAN_staging)
That will iterate through your list in_barcodes_but_not_in_EAN_staging, and write each element as a separate line in the file BarcodesSearch29_06_16.

Try BarcodesSearch29_06_16.write(str(in_barcodes_but_not_in_EAN_staging)). Also, you'll want to close the file after you're done writing to it with BarcodesSearch29_06_16.close().

How to import string extracted from a file in python?

I have one issue over here in file importing data from text file using python.
I have data like this in my file.
{1:F05ABCDRPRAXXX0000000000}{2:I1230AGRIXXPRXXXXN}{4:
:20:1234567980
:25:AB123465789013246578900000000000
:28c:110/1123156
-}
So from above data I want to fetch data after {4: and line by line like first line is :20:1234567980 and so on.
I want to split data using regular expression So if any python expert have idea how make regular expression for this so provide in answer it will help.
Thank you

If you want to get the lines in a file use
lines = list()
with open("yourfiile.txt") as f:
for line in f:
lines.append(line)
lines.pop(0) #remove the first line (which ends with "{4:")
#do what you want with list of lines

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3: Pulling specific data from documents - python

Related

Delete last line from multiple docx files in python

Grab Specific elements from a List after reading a file

Python reading nothing from file [duplicate]

Writing results into a .txt file

How to import string extracted from a file in python?

Categories

Resources