pickle is not working in a proper way - python

import nltk
import pickle
input_file=open('file.txt', 'r')
input_datafile=open('newskills1.txt', 'r')
string=input_file.read()
fp=(input_datafile.read().splitlines())
def extract_skills(string):
skills=pickle.load(fp)
skill_set=[]
for skill in skills:
skill= ''+skill+''
if skill.lower() in string:
skill_set.append(skill)
return skill_set
if __name__ == '__main__':
skills= extract_skills(string)
print(skills)
I want to print the skills from file but, here pickle is not working.
It shows the error:
_pickle.UnpicklingError: the STRING opcode argument must be quoted

The file containing the pickled data must be written and read as a binary file. See the documentation for examples.
Your extraction function should look like:
def extract_skills(path):
with open(path, 'rb') as inputFile:
skills = pickle.load(inputFile)
Of course, you will need to dump your data into a file open as binary as well:
def save_skills(path, skills):
with open(path, 'wb') as outputFile:
pickle.dump(outputFile, skills)
Additionally, the logic of your main seems a bit flawed.
While the code that follows if __name__ == '__main__' is only executed when the script is run as main module, the code that is not in the main should only be static, ie definitions.
Basically, your script should not do anything, unless run as main.
Here is a cleaner version.
import pickle
def extract_skills(path):
...
def save_skills(path, skills):
...
if __name__ == '__main__':
inputPath = "skills_input.pickle"
outputPath = "skills_output.pickle"
skills = extract_skills(inputPath)
# Modify skills
save_skills(outputPath, skills)

Related

Applying python script to file

This is a simple ask but I can't find any information on how to do it: I have a python script that is designed to take in a text file of a specific format and perform functions on it--how do I pipe a test file into the python script such that it is recognized as input()? More specifically, the Python is derived from skeleton code I was given that looks like this:
def main():
N = int(input())
lst = [[int(i) for i in input().split()] for _ in range(N)]
intervals = solve(N, lst)
print_solution(intervals)
if __name__ == '__main__':
main()
I just need to understand how to, from the terminal, input one of my test files to this script (and see the print_solution output)
Use the fileinput module
input.txt
...input.txt contents
script.py
#!/usr/bin/python3
import fileinput
def main():
for line in fileinput.input():
print(line)
if __name__ == '__main__':
main()
pipe / input examples:
$ cat input.txt | ./script.py
...input.txt contents
$ ./script.py < input.txt
...input.txt contents
You can take absolute or relative path in your input() function and then open this path via open()
filename = input('Please input absolute filename: ')
with open(filename, 'r') as file:
# Do your stuff
Please let me know if I misunderstood your question.
You can either:
A) Use sys.stdin (import sys at the top of course)
or
B) Use the ArgumentParser (from argparse import ArgumentParser) and pass the file as an argument.
Assuming A it would look something like this:
python script.py < file.extension
Then in the script it would look like:
fData = []
for line in sys.stdin.readLines():
fData.append(line)
# manipulate fData
There are a number of ways to achieve what you want. This is what I came up with off the top of my head. It may not be the best / efficient way, but it should work. I do a lot of file I/O with python at work and this is one of the ways I've achieved it in the past.
Note: If you want to write the manipulated lines back to the file use the argparse library.
Edit:
from argparse import ArgumentParser
def parseInput():
parser = ArgumentParser(description = "Takes input file to read")
parser.add_argument('-f', type = str, default = None, required =
True, help = "File to perform I/O on")
args = parser.parse_args()
return args
def main():
args = parseInput()
fData = []
# perform rb
with open(args.f, 'r') as f:
for line in f.readlines():
fData.append(line)
# Perform data manipulations
# perform wb
with open(args.f, 'w') as f:
for line in fData:
f.write(line)
if __name__ == "__main__":
main()
Then on command line it would look like:
python yourScript.py -f fileToInput.extension

How to download PDF files from a list of URLs in Python?

I have a big list of links to PDF files that I need to download (500+) and I was trying to make a program to download them all because I don't want to manually do them.
This is what I have and when I try to run it, the console just opens up and closes.
import wget
def main():
f = open("list.txt", "r")
f1 = f.readlines()
for x in f1:
wget.download(x, 'C:/Users/ALEXJ/OneDrive/Desktop/Books')
print("Downloaded" + x)
The problem is that you are defining the function main() but you are not calling it anywhere else.
Here is a complete example to achieve what you want:
import wget
def main():
books_folder = 'C:/Users/ALEXJ/OneDrive/Desktop/Books'
books_list = 'list.txt'
with open(books_list) as books:
for book in books:
wget.download(book.strip(), books_folder)
print('Downloaded', book)
if __name__ == '__main__':
main()
Make sure you add the function call at the end of your script, is good practice to use the if __name__ == '__main__': before the code of code you want to execute (although is not mandatory it will help so if you import this file into another your code will not get executed without your knowledge)
if __name__ == '__main__':
main()

Using pytest to ensure a file is created and written to

I'm using pytest and want to test that a function writes some content to a file. So I have writer.py which includes:
MY_DIR = '/my/path/'
def my_function():
with open('{}myfile.txt'.format(MY_DIR), 'w+') as file:
file.write('Hello')
file.close()
I want to test /my/path/myfile.txt is created and has the correct content:
import writer
class TestFile(object):
def setup_method(self, tmpdir):
self.orig_my_dir = writer.MY_DIR
writer.MY_DIR = tmpdir
def teardown_method(self):
writer.MY_DIR = self.orig_my_dir
def test_my_function(self):
writer.my_function()
# Test the file is created and contains 'Hello'
But I'm stuck with how to do this. Everything I try, such as something like:
import os
assert os.path.isfile('{}myfile.txt'.format(writer.MYDIR))
Generates errors which lead me to suspect I'm not understanding or using tmpdir correctly.
How should I test this? (If the rest of how I'm using pytest is also awful, feel free to tell me that too!)
I've got a test to work by altering the function I'm testing so that it accepts a path to write to. This makes it easier to test. So writer.py is:
MY_DIR = '/my/path/'
def my_function(my_path):
# This currently assumes the path to the file exists.
with open(my_path, 'w+') as file:
file.write('Hello')
my_function(my_path='{}myfile.txt'.format(MY_DIR))
And the test:
import writer
class TestFile(object):
def test_my_function(self, tmpdir):
test_path = tmpdir.join('/a/path/testfile.txt')
writer.my_function(my_path=test_path)
assert test_path.read() == 'Hello'

pass the contents of a file as a parameter in python

I am trying to download the files based on their ids. How can I download the files if i have their IDS stored in a text file. Here's what I've done so far
import urllib2
#code to read a file comes here
uniprot_url = "http://www.uniprot.org/uniprot/" # constant Uniprot Namespace
def get_fasta(id):
url_with_id = "%s%s%s" %(uniprot_url, id, ".fasta")
file_from_uniprot = urllib2.urlopen(url_with_id)
data = file_from_uniprot.read()
get_only_sequence = data.replace('\n', '').split('SV=')[1]
length_of_sequence = len(get_only_sequence[1:len(get_only_sequence)])
file_output_name = "%s%s%s%s" %(id,"_", length_of_sequence, ".fasta")
with open(file_output_name, "wb") as fasta_file:
fasta_file.write(data)
print "completed"
def main():
# or read from a text file
input_file = open("positive_copy.txt").readlines()
get_fasta(input_file)
if __name__ == '__main__':
main()
.readlines() returns a list of lines in file.
According to an oficial documentation you can also amend it
For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code.
So I guess your code may be rewritten in this way
with open("positive_copy.txt") as f:
for id in f:
get_fasta(id.strip())
You can read more about with keyword in PEP-343 page.

user created log files

I am getting a TypeError: object of type file' has no len()
I have traced down the issue to the path established upon execution.
What am I missing to correct this error found within the "savePath" deceleration or usage within the "temp = os.path.join(savePath, files)"?
def printTime(time):
savePath = "C:\Users\Nicholas\Documents"
files = open("LogInLog.txt", "a")
temp = os.path.join(savePath, files)
files.write("A LogIn occured.")
files.write(time)
print files.read
files.close
main()
The whole program is below for reference:
from time import strftime
import os.path
def main():
getTime()
def getTime():
time = strftime("%Y-%m-%d %I:%M:%S")
printTime(time)
def printTime(time):
savePath = "C:\Users\Nicholas\Documents"
files = open("LogInLog.txt", "a")
temp = os.path.join(savePath, files)
files.write("A LogIn occured.")
files.write(time)
print files.read
files.close
main()
Here's a working version:
from time import strftime
import os.path
def main():
getTime()
def getTime():
time = strftime("%Y-%m-%d %I:%M:%S")
printTime(time)
def printTime(time):
savePath = "C:\Users\Nicholas\Documents"
logFile = "LogInLog.txt"
files = open(os.path.join(savePath, logFile), "a+")
openPosition = files.tell()
files.write("A LogIn occured.")
files.write(time)
files.seek(openPosition)
print(files.read())
files.close()
if __name__ == '__main__':
main()
There were a few problems with the code snippet posted in the question:
Two import statements were concatenated together. Each should be on a separate line.
The os.path.join function doesn't work on an open filehandle.
The read() and close() methods were missing parens.
If the intent is to read what is written in append mode, it's necessary to get the current file position via tell() and seek() to that position after writing to the file.
While it's legal to call main() without any conditional check, it's usually best to make sure the module is being called as a script as opposed to being imported.

Categories

Resources