I am trying to download the files based on their ids. How can I download the files if i have their IDS stored in a text file. Here's what I've done so far
import urllib2
#code to read a file comes here
uniprot_url = "http://www.uniprot.org/uniprot/" # constant Uniprot Namespace
def get_fasta(id):
url_with_id = "%s%s%s" %(uniprot_url, id, ".fasta")
file_from_uniprot = urllib2.urlopen(url_with_id)
data = file_from_uniprot.read()
get_only_sequence = data.replace('\n', '').split('SV=')[1]
length_of_sequence = len(get_only_sequence[1:len(get_only_sequence)])
file_output_name = "%s%s%s%s" %(id,"_", length_of_sequence, ".fasta")
with open(file_output_name, "wb") as fasta_file:
fasta_file.write(data)
print "completed"
def main():
# or read from a text file
input_file = open("positive_copy.txt").readlines()
get_fasta(input_file)
if __name__ == '__main__':
main()
.readlines() returns a list of lines in file.
According to an oficial documentation you can also amend it
For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code.
So I guess your code may be rewritten in this way
with open("positive_copy.txt") as f:
for id in f:
get_fasta(id.strip())
You can read more about with keyword in PEP-343 page.
Related
I've written a script in python which is able to fetch the title of different posts from a webpage and write them to a csv file. As the site updates it's content very frequently, I like to append the new result first in that csv file where there are already list of old titles available.
I've tried with:
import csv
import time
import requests
from bs4 import BeautifulSoup
url = "https://stackoverflow.com/questions/tagged/python"
def get_information(url):
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
for title in soup.select(".summary .question-hyperlink"):
yield title.text
if __name__ == '__main__':
while True:
with open("output.csv","a",newline="") as f:
writer = csv.writer(f)
writer.writerow(['posts'])
for items in get_information(url):
writer.writerow([items])
print(items)
time.sleep(300)
The above script which when run twice can append the new results after the old results.
Old data are like:
A
F
G
T
New data are W,Q,U.
The csv file should look like below when I rerun the script:
W
Q
U
A
F
G
T
How can I append the new result first in an existing csv file having old data?
Inserting data anywhere in a file except at the end requires rewriting the whole thing. To do this without reading its entire contents into memory first, you could create a temporary csv file with the new data in it, append the data from the existing file to that, delete the old file and rename the new one.
Here's and example of what I mean (using a dummy get_information() function to simplify testing).
import csv
import os
from tempfile import NamedTemporaryFile
url = 'https://stackoverflow.com/questions/tagged/python'
csv_filepath = 'updated.csv'
# For testing, create a existing file.
if not os.path.exists(csv_filepath):
with open(csv_filepath, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows([item] for item in 'AFGT')
# Dummy for testing.
def get_information(url):
for item in 'WQU':
yield item
if __name__ == '__main__':
folder = os.path.abspath(os.path.dirname(csv_filepath)) # Get dir of existing file.
with NamedTemporaryFile(mode='w', newline='', suffix='.csv',
dir=folder, delete=False) as newf:
temp_filename = newf.name # Save filename.
# Put new data into the temporary file.
writer = csv.writer(newf)
for item in get_information(url):
writer.writerow([item])
print([item])
# Append contents of existing file to new one.
with open(csv_filepath, 'r', newline='') as oldf:
reader = csv.reader(oldf)
for row in reader:
writer.writerow(row)
print(row)
os.remove(csv_filepath) # Delete old file.
os.rename(temp_filename, csv_filepath) # Rename temporary file.
Since you intend to change the position of every element of the table, you need to read the table into memory and rewrite the entire file, starting with the new elements.
You may find it easier to (1) write the new element to a new file, (2) open the old file and append its contents to the new file, and (3) move the new file to the original (old) file name.
I'm farily new to python and I'm currently stuck when trying to improve my script. I have a script that performs a lot of operations using selenium to automate a manual task. The scripts opens two pages, searches for an email, fetches data from that page and sends it to another tab. I need help to feed the script a textfile containing a list of email addresses, one line at a time and using each line to search the webpage. What I need is the following:
Open file "test.txt"
Read first line in text file and store this value for use in another function.
perform function which uses line from text file as its input value.
Add "Completed" behind the first line in the text file before moving to the next
Move to and read next line i text file, store as variable and repeat from step 3.
I'm not sure how I can do this.
Here is a snippet of my code at the time:
def fetchEmail():
fileName = input("Filename: ")
fileNameExt = fileName + ".txt" #to make sure a .txt extension is used
line = f.readline()
for line in f:
print(line) # <-- How can I store the value here for use later?
break
def performSearch():
emailSearch = driver.find_element_by_id('quicksearchinput')
emailSearch.send_keys(fetchEmail, Keys.RETURN) <--- This is where I want to be able to paste current line for everytime function is called.
return main
I would appreciate any help how I can solve this.
It's a bit tricky to diagnose your particular issue, since you don't actually provide real code. However, probably one of the following will help you:
Return the list of all lines from fetchEmail, then search for all of them in send_keys:
def fetchEmail():
fileName = input("Filename: ")
fileNameExt = fileName + ".txt"
with open(fileNameExt) as f:
return f.read().splitlines()
def performSearch():
emailSearch = driver.find_element_by_id('quicksearchinput')
emailSearch.send_keys(fetchEmail(), Keys.RETURN)
# ...
Yield them one at a time, and look for them individually:
def fetchEmail():
fileName = input("Filename: ")
fileNameExt = fileName + ".txt"
with open(fileNameExt) as f:
for line in f:
yield line.strip()
def performSearch():
emailSearch = driver.find_element_by_id('quicksearchinput')
for email in fetchEmail():
emailSearch.send_keys(email, Keys.RETURN)
# ...
I don't recommend using globals, there should be a better way to share information between functions (such as having both of these in a class instance, or having one function call the other like I show above), but here would be an example of how you could save the value when the first function gets called, and retrieve the results in the second function at an arbitrary later time:
emails = []
def fetchEmail():
global emails
fileName = input("Filename: ")
fileNameExt = fileName + ".txt"
with open(fileNameExt) as f:
emails = f.read().splitlines()
def performSearch():
emailSearch = driver.find_element_by_id('quicksearchinput')
emailSearch.send_keys(emails, Keys.RETURN)
# ...
I have several files and I would like to read those files, filter some keywords and write them into different files. I use Process() and it turns out that it takes more time to process the readwrite function.
Do I need to separate the read and write to two functions? How I can read multiple files at one time and write key words in different files to different csv?
Thank you very much.
def readwritevalue():
for file in gettxtpath(): ##gettxtpath will return a list of files
file1=file+".csv"
##Identify some variable
##Read the file
with open(file) as fp:
for line in fp:
#Process the data
data1=xxx
data2=xxx
....
##Write it to different files
with open(file1,"w") as fp1
print(data1,file=fp1 )
w = csv.writer(fp1)
writer.writerow(data2)
...
if __name__ == '__main__':
p = Process(target=readwritevalue)
t1 = time.time()
p.start()
p.join()
Want to edit my questions. I have more functions to modify the csv generated by the readwritevalue() functions.
So, if Pool.map() is fine. Will it be ok to change all the remaining functions like this? However, it seems that it did not save much time for that.
def getFormated(file): ##Merge each csv with a well-defined formatted csv and generate a final report with writing all the csv to one output csv
csvMerge('Format.csv',file,file1)
getResult()
if __name__=="__main__":
pool=Pool(2)
pool.map(readwritevalue,[file for file in gettxtpath()])
pool.map(GetFormated,[file for file in getcsvName()])
pool.map(Otherfunction,file_list)
t1=time.time()
pool.close()
pool.join()
You can extract the body of the for loop into its own function, create a multiprocessing.Pool object, then call pool.map() like so (I’ve used more descriptive names):
import csv
import multiprocessing
def read_and_write_single_file(stem):
data = None
with open(stem, "r") as f:
# populate data somehow
csv_file = stem + ".csv"
with open(csv_file, "w", encoding="utf-8") as f:
w = csv.writer(f)
for row in data:
w.writerow(data)
if __name__ == "__main__":
pool = multiprocessing.Pool()
result = pool.map(read_and_write_single_file, get_list_of_files())
See the linked documentation for how to control the number of workers, tasks per worker, etc.
I may have found an answer myself. Not so sure if it is indeed a good answer, but the time is 6 times shorter than before.
def readwritevalue(file):
with open(file, 'r', encoding='UTF-8') as fp:
##dataprocess
file1=file+".csv"
with open(file1,"w") as fp2:
##write data
if __name__=="__main__":
pool=Pool(processes=int(mp.cpu_count()*0.7))
pool.map(readwritevalue,[file for file in gettxtpath()])
t1=time.time()
pool.close()
pool.join()
I used this tweepy-based code to pull the tweets of a given user by user_id. I then saved a list of all tweets of a given user (alltweets) to a json file as follows. Note that without "repr", i wasn't able to dump the alltweets list into json file. The code worked as expected
with open(os.path.join(output_file_path,'%s_tweets.json' % user_id), 'a') as f:
json.dump(repr(alltweets), f)
However, I have a side problem with retrieving the tweets after saving them to the json file. I need to access the text in each tweet, but I'm not sure how to deal with the "Status" wrapper that tweepy uses (See a sample of the json file attached).sample json file content
I tried to iterate over the lines in the file as follows, but the file is being seen as a single line.
with open(fname, 'r') as f:
for line in f:
tweet = json.loads(line)
I also tried to iterate over statuses after reading the json file as a string, as follows, but iteration rather takes place on the individual characters in the json file.
with open(fname, 'r') as f:
x = f.read()
for status in x:
"""code"""
Maybe not the prettiest solution but you could just declare Status as a dict and then eval the list (the whole content of the files).
Status = dict
f = open(fname, 'r')
data = eval(f.read())
f.close()
for status in data:
""" do your stuff"""
I would like to be able to use a list in a file to 'upload' a code to the program.
NotePad file:
savelist = ["Example"]
namelist = ["Example2"]
Python Code:
with open("E:/battle_log.txt", 'rb') as f:
gamesave = savelist[(name)](f)
name1 = namelist [(name)](f)
print ("Welcome back "+name1+"! I bet you missed this adventure!")
f.close()
print savelist
print namelist
I would like this to be the output:
Example
Example2
It looks like you're trying to serialize a program state, the re-load it later! You should consider using a database instead, or even simply pickle
import pickle
savelist = ["Example"]
namelist = ["Example2"]
obj_to_pickle = (savelist, namelist)
with open("path/to/savefile.pkl", 'wb') as p:
pickle.dump(obj_to_pickle, p)
# save data
with open('path/to/savefile.pkl', 'rb') as p:
obj_from_pickle = pickle.load(p)
savelist, namelist = obj_from_pickle
# load data
There are several options:
Save your notepad file with the .py extension and import it. As long as it contains valid python code, everything will be accessible
Load the text as a string and execute it (e.g., via eval())
Store the information in an easy to read configuration file (e.g., YAML) and parse it when you need it
Precompute the data and store it in a pickle file
The first two are risky if you don't have control over who will provide the file as someone can insert malicious code into the inputs.
You could simply import it as long the file is in the same folder as the one your program is in. Kinda like this:
import example.txt
or:
from example.txt import*
Then access it through one of two ways. The first one:
print Example.savelist[0]
print Example.namelist[0]
The second way:
print savelist[0]
print namelist[0]