This is a small widget that I am designing that is designed to 'browse' while circumventing proxy settings. I have been told on Code Review that it would be beneficial here, but am struggling to put it in with my program's current logic. Here is the code:
import urllib.request
import webbrowser
import os
import tempfile
location = os.path.dirname(os.path.abspath(__file__))
proxy_handler = urllib.request.ProxyHandler(proxies=None)
opener = urllib.request.build_opener(proxy_handler)
def navigate(query):
response = opener.open(query)
html = response.read()
return html
def parse(data):
start = str(data)[2:-1]
lines = start.split('\\n')
return lines
while True:
url = input("Path: ")
raw_data = navigate(url)
content = parse(raw_data)
with open('cache.html', 'w') as f:
f.writelines(content)
webbrowser.open_new_tab(os.path.join(location, 'cache.html'))
Hopefully someone who has worked with these modules before can help me. The reason that I want to use tempfile is that my program gets raw html, parses it and stores it in a file. This file is overwritten every time a new input comes in, and would ideally be deleted when the program stops running. Also, the file doesn't have to exist when the program initializes so it seems logical from that view also.
Since you are passing the name of the file to webbrowser.open_new_tab(), you should use a NamedTemporaryFile
cache = tempfile.NamedTemporaryFile()
...
cache.seek(0)
cache.writelines(bytes(line, 'UTF-8') for line in content)
cache.seek(0)
webbrowser.open_new_tab('file://' + cache.name)
Related
Epic saves it's shortcuts in a .url file format, I wish to make use of the askopenfilename() function of tkinter to just select the shortcut and then with the code bellow open the directory of the game, but whenever I try such a thing with said function I get a "catastrophic error",is there any way to select it with tkinter.filedialog? or, are there any other functions I could use to achieve the same result?
import os
import tkinter.filedialog as fd
target_url = #Here is where I wish to put a fd.askopenfilename() function, but as I described, this doesn't work#
#Thus far the target_url variable is just an input()#
data = open(target_url, mode='r')
data_read = data.read()
data_index = data_read.index("IconFile=")
exe = data_read[data_index:].replace('IconFile=', '')
replaced = os.path.basename(exe)
open_thing_2 = exe.replace(replaced, '')
os.startfile(open_thing_2)
For easier debugging, this is a example of the contents of such .url file.
[{Number_Array}]
Prop3=19,0
[InternetShortcut]
IDList=
IconIndex=0
WorkingDirectory=C:\Program Files (x86)\Epic Games
URL=com.epicgames.launcher://apps/Jaguar?action=launch&silent=true
IconFile=C:\Program Files\Epic Games\Game_Name\Game_Name.exe
Thank you for your help.
I'm attempting a simple scraper to dump my saved Reddit posts to a txt file and struggling to get the script to do what I want it to do.
Here's some context. The script below dumps all of my saved post IDs into a text file, each one in its own line.
import praw
import os
import sys
reddit = praw.Reddit(client_id='MY_CLIENT_id',
client_secret='TOP_SECRET',
user_agent='AGENT_HERE',
username='USERNAME',
password='PASSWORD')
#open text file
sys.stdout = open('test.txt', 'w')
# get user saved item ids
for item in reddit.user.me().saved(limit=None):
print(item.id)
# print to file
sys.stdout.close()
This gives me a list of post IDS that looks something like this:
lkj34f
ou456d
ho34oo
5j0vr4
I can then use the below to use each of those IDs to get the actual content I want
submission = reddit.submission(id="dg23y6")
print(submission.title)
print(submission.url)
My first question is - is there a way to open the output file, read each of the lines there and pass it as the id for the submission variable?
I'm sure there's an easier way to get this of course, I have seen several existing scripts like this dumping all content into a nicely formatted HTML file, but I'm not quite there yet, so trying to resolve this challenge using my somewhat limited skillset. I think the most obvious solution would be to use print(actual.command.I.am.missing) in place of print(item.id), but no idea how to find it.
Thanks in advance!
Both answers that have been submitted so far have the right idea, but make a mistake in how they use PRAW. They ignore the fact that your saved items are both comments and posts. Then, they both have a line like
submission = reddit.submission(id=item.id)
This creates a PRAW Submission object by using the ID of a pre-existing object, which is either a Submission or a Comment object. In the case that it's a Submission, the new Submission object is identical to the one it's created from, so it's redundant. In the case that it's a Comment, the behavior is incorrect because you're treating a comment ID as if it's a submission ID.
It's not clear exactly what you want to happen with comments, so I'll do it two ways. First, here's how to do it if you want to ignore saved comments (which is much like the existing answers, but with a check of type added and the redundant line removed):
import praw
import os
import sys
reddit = praw.Reddit(client_id='MY_CLIENT_id',
client_secret='TOP_SECRET',
user_agent='AGENT_HERE',
username='USERNAME',
password='PASSWORD')
with open('test.txt', 'w') as f:
for item in reddit.user.me().saved(limit=None):
if isinstance(item, praw.models.Submission):
f.write(item.id + '\n')
f.write(item.title + '\n')
if item.is_self:
f.write(item.selftext + '\n')
else: # link post
f.write(item.url)
And here's how to do it where you also save comments:
import praw
import os
import sys
reddit = praw.Reddit(client_id='MY_CLIENT_id',
client_secret='TOP_SECRET',
user_agent='AGENT_HERE',
username='USERNAME',
password='PASSWORD')
with open('test.txt', 'w') as f:
for item in reddit.user.me().saved(limit=None):
if isinstance(item, praw.models.Submission):
f.write(item.id + '\n')
f.write(item.title + '\n')
if item.is_self:
f.write(item.selftext + '\n')
else: # link post
f.write(item.url)
else: # comment
f.write(item.id + '\n')
f.write(item.body + '\n')
Instead of reopening the file just write what you want to it when its open
import praw
import os
import sys
reddit = praw.Reddit(client_id='MY_CLIENT_id',
client_secret='TOP_SECRET',
user_agent='AGENT_HERE',
username='USERNAME',
password='PASSWORD')
out_filename = 'test.txt'
with open(out_filename, 'w') as out_file:
for item in reddit.user.me().saved(limit=None):
out_file.write(item.id + '\n')
submission = reddit.submission(id=item.id)
out_file.write(submission.title + '\n')
out_file.write(submission.url + '\n')
# or combine title and url on same line like this
# out_file.write(submission.title + ': ' + submission.url + '\n')
In general, it's not good form to reassign sys.stdout. You can instead use print(..., file=...).
I imagine you might be looking for something like
import praw
import os
import sys
reddit = praw.Reddit(...)
with open("test.txt", "w") as f:
for item in reddit.user.me().saved(limit=None):
print(item.id) # printed to the console
item = reddit.submission(id=item.id)
print(item.title, file=f) # written to the file
print(item.url, file=f) # written to the file
print('----', file=f) # A separator, written to the file
I have data base of file. I'm writing a program to ask the user to input file name and using that input to find the file, download it,make a folder locally and save the file..which module in Python should be used?
Can be as small as this:
import requests
my_filename = input('Please enter a filename:')
my_url = 'http://www.somedomain/'
r = requests.get(my_url + my_filename, allow_redirects=True)
with open(my_filename, 'wb') as fh:
fh.write(r.content)
Well, do you have the database online?
If so I would suggest you the requests module, very pythonic and fast.
Another great module based on requests is robobrowser.
Eventually, you may need beautiful soup to parse the HTML or XML data.
I would avoid using selenium because it's designed for web-testing, it needs a browser and its webdriver and it's pretty slow. It doesn't fit your needs at all.
Finally, to interact with the database I'd use sqlite3
Here a sample:
from requests import Session
import os
filename = input()
with Session() as session:
url = f'http://www.domain.example/{filename}'
try:
response = session.get(url)
except requests.exceptions.ConnectionError:
print('File not existing')
download_path = f'C:\\Users\\{os.getlogin()}\\Downloads\\your application'
os.makedirs(dowload_path, exist_ok=True)
with open(os.path.join(download_path, filename), mode='wb') as dbfile:
dbfile.write(response.content)
However, you should read how to ask a good question.
I have the following view code that attempts to "stream" a zipfile to the client for download:
import os
import zipfile
import tempfile
from pyramid.response import FileIter
def zipper(request):
_temp_path = request.registry.settings['_temp']
tmpfile = tempfile.NamedTemporaryFile('w', dir=_temp_path, delete=True)
tmpfile_path = tmpfile.name
## creating zipfile and adding files
z = zipfile.ZipFile(tmpfile_path, "w")
z.write('somefile1.txt')
z.write('somefile2.txt')
z.close()
## renaming the zipfile
new_zip_path = _temp_path + '/somefilegroup.zip'
os.rename(tmpfile_path, new_zip_path)
## re-opening the zipfile with new name
z = zipfile.ZipFile(new_zip_path, 'r')
response = FileIter(z.fp)
return response
However, this is the Response I get in the browser:
Could not convert return value of the view callable function newsite.static.zipper into a response object. The value returned was .
I suppose I am not using FileIter correctly.
UPDATE:
Since updating with Michael Merickel's suggestions, the FileIter function is working correctly. However, still lingering is a MIME type error that appears on the client (browser):
Resource interpreted as Document but transferred with MIME type application/zip: "http://newsite.local:6543/zipper?data=%7B%22ids%22%3A%5B6%2C7%5D%7D"
To better illustrate the issue, I have included a tiny .py and .pt file on Github: https://github.com/thapar/zipper-fix
FileIter is not a response object, just like your error message says. It is an iterable that can be used for the response body, that's it. Also the ZipFile can accept a file object, which is more useful here than a file path. Let's try writing into the tmpfile, then rewinding that file pointer back to the start, and using it to write out without doing any fancy renaming.
import os
import zipfile
import tempfile
from pyramid.response import FileIter
def zipper(request):
_temp_path = request.registry.settings['_temp']
fp = tempfile.NamedTemporaryFile('w+b', dir=_temp_path, delete=True)
## creating zipfile and adding files
z = zipfile.ZipFile(fp, "w")
z.write('somefile1.txt')
z.write('somefile2.txt')
z.close()
# rewind fp back to start of the file
fp.seek(0)
response = request.response
response.content_type = 'application/zip'
response.app_iter = FileIter(fp)
return response
I changed the mode on NamedTemporaryFile to 'w+b' as per the docs to allow the file to be written to and read from.
current Pyramid version has 2 convenience classes for this use case- FileResponse, FileIter. The snippet below will serve a static file. I ran this code - the downloaded file is named "download" like the view name. To change the file name and more set the Content-Disposition header or have a look at the arguments of pyramid.response.Response.
from pyramid.response import FileResponse
#view_config(name="download")
def zipper(request):
path = 'path_to_file'
return FileResponse(path, request) #passing request is required
docs:
http://docs.pylonsproject.org/projects/pyramid/en/latest/api/response.html#
hint: extract the Zip logic from the view if possible
New to Python & BeautifulSoup. I have a Python program that opens a file called "example.html", runs a BeautifulSoup action on it, then runs a Bleach action on it, then saves the result as file "example-cleaned.html". So far it is working for all contents of "example.html".
I need to modify it so that it opens each file in folder "/posts/", runs the program on it, then saves it out as "/posts-cleaned/X-cleaned.html" where X is the original filename.
Here's my code, minimised:
from bs4 import BeautifulSoup
import bleach
import re
text = BeautifulSoup(open("posts/example.html"))
text.encode("utf-8")
tag_black_list = ['iframe', 'script']
tag_white_list = ['p','div']
attr_white_list = {'*': ['title']}
# Step one, with BeautifulSoup: Remove tags in tag_black_list, destroy contents.
[s.decompose() for s in text(tag_black_list)]
pretty = (text.prettify())
# Step two, with Bleach: Remove tags and attributes not in whitelists, leave tag contents.
cleaned = bleach.clean(pretty, strip="TRUE", attributes=attr_white_list, tags=tag_white_list)
fout = open("posts/example-cleaned.html", "w")
fout.write(cleaned.encode("utf-8"))
fout.close()
print "Done"
Assistance & pointers to existing solutions gladly received!
You can use os.listdir() to get a list of all files in a directory. If you want to recurse all the way down the directory tree, you'll need os.walk().
I would move all this code to handle a single file to function, and then write a second function to handle parsing the whole directory. Something like this:
def clean_dir(directory):
os.chdir(directory)
for filename in os.listdir(directory):
clean_file(filename)
def clean_file(filename):
tag_black_list = ['iframe', 'script']
tag_white_list = ['p','div']
attr_white_list = {'*': ['title']}
with open(filename, 'r') as fhandle:
text = BeautifulSoup(fhandle)
text.encode("utf-8")
# Step one, with BeautifulSoup: Remove tags in tag_black_list, destroy contents.
[s.decompose() for s in text(tag_black_list)]
pretty = (text.prettify())
# Step two, with Bleach: Remove tags and attributes not in whitelists, leave tag contents.
cleaned = bleach.clean(pretty, strip="TRUE", attributes=attr_white_list, tags=tag_white_list)
# this appends -cleaned to the file;
# relies on the file having a '.'
dot_pos = filename.rfind('.')
cleaned_filename = '{0}-cleaned{1}'.format(filename[:dot_pos], filename[dot_pos:])
with open(cleaned_filename, 'w') as fout:
fout.write(cleaned.encode("utf-8"))
print "Done"
Then you just call clean_dir('/posts') or what not.
I'm appending "-cleaned" to the files, but I think I like your idea of using a whole new directory better. That way you won't have to handle conflicts if -cleaned already exists for some file, etc.
I'm also using the with statement to open files here as it closes them and handles exceptions automatically.
Answer to my own question, for others who might find the Python docs for os.listdir a bit unhelpful:
from bs4 import BeautifulSoup
import bleach
import re
import os, os.path
tag_black_list = ['iframe', 'script']
tag_white_list = ['p','div']
attr_white_list = {'*': ['title']}
postlist = os.listdir("posts/")
for post in postlist:
# HERE: you need to specify the directory again, the value of "post" is just the filename:
text = BeautifulSoup(open("posts/"+post))
text.encode("utf-8")
# Step one, with BeautifulSoup: Remove tags in tag_black_list, destroy contents.
[s.decompose() for s in text(tag_black_list)]
pretty = (text.prettify())
# Step two, with Bleach: Remove tags and attributes not in whitelists, leave tag contents.
cleaned = bleach.clean(pretty, strip="TRUE", attributes=attr_white_list, tags=tag_white_list)
fout = open("posts-cleaned/"+post, "w")
fout.write(cleaned.encode("utf-8"))
fout.close()
I cheated and made a separate folder called "posts-cleaned/" because savings files to there was easier than splitting the filename, adding "cleaned", and re-joining it, although if anyone wants to show me a good way to do that, that would be even better.