I am trying to POST all files in a folder on my local drive to a certain web URL by using Requests and Glob. Every time I POST a new file to the URL, I want to add to a dictionary a new "key-value" item that is "name of the file (key), output from server after POSTing the file (value)":
import requests, glob, unicodedata
outputs = {}
text_files = glob.iglob("/Users/ME/Documents/folder/folder/*.csv")
url = 'http://myWebsite.com/extension/extension/extension'
for data in text_files:
file2 = {'file': open(data)}
r = requests.post(url, files=file2)
outputs[file2] = r.text
This gives me the error:
Traceback (most recent call last):
File "/Users/ME/Documents/folder/folder/myProgram.py", line 15, in <module>
outputs[file2] = r.text
TypeError: unhashable type: 'dict'
This is because (I think) "file2" if of type 'dict'. Is there anyway to cast/alter 'file2' after I POST it to just be a string of the file name?
You are trying to use the file object, no the file name. Use data as the key:
for data in text_files:
file2 = {'file': open(data)}
r = requests.post(url, files=file2)
outputs[data] = r.text
better yet, use a more meaningful name, and use with to have the open file object closed again for you:
for filename in text_files:
with open(filename) as fileobj:
files = {'file': fileobj}
r = requests.post(url, files=files)
outputs[filename] = r.text
Related
files = {'file': ('output.mp3', open('output.mp3','rb'), 'audio/mpeg')}
I am using this for POST request but after usage trying to delete it with os.remove and it gives "it's used by a process". How to close file after?
You can use with ...:
import os
import requests
# open the file and send it
with open("output.mp3", "rb") as my_file:
files = {"file": ("output.mp3", my_file, "audio/mpeg")}
r = requests.post(url, files=files)
# ...
# file is closed now, remove it
os.remove("output.mp3")
import glob
import os
import requests
import shutil
class file_service:
def file():
dir_name = '/Users/TEST/Downloads/TU'
os.chdir(dir_name)
pattern='TU_*.csv'
for x in glob.glob(pattern):
file_name=os.path.join(dir_name,x)
print (file_name)
from datetime import date
dir_name_backup = '/Users/Zill/Downloads/backup'
today = date.today()
backup_file_name = f'Backup_TU_{today.year}{today.month:02}{today.day:02}.csv'
backup_file_name_directory= os.path.join(dir_name_backup,backup_file_name)
print(backup_file_name_directory)
newPath = shutil.copy(file_name, backup_file_name_directory)
url = "google.com"
payload = {'name': 'file'}
files = [
('file', open(file_name,'rb'))
]
headers = {
'X-API-TOKEN': '12312'
}
response = requests.request("POST", url, headers=headers, data = payload, files = files)
print(response.text.encode('utf8'))
files.close()
os.remove(file_name)
file()
To provide an overall context, I am trying to retrieve a file from my OS and using POST method I am trying to post the content of the file into an application. Its working as expected so far, the details are getting pushed into application as expected. As part of my next step I am trying to remove the file from my existing directory using os.remove(). But I am getting a Win32 error as my file is not closed when it was opened in read-only mode in the POST call. I am trying to close it but I am unable to do so.
Can anyone please help me out with it.
Thanks!
I'm not sure I understand your code correctly. Could you try replacing
files.close()
with
for _, file in files:
file.close()
and check if it works?
Explanation:
In
files = [('file', open(file_name,'rb'))]
you create a list containing exactly one tuple that has the string 'file' as first element and a file object as second element:
[('file', file_object)]
The loop takes the tuple from the list, ignores its first element (_), takes its second element, the file object, and uses its close method to close it.
I've just now realised the list contains only one tuple. So there's no need for a loop:
files[0][1].close()
should do it.
The best way would be to use with (the file gets automatically closed once you leave the with block):
payload = {'name': 'file'}
with open(file_name, 'rb') as file:
files = [('file', file)]
headers = {'X-API-TOKEN': '12312'}
response = requests.request("POST", url, headers=headers, data = payload, files = files)
print(response.text.encode('utf8'))
os.remove(file_name)
What I am trying to do is loop through a list of URL to download a series of .pdfs, and save them to a .zip. At the moment I am just trying to test code using just one URL. The ERROR I am getting is:
Traceback (most recent call last):
File "I:\test_pdf_download_zip.py", line 36, in <module>
zip_file(zipfile_name, url)
File "I:\test_pdf_download_zip.py", line 30, in zip_file
myzip.write(dowload_pdf(url))
TypeError: expected a string or other character buffer object
Would someone know how to pass .pdf request to the .zip correctly (avoiding the error above) in order for me to append it, or know if it is possible to do this?
import os
import zipfile
import requests
output = r"I:"
# File name of the zipfile
zipfile_name = os.path.join(output, "test.zip")
# Random test pdf
url = r"http://www.pdf995.com/samples/pdf.pdf"
def create_zipfile(zipfile_name):
zipfile.ZipFile(zipfile_name, "w")
def dowload_pdf(url):
response = requests.get(url, stream=True)
with open('test.pdf', 'wb') as f:
f.write(response.content)
def zip_file(zip_name, url):
with open(zip_name,'a') as myzip:
myzip.write(dowload_pdf(url))
if __name__ == "__main__":
create_zipfile(zipfile_name)
zip_file(zipfile_name, url)
print("Done")
Your download_pdf() function is saving a file but it doesn't return anything. You need to modify it so it actually returns the file path to myzip.write(). You don't want to hardcode test.pdf but pass unique paths to your download function so you don't end up with multiple test.pdf in your archive.
def dowload_pdf(url, path):
response = requests.get(url, stream=True)
with open(path, 'wb') as f:
f.write(response.content)
return path
I have a python script that fetches a webpage and mirrors it. It works fine for one specific page, but I can't get it to work for more than one. I assumed I could put multiple URLs into a list and then feed that to the function, but I get this error:
Traceback (most recent call last):
File "autowget.py", line 46, in <module>
getUrl()
File "autowget.py", line 43, in getUrl
response = urllib.request.urlopen(url)
File "/usr/lib/python3.2/urllib/request.py", line 139, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.2/urllib/request.py", line 361, in open
req.timeout = timeout
AttributeError: 'tuple' object has no attribute 'timeout'
Here's the offending code:
url = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com']
def getUrl(*url):
response = urllib.request.urlopen(url)
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
getUrl()
I've exhausted Google trying to find how to open a list with urlopen(). I found one way that sort of works. It takes a .txt document and goes through it line-by-line, feeding each line as a URL, but I'm writing this using Python 3 and for whatever reason twillcommandloop won't import. Plus, that method is unwieldy and requires (supposedly) unnecessary work.
Anyway, any help would be greatly appreciated.
In your code there are some errors:
You define getUrls with variable arguments list (the tuple in your error);
You manage getUrls arguments as a single variable (list instead)
You can try with this code
import urllib2
import shutil
urls = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com']
def getUrl(urls):
for url in urls:
#Only a file_name based on url string
file_name = url.replace('https://', '').replace('.', '_').replace('/','_')
response = urllib2.urlopen(url)
with open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
getUrl(urls)
It do not support tuple:
urllib.request.urlopen(url[, data][, timeout])
Open the URL url, which can be either a string or a Request object.
And your calling is incorrect. It should be:
getUrl(url[0],url[1],url[2])
And inside the function, use a loop like "for u in url" to travel all urls.
You should just iterate over your URLs using a for loop:
import shutil
import urllib.request
urls = ['https://www.example.org/', 'https://www.foo.com/']
file_name = 'foo.txt'
def fetch_urls(urls):
for i, url in enumerate(urls):
file_name = "page-%s.html" % i
response = urllib.request.urlopen(url)
with open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
fetch_urls(urls)
I assume you want the content saved to separate files, so I used enumerate here to create a uniqe file name, but you can obviously use anything from hash(), the uuid module to creating slugs.
Please could someone convert the following from python2 to python3;
import requests
url = "http://duckduckgo.com/html"
payload = {'q':'python'}
r = requests.post(url, payload)
with open("requests_results.html", "w") as f:
f.write(r.content)
and I get;
Traceback (most recent call last):
File "C:\temp\Python\testFile.py", line 1, in <module>
import requests
ImportError: No module named 'requests'
I have also tried;
import urllib.request
url = "http://duckduckgo.com/html"
payload = {'q':'python'}
r = urllib.request.post(url, payload)
with open("requests_results.html", "w") as f:
f.write(r.content)
but I get
Traceback (most recent call last):
File "C:\temp\Python\testFile.py", line 5, in <module>
r = urllib.request.post(url, payload)
AttributeError: 'module' object has no attribute 'post'
So, in python3.2, r.content is a bytestring, not a str, and write does not like it. You might want to use r.text instead:
with open("requests_results.html", "w") as f:
f.write(r.text)
You can see it in the requests documentation in http://docs.python-requests.org/en/latest/api.html#main-interface
class requests.Response
content - Content of the response, in bytes.
text - Content of the response, in unicode. if Response.encoding is None and chardet module is available,
encoding will be guessed.
Edit:
I posted before seeing the edited question. Yeah, like Martijn Pieters said, you need to install the requests module for python3 in order to be able to import it.
I think the problem here is that there is no Requests package installed. Or if you have installed it's installed in your python2.x directory and not in python3 so which is why you're not able to use requests module. Try making python3 as your default copy and then install requests.
Also try visiting thisarticle by Michael Foord which walks you through using all the features of urlib2
import urllib.request
import urllib.parse
url = "https://duckduckgo.com/html"
values = {'q':'python'}
data = urllib.parse.urlencode(values).encode("utf-8")
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
the_page = response.read()
print(the_page)