POST XML file with requests - python

I'm getting:
<error>You have an error in your XML syntax...
when I run this python script I just wrote (I'm a newbie)
import requests
xml = """xxx.xml"""
headers = {'Content-Type':'text/xml'}
r = requests.post('https://example.com/serverxml.asp', data=xml)
print (r.content);
Here is the content of the xxx.xml
<xml>
<API>4.0</API>
<action>login</action>
<password>xxxx</password>
<license_number>xxxxx</license_number>
<username>xxx#xyz.com</username>
<training>1</training>
</xml>
I know that the xml is valid because I use the same xml for a perl script and the contents are being printed back.
Any help will greatly appreciated as I am very new to python.

You want to give the XML data from a file to requests.post. But, this function will not open a file for you. It expects you to pass a file object to it, not a file name. You need to open the file before you call requests.post.
Try this:
import requests
# Set the name of the XML file.
xml_file = "xxx.xml"
headers = {'Content-Type':'text/xml'}
# Open the XML file.
with open(xml_file) as xml:
# Give the object representing the XML file to requests.post.
r = requests.post('https://example.com/serverxml.asp', data=xml, headers=headers)
print (r.content);

Related

Search for a word in webpage and save to TXT in Python

I am trying to: Load links from a .txt file, search for a specific Word, and if the word exists on that webpage, save the link to another .txt file but i am getting error: No scheme supplied. Perhaps you meant http://<_io.TextIOWrapper name='import.txt' mode='r' encoding='cp1250'>?
Note: the links has HTTPS://
The code:
import requests
list_of_pages = open('import.txt', 'r+')
save = open('output.txt', 'a+')
word = "Word"
save.truncate(0)
for page_link in list_of_pages:
res = requests.get(list_of_pages)
if word in res.text:
response = requests.request("POST", url)
save.write(str(response) + "\n")
Can anyone explain why ? thank you in advance !
Try putting http:// behind the links.
When you use res = requests.get(list_of_pages) you're creating HTTP connection to list_of_pages. But requests.get takes URL string as a parameter (e.g. http://localhost:8080/static/image01.jpg), and look what list_of_pages is - it's an already opened file. Not a string. You have to either use requests library, or file IO API, not both.
If you have an already opened file, you don't need to create HTTP request at all. You don't need this request.get(). Parse list_of_pages like a normal, local file.
Or, if you would like to go the other way, don't open this text file in list_of_arguments, make it a string with URL of that file.

How to store XML received from POST XML request in its correct format? Python requests library

I'm sending an XML file to a website with Python requests library and received back a bunch of XML code (in format of bytes) like below:
b'<?xml version="1.0" encoding="UTF-8"?>\n<GetCategorySpecificsResponse xmlns="urn:ebay:apis:eBLBaseComponents"><Timestamp>2022-03-15T09:54:41.461Z</Timestamp><Ack>Success</Ack><Version>1219</Version><Build>E1219_CORE_APICATALOG_19146446_R1</Build><Recommendations><CategoryID>19006</CategoryID><NameRecommendation>.....
However, how can I get the xml above in its correct format and with all the correct indentations? I want to store the string above in another file, but with the current string, it's just a long line going forever toward the right side that is not really useful to me...
Below is my code (with the r.content as the xml above):
import requests
xml_file = XML_FILE
headers = {'Content-Type':'text/xml'}
with open(XML_FILE) as xml:
r = requests.post(WEBSITE_URL, data=xml, headers=headers)
print(r.content)
new_file = open(ANOTHER_FILE)
new_file.write(str(r.content))
new_file.close()
Example of the xml I want to store in new_file:
<?xml version="1.0" encoding="UTF-8"?>
<GetCategorySpecificsResponse
xmlns="urn:ebay:apis:eBLBaseComponents">
<Timestamp>2022-03-15T08:30:01.877Z</Timestamp>
<Ack>Success</Ack>
<Version>1219</Version>
<Build>E1219_CORE_APICATALOG_19146446_R1</Build>
<Recommendations>
<CategoryID>19006</CategoryID>
.....
</GetCategorySpecificsResponse>
Thank you!
One way to do it is to pass the response through a parser and save to file. For example, something like this should work:
from bs4 import BeautifulSoup as bs
soup= bs(r.text,"lxml")
with open("file.xml", "w", encoding='utf-8') as file:
file.write(str(soup.prettify()))

Extract file from gzip folder

I am trying extract the XML file from the gzip that comes out of clicking the button "SEC Investment Adviser Report" at the website here (FYI, this links to the SEC website). Below is my (minimal) code. I continue to get "embedded null character" or "embedded null byte", depending on whether I feed gzip.open() .text or .content from my request. Can anyone help me get this file loaded so I can access the XML?
import requests
import gzip
file = gzip.open(requests.get(r'https://www.adviserinfo.sec.gov/IAPD/Content/BulkFeed/CompilationDownload.aspx?FeedPK=39545&FeedType=IA_FIRM_SEC').text,'rt')
gzip.open takes a filename, not compressed data. You could use gzip.decompress.
The archive from your question looks malformed. Specifically, it has HTML appended for some reason.
The following works by only using the content before the beginning of the HTML:
import requests
import gzip
request = requests.get(r'https://www.adviserinfo.sec.gov/IAPD/Content/BulkFeed/CompilationDownload.aspx?FeedPK=39545&FeedType=IA_FIRM_SEC')
xml = gzip.decompress(request.content[:request.content.find(b"\r\n\r\n<!DOCTYPE html>") - 1])

OSError: [Errno 36] File name too long:

I need to convert a web page to XML (using Python 3.4.3). If I write the contents of the URL to a file then I can read and parse it perfectly but if I try to read directly from the web page I get the following error in my terminal:
File "./AnimeXML.py", line 22, in
xml = ElementTree.parse (xmlData)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/xml/etree/ElementTree.py", line 1187, in parse
tree.parse(source, parser)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/xml/etree/ElementTree.py", line 587, in parse
source = open(source, "rb")
OSError: [Errno 36] File name too long:
My python code:
# AnimeXML.py
#! /usr/bin/Python
# Import xml parser.
import xml.etree.ElementTree as ElementTree
# XML to parse.
sampleUrl = "http://cdn.animenewsnetwork.com/encyclopedia/api.xml?anime=16989"
# Read the xml as a file.
content = urlopen (sampleUrl)
# XML content is stored here to start working on it.
xmlData = content.readall().decode('utf-8')
# Close the file.
content.close()
# Start parsing XML.
xml = ElementTree.parse (xmlData)
# Get root of the XML file.
root = xml.getroot()
for info in root.iter("info"):
print (info.attrib)
Is there any way I can fix my code so that I can read the web page directly into python without getting this error?
As explained in the Parsing XML section of the ElementTree docs:
We can import this data by reading from a file:
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Or directly from a string:
root = ET.fromstring(country_data_as_string)
You're passing the whole XML contents as a giant pathname. Your XML file is probably bigger than 2K, or whatever the maximum pathname size is for your platform, hence the error. If it weren't, you'd just get a different error about there being no directory named [everything up to the first / in your XML file].
Just use fromstring instead of parse.
Or, notice that parse can take a file object, not just a filename. And the thing returned by urlopen is a file object.
Also notice the very next line in that section:
fromstring() parses XML from a string directly into an Element, which is the root element of the parsed tree. Other parsing functions may create an ElementTree.
So, you don't want that root = tree.getroot() either.
So:
# ...
content.close()
root = ElementTree.fromstring(xmlData)

XML file as input

I have the following line of code: xml = BytesIO("<A><B>some text</B></A>") for the file named test.xml.
But I would like to have something like xml = "/home/user1/test.xml"
How can I use the file location instread of having to put the file content?
Exactly like you have. lxml.etree.parse() accepts a string filename and will read the file for you.
The following code will read in the contents of the file into a string, and pass it into the class instantiator for BytesIO
xml = BytesIO(open("/home/user1/test.xml").read())
xml = open('/home/user1/test.xml', 'rb').read()

Categories

Resources