Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am wondering if anyone is able to read the articles within Tickets of an OTRS system via pyOTRS? I am able to connect and get tickets fine, I just cannot find out how to get the content of the tickets. I have been up and down the PyOTRS documentation but I am stuck. Does anyone have anything they can share with regards to reading articles?
The pre-requisites for PyOTRS are mentioned here: https://pypi.org/project/PyOTRS/.
Once these are complete, following steps can be taken for retrieving OTRS ticket data:
A connection is initiated by creating a client.
OTRS ticket search is conducted.
OTRS Ticket data is retrieved including dynamic fields and articles using get_ticket.to_dct() response.
from pyotrs import Article, Client, Ticket, DynamicField, Attachment
# Initializing
URL = config["url"]
USERNAME = config["username"]
PASSWORD = config["password"]
TICKET_LINK = config["ticketLink"]
# Create session
def createSession():
client = Client(URL, USERNAME, PASSWORD, https_verify=False)
client.session_create()
return client
# Retrieve tickets based on condition
def ticketData(client):
# Specify ticket search condition
data = client.ticket_search(Queues=['queue1Name', 'queue2Name'], States=['open'])
print("Number of tickets retrieved:" + str(len(data)))
# Iterating over all search results
if data[0] is not u'':
for ticket_id in data:
# Get ticket details
get_ticket = client.ticket_get_by_id(ticket_id, articles=1, attachments=1)
print(get_ticket)
q1 = "Ticket id: " + str(get_ticket.field_get("TicketID")) + "\nTicket number: " + str(get_ticket.field_get("TicketNumber")) + "\nTicket Creation Time: " + str(get_ticket.field_get("Created")) + "\n\nTicket title: " + get_ticket.field_get("Title")
print(q1)
# Based on to_dct() response we can access dynamic field (list) and article values
print(get_ticket.to_dct())
# Accessing dynamic field values
dynamicField3 = get_ticket.to_dct()["Ticket"]["DynamicField"][3]["Value"]
dynamicField12 = get_ticket.to_dct()["Ticket"]["DynamicField"][12]["Value"]
# Accessing articles
article = get_ticket.to_dct()["Ticket"]["Article"]
print(len(article))
# Iterating through all articles of the ticket (in cases where tickets have multiple articles)
for a1 in range(0, len(article)):
# Article subject
q2 = "Article " + str(a1+1) + ": " + get_ticket.to_dct()["Ticket"]["Article"][a1]["Subject"] + "\n"
print(q2)
# Article body
x = get_ticket.to_dct()["Ticket"]["Article"][a1]["Body"]
x = x.encode('utf-8') #encoded
q3 = "Body " + str(a1+1) + ": " + x + "\n"
print(q3)
# Ticket link for reference
q4 = "Ticket link: " + TICKET_LINK + ticket_id + "\n"
print(q4, end="\n\n")
def main():
client = createSession()
ticketData(client)
print("done")
main()
Related
I am currently working on a project with the goal of determining the popularity of various topics on gis.stackexchange. I am using Python to interface with the stack exchange API. My issue is I am having trouble configuring the API query to match what a basic search using the search bar would return (showing posts containing the term (x)). I am currently using the /search/advanced... q="term" method, however I am getting empty results for search terms that might have around 100-200 posts. I have read a lot of the API documentation, but can't seem to configure the API query to match what a site search would yield.
Edit: For example, if I search, "Bayesian", I get 42 results on gis.stackexchange, but when I set q=Bayesian in the API request I get an empty return.
I have included my program below if it helps. Thanks!
#Interfacing_with_SO_API
import requests as rq
import json
import time
keywordinput = input('Enter your search term. If two words seperate by - : ')
baseurl = ('https://api.stackexchange.com/2.3/search/advanced?page=')
endurl = ('&pagesize=100&order=desc&sort=votes&q=' + keywordinput + '&site=gis.stackexchange&filter=!-nt6H9O0imT9xRAnV1gwrp1ZOq7FBaU7CRaGpVkODaQgDIfSY8tJXb')
urltot = ('https://api.stackexchange.com/2.3/search/advanced?page=1&pagesize=100&order=desc&sort=votes&q=' + keywordinput + '&site=gis.stackexchange&filter=!-nt6H9O0imT9xRAnV1gwrp1ZOq7FBaU7CRaGpVkODaQgDIfSY8tJXb')
response = rq.get(urltot)
page = range(1,400)
if response.status_code == 400:
print('Initial Response Code 400: Stopping')
exit()
elif response.status_code == 200:
print('Initial Response Code 200: Continuing')
datarr = []
for n in page:
response = rq.get(baseurl + str(n) + endurl)
print(baseurl + str(n) + endurl)
time.sleep(2)
if response.status_code == 400 or response.json()['has_more'] == False or n >400:
print('No more pages')
break
elif response.json()['has_more'] == True:
for data in response.json()['items']:
if data['view_count'] >= 0:
datarr.append(data)
print(data['view_count'])
print(data['answer_count'])
print(data['score'])
#convert datarr to csv and save to file
with open(input('Search Term Name (filename): ') + '.csv', 'w') as f:
for data in datarr:
f.write(str(data['view_count']) + ',' + str(data['answer_count']) + ','+ str(data['score']) + '\n')
exit()
If you look at the results for searching bayesian on the GIS StackExchange site, you'll get 42 results because the StackExchange site search returns both questions and answers that contain the term.
However, the standard /search and /search/advanced API endpoints only search questions, per the doc (emphasis mine):
Searches a site for any questions which fit the given criteria
Discussion
Searches a site for any questions which fit the given criteria.
Instead, what you want to use is the /search/excerpts endpoint, which will return both questions and answers.
Quick demo in the shell to show that it returns the same number of items:
curl -s --compressed "https://api.stackexchange.com/2.3/search/excerpts?page=1&pagesize=100&site=gis&q=bayesian" | jq '.["items"] | length'
42
And a minimal Python program to do the same:
#!/usr/bin/env python3
# file: test_so_search.py
import requests
if __name__ == "__main__":
api_url = "https://api.stackexchange.com/2.3/search/excerpts"
search_term = "bayesian"
qs = {
"page": 1,
"pagesize": 100,
"order": "desc",
"sort": "votes",
"site": "gis",
"q": search_term
}
rsp = requests.get(api_url, qs)
data = rsp.json()
print(f"Got {len(data['items'])} results for '{search_term}'")
And output:
> python test_so_search.py
Got 42 results for 'bayesian'
When you're on the Google Console, Security Command Center, Findings, you can click on an item to view the details. There is a section that lists "Attributes" and "Source Properties". I would like to get some of these values. The code below is taken from this page (https://cloud.google.com/security-command-center/docs/how-to-api-list-findings) and modified to get what I need:
from google.cloud import securitycenter
client = securitycenter.SecurityCenterClient()
organization_id = "<my organization id>"
org_name = "organizations/{org_id}".format(org_id=organization_id)
finding_result_iterator = client.list_findings(request={"parent": all_sources, "filter": 'severity="HIGH"'})
for i, finding_result in enumerate(finding_result_iterator):
sourceId = finding_result.finding.resource_name
title = finding_result.finding.category
alertTime = finding_result.finding.event_time
serviceName = finding_result.resource.type_
description = ""
additionalInfo = ""
I would like to get the "explanation" and "recommendation" values from Source Properties, but I don't know where to get them. The reference page shows the output for each finding_result in the loop. The Console displays these properties, but I don't know how to get them and I've been searching on the interwebs for a answer. I'm hoping someone here has the answer.
So, I was being a bit impatient with my question, both here and with Google Support. When I tightened up the filters for my call, I found records that do indeed have the two values I was looking for. For those who are interested, I've included some junky test code below.
from google.cloud import securitycenter
client = securitycenter.SecurityCenterClient()
organization_id = "<my org id>"
org_name = "organizations/{org_id}".format(org_id=organization_id)
all_sources = "{org_name}/sources/-".format(org_name=org_name)
finding_result_iterator = client.list_findings(request={"parent": all_sources, "filter": 'severity="HIGH" AND state="ACTIVE" AND category!="Persistence: IAM Anomalous Grant" AND category!="MFA_NOT_ENFORCED"'})
for i, finding_result in enumerate(finding_result_iterator):
sourceId = finding_result.finding.resource_name
projectId = finding_result.resource.project_display_name
title = finding_result.finding.category
alertTime = finding_result.finding.event_time
serviceName = finding_result.resource.type_
description = ""
additionalInfo = ""
externalUri = ""
if hasattr(finding_result.finding,"external_uri"):
externalUri = finding_result.finding.external_uri
sourceProps = finding_result.finding.source_properties
for item in sourceProps:
if (item == "Explanation"):
description = str(sourceProps[item])
if (item == "Recommendation"):
additionalInfo = str(sourceProps[item])
print("TITLE: " + title)
print(" PROJECT ID: " + projectId)
print(" DESCRIPTION: " + description)
print(" SOURCE ID: " + sourceId)
print(" ALERT TIME: {}".format(alertTime))
print(" SERVICE NAME: " + serviceName)
print(" ADDITIONAL INFO: Recommendation: " + additionalInfo)
if len(externalUri) > 0:
print(", External URI: " + externalUri)
if i < 1:
break
So while the question was a bit of a waste, the code might help someone else trying to work with the Security Command Center API.
This question already has answers here:
Website blocking Selenium : is there a way to bypass?
(2 answers)
Closed 3 years ago.
I read a lot of posts on the topic, and also tried some of this article's advice, but I am still blocked.
https://www.scraperapi.com/blog/5-tips-for-web-scraping
IP Rotation: done I'm using a VPN and often changing IP (but not DURING the script, obviously)
Set a Real User-Agent: implemented fake-useragent with no luck
Set other request headers: tried with SeleniumWire but how to use it at the same time than 2.?
Set random intervals in between your requests: done but anyway at the present time I even cannot access the starting home page !!!
Set a referer: same as 3.
Use a headless browser: no clue
Avoid honeypot traps: same as 4.
10: irrelevant
The website I want to scrape: https://www.winamax.fr/paris-sportifs/
Without Selenium: it goes smoothly to a page with some games and their odds, and I can navigate from here
With Selenium: the page shows a "Winamax est actuellement en maintenance" message and no games and no odds
Try to execute this piece of code and you might get blocked quite quickly :
from selenium import webdriver
import time
from time import sleep
import json
driver = webdriver.Chrome(executable_path="chromedriver")
driver.get("https://www.winamax.fr/paris-sportifs/") #I'm even blocked here now !!!
toto = driver.page_source.splitlines()
titi = {}
matchez = []
matchez_detail = []
resultat_1 = {}
resultat_2 = {}
taratata = 1
comptine = 1
for tut in toto:
if tut[0:53] == "<script type=\"text/javascript\">var PRELOADED_STATE = ": titi = json.loads(tut[53:tut.find(";var BETTING_CONFIGURATION = ")])
for p_id in titi.items():
if p_id[0] == "sports":
for fufu in p_id:
if isinstance(fufu, dict):
for tyty in fufu.items():
resultat_1[tyty[0]] = tyty[1]["categories"]
for p_id in titi.items():
if p_id[0] == "categories":
for fufu in p_id:
if isinstance(fufu, dict):
for tyty in fufu.items():
resultat_2[tyty[0]] = tyty[1]["tournaments"]
for p_id in resultat_1.items():
for tgtg in p_id[1]:
for p_id2 in resultat_2.items():
if str(tgtg) == p_id2[0]:
for p_id3 in p_id2[1]:
matchez.append("https://www.winamax.fr/paris-sportifs/sports/"+str(p_id[0])+"/"+str(tgtg)+"/"+str(p_id3))
for alisson in matchez:
print("compet " + str(taratata) + "/" + str(len(matchez)) + " : " + alisson)
taratata = taratata + 1
driver.get(alisson)
sleep(1)
elements = driver.find_elements_by_xpath("//*[#id='app-inner']/div/div[1]/span/div/div[2]/div/section/div/div/div[1]/div/div/div/div/a")
for elm in elements:
matchez_detail.append(elm.get_attribute("href"))
for mat in matchez_detail:
print("match " + str(comptine) + "/" + str(len(matchez_detail)) + " : " + mat)
comptine = comptine + 1
driver.get(mat)
sleep(1)
elements = driver.find_elements_by_xpath("//*[#id='app-inner']//button/div/span")
for elm in elements:
elm.click()
sleep(1) # and after my specific code to scrape what I want
I recommend using requests , I don’t see a reason to use selenium since you said requests works, and requests can work with pretty much any site as long as you are using appropriate headers, you can see the headers needed by looking at the developer console in chrome or Firefox and looking at the request headers.
I Want To Scrape 70 character in this HTML code:
<p>2) Proof of payment emailed to satrader03<strong>#gmail.com</strong> direct from online banking 3) Selfie of you holding your ID 4) Selfie of you holding your bank card from which payment will be made OR 5) Skype or what's app Video call while logged onto online banking displaying account name which should match personal verified name Strictly no 3rd party payments</p>
I Want To Know How To Scrape Specific Character with selenium for example i want to scrape 30 character or other
Here is my code:
description = driver.find_elements_by_css_selector("p")
items = len(title)
with open('btc_gmail.csv','a',encoding="utf-8") as s:
for i in range(items):
s.write(str(title[i].text) + ',' + link[i].text + ',' + description[i].text + '\n')
How to scrape 30 characters or 70 or something
Edit (full code):
driver = webdriver.Firefox()
r = randrange(3,7)
for url_p in url_pattren:
time.sleep(3)
url1 = 'https://www.bing.com/search?q=site%3alocalbitcoins.com+%27%40gmail.com%27&qs=n&sp=-1&pq=site%3alocalbitcoins+%27%40gmail.com%27&sc=1-31&sk=&cvid=9547A785CF084BAE94D3F00168283D1D&first=' + str(url_p) + '&FORM=PERE3'
driver.get(url1)
time.sleep(r)
title = driver.find_elements_by_tag_name('h2')
link = driver.find_elements_by_css_selector("cite")
description = driver.find_elements_by_css_selector("p")
items = len(title)
with open('btc_gmail.csv','a',encoding="utf-8") as s:
for i in range(items):
s.write(str(title[i].text) + ',' + link[i].text + ',' + description[i].text[30:70] + '\n')
Any Solution?
You can get text of the tag and then use slice on string
>>> description = driver.find_elements_by_css_selector("p")[0].text
>>> print(description[30:70]) # printed from 30th to 70th symbol
'satrader03<strong>#gmail.com</strong>'
I am using Python to scrape US postal code population data from http:/www.city-data.com, through this directory: http://www.city-data.com/zipDir.html. The specific pages I am trying to scrape are individual postal code pages with URLs like this: http://www.city-data.com/zips/01001.html. All of the individual zip code pages I need to access have this same URL Format, so my script simply does the following for postal_code in range:
Creates URL given postal code
Tries to get response from URL
If (2), Check the HTTP of that URL
If HTTP is 200, retrieves the HTML and scrapes the data into a list
If HTTP is not 200, pass and count error (not a valid postal code/URL)
If no response from URL because of error, pass that postal code and count error
At end of script, print counter variables and timestamp
The problem is that I run the script and it works fine for ~500 postal codes, then suddenly stops working and returns repeated timeout errors. My suspicion is that the site's server is limiting the page views coming from my IP address, preventing me from completing the amount of scraping that I need to do (all 100,000 potential postal codes).
My question is as follows: Is there a way to confuse the site's server, for example using a proxy of some kind, so that it will not limit my page views and I can scrape all of the data I need?
Thanks for the help! Here is the code:
##POSTAL CODE POPULATION SCRAPER##
import requests
import re
import datetime
def zip_population_scrape():
"""
This script will scrape population data for postal codes in range
from city-data.com.
"""
postal_code_data = [['zip','population']] #list for storing scraped data
#Counters for keeping track:
total_scraped = 0
total_invalid = 0
errors = 0
for postal_code in range(1001,5000):
#This if statement is necessary because the postal code can't start
#with 0 in order for the for statement to interate successfully
if postal_code <10000:
postal_code_string = str(0)+str(postal_code)
else:
postal_code_string = str(postal_code)
#all postal code URLs have the same format on this site
url = 'http://www.city-data.com/zips/' + postal_code_string + '.html'
#try to get current URL
try:
response = requests.get(url, timeout = 5)
http = response.status_code
#print current for logging purposes
print url +" - HTTP: " + str(http)
#if valid webpage:
if http == 200:
#save html as text
html = response.text
#extra print statement for status updates
print "HTML ready"
#try to find two substrings in HTML text
#add the substring in between them to list w/ postal code
try:
found = re.search('population in 2011:</b> (.*)<br>', html).group(1)
#add to # scraped counter
total_scraped +=1
postal_code_data.append([postal_code_string,found])
#print statement for logging
print postal_code_string + ": " + str(found) + ". Data scrape successful. " + str(total_scraped) + " total zips scraped."
#if substrings not found, try searching for others
#and doing the same as above
except AttributeError:
found = re.search('population in 2010:</b> (.*)<br>', html).group(1)
total_scraped +=1
postal_code_data.append([postal_code_string,found])
print postal_code_string + ": " + str(found) + ". Data scrape successful. " + str(total_scraped) + " total zips scraped."
#if http =404, zip is not valid. Add to counter and print log
elif http == 404:
total_invalid +=1
print postal_code_string + ": Not a valid zip code. " + str(total_invalid) + " total invalid zips."
#other http codes: add to error counter and print log
else:
errors +=1
print postal_code_string + ": HTTP Code Error. " + str(errors) + " total errors."
#if get url fails by connnection error, add to error count & pass
except requests.exceptions.ConnectionError:
errors +=1
print postal_code_string + ": Connection Error. " + str(errors) + " total errors."
pass
#if get url fails by timeout error, add to error count & pass
except requests.exceptions.Timeout:
errors +=1
print postal_code_string + ": Timeout Error. " + str(errors) + " total errors."
pass
#print final log/counter data, along with timestamp finished
now= datetime.datetime.now()
print now.strftime("%Y-%m-%d %H:%M")
print str(total_scraped) + " total zips scraped."
print str(total_invalid) + " total unavailable zips."
print str(errors) + " total errors."