i am trying to recover spotify ID's of some tracks i got title and artist, and while i was trying to complete the task i came across a strange situation:
the track is the following:
"artist" : "Noisettes",
"track" : "Don't Upset The Rhythm (Go Baby Go)"
The track exists and seems written the same way (so no spaces, strange characters), i manually found the spotifyid of the item ("6Pfp47eUtnj2D1LMMtmDne"), but when i perform the search specifing this query parameter
q=artist%3ANoisettes+track%3ADon%27t+Upset+The+Rhythm+%28Go+Baby+Go%29&type=track
by the search for item
https://developer.spotify.com/documentation/web-api/reference/#/operations/search
it returns a response with 0 items, meaning it didn't match any item.
Do you have an idea why this happens?
Related
So I'm trying to build a tool to transfer tickets that I sell. A sale comes into my POS, I do an API call for the section, row, and seat numbers ordered (as well as other information obviously). Using the section, row, and seat number, I want to plug those values into a contains (text) statement to in order to find and select the right tickets on the host site.
Here is a sample of how the tickets are laid out:
And here is a screenshot (sorry if this is inconvenient) of the DOM related to one of the rows above:
Given this, how should I structure my contains(text) statement so that it is able to find and select the correct seats? I am very new/inexperienced with automation. I messed around with it a few months ago with some success and have managed to get a tool that gets me right up to selecting the seats but the "div" path confuses me when it comes to searching for text that is tied to other text.
I tried the following structure:
for i in range(int(lowseat), int(highseat)):
web.find_element_by_xpath('//*[contains (text(), "'+section+'")]/following-sibling::[contains text(), "'+row+'")]/following-sibling::[contains text(), "'+str(i)+'")]').click()
to no avail. Can someone help me explain how to structure these statements correctly so that it searches for section, row, and seat number correctly?
Thanks!
Also, if needed, here is a screenshot with more context of the button (in cases its needed). Button is highlighted in sky blue:
you can't use text() for that because it's in nested elements. You probably want to map all these into dicts and select with filter.
Update
Here's an idea for a lazy way to do this (untested):
button = driver.execute_script('''
return [...document.querySelectorAll('button')].find(b => {
return b.innerText.match(/Section 107\b.*Row P.*Seat 10\b/)
})
''')
I'm looking to create a program (right now just trying to see how far I can take it) that is able to retrieve nyc 311 department of building complaints. I did find the api documentation online and I was able to search per complaint number, as far as my program is concerned that defeats the purpose I want to to be able to search by address to see if there is an active complaint so therefore someone wouldn't know there complaint number if they haven't been notified of it. Here is an example of to search with the complaint number; which works.
comNum = "4830407"
response = requests.get('https://data.cityofnewyork.us/resource/eabe-havv.json?complaint_number=%s' %(comNum))
Ok so on the api documentation there are options for zip_code= , house_number= , and house_street= ,
When I attempt to add these to the url like in this example:
responseAddress = requests.get('https://data.cityofnewyork.us/resource/eabe-havv.json?zip_code=11103&house_number=123&house_street=50street'
nothing returns if I eliminate lets say zip and house_number in printed back an empty sting like so : []
I want to be able to have this program searchable by address but I can't seem to get the url to function the way I'm trying to. You can’t possible search an address by only zip or only house number.
If you look at the raw data (no parameters), looking specifically at zipcodes, there are spaces in it. You'll need to url encode those spaces.
This returns []: https://data.cityofnewyork.us/resource/eabe-havv.json?zip_code=11103
This does not. https://data.cityofnewyork.us/resource/eabe-havv.json?zip_code=11103%20%20%20%20
Looks like house numbers are always 12 characters long, so you could do something like this to get a left-padded string
>>> "{:<12d}".format(113)
'113 '
Related: How to urlencode a querystring in Python?
I'm new to python and scrapy and thought I'd try out a simple review site to scrape. While most of the site structure is straight forward, I'm having trouble extracting the content of the reviews. This portion is visually laid out in sets of 3 (the text to the right of 良 (good), 悪 (bad), 感 (impressions) fields), but I'm having trouble pulling this content and associating it with a reviewer or section of review due to the use of generic divs, , /n and other formatting.
Any help would be appreciated.
Here's the site and code I've tried for the grabbing them, with some results.
http://www.psmk2.net/ps2/soft_06/rpg/p3_log1.html
(1):
response.xpath('//tr//td[#valign="top"]//text()').getall()
This returns the entire set of reviews, but it contains newline markup and, more of a problem, it renders each line as a separate entry. Due to this, I can't figure out where the good, bad, and impression portions end, nor can I easily parse each separate review as entry length varies.
['\n弱点をついた時のメリット、つかれたときのデメリットがはっきりしてて良い', '\nコミュをあげるのが楽しい',
'\n仲間が多くて誰を連れてくか迷う', '\n難易度はやさしめなので遊びやすい', '\nタルタロスしかダンジョンが無くて飽きる。'........and so forth
(2) As an alternative, I tried:
response.xpath('//tr//td[#valign="top"]')[0].get()
Which actually comes close to what I'd like, save for the markup. Here it seems that it returns the entire field of a review section. Every third element should be the "good" points of each separate review (I've replaced the <> with () to show the raw return).
(td valign="top")\n精一杯考えました(br)\n(br)\n戦闘が面白いですね\n主人公だけですが・・・・(br)\n従来のプレスターンバトルの進化なので(br)\n(br)\n以上です(/td)
(3) Figuring I might be able to get just the text, I then tried:
response.xpath('//tr//td[#valign="top"]//text()')[0].get()
But that only provides each line at a time, with the \n at the front. As with (1), a line by line rendering makes it difficult to attribute reviews to reviewers and the appropriate section in their review.
From these (2) seems the closest to what I want, and I was hoping I could get some direction in how to grab each section for each review without the markup. I was thinking that since these sections come in sets of 3, if these could be put in a list that would make pulling them easier in the future (i.e. all "good" reviews follow 0, 0+3; all "bad" ones 1, 1+3 ... etc.)...but first I need to actually get the elements.
I've thought about, and tried, iterating over each line with an "if" conditional (something like:)
i = 0
if i <= len(response.xpath('//tr//td[#valign="top"]//text()').getall()):
yield {response.xpath('//tr//td[#valign="top"]')[i].get()}
i + 1
to pull these out, but I'm a bit lost on how to implement something like this. Not sure where it should go. I've briefly looked at Item Loader, but as I'm new to this, I'm still trying to figure it out.
Here's the block where the review code is.
def parse(self, response):
for table in response.xpath('body'):
yield {
#code for other elements in review
'date': response.xpath('//td//div[#align="left"]//text()').getall(),
'name': response.xpath('//td//div[#align="right"]//text()').getall(),
#this includes the above elements, and is regualr enough I can systematically extract what I want
'categories': response.xpath('//tr//td[#class="koumoku"]//text()').getall(),
'scores': response.xpath('//tr//td[#class="tokuten_k"]//text()').getall(),
'play_time': response.xpath('//td[#align="right"]//span[#id="setumei"]//text()').getall(),
#reviews code here
}
Pretty simple task using a part of text as anchor (I used string to get text content for a whole td):
for review_node in response.xpath('//table[#width="645"]'):
good = review_node.xpath('string(.//td[b[starts-with(., "良")]]/following-sibling::td[1])').get()
bad= review_node.xpath('string(.//td[b[starts-with(., "悪")]]/following-sibling::td[1])').get()
...............
I have built a web crawler for a forum game in which players use specific keywords in [b] bold [/b] tags to issue their commands. The bot's job is to traverse through the thread and keep a record of all player's commands, however I'm running into a problem where if player A quotes a post from player B, the bot reads the command of player B in the quote and updates the table for player A.
I have found the specific class name of the quote box, but I cannot figure out how to remove the class from the entire post body.
I tried converting the post to text using the get_attribute('innerHTML') and successfully removed it using regex, however the code I wrote to extract the bold tags (find_attribute_by_tag_name) becomes invalid.
I have two questions for the geniuses that post here:
Is there a way I can delete a specific element from the post body? I searched throughout google and could not find a working solution
Otherwise, is there a way I can convert the HTML I get from get_attribute('innerHTML') back to an element?
def ScrapPosts( driver ):
posts=driver.find_elements_by_class_name("postdetails")
print("Total number of posts on this page:", len(posts))
for post in posts:
#print("username:",post.find_element_by_tag_name("Strong").text)
username=post.find_element_by_tag_name("Strong").text.upper()
#remove the quote boxes before sending to check command?
post_txt=post.find_element_by_class_name("content")
CheckCommand(post_txt, username)
Selenium doesn't have a built in method for deleting elements. However, you can execute some javascript code that can remove the quote box elements. See related question at: https://stackoverflow.com/a/22519967/7880461
This code will delete all elements with the class name quoteBox which I think would work for you if you just change the class name.
driver.execute_script('''
var element = document.getElementsByClassName("quoteBox"), index;
for (index = element.length - 1; index >= 0; index--) {
element[index].parentNode.removeChild(element[index]);
}
''')
Same answer- no built in way of doing that but you can use javascript. This approach would probably a lot more complicated than the first one.
I'm using App Engine, Python, v1.9.23.290
Currently I'm doing Alpha testing before opening up the app to the public.
I'm finding that some items "randomly" disappear from the search index.
I'm looking at one particular item where a user entered the item a week ago.
The search index was updated.
The item showed up as expected in searches.
The NDB entity was no "touched"/modified since last week.
This morning it is not in the index.
I don't have a code sample to share, because there is no "error".
Is this a common problem with a common solution?
To clarify:
When a user creates/edits an NDB entity, I update the item index thusly:
doc = search.Document(doc_id=str(this_item.key.id()), fields=fields)
search_index = search.Index(name="ItemIndex")
try:
search_index.put(doc)
except search.Error:
logging.exception('Put failed on search index ItemIndex')
All is fine. But the item 'disappeared' from the index.
With only a dozen items in the index I've had this happen a couple of times in the last week.
If it never happens for anybody else, I guess that is a good sign. I just have to find where the error is in my code.
If somebody else has had this problem, any indication as to the problem would be a great help.