from pywinauto import application
app=application.Application()
3. app.connect(title_re = "/Zero Hedge$/")
4. app.connect(title_re = "/| Zero Hedge/")
I want to get a window of Chrome like this: http://www.zerohedge.com/news/2016-02-18/markets-ignore-fundamentals-and-chase-headlines-because-they-are-dying
As you see when you visit the website, the website title contains "| Zero Hedge" in the last part of it or only "Zero Hedge" contained.
However, I still get a WindowNotFoundError raised in no matter line 3 or line 4. Why the Regular Expression didn't work?
Thank you for all your help!
try this to get a list of all of your windows/titles:
from pywinauto import findwindows
print findwindows.find_windows(title_re=".*")
That should get you a list of all windows and then build your regex based off the results.
Related
I'm trying to build a script, that can click on the Facebook group category "join" button, when certain conditions are met.
The script is already able to navigate "https://www.facebook.com/search/groups/?q=nature_lover" path using selenium.
Image: https://i.stack.imgur.com/3QJhy.png
After navigating to that path I used this code to handle, each group component data.
all_group_elements = self.driver.find_elements(By.CSS_SELECTOR, "div[role=article]")
for group_element in group_elements:
group_name = str(element.text.split('\n')[0])
group_button = str(element.text.split('\n')[-1])
if group_button=="Join":
group_button_target = f"Join Group {group_name}"
if group_button=="Follow Group":
group_button_target = f"Follow Group {group_name}"
# I used this code to target and click the "join" button.
self.driver.find_element(By.CSS_SELECTOR, f"div[aria-label={group_button_target}]").click()
I'm also using "WebDriverWait" in the script. What is the issue here?
Your issue is with f"div[aria-label={group_button_target}]"
That translates to something like "div[aria-label=Join Group NAME]"
That's a problem, because the value of the attribute contains spaces and you need quotes around the value if there are spaces.
Eg:
Bad: 'TAG[ATTRIBUTE=SOME VALUE]'
Good: 'TAG[ATTRIBUTE="SOME VALUE"]'
Those quotes are important if the value contains spaces. You may want to change that line to:
self.driver.find_element(By.CSS_SELECTOR, f'div[aria-label="{group_button_target}"]').click()
I need to grab a url from a text file.
The URL is stored in a string like so: 'URL=http://example.net'.
Is there anyway I could grab everything after the = char up until the . in '.net'?
Could I use the re module?
text = """A key feature of effective analytics infrastructure in healthcare is a metadata-driven architecture. In this article, three best practice scenarios are discussed: https://www.healthcatalyst.com/clinical-applications-of-machine-learning-in-healthcare Automating ETL processes so data analysts have more time to listen and help end users , https://www.google.com/, https://www.facebook.com/, https://twitter.com
code below catches all urls in text and returns urls in list."""
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text)
print(urls)
output:
[
'https://www.healthcatalyst.com/clinical-applications-of-machine-learning-in-healthcare',
'https://www.google.com/',
'https://www.facebook.com/',
'https://twitter.com'
]
i dont have much information but i will try to help with what i got im assuming that URL= is part of the string in that case you can do this
re.findall(r'URL=(.*?).', STRINGNAMEHERE)
Let me go more into detail about (.*?) the dot means Any character (except newline character) the star means zero or more occurences and the ? is hard to explain but heres an example from the docs "Causes the resulting RE to match 0 or 1 repetitions of the preceding RE. ab? will match either ‘a’ or ‘ab’." the brackets place it all into a group. All this togethear basicallly means it will find everything inbettween URL= and .
You don't need RegEx'es (the re module) for such a simple task.
If the string you have is of the form:
'URL=http://example.net'
Then you can solve this using basic Python in numerous ways, one of them being:
file_line = 'URL=http://example.net'
start_position = file_line.find('=') + 1 # this gives you the first position after =
end_position = file_line.find('.')
# this extracts from the start_position up to but not including end_position
url = file_line[start_position:end_position]
Of course that this is just going to extract one URL. Assuming that you're working with a large text, where you'd want to extract all URLs, you'll want to put this logic into a function so that you can reuse it, and build around it (achieve iteration via the while or for loops, and, depending on how you're iterating, keep track of the position of the last extracted URL and so on).
Word of advice
This question has been answered quite a lot on this forum, by very skilled people, in numerous ways, for instance: here, here, here and here, to a level of detail that you'd be amazed. And these are not all, I just picked the first few that popped up in my search results.
Given that (at the time of posting this question) you're a new contributor to this site, my friendly advice would be to invest some effort into finding such answers. It's a crucial skill, that you can't do without in the world of programming.
Remember, that whatever problem it is that you are encountering, there is a very high chance that somebody on this forum had already encountered it, and received an answer, you just need to find it.
Please try this. It worked for me.
import re
s='url=http://example.net'
print(re.findall(r"=(.*)\.",s)[0])
I'm very new to this area so I'm sure it's just something obvious. I'm trying to change a python script so that it finds a node in a different way but I get an "invalid predicate" error.
import xml.etree.ElementTree as ET
tree = ET.parse("/tmp/failing.xml")
doc = tree.getroot()
thingy = doc.find(".//File/Diag[#id='53']")
print(thingy.attrib)
thingy = doc.find(".//File/Diag[BaseName = 'HTTPHeaders']")
print(thingy.attrib)
That should find the same node twice but the second find gets the error. Here is an extract of the XML:
<Diag id="53">
<Formatted>xyz</Formatted>
<BaseName>HTTPHeaders</BaseName>
<Column>17</Column>
I hope I've not cut it down too much. Basically, finding it with "#id" works but I want to search on that BaseName tag instead.
Actually, I want to search on a combination of tags so I have a more complicated expression lined up but I can't get the simple one to work!
The code in the question works when using Python 3.7. If the spaces before and after the equals sign in the predicate are removed, it also works with earlier Python versions.
thingy = doc.find(".//File/Diag[BaseName='HTTPHeaders']")
See https://bugs.python.org/issue31648.
I'm trying to make a stand alone application using Python and Tkinter.
My work is to get all similar looking product IDs from a excel sheet using Python. I got similar looking products for a particular company XYZ.
The code goes like this
IDs = df1['A'].str.extract(r'\b(\d{8}s\d{2})\b' , expand = False).dropna().tolist()
This helps extract all items which have "8 Number followed by s followed by 2 more numbers" like 01234567s12 or 98765432s23
But i want to do something opposite that is input the product ID and get its regex.
The product ID can be anything say ABC123456 or C234-D456
So is there a code which can help me get the regex ?
what you could do is generate regex according to pattern recognition:
6numbers 2letter 2symbol 4 numbers would be :
\d{6} .{2} \S{2} \d{4}
i do not know if this a good practice like this
but atleast you will have regex thats get generated.
the regex :
https://regex101.com/r/HPPAAm/1
I used re module to do this .
import re
text ="12345678S00"
y=""
for i in range(0,len(text)):
r=re.match('[a-zA-Z]',text[i])
if r!=None:
y+='s'
r=re.match('[0-9]',text[i])
if r!=None:
y+='\d'
r=re.match('[.,_=&*()%^#$!#-]',text[i])
if r!=None:
y+='\S'
\d\d\d\d\d\d\d\ds\d\d #output
I've been using python for web scraping. Everything worked like a oiled gear until I used it to get the description of a product which is actually a laaaarge description.
So, it's not working at all... like if my regex was incorrect. Sadly I can not tell you which website I'm scraping in order to show you the real example, but I actually know that the regex is actually ok... it's something like this:
descriptionRegex = 'id="this_id">(.*)</div>\s*<div\ id="another_id"'
for found in re.findall(descriptionRegex, response) :
print found
The deal is that (.*) is like 25000+ characters
There's a limit of characters to reach on a re.findall() finding? There's any way I can achieve this?
You need to specify re.DOTALL in your call to .findall().
If you run this program, it will behave as you request:
import re
response = '''id="this_id">
blah
</div> <div id="another_id"'''
descriptionRegex = r'id="this_id">(.*)</div>\s*<div\ id="another_id"'
for found in re.findall(descriptionRegex, response, re.DOTALL ) :
print found