Python finding variable characters in a string with RE? [duplicate] - python

This question already has answers here:
How to grab number after word in python
(4 answers)
What special characters must be escaped in regular expressions?
(13 answers)
Closed 2 years ago.
Hey I need to search for variable data in a console from a page source
The data will be shown like this:
"data":[13,17]
It will vary a lot with the amount of units inside the table. I have tried out several RE expressions, but the closest I have come to a result, is with a fixed amount of units.
self.driver.get("website.com")
apidata = self.driver.page_source
print(apidata)
datasetbasic = re.search('"data":[[0-99,0-99]+', apidata)
print(datasetbasic)
Instead of having it as a fixed amount, how do I capture anything that is inside the data table?
Before you ask, I cannot use xpath or any other selenium calls to capture this data directly from the webpage (I think), because the element is from a graph, where the data is only visible in the actual console.
Any help is appreciated

Related

Is it possible to filter certain text only cells? [duplicate]

This question already has answers here:
REGEXP_LIKE to match a specific word in a comma separated string
(1 answer)
Regex to match text between commas
(8 answers)
Prevent replacing an already replaced string in comma-separated string
(5 answers)
Closed 7 months ago.
I have around 10 000 data like this:
Row
Effect
1
3_prime_UTR_variant,intron_variant
2
missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant
3
intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant,intron_variant
I wanted to filtered out with regex patter only cells cotained intron_variant. I tried this pattern in python:
pattern = r'^(intron_variant)|(,intron_variant)|(,intron_variant)$'
But it still picked all cells containing text intron_variant. Is it even possible to pick intron_variant only cells?
__
Regex \b option unfortunatelly did not work. I still get effect of 1. sample as result, eventhough i need only 3. row effect to filter out.
If you're using pandas (which I assume you do), you can use the following:
df[df["Effect"] == 'bintron_variant']
This filters out the dataframe in a similar fashion to SQL (where Effect = 'bintron_variant'). It returns a dataframe, so you can save this as a new dataframe as well.

How to format a string in Python source code for improved readability [duplicate]

This question already has answers here:
How do I split the definition of a long string over multiple lines?
(30 answers)
How can I split up a long f-string in Python?
(2 answers)
Closed 10 months ago.
I'm building a rather long file path, like so:
file_path = f"{ENV_VAR}/my_dir/{foo['a']}/{foo['b']}/{bar.date()}/{foo['c']}.json"
This is a simplified example. The actual path is much longer.
To make this line shorter and more readable in code, I have tried the following:
file_path = f"{ENV_VAR}/my_dir\
/{foo['a']}\
/{foo['b']}\
/{bar.date()}\
/{foo['c']}.json"
This works but also affects the actual string in my program.
More specifically, the linebreaks are added to the string value itself, which is undesirable in this case. I only want to change the formatting of the source code.
Is it possible to format the string without affecting the actual value in my program?

What does '\' mean when declaring a variable? [duplicate]

This question already has answers here:
What does a backslash by itself ('\') mean in Python? [duplicate]
(5 answers)
What is the purpose of a backslash at the end of a line?
(2 answers)
Closed 1 year ago.
While I was searching code from the internet about YouTube data analysis, I found code like this:
df_rgb2['total_sign_comment_ratio'] = df_rgb2['total_number_of_sign'] / df_rgb2['comment_count']
total_sign_comment_ratio_max = df_rgb2['total_sign_comment_ratio'].replace([np.inf, -np.inf], 0).max()
df_rgb2['total_sign_comment_ratio'] = \
df_rgb2['total_sign_comment_ratio'].replace([np.inf, -np.inf], total_sign_comment_ratio_max*1.5)
and I was wondering why the analyst used the expression:
df_rgb2['total_sign_comment_ratio'] = \
because whether I apply that code or not, the result is same.
I tried to find the meaning of '\' but all I have got is how to use '\' when printing out the result.
\ is usually used to make a piece of code go on onto multiple lines. If you where to just press enter and continue to write code a line below for example declaring a variable, it would count as an error.
You use this when you need to tidy up code or when your working window is too small for some reason.
See: https://developer.rhino3d.com/guides/rhinopython/python-statements/

Numeric pattern search in regular expression using Python [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 2 years ago.
I have text as below-
my_text = "My telephone number is 408-555-1234"
on which i am searching the pattern
re.findall(r'\d{3}-\d{1,}',my_text)
My intention was to search for three digit numeric value followed by - and then another set of one or more than one digit numeric value. Hence I was expecting the result to be - ['408-555','555-1234'],
However the result i am getting os only ['408-555'] .
Could anyone suggest me what is wrong in my understaning here. And suggest a pattern that would serve my purpose
you can use:
re.findall(r'(?=(\d{3}-\d+))', my_text)
output:
['408-555', '555-1234']

Regex - so close, yet so far away [duplicate]

This question already has answers here:
Regular expression to find URLs within a string
(35 answers)
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 4 years ago.
Here is my current regex: (?:ht|f)tps?:[\S]*\/?(?:\w+)
I need to refine it such that it pulls the following link correctly from the quoted text below: http://www.purdue.edu/transcom/index.php
Any thoughts on how I can improve my current regex? Thanks in advance!
Additional information about the experimental protocol and results is
provided in the companion files and the TransCom project web site
(http://www.purdue.edu/transcom/index.php).The results of the Level 1
experiments presented here are grouped into two broad categories
I do not tested your regex thougoutly, and this is not clear enough why is your current regex failing.
But to catch a ulr in general, I would use the repetition of the group (the authorized characters for html minus the slash like [a-zA-Z0-9.]) and the slash)
something like
r'(?:ht|f)tps?:\\(?:\\[_html_authorized_chars])*'
and eventually a positive lookahead assertion if the answer is always inside quotes or parenthesis...
Url Similar Splitter
matches url similars and splits it into its address and parameters
by deme72
([--:\w?#%&+~#=]*\.[a-z]{2,4}\/{0,2})((?:[?&](?:\w+)=(?:\w+))+|[--:\w?#%&+~#=]+)?
Source: regexr.com community

Categories

Resources