Differentiating between double and single characters? [duplicate] - python

This question already has an answer here:
Dynamic variable name in python
(1 answer)
Closed 6 years ago.
I have a weird variable name for a dictionary that I have to use that looks like so:
val.display_counter1.rjj = {}
This is illegal so I've decided to use this format:
val_display__counter1_rjj = {}
Later in the code I need to match up that dict variable name with the original name. So I'm trying to find a way to replace those single underscores with dots and the double underscores with a single underscore. I'm sure that there is a regex solution, but regex isn't my strong suit.
Is there a way to selectively replace like this?
Edit:
There is some confusion with my question so allow me to clarify. The original name:
val.display_counter1.rjj
This is NOT a variable in itself but merely an item name from the 3D software package Modo. There are many items that share this format. What I am trying to do is create a class of dicts that will store information about these items. I want to name the dicts for the items and be able to match them in program.
For me to make this match I need to revert my dict name back to it's original so I can make the match:
val_display__counter1_rjj --> val.display_counter1.rjj
All I need to know is how to make the Regex match ONLY the single underscore and discard the matches that are surrounded by other underscores.
Also, not sure why this is marked as duplicate. But my question doesn't involve dynamic variables.

Well, I am new to Python.
But hope this works!!!
import re;
val_display__counter1_rij= {};
l = ['val_display__counter1_rij', 'val_display___counter1_rij','val.display__counter1_rij'] # list of variables to match
for x in l:
if "." not in x:
article = re.sub(r'(?is)_', '.', x)
if ".." in article:
article= article.replace("..","__");
if (article == 'val.display__counter1.rij'):
print(article)

Related

Convert list of string to dict - Remove extra comma [duplicate]

This question already has answers here:
Convert a String representation of a Dictionary to a dictionary
(11 answers)
Closed 1 year ago.
I am trying to create a dictionary from a list of strings. My attempt to convert this list of string to list of dictionary is as below:
author_dict = [[dict(map(str.strip, s.split(':')) for s in author_transform.split(','))] for author_transform in list_of_strings]
Everything was working fine until I encountered this piece of string:
[[country:United States,affiliation:University of Maryland, Baltimore County,name:tim oates,id:2217452330,gridid:grid.266673.0,affiliationid:79272384,order:2],........,[]]
As this string has an extra comma(,) in the middle of the intended value of affiliation key: my list is getting a spit at the wrong place. Is there a way (or idea) I can use to avoid this kind of situation?
If it is not possible, any suggestions on how can I ignore thiskind of list?
I would solve this by using a regular expression for splitting. This way you can split only on those commas that are followed by a colon without another comma in between.
In your code, replace
author_transform.split(',')
with
re.split(',(?=[^,]+:)', author_transform)
(And don’t forget to import re, of course.)
So, the whole code snippet becomes this:
author_dict = [
[
dict(map(str.strip, s.split(':'))
for s in re.split(',(?=[^,]+:)', author_transform))
]
for author_transform in list_of_strings
]
I took the liberty of reformatting the code, so the structure of the list comprehensions becomes clear.

Wildcard in python dictionary

I am trying create a python dictionary to reference 'WHM1',2,3, 'HISPM1',2,3, etc. and other iterations to create a new column with a specific string for ex. White or Hispanic. Using regex seems like the right path but I am missing something here and refuse to hard code the whole thing in the dictionary.
I have tried several iterations of regex and regexdict :
d = regexdict({'W*':'White', 'H*':'Hispanic'})
eeoc_nac2_All_unpivot_df['Race'] =
eeoc_nac2_All_unpivot_df['EEOC_Code'].map(d)
A new column will be created with 'White' or 'Hispanic' for each row based on what is in an existing column called 'EEOC_Code'.
Your regular expressions are wrong - you appear to be using glob syntax instead of proper regular expressions.
In regex, x* means "zero or more of x" and so both your regexes will trivially match the empty string. You apparently mean
d = regexdict({'^W':'White', '^H':'Hispanic'})
instead, where the regex anchor ^ matches beginning of string.
There are several third-party packages 1, 2, 3 named regexdict so you should probably point out which one you use. I can't tell whether the ^ is necessary here, or whether the regexes need to match the input completely (I have assumed a substring match is sufficient, as is usually the case in regex) because this sort of detail may well differ between implementations.
I'm not sure to have completely understood your problem. However, if all your labels have structure WHM... and HISP..., then you can simply check the first character:
for race in eeoc_nac2_All_unpivot_df['EEOC_Code']:
if race.startswith('W'):
eeoc_nac2_All_unpivot_df['Race'] = "White"
else:
eeoc_nac2_All_unpivot_df['Race'] = "Hispanic"
Note: it only works if what you have inside eeoc_nac2_All_unpivot_df['EEOC_Code'] is iterable.

Splitting quotes [duplicate]

This question already has answers here:
RegEx: Grabbing values between quotation marks
(20 answers)
Closed 6 years ago.
Does anyone have any advice for removing separators of split quotes in a piece of text? I am using Python, and am still a beginner.
For example, "Well," he said, "I suppose I could take a break." In this example, the italicized "he said," is the separator, and needs to be removed. Then, the quote needs to be seen as one string within quotations such as, "Well, I suppose I could take a break." I haven't been able to find code similar to this yet, and was hoping someone may be able to point me in the right direction.
Thanks!
In order to get the content only within " in your given string, you may use re library as:
import re
my_string = '"Well," he said, "I suppose I could take a break."'
quoted_string = re.findall(r'\".*?\"', my_string)
# 'quoted_string' is -> ['"Well,"', '"I suppose I could take a break."']
new_string = ''.join(quoted_string).replace('"', '')
# 'new_string' is -> 'Well, I suppose I could take a break.'
You may write the same as one-liner as:
''.join(re.findall(r'\".*?\"', my_string)).replace('"', '')

Python: how to emphasize a specific sequence of characters within a string when it is printed? [duplicate]

This question already has answers here:
How can I print bold text in Python?
(17 answers)
Closed 7 years ago.
I have a program that prints strings consisting of a random sequence of lowercase letters and spaces.
Is there a way emphasize a certain target word within that string when I print the string to make the target easier to spot in the output?
For example:
Current code:
>>> print(mystring)
mxxzjvgjaspammttunthcrurny dvszqwkurxcxyfepftwyrxqh
Desired behaviour:
>>> some_function_that_prints_with_emphasis(mystring,'spam')
mxxzjvgjaspammttunthcrurny dvszqwkurxcxyfepftwyrxqh
Other acceptable forms of emphasis would be:
boldening the characters
changing colour
capitalizing
separating with extra characters
any other idea that is easily implemented in Python 3
I'm leaving the requirements deliberately vague because as a beginner I'm not aware of all that Python can do so there might be a simpler way to do this that I've overlooked.
Basically everything you're looking for would be done with the curses module; it's designed to perform advanced control over the terminal to do stuff like change colors, bold text, etc. You need to use the various has_* commands to determine terminal capabilities and choose your preferred emphasis style, but after that the docs page and the linked tutorial should give you all the info you need.
For simpler usage, you can just print out the raw terminal escape codes to add and remove color (you just have to split the line up yourself or use re to perform replacements to add the codes). For example, to highlight 'spam' in a line as blue:
myline = "abc123spamscsdfwerf"
print(myline.replace('spam', '\033[94mspam\033[0m'))
For ease of use, you can use ansicolors to avoid having to manually deal with color escapes and the like.
You can replace the target with its version capitalized:
def emphasis(string, target):
return string.replace(target, target.upper())
you could just find the substring you're looking for and put a marker under them, like:
needle = "spam"
haystack = "mxxzjvgjaspammttunthcrurny dvszqwkurxcxyfepftwyrxqh"
pos = haystack.find(needle)
print(haystack)
print(" "*(pos-1) + "^" * len(needle))

Using regex to remove substrings from list items in python

Im sure this must be a duplicate question but I can't find an answer anywhere. I have a list with multiple strings as below:
['>ctg7180000016561_3757\nAAAAATTTAGTTAAAACTATAACATTAGCTTGTCAAGCTAAAATTACTATGTAAGTAGTAATTTTTA\n', '>ctg7180000016561_3824\nATCCCTCAAATAGCACCCATTAACTGATTATCCTTATTCTTAATATTCACCACCTCTCTCCTAATATTTAGAGCTTCTAACTATTTCTTTATCATGTACCCCCCCAAAAAATCTGTTTTTTATAAAAAAACTAGTATAAATAACTGATCATGATAACTAACCTCTTTTCGTCTTTCGACCCCTCTACTAACTTAAATACTAACTTTAACTGAGTTAGGACTATCCTCGGGGTGGCTGTAATCCCGAGGATATTTTGGATTATCCCCTCGCGTTTCTCCCTGCTTTGAATAAAACTTATCAGTACTCTTCACAAAGAATTCAAAGTCCTTGTTAACAACAAAAAATCCCAAGGCAGAACCCTAATCCTGATTTCCTTATTTTCTATTATTTTATTTAATAACTTCATAGGACTATTCCCATATATTTTCACATCCACAAGTCACATAGTATTAACCCTGTCCCTGGCTCTCCCCATATGACTAAGATTTATATTGTATGGGTGGGTAAATAATACAACCCACATGCTAGCCCATCTAGTACCCCAAGGAACCCCTGCCGTTCTAATACCATTTATGGTGTGTATTGAAACAATCAGAAATGTTATCCGACCCGGCACCCTGGCAATCCGGCTATCCGCAAATATAATTGCAGGACACCTACTAATAACCCTTCTAGGTAACACGGGAAAC\n', '>ctg7180000016561_4513\nT\n']
And all I want to do is remove the numbers after the underscore, so in this example the output would be:
['>ctg7180000016561\nAAAAATTTAGTTAAAACTATAACATTAGCTTGTCAAGCTAAAATTACTATGTAAGTAGTAATTTTTA\n', '>ctg7180000016561\nATCCCTCAAATAGCACCCATTAACTGATTATCCTTATTCTTAATATTCACCACCTCTCTCCTAATATTTAGAGCTTCTAACTATTTCTTTATCATGTACCCCCCCAAAAAATCTGTTTTTTATAAAAAAACTAGTATAAATAACTGATCATGATAACTAACCTCTTTTCGTCTTTCGACCCCTCTACTAACTTAAATACTAACTTTAACTGAGTTAGGACTATCCTCGGGGTGGCTGTAATCCCGAGGATATTTTGGATTATCCCCTCGCGTTTCTCCCTGCTTTGAATAAAACTTATCAGTACTCTTCACAAAGAATTCAAAGTCCTTGTTAACAACAAAAAATCCCAAGGCAGAACCCTAATCCTGATTTCCTTATTTTCTATTATTTTATTTAATAACTTCATAGGACTATTCCCATATATTTTCACATCCACAAGTCACATAGTATTAACCCTGTCCCTGGCTCTCCCCATATGACTAAGATTTATATTGTATGGGTGGGTAAATAATACAACCCACATGCTAGCCCATCTAGTACCCCAAGGAACCCCTGCCGTTCTAATACCATTTATGGTGTGTATTGAAACAATCAGAAATGTTATCCGACCCGGCACCCTGGCAATCCGGCTATCCGCAAATATAATTGCAGGACACCTACTAATAACCCTTCTAGGTAACACGGGAAAC\n', '>ctg7180000016561\nT\n']
I am using regex and I have a perfect match but I cant work out how to actually remove the substrings. My code so far is:
pattern = re.compile('_[0-9]*')
for x in SequenceList:
re.sub(pattern, '', x)
I'm aware that this is just changing the variable x, but even when I just print x within the for loop the pattern isn't removed. How do I actually remove the pattern and alter the list?
Thank you and sorry if this is already answered somewhere!
Strings are immutable. So, re.sub will create a new string. Instead, you can use list comprehension to create a new list with the replaced strings like this
import re
pattern = re.compile(r"_\d+")
print [pattern.sub("", item) for item in data]

Categories

Resources