Changing a part of string to the subscript - python

I have a list of strings, I want to place '_Octa', '_Tet' and so on in subscripts
original = ['VO₆_Octa', 'FeO₄_Tet', 'FeO₆_Oct', 'BaO₉_Tsf', 'PrO₆_Oct', 'CaO₆_Oct',
'HgO₂_Lin', 'CrO₆_Oct', 'AgO₄_Tet', 'EuO₉_Tsf']
What I want is posted in the screenshot
I have hundreds of such strings in a list. For numbers, I have found many such answers and I am able to apply in my case as well. Is there a better way to do it for such strings? Any help or pointers to similar problems would be great.

Use these!
SUB =
str.maketrans("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_",
"₀₁₂₃₄₅₆₇₈₉ₐᵦ𝒸𝒹ₑfgₕᵢⱼₖₗₘₙₒₚqᵣₛₜᵤᵥwₓyz🇦🇧🇨🇩🇪🇫🇬🇭🇮🇯🇰🇱🇲🇳🇴🇵🇶🇷🇸🇹🇺🇻🇼🇽🇾🇿₋")
SUP =
str.maketrans("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_",
"⁰¹²³⁴⁵⁶⁷⁸⁹ᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖᵠʳˢᵗᵘᵛʷˣʸᶻᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖᵠʳˢᵗᵘᵛʷˣʸᶻ‾")
Here's the code:
original = ['VO₆_Octa', 'FeO₄_Tet', 'FeO₆_Oct', 'BaO₉_Tsf', 'PrO₆_Oct', 'CaO₆_Oct',
'HgO₂_Lin', 'CrO₆_Oct', 'AgO₄_Tet', 'EuO₉_Tsf']
SUB =
str.maketrans("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_",
"₀₁₂₃₄₅₆₇₈₉ₐᵦ𝒸𝒹ₑfgₕᵢⱼₖₗₘₙₒₚqᵣₛₜᵤᵥwₓyz🇦🇧🇨🇩🇪🇫🇬🇭🇮🇯🇰🇱🇲🇳🇴🇵🇶🇷🇸🇹🇺🇻🇼🇽🇾🇿₋")
SUP =
str.maketrans("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_",
"⁰¹²³⁴⁵⁶⁷⁸⁹ᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖᵠʳˢᵗᵘᵛʷˣʸᶻᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖᵠʳˢᵗᵘᵛʷˣʸᶻ‾")
new = []
for item in original:
x = item.split('_')
new.append(x[0] + "₋" + x[1].translate(SUB))
print(new)
As you might have noticed, some letters don't actually convert properly to lowercase.
This is because the alphabets for subscript and superscript don't actually exist as a proper alphabet in Unicode.
I've used various online converters and could only get the conversions of the letters that you see above (ie: excluding lowercase b,c,d,f,g,q,w,y,z).
However in my opinion, the better way to do this would be to format the string in some markup language (HTML, Latex etc).
You'll have to use simple <sub></sub> and <sup></sup> tags in HTML.

Related

I'm not sure if I have correctly converted a string to an integer?

I apologize if this post isn't formatted correctly, I've tried to make sure that it is but this is my first time posting so I imagine I may have gotten something wrong.
I'm a complete beginner learning Python 3 and am going through a Udemy course where currently we are learning about joins and splits.
A challenge was set to convert a string of numbers to integers. The code I have written returns the values as individual numbers, however I am not sure whether these values were converted to integers or still are strings? The instructor of the course did not use the method that I have below and his method returned a list of integers.
I used the isdigit() method to see whether or not 'digits' returned True which it did. However I feel as though I've gone wrong somewhere?
I imagine this is an extremely basic question but any sort of clarity would be greatly appreciated!
numbers = "9,223,372,036,854,775,807"
for digits in numbers.split(","):
print(int(digits))
Here's an alternative to #Bryan Deng's answer:
num = '123,456,789'
num = num.replace(',', '') # replaces all commas with a '' empty string
# This way we don't have to convert to list than back to string
print(num)
# 123456789
Yes, you are outputting numbers.
If you want to output a list of numbers, you can achieve it in two ways.
create an empty list and push number into it.
numbers = "9,223,372,036,854,775,807"
ans = []
for digits in numbers.split(","):
ans.append(int(digits))
print(ans)
use list comprehension.
numbers = "9,223,372,036,854,775,807"
print([int(element) for element in numbers.split(",")])
I'm assuming you're asking to convert numbers from a string to an int using splits and joins. Here's one way to do it:
numbers = "123,456,789"
numbers.split(",") # numbers is now a list -> ["123", "456", "789"]
"".join(numbers) # converts numbers back into a string -> "123456789"
print(int(numbers))
# >>> 123456789

Read certain part of a line/string

I'm trying to figure out how to read a certain part of a string using python, but I can't seem to figure it out, and nobody has the solution I'm looking for.
I have multiple lines formatted similarly to this:
1235:9875:0.1234
Its separated with colons, but the thing is that the length of the line varies, so only reading a certain amount of characters wont work.
Anyone have any idea how to do this? I really need to know this and I hope that this can help other people in the future.
Getting the values into array as strings:
test_str = "1235:9875:0.1234"
number_str_arr = test_str.split(":") # ['1235', '9875', '0.1234']
Saving them as floats instead of strings (maybe what you want?)
number_arr = [float(num) for num in number_str_arr] # [1235.0, 9875.0, 0.1234]
How to access certain values:
first_num = number_arr[0] # 1235.0
last_num = number_arr[-1] # 0.1234

Any way to split python string without generate new strings?

The input is a string containing a huge number of characters, and I hope to split this string into a list of strings with a special delimiter.
But I guess that simply using split would generate new strings rather than split the original input string itself, and in that case it consumes large memory(it's guaranteed that the original string would not be used any longer).
So is there a convenient way to do this destructive split?
Here is the case:
input_string = 'data1 data2 <...> dataN'
output_list = ['data1', 'data2', <...> 'dataN']
What I hope is that the data1 in output_list is and the data1(and all others) in input_string shares the same memory area.
BTW, for each input string, the size is 10MB-20MB; but as there are lots of such strings(about 100), so I guess memory consumption should be taken into consideration here?
In Python, strings are immutable. This means that any operation that changes the string will create a new string. If you are worried about memory (although this shouldn't be much of an issue unless you are dealing with gigantic strings), you can always overwrite the old string with the new, modified string, replacing it.
The situation you are describing is a little different though, because the input to split is a string and the output is a list of strings. They are different types. In this case, I would just create a new variable containing the output of split and then set the old string (that was the input to the split function) to None, since you guarantee it will not be used again.
Code:
split_str = input_string.split(delim)
input_string = None
The only alternative would be to access the substrings using slicing instead of split. You can use str.find to find the position of each delimiter. However this would be slow, and fiddly. If you can use split and get the original string to drop out of scope then it would be worth the effort.
You say that this string is input, so you might like to consider reading a smaller number of characters so you are dealing with more manageable chunks. Do you really need all the data in memory at the same time?
Perhaps the Pythonic way would be to use iterators? That way, the new substrings will be in memory only one at a time. Based on
Splitting a string into an iterator :
import re
string_long = "my_string " * 100000000 # takes some memory
# strings_split = string_long.split() # takes too much memory
strings_reiter = re.finditer("(\S*)\s*", string_long) # takes no memory
for match in strings_reiter:
print match.group()
This works fine without leading to memory problems.
If you're talking about strings that are SO huge that you can't stand to put them in memory, then maybe running through the string once (O(n), probably improvable using str.find but I'm not sure) then storing a generator that holds slice objects would be more memory-efficient?
long_string = "abc,def,ghi,jkl,mno,pqr" # ad nauseum
splitters = [','] # add whatever you want to split by
marks = [i for i,ch in enumerate(long_string) if ch in splitters]
slices = []
start = 0
for end in marks:
slices.append(slice(start,end))
start = end+1
else:
slices.append(slice(start,None))
split_string = (long_string[slice_] for slice_ in slices)

Does python have a better way to split a string than converting to a list?

I am testing a set of web apis, using python, a language I am still in the process of learning. I am taking in a string, the name of a dealer, and chopping off the end after a random number of characters. I am then adding a character (wild card) to then end of the string. That modified string is then passed to an api that searches for the name of a dealer, and can include wild cards. I have the code below, but it seems long. Is there a cleaner looking, or more pythonic way of approaching this problem? Potentially a way to do this without converting from a string, to a list, back to a string?
split_name = list(name) #turns name string into list
rand = random.randint(6,(len(split_name)-1)) #generates random number
split_name[rand:len(split_name)] = [] #breaks of end part of name list
srch_name = ''.join(split_name) #stringifies list
#Send request
rqst = requests.get(name_srch %(key, (srch_name + '*'))) #this adds * and sends the request
Name is earlier defined in the script to be some string, such as "Dave and Bills equipment sales and service, INC"
I should note I am using python 2.7
Yes, use slicing to pick a random number of characters from the string, no need to split it into a list first:
rand = random.randint(6, len(split_name) - 1)
search_name = name[rand:] + '*'
rqst = requests.get(name_srch % (key, search_name))
Strings are sequences too and support slicing directly without needing to turn it into a list first. You can omit the end-point, slicing defaults to the end of the string in that case.

Breaking 1 String into 2 Strings based on special characters using python

I am working with python and I am new to it. I am looking for a way to take a string and split it into two smaller strings. An example of the string is below
wholeString = '102..109'
And what I am trying to get is:
a = '102'
b = '109'
The information will always be separated by two periods like shown above, but the number of characters before and after can range anywhere from 1 - 10 characters in length. I am writing a loop that counts characters before and after the periods and then makes a slice based on those counts, but I was wondering if there was a more elegant way that someone knew about.
Thanks!
Try this:
a, b = wholeString.split('..')
It'll put each value into the corresponding variables.
Look at the string.split method.
split_up = [s.strip() for s in wholeString.split("..")]
This code will also strip off leading and trailing whitespace so you are just left with the values you are looking for. split_up will be a list of these values.

Categories

Resources