Breaking 1 String into 2 Strings based on special characters using python - python

I am working with python and I am new to it. I am looking for a way to take a string and split it into two smaller strings. An example of the string is below
wholeString = '102..109'
And what I am trying to get is:
a = '102'
b = '109'
The information will always be separated by two periods like shown above, but the number of characters before and after can range anywhere from 1 - 10 characters in length. I am writing a loop that counts characters before and after the periods and then makes a slice based on those counts, but I was wondering if there was a more elegant way that someone knew about.
Thanks!

Try this:
a, b = wholeString.split('..')
It'll put each value into the corresponding variables.

Look at the string.split method.

split_up = [s.strip() for s in wholeString.split("..")]
This code will also strip off leading and trailing whitespace so you are just left with the values you are looking for. split_up will be a list of these values.

Related

Trying to sort two combined strings alphabetically without duplicates

Challenge: Take 2 strings s1 and s2 including only letters from a to z. Return a new sorted string, the longest possible, containing distinct letters - each taken only once - coming from s1 or s2.
# Examples
a = "xyaabbbccccdefww"
b = "xxxxyyyyabklmopq"
assert longest(a, b) == "abcdefklmopqwxy"
a = "abcdefghijklmnopqrstuvwxyz"
assert longest(a, a) == "abcdefghijklmnopqrstuvwxyz"
So I am just starting to learn, but so far I have this:
def longest(a1, a2):
for letter in max(a1, a2):
return ''.join(sorted(a1+a2))
which returns all the letters but I am trying to filter out the duplicates.
This is my first time on stack overflow so please forgive anything I did wrong. I am trying to figure all this out.
I also do not know how to indent in the code section if anyone could help with that.
You have two options here. The first is the answer you want and the second is an alternative method
To filter out duplicates, you can make a blank string, and then go through the returned string. For each character, if the character is already in the string, move onto the next otherwise add it
out = ""
for i in returned_string:
if i not in out:
out += i
return out
This would be empedded inside a function
The second option you have is to use Pythons sets. For what you want to do you can consider them as lists with no dulicate elements in them. You could simplify your function to
def longest(a: str, b: str):
return "".join(set(a).union(set(b)))
This makes a set from all the characters in a, and then another one with all the characters in b. It then "joins" them together (union) and you get another set. You can them join all the characters together in this final set to get your string. Hope this helps

How to Filter Rows in a DataFrame Based on a Specific Number of Characters and Numbers

New Python user here, so please pardon my ignorance if my approach seems completely off.
I am having troubles filtering rows of a column based off of their Character/Number format.
Here's an example of the DataFrame and Series
df = {'a':[1,2,4,5,6], 'b':[7, 8, 9,10 ], 'target':[ 'ABC1234','ABC123', '123ABC', '7KZA23']
The column I am looking to filter is the "target" column based on their character/number combos and I am essentially trying to make a dict like below
{'ABC1234': counts_of_format
'ABC123': counts_of_format
'123ABC': counts_of_format
'any_other_format': counts_of_format}
Here's my progress so far:
col = df['target'].astype('string')
abc1234_pat = '^[A-Z]{3}[0-9]{4]'
matches = re.findall(abc1234_pat, col)
I keep getting this error:
TypeError: expected string or bytes-like object
I've double checked the dtype and it comes back as string. I've researched the TypeError and the only solutions I can find it converting it to a string.
Any insight or suggestion on what I might be doing wrong, or if this is simply the wrong approach to this problem, will be greatly appreciated!
Thanks in advance!
I am trying to create a dict that returns how many times the different character/number combos occur. For example, how many time does 3 characters followed by 4 numbers occur and so on.
(Your problem would have been earlier and easier understood had you stated this in the question post itself rather than in a comment.)
By characters, you mean letters; by numbers, you mean digits.
abc1234_pat = '^[A-Z]{3}[0-9]{4]'
Since you want to count occurrences of all character/number combos, this approach of using one concrete pattern would not lead very far. I suggest to transform the targets to a canonical form which serves as the key of your desired dict, e. g. substitute every letter with C and every digit with N (using your terms).
Of the many ways to tackle this, one is using str.translate together with a class which does the said transformation.
class classify():
def __getitem__(self, key):
return ord('C' if chr(key).isalpha() else 'N' if chr(key).isdigit() else None)
occ = df.target.str.translate(classify()).value_counts()#.todict()
Note that this will purposely raise an exception if target contains non-alphanumeric characters.
You can convert the resulting Series to a dict with .to_dict() if you like.

How to get first n charakters of a string AND the last one?

I got this code which does something with the first 10 charakters of a string:
f_binary = f.encode(encoding='utf_8')[0:10]
but I want to do it with the 19th charakter as well. I tried like this:
f_binary = f.encode(encoding='utf_8')[0:10],[19]
and this:
f_binary = f.encode(encoding='utf_8')[0:10,19] but it doesn't work.
Python's list comprehension doesn't help me either because it doesn't show how to deal with a larger and a small part of a list or string at the same time.
Just use
f_binary = (f[0:10]+f[-1]).encode(encoding='utf_8')
to encode the first 10 and the last character of string f
turn it to a string and then select using
first_n_chars_and_last = (f[0:n], f[-1])
and turn it THEN into a bytes object

Read certain part of a line/string

I'm trying to figure out how to read a certain part of a string using python, but I can't seem to figure it out, and nobody has the solution I'm looking for.
I have multiple lines formatted similarly to this:
1235:9875:0.1234
Its separated with colons, but the thing is that the length of the line varies, so only reading a certain amount of characters wont work.
Anyone have any idea how to do this? I really need to know this and I hope that this can help other people in the future.
Getting the values into array as strings:
test_str = "1235:9875:0.1234"
number_str_arr = test_str.split(":") # ['1235', '9875', '0.1234']
Saving them as floats instead of strings (maybe what you want?)
number_arr = [float(num) for num in number_str_arr] # [1235.0, 9875.0, 0.1234]
How to access certain values:
first_num = number_arr[0] # 1235.0
last_num = number_arr[-1] # 0.1234

Extract a specific number from a string

I have this string 553943040 21% 50.83MB/s 0:00:39
The length of the numbers can vary
The percent can contain one or two numbers
The spaces between the start of the string and the first number may vary
I need to extract the first number, in this case 553943040
I was thinking that the method could be to:
1) Replace the percent with a separator. something like:
string=string.replace("..%","|") # where the "." represent any character, even an space.
2) Get the first part of the new string by cutting everything after the separator.
string=string.split("|")
string=string[0]
3) Remove the spaces.
string=string.strip()
I know that the stages 2 and 3 works, but I'm stocked on the first. Also if there is any better method of getting it would be great to know it!
Too much work.
>>> '553943040 21% 50.83MB/s 0:00:39'.split()[0]
'553943040'

Categories

Resources