I have a list/string. And I would like to split it into pairs and separate them by parenthesis in the same list as output. How do I do that?
What I tried so far?
ip='MDSYS.SDO_GEOMETRY(2003, NULL, NULL, MDSYS.SDO_ELEM_INFO_ARRAY(1, 1003, 1), MDSYS.SDO_ORDINATE_ARRAY(22027, 22943.23, 22026, 22939, 22025, 22936, 22025.09, 22932, 22027, 22929, 22030, 22926)'
split_string_1 = "MDSYS.SDO_ORDINATE_ARRAY("
split_string_2 = ")"
data = list(map(int, ip.split(split_string_1)[1].split(split_string_2)[0].split(", ")))
result = list(zip(data[:-1], data[1:]))
I get an error saying ValueError: invalid literal for int() with base 10: '22943.23' How do I solve this?
Desired output:
[(22027, 22943.23), (22026, 22939), (22025, 22936), (22025.09, 22932), (22027, 22929), (22030, 22926)]
You can rpartition 2 times with your splitting delimiters:
>>> out = ip.rpartition(split_string_1)[-1].rpartition(split_string_2)[0]
>>> out
"22027, 22943.23, 22026, 22939, 22025, 22936, 22025.09, 22932, 22027, 22929, 22030, 22926"
Then split over ", " and map to floats; lastly take every 2 elements with zip (i.e., odd indices and even indices in parallel) to form the output:
>>> out = list(map(float, out.split(", ")))
>>> out = list(zip(out[::2], out[1::2]))
>>> out
[(22027.0, 22943.23),
(22026.0, 22939.0),
(22025.0, 22936.0),
(22025.09, 22932.0),
(22027.0, 22929.0),
(22030.0, 22926.0)]
Use regex pattern ('\([0-9., ]+\) which will give you all tuples containing interger/float, convert string to tuple using ast.literal_eval(). Finally get list of tuples using list(zip(out[::2], out[1::2])
import ast
import re
out = re.findall('\([0-9., ]+\)', ip)[-1]
out = ast.literal_eval(out)
out = list(zip(out[::2], out[1::2])
print(out)
[(22027, 22943.23),
(22026, 22939),
(22025, 22936),
(22025.09, 22932),
(22027, 22929),
(22030, 22926)]
You almost have it, just replace the int with float, because you've got floating point numbers in your data
data = list(map(float, ip.split(split_string_1)[1].split(split_string_2)[0].split(", ")))
result = list(zip(data[::2], data[1::2]))
print(result)
>> [(22027.0, 22943.23), (22943.23, 22026.0), (22026.0, 22939.0), (22939.0, 22025.0), (22025.0, 22936.0), (22936.0, 22025.09), (22025.09, 22932.0), (22932.0, 22027.0), (22027.0, 22929.0), (22929.0, 22030.0), (22030.0, 22926.0)]
A comment on your splitting:
If you're guaranteed to have a string of that form, you can perhaps just do the following:
1)Get rid of the last ) character, ip = ip[:-1]
2)Split on '(' and take the last part, ip = ip.split("(")[-1]
3)Split this on comma, ip = ip.split(",")
ip = ip[:-1].split("(")[-1].split(",")
data = list(map(float, ip))
result = list(zip(data[::2], data[1::2]))
I suggest you define a method that does the operation, in this case it is returning a generator:
def each_slice(iterable, n=2):
if n < 2: n = 1
i, size = 0, len(iterable)
while i < size-n+1:
yield iterable[i:i+n]
i += n
Once you have your list (letting apart the conversion of string to number):
lst = ['22027', '22943.23', '22026', '22939', '22025', '22936', '22025.09', '22932', '22027', '22929', '22030', '22926']
You can just call the method each_slice(lst):
print(list(each_slice(lst)))
#=> [['22027', '22943.23'], ['22026', '22939'], ['22025', '22936'], ['22025.09', '22932'], ['22027', '22929'], ['22030', '22926']]
Note that this implementation cuts off the reminder elements, for example grouping by five:
print(list(each_slice(lst, n=5)))
#=> [['22027', '22943.23', '22026', '22939', '22025'], ['22936', '22025.09', '22932', '22027', '22929']]
Related
Given the following problem,
Input
lis = ['0-10,000, 10,001-11,000, 11,001-12,000']
Output:
['0-10,000','10,001-11,000', '11,001-12,000']
Create a function such that, it should avoid if there's a single range in the list but split the ranges if there are multiple ranges in the list.
Can anybody help me with this problem, I can't even think of any method.
First build a string from the list of elements, then split the string with the specified ", ".
lis = ['0-10,000, 10,001-11,000, 11,001-12,000']
print(''.join(lis).split(", "))
I have tried this :
lis = ['0-10,000, 10,001-11,000, 11,001-12,000','0-10,001, 10,001-11,000, 11,001-12,000','0-10,011, 10,001-11,000, 11,001-12,000']
def data_clean(x):
v = []
for i in range(len(x)):
v.append(x[i].split(", "))
return v
Here it is how the output is :
[['0-10,000', '10,001-11,000', '11,001-12,000'],
['0-10,001', '10,001-11,000', '11,001-12,000'],
['0-10,011', '10,001-11,000', '11,001-12,000']]
given a string as shown below,
"[xyx],[abc].[cfd],[abc].[dgr],[abc]"
how to print it like shown below ?
1.[xyz]
2.[cfd]
3.[dgr]
The original string will always maintain the above-mentioned format.
I did not realize you had periods and commas... that adds a bit of trickery. You have to split on the periods too
I would use something like this...
list_to_parse = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
count = 0
for i in list_to_parse.split('.'):
for j in i.split(','):
string = str(count + 1) + "." + j
if string:
count += 1
print(string)
string = None
Another option is split on the left bracket, and then just re-add it with enumerate - then strip commas and periods - this method is also probably a tiny bit faster, as it's not a loop inside a loop
list_to_parse = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
for index, i in enumerate(list.split('[')):
if i:
print(str(index) + ".[" + i.rstrip(',.'))
also strip is really "what characters to remove" not a specific pattern. so you can add any characters you want removed from the right, and it will work through the list until it hits a character it can't remove. there is also lstrip() and strip()
string manipulation can always get tricky, so pay attention. as this will output a blank first object, so index zero isn't printed etc... always practice and learn your needs :D
You can use split() function:
a = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
desired_strings = [i.split(',')[0] for i in a.split('.')]
for i,string in enumerate(desired_strings):
print(f"{i+1}.{string}")
This is just a fun way to solve it:
lst = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
count = 1
var = 1
for char in range(0, len(lst), 6):
if var % 2:
print(f"{count}.{lst[char:char + 5]}")
count += 1
var += 1
output:
1.[xyx]
2.[cfd]
3.[dgr]
explanation : "[" appears in these indexes: 0, 6, 12, etc. var is for skipping the next pair. count is the counting variable.
Here we can squeeze the above code using list comprehension and slicing instead of those flag variables. It's now more Pythonic:
lst = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
lst = [lst[i:i+5] for i in range(0, len(lst), 6)][::2]
res = (f"{i}.{item}" for i, item in enumerate(lst, 1))
print("\n".join(res))
You can use RegEx:
import regex as re
pattern=r"(\[[a-zA-Z]*\])\,\[[a-zA-Z]*\]\.?"
results=re.findall(pattern, '[xyx],[abc].[cfd],[abc].[dgr],[abc]')
print(results)
Using re.findall:
import re
s = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
print('\n'.join(f'{i+1}.{x}' for i,x in
enumerate(re.findall(r'(\[[^]]+\])(?=,)', s))))
Output:
1.[xyx]
2.[cfd]
3.[dgr]
I have a string where I want to output random ints of differing size using Python's built-in format function.
IE: "{one_digit}:{two_digit}:{one_digit}"
Yields: "3:27:9"
I'm trying:
import random
"{one_digit}:{two_digit}:{one_digit}".format(one_digit=random.randint(1,9),two_digits=random.randint(10,99))
but this always outputs...
"{one_digit}:{two_digit}:{one_digit}".format(one_digit=random.randint(1,9),two_digit=random.randint(10,99))
>>>'4:22:4'
"{one_digit}:{two_digit}:{one_digit}".format(one_digit=random.randint(1,9),two_digit=random.randint(10,99))
>>>'7:48:7'
"{one_digit}:{two_digit}:{one_digit}".format(one_digit=random.randint(1,9),two_digit=random.randint(10,99))
>>>'2:28:2'
"{one_digit}:{two_digit}:{one_digit}".format(one_digit=random.randint(1,9),two_digit=random.randint(10,99))
>>>'1:12:1'
Which is as expected since the numbers are evaluated before hand. I'd like them to all be random, though. I tried using a lambda function but only got this:
"test{number}:{number}".format(number=lambda x: random.randint(1,10))
But that only yields
"test{number}:{number}".format(number=lambda x: random.randint(1,10))
>>>'test<function <lambda> at 0x10aa14e18>:<function <lambda> at 0x10aa14e18>'
First off: str.format is the wrong tool for the job, because it doesn't allow you to generate a different value for each replacement.
The correct solution is therefore to implement your own replacement function. We'll replace the {one_digit} and {two_digit} format specifiers with something more suitable: {1} and {2}, respectively.
format_string = "{1}:{2}:{1}"
Now we can use regex to substitute all of these markers with random numbers. Regex is handy because re.sub accepts a replacement function, which we can use to generate a new random number every time:
import re
def repl(match):
num_digits = int(match.group(1))
lower_bound = 10 ** (num_digits - 1)
upper_bound = 10 * lower_bound - 1
random_number = random.randint(lower_bound, upper_bound)
return str(random_number)
result = re.sub(r'{(\d+)}', repl, format_string)
print(result) # result: 5:56:1
How about this?
import random
r = [1,2,3,4,5]
','.join(map(str,(random.randint(-10**i,10**i) for i in r)))
The first two params(-10** i, 10**i) are low and upper bound meanwhile size=10 is the amount of numbers).
Example output: '-8,45,-328,7634,51218'
Explanation:
It seems you are looking to join random numbers with ,. This can simply be done using ','.join([array with strings]), e.g. ','.join(['1','2']) which would return '1,2'.
What about This?
'%s:%s:%s' % (random.randint(1,9),random.randint(10,99),random.randint(1,9))
EDIT : meeting requirements.
a=[1,2,2,1,3,4,5,9,0] # our definition of the pattern (decimal range)
b= ''
for j in enumerate(a):
x=random.randint(10**j,10**(j+1)-1)
b = b + '%s:' % x
print(b)
sample:
print (b)
31:107:715:76:2602:99021:357311:7593756971:1:
I'm trying to delete a certain number of zeros from right. For example:
"10101000000"
I want to remove 4 zeros... And get:
"1010100"
I tried to do string.rstrip("0") or string.strip("0") but this removes all the of zeros from right. How can I do that?
The question is not a duplicate because I can't use imports.
You can use a regex
>>> import re
>>> mystr = "10101000000"
>>> numzeros = 4
>>> mystr = re.sub("0{{{}}}$".format(numzeros), "", mystr)
>>> mystr
'1010100'
This will leave the string as is if it doesn't end in four zeros
You could also check and then slice
if mystr.endswith("0" * numzeros):
mystr = mystr[:-numzeros]
For a known number of zeros you can use slicing:
s = "10101000000"
zeros = 4
if s.endswith("0" * zeros):
s = s[:-zeros]
rstrip deletes all characters from the end that are in passed set of characters. You can delete trailing zeros like this:
s = s[:-4] if s[-4:] == "0"*4 else s
Here's my solution:
number = "10101000000"
def my_rstrip(number, char, count=4):
for x in range(count):
if number.endswith(char):
number = number[0:-1]
else:
break
return number
print my_rstrip(number, '0', 4)
>>> s[:-4]+s[-4:].replace('0000','')
Don't forget to convert to str
import re
a = 10101000000
re.sub("0000$","", str(a))
You try to split off the last 4 characters from the string by this way:
string[:-4]
Quick question. I'm trying to find or write an encoder in Python to shorten a string of numbers by using upper and lower case letters. The numeric strings look something like this:
20120425161608678259146181504021022591461815040210220120425161608667
The length is always the same.
My initial thought was to write some simple encoder to utilize upper and lower case letters and numbers to shorten this string into something that looks more like this:
a26Dkd38JK
That was completely arbitrary, just trying to be as clear as possible.
I'm certain that there is a really slick way to do this, probably already built in. Maybe this is an embarrassing question to even be asking.
Also, I need to be able to take the shortened string and convert it back to the longer numeric value.
Should I write something and post the code, or is this a one line built in function of Python that I should already know about?
Thanks!
This is a pretty good compression:
import base64
def num_to_alpha(num):
num = hex(num)[2:].rstrip("L")
if len(num) % 2:
num = "0" + num
return base64.b64encode(num.decode('hex'))
It first turns the integer into a bytestring and then base64 encodes it. Here's the decoder:
def alpha_to_num(alpha):
num_bytes = base64.b64decode(alpha)
return int(num_bytes.encode('hex'), 16)
Example:
>>> num_to_alpha(20120425161608678259146181504021022591461815040210220120425161608667)
'vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w=='
>>> alpha_to_num('vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w==')
20120425161608678259146181504021022591461815040210220120425161608667
There are two functions that are custom (not based on base64), but produce shorter output:
chrs = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = len(chrs)
def int_to_cust(i):
result = ''
while i:
result = chrs[i % l] + result
i = i // l
if not result:
result = chrs[0]
return result
def cust_to_int(s):
result = 0
for char in s:
result = result * l + chrs.find(char)
return result
And the results are:
>>> int_to_cust(20120425161608678259146181504021022591461815040210220120425161608667)
'9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx'
>>> cust_to_int('9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx')
20120425161608678259146181504021022591461815040210220120425161608667L
You can also shorten the generated string, if you add other characters to the chrs variable.
Do it with 'class':
VALID_CHRS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
BASE = len(VALID_CHRS)
MAP_CHRS = {k: v
for k, v in zip(VALID_CHRS, range(BASE + 1))}
class TinyNum:
"""Compact number representation in alphanumeric characters."""
def __init__(self, n):
result = ''
while n:
result = VALID_CHRS[n % BASE] + result
n //= BASE
if not result:
result = VALID_CHRS[0]
self.num = result
def to_int(self):
"""Return the number as an int."""
result = 0
for char in self.num:
result = result * BASE + MAP_CHRS[char]
return result
Sample usage:
>> n = 4590823745
>> tn = TinyNum(a)
>> print(n)
4590823745
>> print(tn.num)
50GCYh
print(tn.to_int())
4590823745
(Based on Tadeck's answer.)
>>> s="20120425161608678259146181504021022591461815040210220120425161608667"
>>> import base64, zlib
>>> base64.b64encode(zlib.compress(s))
'eJxly8ENACAMA7GVclGblv0X4434WrKFVW5CtJl1HyosrZKRf3hL5gLVZA2b'
>>> zlib.decompress(base64.b64decode(_))
'20120425161608678259146181504021022591461815040210220120425161608667'
so zlib isn't real smart at compressing strings of digits :(