Using Python to break a continuous string into components?

Using Python to break a continuous string into components? - python

This is similar to what I want to do: breaking a 32-bit number into individual fields
This is my typical "string" 00000000110000000000011000000000
I need to break it up into four equal parts:
00000000
11000000
00000110
00000000
I need to append the list to a new text file with the original string as a header.
I know how to split the string if there were separators such as spaces but my string is continuous.
These could be thought of as 32bit and 8bit binary numbers but they are just text in a text file (for now)!
I am brand new to programing in Python so please, I need patient details, no generalizations.
Do not assume I know anything.
Thank you,
Ralph

This should do what you want. See comprehensions for more details.
>>> s = "00000000110000000000011000000000"
>>> [s[i:i+8] for i in xrange(0, len(s), 8)]
['00000000', '11000000', '00000110', '00000000']

+1 for Robert's answer. As for 'I need to append the list to a new text file with the original string as a header':
s = "00000000110000000000011000000000"
s += '\n' + '\n'.join(s[i:i+8] for i in xrange(0, len(s), 8))
will give
'00000000110000000000011000000000\n00000000\n11000000\n00000110\n00000000'
thus putting each 'byte' on a separate line as I understood from your question...
Edit: some notes to help you understand:
A list [] (see here) contains your data, in this case, strings, between its brackets. The first item in a list is retrieved as in:
mylist[0]
in Python, a string is itself also an object, with specific methods that you can call. So '\n' (representing a carriage return) is an object of type 'string', and you can call it's method join() with your list as argument:
'\n'.join(mylist)
The elements in the list are then 'joined' together with the string '\n' in between each element. The result is no longer a list, but a string. Two strings can be added together, thus
s += '\n' + '\n'.join(mylist)
adds to s (which was already a string), the right part which is itself a 'sum' of strings.
(I hope that clears some things up?)

For reference, here are a few alternatives for splitting strings into equal length parts:
>>> import re
>>> re.findall(r'.{1,8}', s, re.S)
['00000000', '11000000', '00000110', '00000000']
>>> map(''.join, zip(*[iter(s)]*8))
['00000000', '11000000', '00000110', '00000000']
The zip method for splitting a sequence into n-length groups is documented here, but it will only work for strings whose length is evenly divisible by n (which won't be an issue for this particular question). If the string length is not evenly divisible by n you could use itertools.izip_longest(*[iter(s)]*8, fillvalue='').

Strings, Lists and Touples can be broken using the indexing operator [].
Using the : operator inside of the indexing operator you can achieve fields there.
Try something like:
x = "00000000110000000000011000000000"
part1, part2, part3, part4 = x[:8], x[8:16], x[16:24], x[24:]

you need a substring
x = 01234567
x0 = x[0:2]
x1 = x[2:4]
x2 = x[4:6]
x3 = x[6:8]
So, x0 will hold '01', x1 will hold '23', etc.

Related

how to split string and assign it to arey with adding some parts in python

I have some string which contains parts separated by commas and need to add some part to each and assign all to array of variables.
the string looks like
chp_algos = 'AES256_SSE','AES128_CBC','AES64_CBC','AES33_CBC'
I want to put in array which looks like:
arr = [
[AES128_CBC],
[AES128_CBC_fer],
[AES128_SSE],
[AES128_SSE_fer],
[AES64_CBC],
[AES64_CBC_fer],
[AES33_CBC],
[AES33_CBC_fer]
]
and I want to map the following final result to db
f = 'AES128_CBC_fer AES128_SSE_fer AES64_CBC_fer AES33_CBC_fer'

As written in the question, chp_algos is a tuple, not a string. So, it is already "split"
I'd recommend not using a list of lists. Just create a list of strings.
from itertools import chain
arr = list(chain.from_iterable([x, x + '_fer'] for x in chp_algos))
Output
['AES256_SSE',
'AES256_SSE_fer',
'AES128_CBC',
'AES128_CBC_fer',
'AES64_CBC',
'AES64_CBC_fer',
'AES33_CBC',
'AES33_CBC_fer']
With that, you can filter.
But you could also just skip arr and build a new list from concatenating to the original string values
f = ' '.join(x for x in arr if x.endswith('_fer'))

You can do this by sorting chp_algos then using f strings in a generator expression
>>> ' '.join(f'{i}_fer' for i in sorted(chp_algos))
'AES128_CBC_fer AES256_SSE_fer AES33_CBC_fer AES64_CBC_fer'

Could you clarify your string chp_algos? The way you wrote it now it is not compatible with python.
Anyway, what you can do in your case, assuming that chp_algos is a string of the form chp_algos= "'AES256_SSE','AES128_CBC','AES64_CBC','AES33_CBC'", then you can split the string into a list of strings via chp_algos.split(",").
The argument of split() is the delimiter which should be used to split the string.
Now you have something like array = ["'AES256_SSE'", "'AES128_CBC'", "'AES64_CBC'", "'AES33_CBC'"].
To get the array that you want you can just do a simple loop through your array:
arr = []
for element in array:
arr.append([element])
arr.append([element + str("_fer")])
Now you might have some issues with the quotes (depends on how your data looks like). But these you can just remove by looking at the relevant indices of element. To do this just replace element in the code above by element[1:-2]. This removes the first and the last element of the string.
To get the f string in the very end, you can just loop through arr[1::2] which returns every 2nd element of arr starting at the second one (index 1).

Say we have a string:
s = 'AES256_SSE,AES128_CBC,AES64_CBC,AES33_CBC'
In order to replace commas with a space and append a suffix to each part:
' '.join(f'{p}_fer' for p in s.split(','))
As for an array:
def g(s):
for s in s.split(','):
yield s
yield f'{s}_fer'
arr = [*g(s)]

How to partial split and take the first portion of string in Python?

Have a scenario where I wanted to split a string partially and pick up the 1st portion of the string.
Say String could be like aloha_maui_d0_b0 or new_york_d9_b10. Note: After d its numerical and it could be any size.
I wanted to partially strip any string before _d* i.e. wanted only _d0_b0 or _d9_b10.
Tried below code, but obviously it removes the split term as well.
print(("aloha_maui_d0_b0").split("_d"))
#Output is : ['aloha_maui', '0_b0']
#But Wanted : _d0_b0
Is there any other way to get the partial portion? Do I need to try out in regexp?

How about just
stArr = "aloha_maui_d0_b0".split("_d")
st2 = '_d' + stArr[1]
This should do the trick if the string always has a '_d' in it

You can use index() to split in 2 parts:
s = 'aloha_maui_d0_b0'
idx = s.index('_d')
l = [s[:idx], s[idx:]]
# l = ['aloha_maui', '_d0_b0']
Edit: You can also use this if you have multiple _d in your string:
s = 'aloha_maui_d0_b0_d1_b1_d2_b2'
idxs = [n for n in range(len(s)) if n == 0 or s.find('_d', n) == n]
parts = [s[i:j] for i,j in zip(idxs, idxs[1:]+[None])]
# parts = ['aloha_maui', '_d0_b0', '_d1_b1', '_d2_b2']

I have two suggestions.
partition()
Use the method partition() to get a tuple containing the delimiter as one of the elements and use the + operator to get the String you want:
teste1 = 'aloha_maui_d0_b0'
partitiontest = teste1.partition('_d')
print(partitiontest)
print(partitiontest[1] + partitiontest[2])
Output:
('aloha_maui', '_d', '0_b0')
_d0_b0
The partition() methods returns a tuple with the first element being what is before the delimiter, the second being the delimiter itself and the third being what is after the delimiter.
The method does that to the first case of the delimiter it finds on the String, so you can't use it to split in more than 3 without extra work on the code. For that my second suggestion would be better.
replace()
Use the method replace() to insert an extra character (or characters) right before your delimiter (_d) and use these as the delimiter on the split() method.
teste2 = 'new_york_d9_b10'
replacetest = teste2.replace('_d', '|_d')
print(replacetest)
splitlist = replacetest.split('|')
print(splitlist)
Output:
new_york|_d9_b10
['new_york', '_d9_b10']
Since it replaces all cases of _d on the String for |_d there is no problem on using it to split in more than 2.
Problem?
A situation to which you may need to be careful would be for unwanted splits because of _d being present in more places than anticipated.
Following the apparent logic of your examples with city names and numericals, you might have something like this:
teste3 = 'rio_de_janeiro_d3_b32'
replacetest = teste3.replace('_d', '|_d')
print(replacetest)
splitlist = replacetest.split('|')
print(splitlist)
Output:
rio|_de_janeiro|_d3_b32
['rio', '_de_janeiro', '_d3_b32']
Assuming you always have the numerical on the end of the String and _d won't happen inside the numerical, rpartition() could be a solution:
rpartitiontest = teste3.rpartition('_d')
print(rpartitiontest)
print(rpartitiontest[1] + rpartitiontest[2])
Output:
('rio_de_janeiro', '_d', '3_b32')
_d3_b32
Since rpartition() starts the search on the String's end and only takes the first match to separate the terms into a tuple, you won't have to worry about the first term (city's name?) causing unexpected splits.

Use regex's split and keep delimiters capability:
import re
patre = re.compile(r"(_d\d)")
#👆 👆
#note the surrounding parenthesises - they're what drives "keep"
for line in """aloha_maui_d0_b0 new_york_d9_b10""".split():
parts = patre.split(line)
print("\n", line)
print(parts)
p1, p2 = parts[0], "".join(parts[1:])
print(p1, p2)
output:
aloha_maui_d0_b0
['aloha_maui', '_d0', '_b0']
aloha_maui _d0_b0
new_york_d9_b10
['new_york', '_d9', '_b10']
new_york _d9_b10
credit due: https://stackoverflow.com/a/15668433

Find Certain String Indices

I have this string and I need to get a specific number out of it.
E.G. encrypted = "10134585588147, 3847183463814, 18517461398"
How would I pull out only the second integer out of the string?

You are looking for the "split" method. Turn a string into a list by specifying a smaller part of the string on which to split.
>>> encrypted = '10134585588147, 3847183463814, 18517461398'
>>> encrypted_list = encrypted.split(', ')
>>> encrypted_list
['10134585588147', '3847183463814', '18517461398']
>>> encrypted_list[1]
'3847183463814'
>>> encrypted_list[-1]
'18517461398'
Then you can just access the indices as normal. Note that lists can be indexed forwards or backwards. By providing a negative index, we count from the right rather than the left, selecting the last index (without any idea how big the list is). Note this will produce IndexError if the list is empty, though. If you use Jon's method (below), there will always be at least one index in the list unless the string you start with is itself empty.
Edited to add:
What Jon is pointing out in the comment is that if you are not sure if the string will be well-formatted (e.g., always separated by exactly one comma followed by exactly one space), then you can replace all the commas with spaces (encrypt.replace(',', ' ')), then call split without arguments, which will split on any number of whitespace characters. As usual, you can chain these together:
encrypted.replace(',', ' ').split()

Python - Parse strings with variable repeating substring

I am trying to do something which I thought would be simple (and probably is), however I am hitting a wall. I have a string that contains document numbers. In most cases the format is ######-#-### however in some cases, where the single digit should be, there are multiple single digits separated by a comma (i.e. ######-#,#,#-###). The number of single digits separated by a comma is variable. Below is an example:
For the string below:
('030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003')
I need to return:
['030421-1-001', '030421-2-001' '030421-1-002', '030421-1-002', '030421-2-002', '030421-3-002' '030421-1-003']
I have only gotten as far as returning the strings that match the ######-#-### pattern:
import re
p = re.compile('\d{6}-\d{1}-\d{3}')
m = p.findall('030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003')
print m
Thanks in advance for any help!
Matt

Perhaps something like this:
>>> import re
>>> s = '030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003'
>>> it = re.finditer(r'(\b\d{6}-)(\d(?:,\d)*)(-\d{3})\b', s)
>>> for m in it:
a, b, c = m.groups()
for x in b.split(','):
print a + x + c
...
030421-1-001
030421-2-001
030421-1-002
030421-1-002
030421-2-002
030421-3-002
030421-1-003
Or using a list comprehension
>>> [a+x+c for a, b, c in (m.groups() for m in it) for x in b.split(',')]
['030421-1-001', '030421-2-001', '030421-1-002', '030421-1-002', '030421-2-002', '030421-3-002', '030421-1-003']

Use '\d{6}-\d(,\d)*-\d{3}'.
* means "as many as you want (0 included)".
It is applied to the previous element, here '(,\d)'.

I wouldn't use a single regular expression to try and parse this. Since it is essentially a list of strings, you might find it easier to replace the "&" with a comma globally in the string and then use split() to put the elements into a list.
Doing a loop of the list will allow you to write a single function to parse and fix the string and then you can push it onto a new list and the display your string.
replace(string, '&', ',')
initialList = string.split(',')
for item in initialList:
newItem = myfunction(item)
newList.append(newItem)
newstring = newlist(join(','))

(\d{6}-)((?:\d,?)+)(-\d{3})
We take 3 capturing groups. We match the first part and last part the easy way. The center part is optionally repeated and optionally contains a ','. Regex will however only match the last one, so ?: won't store it at all. What where left with is the following result:
>>> p = re.compile('(\d{6}-)((?:\d,?)+)(-\d{3})')
>>> m = p.findall('030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003')
>>> m
[('030421-', '1,2', '-001'), ('030421-', '1', '-002'), ('030421-', '1,2,3', '-002'), ('030421-', '1', '-003')]
You'll have to manually process the 2nd term to split them up and join them, but a list comprehension should be able to do that.

Replacing reoccuring characters in strings in Python 3.1

Is it possible to replace a single character inside a string that occurs many times?
Input:
Sentence=("This is an Example. Thxs code is not what I'm having problems with.") #Example input
^
Sentence=("This is an Example. This code is not what I'm having problems with.") #Desired output
Replace the 'x' in "Thxs" with an i, without replacing the x in "Example".

You can do it by including some context:
s = s.replace("Thxs", "This")
Alternatively you can keep a list of words that you don't wish to replace:
whitelist = ['example', 'explanation']
def replace_except_whitelist(m):
s = m.group()
if s in whitelist: return s
else: return s.replace('x', 'i')
s = 'Thxs example'
result = re.sub("\w+", replace_except_whitelist, s)
print(result)
Output:
This example

Sure, but you essentially have to build up a new string out of the parts you want:
>>> s = "This is an Example. Thxs code is not what I'm having problems with."
>>> s[22]
'x'
>>> s[:22] + "i" + s[23:]
"This is an Example. This code is not what I'm having problems with."
For information about the notation used here, see good primer for python slice notation.

If you know whether you want to replace the first occurrence of x, or the second, or the third, or the last, you can combine str.find (or str.rfind if you wish to start from the end of the string) with slicing and str.replace, feeding the character you wish to replace to the first method, as many times as it is needed to get a position just before the character you want to replace (for the specific sentence you suggest, just one), then slice the string in two and replace only one occurrence in the second slice.
An example is worth a thousands words, or so they say. In the following, I assume you want to substitute the (n+1)th occurrence of the character.
>>> s = "This is an Example. Thxs code is not what I'm having problems with."
>>> n = 1
>>> pos = 0
>>> for i in range(n):
>>> pos = s.find('x', pos) + 1
...
>>> s[:pos] + s[pos:].replace('x', 'i', 1)
"This is an Example. This code is not what I'm having problems with."
Note that you need to add an offset to pos, otherwise you will replace the occurrence of x you have just found.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Python to break a continuous string into components? - python

This should do what you want. See comprehensions for more details. >>> s = "00000000110000000000011000000000" >>> [s[i:i+8] for i in xrange(0, len(s), 8)] ['00000000', '11000000', '00000110', '00000000']

Strings, Lists and Touples can be broken using the indexing operator []. Using the : operator inside of the indexing operator you can achieve fields there. Try something like: x = "00000000110000000000011000000000" part1, part2, part3, part4 = x[:8], x[8:16], x[16:24], x[24:]

you need a substring x = 01234567 x0 = x[0:2] x1 = x[2:4] x2 = x[4:6] x3 = x[6:8] So, x0 will hold '01', x1 will hold '23', etc.

Related

how to split string and assign it to arey with adding some parts in python

How to partial split and take the first portion of string in Python?

Find Certain String Indices

Python - Parse strings with variable repeating substring

Replacing reoccuring characters in strings in Python 3.1

Categories

Resources