Slicing a string into a list [duplicate] - python

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do you split a list into evenly sized chunks in Python?
What is the most “pythonic” way to iterate over a list in chunks?
Say I have a string
s = '1234567890ABCDEF'
How can I slice (or maybe split is the correct term?) this string into a list consisting of strings containing 2 characters each?
desired_result = ['12', '34', '56', '78', '90', 'AB', 'CD', 'EF']
Not sure if this is relevant, but I'm parsing a string of hex characters and the final result I need is a list of bytes, created from the list above (for instance, by using int(desired_result[i], 16))

3>> bytes.fromhex('1234567890ABCDEF')
b'\x124Vx\x90\xab\xcd\xef'

You could use binascii:
>>> from binascii import unhexlify
>>> unhexlify(s)
'\x124Vx\x90\xab\xcd\xef'
Then:
>>> list(_)
['\x12', '4', 'V', 'x', '\x90', '\xab', '\xcd', '\xef']

>>> s = '1234567890ABCDEF'
>>> iter_s = iter(s)
>>> [a + next(iter_s) for a in iter_s]
['12', '34', '56', '78', '90', 'AB', 'CD', 'EF']
>>>

>>> s = '1234567890ABCDEF'
>>> [char0+char1 for (char0, char1) in zip(s[::2], s[1::2])]
['12', '34', '56', '78', '90', 'AB', 'CD', 'EF']
But, as others have noted, there are more direct solutions to the more general problem of converting hexadecimal numbers to bytes.
Also note that Robert Kings's solution is more efficient, in general, as it essentially has a zero memory footprint (at the cost of a less legible code).

Related

Regex [] vs () in Python with respect to re.split() [duplicate]

This question already has answers here:
Using alternation or character class for single character matching?
(3 answers)
Closed 2 years ago.
What is the difference between [,.] and (,|.) when used as a pattern in re.split(pattern,string)? Can some please explain with respect to this example in Python:
import re
regex_pattern1 = r"[,\.]"
regex_pattern2 = r"(,|\.)"
print(re.split(regex_pattern1, '100,000.00')) #['100', '000', '00']
print(re.split(regex_pattern2, '100,000.00'))) #['100', ',', '000', '.', '00']
[,\.] is equivalent to ,|\..[1]
(,|\.) is equivalent to ([,\.]).
() creates a capture, and re.split returns captured text as well as the text separated by the pattern.
>>> import re
>>> re.split(r'([,\.])', '100,000.00')
['100', ',', '000', '.', '00']
>>> re.split(r'(,|\.)', '100,000.00')
['100', ',', '000', '.', '00']
>>> re.split(r',|\.', '100,000.00')
['100', '000', '00']
>>> re.split(r'(?:,|\.)', '100,000.00')
['100', '000', '00']
>>> re.split(r'[,\.]', '100,000.00')
['100', '000', '00']
You might sometime need (?:,|\.) to limit what is considered the operands of | when you embed it in a larger pattern, though.

How to properly split this list of strings?

I have a list of strings such as this :
['z+2-44', '4+55+z+88']
How can I split this strings in the list such that it would be something like
[['z','+','2','-','44'],['4','+','55','+','z','+','88']]
I have tried using the split method already however that splits the 44 into 4 and 4, and am not sure what else to try.
You can use regex:
import re
lst = ['z+2-44', '4+55+z+88']
[re.findall('\w+|\W+', s) for s in lst]
# [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
\w+|\W+ matches a pattern that consists either of word characters (alphanumeric values in your case) or non word characters (+- signs in your case).
That will work, using itertools.groupby
z = ['z+2-44', '4+55+z+88']
print([["".join(x) for k,x in itertools.groupby(i,str.isalnum)] for i in z])
output:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
It just groups the chars if they're alphanumerical (or not), just join them back in a list comprehension.
EDIT: the general case of a calculator with parenthesis has been asked as a follow-up question here. If z is as follows:
z = ['z+2-44', '4+55+((z+88))']
then with the previous grouping we get:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+((', 'z', '+', '88', '))']]
Which is not easy to parse in terms of tokens. So a change would be to join only if alphanum, and let as list if not, flattening in the end using chain.from_iterable:
print([list(itertools.chain.from_iterable(["".join(x)] if k else x for k,x in itertools.groupby(i,str.isalnum))) for i in z])
which yields:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', '(', '(', 'z', '+', '88', ')', ')']]
(note that the alternate regex answer can also be adapted like this: [re.findall('\w+|\W', s) for s in lst] (note the lack of + after W)
also "".join(list(x)) is slightly faster than "".join(x), but I'll let you add it up to avoid altering visibility of that already complex expression.
Alternative solution using re.split function:
l = ['z+2-44', '4+55+z+88']
print([list(filter(None, re.split(r'(\w+)', i))) for i in l])
The output:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
You could only use str.replace() and str.split() built-in functions within a list comprehension:
In [34]: lst = ['z+2-44', '4+55+z+88']
In [35]: [s.replace('+', ' + ').replace('-', ' - ').split() for s in lst]
Out[35]: [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
But note that this is not an efficient approach for longer strings. In that case the best way to go is using regex.
As another pythonic way you can also use tokenize module:
In [56]: from io import StringIO
In [57]: import tokenize
In [59]: [[t.string for t in tokenize.generate_tokens(StringIO(i).readline)][:-1] for i in lst]
Out[59]: [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
The tokenize module provides a lexical scanner for Python source code, implemented in Python. The scanner in this module returns comments as tokens as well, making it useful for implementing “pretty-printers,” including colorizers for on-screen displays.
If you want to stick with split (hence avoiding regex), you can provide it with an optional character to split on:
>>> testing = 'z+2-44'
>>> testing.split('+')
['z', '2-44']
>>> testing.split('-')
['z+2', '44']
So, you could whip something up by chaining the split commands.
However, using regular expressions would probably be more readable:
import re
>>> re.split('\+|\-', testing)
['z', '2', '44']
This is just saying to "split the string at any + or - character" (the backslashes are escape characters because both of those have special meaning in a regex.
Lastly, in this particular case, I imagine the goal is something along the lines of "split at every non-alpha numeric character", in which case regex can still save the day:
>>> re.split('[^a-zA-Z0-9]', testing)
['z', '2', '44']
It is of course worth noting that there are a million other solutions, as discussed in some other SO discussions.
Python: Split string with multiple delimiters
Split Strings with Multiple Delimiters?
My answers here are targeted towards simple, readable code and not performance, in honor of Donald Knuth

How to use a dictionary to print its partner value

Not sure if the title is specific enough.
words = ['sense', 'The', 'makes', 'sentence', 'perfect', 'sense', 'now']
numbers = ['1', '2', '3', '4', '5', '6']
dictionary = dict(zip(numbers, words))
print(dictionary)
correctorder = ['2', '4', '7', '3', '5', '6']
I'm simply trying to figure out how exactly I can print specific values from the dictionary using the correctorder array so that the sentence makes sense.
You can just iterate over correctorder and get the corresponding dict value, then join the result together.
' '.join(dictionary[ele] for ele in correctorder)
This is assuming that you fix numbers to include '7' at the end.
>>> ' '.join(dictionary[ele] for ele in correctorder)
'The sentence now makes perfect sense'
What you want is this.
for i in correctorder:
print dictionary[i]," ",
Short and simple. As Mitch said, fix the 7 though.
You could use operator.itemgetter to avoid an explicit loop:
>>> from operator import itemgetter
>>> print(itemgetter(*correctorder)(dictionary))
To concatenate this simply use str.join:
>>> ' '.join(itemgetter(*correctorder)(dictionary))

trying to prepend zeroes in python but getting strange results

I am trying to take a list of strings, and prepend an amount of zeroes to the front so that they are all the same length. I have this:
def parity(binlist):
print(binlist)
for item in binlist:
if len(item)==0:
b='000'
elif len(item)==1:
b='00{}'.format(item)
elif len(item)==2:
b='0{}'.format(item)
binlist.remove(item)
binlist.append(b)
return binlist
This is binlist:
['1', '10', '11', '11']
and i want to get this after running it:
['001', '010', '011', '011']
but I get this:
['10', '11', '11', '001']
which really confuses me.
thanks for any help at all.
Try this:
>>> n = "7"
>>> print n.zfill(3)
>>> "007"
This way you will have always a 3 chars string (if the number is minor than 1000)
http://www.tutorialspoint.com/python/string_zfill.htm
The native string formatting operations allow you to do this without all the trouble you're putting in. Here's an example.
x = ['1', '10', '11', '11']
print ["{:>03s}".format(t) for t in x]
['001', '010', '011', '011']
This is caused because you are deleting the elements in the list while iterating through the list using a for loop. Doing so does not iterate over the full list. You can use a while loop to solve this problem.
You can do this in a one-liner using zfill:
>>> map(lambda binlist_item: binlist_item.zfill(3), ['1', '10', '11', '11'] )
['001', '010', '011', '011']
Fill with zeros for each item in the list
binlist = [i.zfill(3) for i in binlist]

How to split a string in Python by 2 or 3, etc [duplicate]

This question already has answers here:
Split string every nth character?
(19 answers)
How to iterate over a list in chunks
(39 answers)
Closed 9 years ago.
Does anyone know if it's possible in python to split a string, not necessarily by a space or commas, but just by every other entry in the string? or every 3rd or 4th etc.
For example if I had "12345678" as my string, is there a way to split it into "12", "34", "56", 78"?
You can use list comprehension:
>>> x = "123456789"
>>> [x[i : i + 2] for i in range(0, len(x), 2)]
['12', '34', '56', '78', '9']
You can use a list comprehension. Iterate over your string and grab every two characters using slicing and the extra options in the range function.
s = "12345678"
print([s[i:i+2] for i in range(0, len(s), 2)]) # >>> ['12', '34', '56', '78']
What you want is the itertools grouper() recipe, which takes any arbitrary iterable and gives you groups of n items from that iterable:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
(Note that in 2.x, this is slightly different as zip_longest() is called izip_longest()!)
E.g:
>>> list(grouper("12345678", 2))
[('1', '2'), ('3', '4'), ('5', '6'), ('7', '8')]
You can then rejoin the strings with a simple list comprehension:
>>> ["".join(group) for group in grouper("12345678", 2)]
['12', '34', '56', '78']
If you might have less than a complete set of values, just use fillvalue="":
>>> ["".join(group) for group in grouper("123456789", 2, fillvalue="")]
['12', '34', '56', '78', '9']

Categories

Resources