how to reduce memory usage in kaggle for python code

how to reduce memory usage in kaggle for python code - python

import itertools
deck = ['AD', '2D', '3D', '4D', '5D', '6D', '7D', '8D', '9D', '10D', 'JD', 'QD', 'KD',
'AC', '2C', '3C', '4C', '5C', '6C', '7C', '8C', '9C', '10C', 'JC', 'QC', 'KC',
'AH', '2H', '3H', '4H', '5H', '6H', '7H', '8H', '9H', '10H', 'JH', 'QH', 'KH',
'AS', '2S', '3S', '4S', '5S', '6S', '7S', '8S', '9S', '10S', 'JS', 'QS', 'KS']
combinations = list(itertools.combinations(deck, 9))
i try to find all this combinations then i will load this combinations in a csv file but kaggle gives me this error message:
Your notebook tried to allocate more memory than is available. It has restarted.

Don't use a list, just write line after line.
itertools.combinations create an iterator that allows you to iterate over each value without having to create a list and store each value in memory.
You can use the csv module to write each combination as a line.
If you don't want an empty line between each combination, don't forget to use the newline='' in open: https://stackoverflow.com/a/3348664/6251742.
import csv
import itertools
deck = ['AD', '2D', '3D', '4D', '5D', '6D', '7D', '8D', '9D', '10D', 'JD', 'QD', 'KD',
'AC', '2C', '3C', '4C', '5C', '6C', '7C', '8C', '9C', '10C', 'JC', 'QC', 'KC',
'AH', '2H', '3H', '4H', '5H', '6H', '7H', '8H', '9H', '10H', 'JH', 'QH', 'KH',
'AS', '2S', '3S', '4S', '5S', '6S', '7S', '8S', '9S', '10S', 'JS', 'QS', 'KS']
combinations = itertools.combinations(deck, 9)
with open('combinations.csv', 'w', newline='') as file:
writer = csv.writer(file, delimiter=',')
for combination in combinations:
writer.writerow(combination)
Result after some time:
AD,2D,3D,4D,5D,6D,7D,8D,9D
AD,2D,3D,4D,5D,6D,7D,8D,10D
AD,2D,3D,4D,5D,6D,7D,8D,JD
AD,2D,3D,4D,5D,6D,7D,8D,QD
AD,2D,3D,4D,5D,6D,7D,8D,KD
... # 3679075395 more lines, 98.3 GB

Related

Slicing Tuples in a List of Lists in Python

I have tuples in a list of lists and would like to extract only some elements in the tuple. Sample of the input data is below.
# input
[[('ab', 0.026412873688749918), ('dc', 0.016451082731822664), ('on', 0.014278088125928066),
('qc', 0.009752817881775656), ('mn', 0.008332886637563352), ('nt', 0.008250535392602258),
('nsw', 0.006874273287824427), ('bar', 0.005878684829852004), ('tor', 0.005741627328513831),
('wds', 0.004119216502907735)],
[('nb', 0.03053649661493629), ('ns', 0.01925207174326825), ('ham', 0.016207228280183325),
('bra', 0.013390785663058102), ('nia', 0.00878166482558038), ('knxr', 0.004648856466085521),
('nwm', 0.004463444159552605), ('md', 0.004377821331080258), ('ut', 0.004165890522922745),
('va', 0.0037484060754341083)]]
What I am trying to do is get the first items in the tuples.
# output
[['ab', 'dc', 'on', 'qc', 'mn', 'nt', 'nsw', 'bar', 'tor', 'wds'],
['nb', 'ns', 'ham', 'bra', 'nia', 'knxr', 'nwm', 'md', 'ut', 'va']]

input = [
[('ab', 0.026412873688749918), ('dc', 0.016451082731822664), ('on', 0.014278088125928066),
('qc', 0.009752817881775656), ('mn', 0.008332886637563352), ('nt', 0.008250535392602258),
('nsw', 0.006874273287824427), ('bar', 0.005878684829852004), ('tor', 0.005741627328513831),
('wds', 0.004119216502907735)],
[('nb', 0.03053649661493629), ('ns', 0.01925207174326825), ('ham', 0.016207228280183325),
('bra', 0.013390785663058102), ('nia', 0.00878166482558038), ('knxr', 0.004648856466085521),
('nwm', 0.004463444159552605), ('md', 0.004377821331080258), ('ut', 0.004165890522922745),
('va', 0.0037484060754341083)]
]
As illustrated in the comments you can use list comprehensions to achieve this:
[[idx for idx, val in x] for x in input]
# Result
[['ab', 'dc', 'on', 'qc', 'mn', 'nt', 'nsw', 'bar', 'tor', 'wds'],
['nb', 'ns', 'ham', 'bra', 'nia', 'knxr', 'nwm', 'md', 'ut', 'va']]
A more complex way to achieve this would be to use zip() to separate the first elements from the second elements of the tuples as shown below:
[('ab', 'dc', 'on', 'qc', 'mn', 'nt', 'nsw', 'bar', 'tor', 'wds'),
(0.026412873688749918,0.016451082731822664,0.014278088125928066,0.009752817881775656,0.008332886637563352,0.008250535392602258,0.006874273287824427,0.005878684829852004,0.005741627328513831,0.004119216502907735)]
This approach can be done using:
[list(list(zip(*x))[0]) for x in input]
# Result
[['ab', 'dc', 'on', 'qc', 'mn', 'nt', 'nsw', 'bar', 'tor', 'wds'],
['nb', 'ns', 'ham', 'bra', 'nia', 'knxr', 'nwm', 'md', 'ut', 'va']]

You can use loop or list comprehension to do this.
The input data is list of lists that contains tuples. Access the first element of the tuple by using tuple[0] and save it into an empty list like this:-
input_data = [
[('ab', 0.026412873688749918), ('dc', 0.016451082731822664), ('on', 0.014278088125928066),
('qc', 0.009752817881775656), ('mn', 0.008332886637563352), ('nt', 0.008250535392602258),
('nsw', 0.006874273287824427), ('bar', 0.005878684829852004), ('tor', 0.005741627328513831),
('wds', 0.004119216502907735)],
[('nb', 0.03053649661493629), ('ns', 0.01925207174326825), ('ham', 0.016207228280183325),
('bra', 0.013390785663058102), ('nia', 0.00878166482558038), ('knxr', 0.004648856466085521),
('nwm', 0.004463444159552605), ('md', 0.004377821331080258), ('ut', 0.004165890522922745),
('va', 0.0037484060754341083)]
]
data_list = []
for x in input_data:
d_list = []
for y in x:
d_list.append(y[0])
data_list.append(d_list)
# Result...
[['ab', 'dc', 'on', 'qc', 'mn', 'nt', 'nsw', 'bar', 'tor', 'wds'],
['nb', 'ns', 'ham', 'bra', 'nia', 'knxr', 'nwm', 'md', 'ut', 'va']]
Using list comprehension:-
It is a shorthand way to write the for loop above by removing append() method and the initial empty lists.
data_list = [ [y[0] for y in x] for x in input_data ]
# Result...
[['ab', 'dc', 'on', 'qc', 'mn', 'nt', 'nsw', 'bar', 'tor', 'wds'],
['nb', 'ns', 'ham', 'bra', 'nia', 'knxr', 'nwm', 'md', 'ut', 'va']]

randomize with a condition

So in python, I have a list like so
['1a', '1b', '2a', '2b', '3a', '3b', '4a', '4b', '5a', '5b', '6a', '6b']
I was wondering if it is possible for me to randomize it so that there's no way for the values that share the same number (i.e. 1a and 1b) to be beside each other after I randomize them.
So for example the final list would come out like something like this:
['1a', '3b', '4b', '2a', '3a', '6a', '5a', '1b', '5b', '4a', '6b', '2b']
Thank you.

I don't know if this is the best approach but it does what you want:
from random import shuffle
unsorted_ls = ['1a', '1b', '2a', '2b', '3a', '3b', '4a', '4b', '5a', '5b', '6a', '6b']
while True:
shuffle(unsorted_ls)
checker = False
for i in range(1, len(unsorted_ls)):
if unsorted_ls[i - 1][0] == unsorted_ls[i][0]:
checker = True
break
if checker == False:
break
Yet whenever you run this code you will get different results.

One way about this would be to construct a new table-like dictionary that excludes the 'similar candidate' for each item:
{'1a': ['2a', '2b', '3a', '3b', '4a', '4b', '5a', '5b', '6a', '6b'],
'1b': ['2a', '2b', '3a', '3b', '4a', '4b', '5a', '5b', '6a', '6b'],
'2a': ['1a', '1b', '3a', '3b', '4a', '4b', '5a', '5b', '6a', '6b'],
'2b': ['1a', '1b', '3a', '3b', '4a', '4b', '5a', '5b', '6a', '6b'],
'3a': ['1a', '1b', '2a', '2b', '4a', '4b', '5a', '5b', '6a', '6b'],
'3b': ['1a', '1b', '2a', '2b', '4a', '4b', '5a', '5b', '6a', '6b'],
'4a': ['1a', '1b', '2a', '2b', '3a', '3b', '5a', '5b', '6a', '6b'],
'4b': ['1a', '1b', '2a', '2b', '3a', '3b', '5a', '5b', '6a', '6b'],
'5a': ['1a', '1b', '2a', '2b', '3a', '3b', '4a', '4b', '6a', '6b'],
'5b': ['1a', '1b', '2a', '2b', '3a', '3b', '4a', '4b', '6a', '6b'],
'6a': ['1a', '1b', '2a', '2b', '3a', '3b', '4a', '4b', '5a', '5b'],
'6b': ['1a', '1b', '2a', '2b', '3a', '3b', '4a', '4b', '5a', '5b']}
Call this object y. You can construct it like so:
x = ['1a', '1b', '2a', '2b', '3a', '3b', '4a', '4b', '5a', '5b', '6a', '6b']
y = {}
for a in x:
y[a] = [i for i in x if not i.startswith(a[0])]
You can then pick from the values of each element based on the last-seen element:
import random
len_new = 10 # Desired length of new list
new = []
last_val = random.choice(list(y)) # Initial pick
for _ in range(len_new):
last_val = random.choice(y[last_val])
new.append(last_val)
Result:
>>> print(new)
['6b', '3b', '2a', '3a', '4a', '2a', '6b', '5a', '4b', '1b']
Downsides:
Memory inefficiency. For a very large x, your y "table" becomes large very quickly. But for small inputs such as yours, this is not an issue. You could cut down on this by not constructing the full y up front, but rather creating just the needed table row at each iteration.

Try this:
from random import shuffle
def randomize(list_):
shuffle(list_)
for n, n_plus_1 in zip(list_, list_[1:]):
if n[0] == n_plus_1[0]:
return randomize(list_)
return list_
list_ = ['1a', '1b', '2a', '2b', '3a', '3b', '4a', '4b', '5a', '5b', '6a', '6b']
print(randomize(list_))
output:
['2b', '1a', '4b', '3b', '5a', '3a', '6a', '2a', '6b', '4a', '1b', '5b']
The output will obviously be different every time, but adjacent items will never have the same leading number.

If there is a high percentage of items that have the same prefix, it would be preferable to use a backtracking approach. Note that a recursive approach is not recommended for this as it would quickly hit recursion depth limits for lists that have a few hundred items
from random import choice
def shuffle(A):
result = []
remaining = A.copy()
while len(result)<len(A):
eligible = [v for v in remaining if not result or v[0]!=result[-1][0]]
if eligible: # pick from eligibles
selected = choice(eligible)
remaining.remove(selected)
result.append(selected)
else:
remaining.append(result.pop(-1)) # backtrack
return result
A = ['1a', '1b', '2a', '2b', '3a', '3b', '4a', '4b', '5a', '5b', '6a', '6b']
print(shuffle(A))
# ['5b', '6b', '3b', '6a', '2b', '1a', '2a', '4b', '3a', '4a', '5a', '1b']
On the other hand, if there are few common prefixes or the list is small, then a simpler trial-and-error approach on the whole list may be sufficient:
from random import sample
def shuffle(A):
while True:
result = sample(A,len(A))
if all(a[0]!=b[0] for a,b in zip(result,result[1:])):
return result
print(shuffle(A))
# ['6b', '2b', '6a', '3a', '4b', '1b', '5a', '1a', '2a', '3b', '4a', '5b']
In both cases, if it is not possible to obtain a permutation that meets the condition, the functions will never return (e.g. A = ['1a','1b']) so you may want to add a validation for that.
from collections import Counter
def shuffle(A):
maxFreq = max(Counter(a[0] for a in A).values())
if maxFreq*2-1>len(A): return
...

python file I/O with binary data

I'm extracting jpeg type bits from mp3 data actually it will be album arts.
I thought about using library called mutagen, but I'd like to try with bits for some practice purpose.
import os
import sys
import re
f = open(sys.argv[1], "rb")
#sys.argv[1] gets mp3 file name ex) test1.mp3
saver = ""
for value in f:
for i in value:
hexval = hex(ord(i))[2:]
if (ord(i) == 0):
saver += "00" #to match with hex form
else:
saver += hexval
header = "ffd8"
tail = "ffd9"
this part of code is to get mp3 as bit form, and then transform it into hex
and find jpeg trailers which starts as "ffd8" and ends with "ffd9"
frontmatch = re.search(header,saver)
endmatch = re.search(tail, saver)
startIndex = frontmatch.start()
endIndex = endmatch.end()
jpgcontents = saver[startIndex:endIndex]
scale = 16 # equals to hexadecimal
numbits = len(jpgcontents) * 4 #log2(scale)
bitcontents = bin(int(jpgcontents, scale))[2:].zfill(numbits)
and here, I get the bits between the header and tail and transform it into
binary form. Which supposed to be the jpg part of the mp3 files.
txtfile = open(sys.argv[1] + "_tr.jpg", "w")
txtfile.write(bitcontents)
and I wrote the bin to the new file with writing type as jpg.
sorry for my wrong naming as txtfile.
But these codes gave the error which is
Error interpreting JPEG image file
(Not a JPEG file: starts with 0x31 0x31)
I'm not sure whether the bits I extracted are wrong or writing to the file
step is wrong. Or there might be other problem in code.
I'm working in linux version with python 2.6. Is there anything wrong with
just writing str type of bin data as JPG?

You are creating a string of ASCII zeroes and ones, i.e. \x30 and \x31, but the JPEG file needs to be proper binary data. So where your file should have a single byte of (for example) \xd8 you instead have these eight bytes: 11011000, or \x31\x31\x30\x31\x31\x30\x30\x30.
You don't need to do all that messy conversion stuff. You can just search directly for the desired byte patterns, writing them using \x hex escape sequences. And you don't even need regex: the simple string .index or .find methods can do this easily and quickly.
with open(fname, 'rb') as f:
data = f.read()
header = "\xff\xd8"
tail = "\xff\xd9"
try:
start = data.index(header)
end = data.index(tail, start) + 2
except ValueError:
print "Can't find JPEG data!"
exit()
print 'Start: %d End: %d Size: %d' % (start, end, end - start)
with open(fname + "_tr.jpg", 'wb') as f:
f.write(data[start:end])
(Tested on Python 2.6.6)
However, extracting embedded JPEG data like this isn't foolproof, since it's possible that those header and tail byte sequences exist in the MP3 sound data.
FWIW, a simpler way to translate binary data to hex strings and back is to use hexlify and unhexlify from the binascii module.
Here are some examples of doing these transformations, both with and without the binascii functions.
from binascii import hexlify, unhexlify
#Create a string of all possible byte values
allbytes = ''.join([chr(i) for i in xrange(256)])
print 'allbytes'
print repr(allbytes)
print '\nhex list'
print [hex(ord(v))[2:].zfill(2) for v in allbytes]
hexstr = hexlify(allbytes)
print '\nhex string'
print hexstr
newbytes = ''.join([chr(int(hexstr[i:i+2], 16)) for i in xrange(0, len(hexstr), 2)])
print '\nNew bytes'
print repr(newbytes)
print '\nUsing unhexlify'
print repr(unhexlify(hexstr))
output
allbytes
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
hex list
['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '0a', '0b', '0c', '0d', '0e', '0f', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '1a', '1b', '1c', '1d', '1e', '1f', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '2a', '2b', '2c', '2d', '2e', '2f', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '3a', '3b', '3c', '3d', '3e', '3f', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '4a', '4b', '4c', '4d', '4e', '4f', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '5a', '5b', '5c', '5d', '5e', '5f', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '6a', '6b', '6c', '6d', '6e', '6f', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '7a', '7b', '7c', '7d', '7e', '7f', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '8a', '8b', '8c', '8d', '8e', '8f', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '9a', '9b', '9c', '9d', '9e', '9f', 'a0', 'a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8', 'a9', 'aa', 'ab', 'ac', 'ad', 'ae', 'af', 'b0', 'b1', 'b2', 'b3', 'b4', 'b5', 'b6', 'b7', 'b8', 'b9', 'ba', 'bb', 'bc', 'bd', 'be', 'bf', 'c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'ca', 'cb', 'cc', 'cd', 'ce', 'cf', 'd0', 'd1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9', 'da', 'db', 'dc', 'dd', 'de', 'df', 'e0', 'e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7', 'e8', 'e9', 'ea', 'eb', 'ec', 'ed', 'ee', 'ef', 'f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'fa', 'fb', 'fc', 'fd', 'fe', 'ff']
hex string
000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff
New bytes
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
Using unhexlify
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
Note that this code needs some modifications to run on Python 3 (apart from converting the print statements to print function calls) because plain Python 3 strings are Unicode strings, not byte strings.

You need to write out as binary
Try:
txtfile = open(sys.argv[1] + "_tr.jpg", "wb")

Oups, you are not doing what you expect. the bin generates a string containing the value in binary form. Let's look at what you have, if the content on the input file was :
saver is a string of hexadecimal characters in textual form something like "313233414243" for an initial string of "132ABC"
jpgcontents has same format and starts with "ffd8" and ends with "ffd9"
you then apply the magic formula bin(int(jpgcontents, scale))[2:].zfill(numbits) that
convert the hexa string to a long integer
convert the long integer to a binary representation string - this part would convert hexa "ff" in integer 255 and end in the string "0b11111111"
remove first characters "0b" and fill the end of buffer if needed
bitcontents is then a string starting with "11111111....". Just rename your file with a .txt extension and open it with a text editor, you will see that it is a large file containing only ASCII characters 0 and 1.
As the header is "ffd8" the file will start with 10 "1". So the error that it starts with 0x31 0x31 because 0x31 is the ascii code of "1".
What you need is convert the hexa string jpgcontents in a binary byte array.
fileimage = ''.join([ jpgcontent[i:i+2] for i in range(0, len(jpgcontent), 2]
You can then safely copy the fileimage buffer to a binary file:
file = open(sys.argv[1] + "_tr.jpg", "wb")
file.write(fileimage)

The easiest method is using the binascii module: https://docs.python.org/2/library/binascii.html.
import binascii
# code in ascii format contained in a list
code = ['00', '01', '02', '03', '04', '05', '06', '07', '08', '09']
bfile = open('bfile.bin', 'w')
for c in code:
# convert the ascii to binary and write it to the file
bfile.write(binascii.unhexlify(c))
bfile.close()

make from a list the dictionary lists by key

I have a list:
['8C', '2C', 'QC', '5C', '7C', '3C', '6D', 'TD', 'TH', 'AS',
'QS', 'TS', 'JS', 'KS']
I need to get a dictionary something like this: (sorting is not important)
{'C': ['QC', '8C', '7C', '5C', '3C', '2C'],
'S': ['AS', 'KS', 'QS', 'JS', 'TS']
}
code:
def parse_flush(cards):
cards = sort_by_color(cards)
flush_dic = {}
print str(cards)
count = 0
pos = 0
last_index = 0
for color in colors:
for i, card in enumerate(cards):
if card[1] == color:
count += 1
last_index = i+1
if count == 1:
pos = i
if count >= 5:
flush_dic[color] = sort_high_to_low(cards[pos:last_index])
count = 0
return flush_dic
my code now looks like, it works but I do not like its length it is possible to make it shorter using python tricks?

You can use simple collections.defaultdict to get the results you wanted
from collections import defaultdict
result = defaultdict(list)
for item in data:
result[item[1]].append(item)
print result
Output
{'S': ['AS', 'QS', 'TS', 'JS', 'KS'],
'H': ['TH'],
'C': ['8C', '2C', 'QC', '5C', '7C', '3C'],
'D': ['6D', 'TD']}
You can solve this, using itertools.groupby as well
data = ['8C', '2C', 'QC', '5C', '7C', '3C', '6D', 'TD', 'TH', 'AS', 'QS',
'TS', 'JS', 'KS']
from itertools import groupby
from operator import itemgetter
keyFn = itemgetter(1)
print {k:list(grp) for k, grp in groupby(sorted(data, key = keyFn), keyFn)}
Explanation
sorted returns a sorted list of items, and it uses keyFn for sorting the data.
groupby accepts a sorted list and it groups the items based on the keyFn, in this case keyFn returns the second elements for each and every items and the result is as seen in the output.

Use a very simple for loop:
>>> l = ['8C', '2C', 'QC', '5C', '7C', '3C', '6D', 'TD', 'TH', 'AS',
... 'QS', 'TS', 'JS', 'KS']
>>> my_dict = {}
>>> for x in l:
... my_dict.setdefault(x[-1],[]).append(x)
...
>>> my_dict
{'S': ['AS', 'QS', 'TS', 'JS', 'KS'], 'H': ['TH'], 'C': ['8C', '2C', 'QC', '5C', '7C', '3C'], 'D': ['6D', 'TD']}

data = ['8C', '2C', 'QC', '5C', '7C', '3C', '6D', 'TD', 'TH', 'AS', 'QS',
'TS', 'JS', 'KS']
dic = {}
for i in data:
try:
dic[i[1]].append(i)
except:
dic[i[1]] = []
dic[i[1]].append(i)
print dic
Output
{'S': ['AS', 'QS', 'TS', 'JS', 'KS'],
'H': ['TH'],
'C': ['8C', '2C', 'QC', '5C', '7C', '3C'],
'D': ['6D', 'TD']}

How to combine two lists into pairs and then make those pairs one element in list?

first=[1,2,3,4,5]
second=['a','b','c','d','e']
final=['1a','2a','3a','1b','2b',3b','1c','2c','3c']
I want to combine two lists in python but I don't care about order. Aka I don't want '1a' and 'a1'.

>>> import itertools
>>> first=[1,2,3,4,5]
>>> second=['a','b','c','d','e']
>>> final = [''.join(str(i) for i in s) for s in itertools.product(first, second)]
>>> final
['1a', '1b', '1c', '1d', '1e', '2a', '2b', '2c', '2d', '2e', '3a', '3b', '3c', '3d', '3e', '4a', '4b', '4c', '4d', '4e', '5a', '5b', '5c', '5d', '5e']

A simple list comprehension will work.
print([str(first[i])+second[i] for i in range(len(first))])

final = list()
for i in first:
for j in second:
final.append(str(i)+j)

If you've only got two sequences to "multiply" like this, and your iteration is dead-simple, a nested loop in a comprehension is perfectly readable:
['{}{}'.format(a, b) for a in first for b in second]
If you have a longer, or dynamic, list, you want itertools.product, as in inspectorG4dget's answer.
If you have anything more complicated than just iterating over the product, you probably want explicit loop statements rather than a comprehension (or maybe factor part of it out into a generator function and use that with the nested comp or product call).

One way without using itertools, map or zip is:
first = [1, 2, 3, 4, 5]
second = ['a', 'b', 'c', 'd', 'e']
print [str(i) + j for i in first for j in second]
Output:
['1a', '1b', '1c', '1d', '1e', '2a', '2b', '2c', '2d', '2e', '3a', '3b', '3c', '3d', '3e', '4a', '4b', '4c', '4d', '4e', '5a', '5b', '5c', '5d', '5e']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to reduce memory usage in kaggle for python code - python

Related

Slicing Tuples in a List of Lists in Python

randomize with a condition

python file I/O with binary data

make from a list the dictionary lists by key

How to combine two lists into pairs and then make those pairs one element in list?

Categories

Resources