I have a dataframe of 51 rows and 464 columns , the columns contain 1's and 0's. I want to have a encoded value of the hex as you see in the attached picture.
I was trying to use numpy to make the hex conversion but it would fail
df = pd.DataFrame(np.random.randint(0,2,size=(51, 464)))
#converting into numpy for easier shifting
a = df.values
b = a.dot(2**np.arange(a.size)[::-1])
I want to have every 4 columns grouped to produce the hexadecimal value and then if there are odd columns for ex: 463 instead of 464 then the trailing hexadecimal will be padded with zero or zeroes based on how many ever needed to make the full hexadecimal value
This code only works for 64 bits length and then fails.
I was following this example
binary0|1 to hex string
any suggestions on how to do this?
Doesn't this do what you want?
df.apply(lambda row: hex(int(''.join(map(str, row)), base=2)), axis=1)
Convert to string every number in a row
Join them to create one big number in string
Convert it to integer with base 2 (since a row is in binary format)
Convert it to hex
Edit: To convert every 4 piece with the same manner:
def hexize(row):
hexes = '0x'
row = ''.join(map(str, row))
for i in range(0, len(row), 4):
value = row[i:i+4]
value = value.ljust(4, '0') # right fill with 0
value = hex(int(value, base=2))
hexes += value[2:]
return hexes
df.apply(hexize, axis=1)
hexize('011101100') # returns '0x760'
Given input data:
ECID,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,T21,T22,T23,T24,T25,T26,T27,T28,T29,T30,T31,T32,T33,T34,T35,T36,T37,T38,T39,T40,T41,T42,T43,T44,T45,T46,T47,T48,T49,T50,T51
ABC123,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
XYZ345,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
DEF789,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
434thECID,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
This adds an "Encoded" column similar to what was asked. The first row example in the original question seems to have the wrong number of Fs:
import pandas as pd
def encode(row):
s = ''.join(str(x) for x in row[1:]) # Create binary string
s += '0' * (4 - len(row[1:]) % 4) # Make length a multiple of 4 by adding zeros
i = int(s,2) # convert to integer base 2
h = hex(i).rstrip('0') # strip trailing zeros
return h if h != '0x' else '0x0' # Handle special case of '0x0' stripping to '0x'
df = pd.read_csv('input.csv')
df['Encoded'] = df.apply(encode,axis=1)
print(df)
Output:
ECID T1 T2 T3 T4 T5 ... T47 T48 T49 T50 T51 Encoded
0 ABC123 1 1 1 1 1 ... 1 1 1 1 1 0xffffffffffffe
1 XYZ345 1 0 0 0 0 ... 0 0 0 0 0 0x8
2 DEF789 1 0 1 0 1 ... 0 0 0 0 0 0xaa
3 434thECID 0 0 0 0 0 ... 0 0 0 0 0 0x0
[4 rows x 53 columns]
I have a binary file in which the data is organised in 16 bit integer blocks like so:
bit 15: digital bit 1
bit 14: digital bit 2
bits 13 to 0: 14 bit signed integer
The only way that I found how to extract the data from file to 3 arrays is:
data = np.fromfile("test1.bin", dtype=np.uint16)
digbit1 = data >= 2**15
data = np.array([x - 2**15 if x >= 2**15 else x for x in data], dtype=np.uint16)
digbit2 = data >= 2**14
data = np.array([x-2**14 if x >= 2**14 else x for x in data])
data = np.array([x-2**14 if x >= 2**13 else x for x in data], dtype=np.int16)
Now I know that I could do the same with with the for loop over the original data and fill out 3 separate arrays, but this would still be ugly. What I would like to know is how to do this more efficiently in style of dtype=[('db', [('1', bit), ('2', bit)]), ('temp', 14bit-signed-int)]) so that it would be easy to access like data['db']['1'] = array of ones and zeros.
Here's a way that is more efficient than your code because Numpy does the looping at compiled speed, which is much faster than using Python loops. And we can use bitwise arithmetic instead of those if tests.
You didn't supply any sample data, so I wrote some plain Python 3 code to create some fake data. I save that data to file in big-endian format, but that's easy enough to change if your data is actually stored in little-endian. I don't use numpy.fromfile to read that data because it's faster to read the file in plain Python and then convert the read bytes using numpy.frombuffer.
The only tricky part is handling those 14 bit signed integers. I assume you're using two's complement representation.
import numpy as np
# Make some fake data
bdata = []
bitlen = 14
mask = (1 << bitlen) - 1
for i in range(12):
# Two initial bits
a = i % 4
# A signed number
b = i - 6
# Combine initial bits with the signed number,
# using 14 bit two's complement.
n = (a << bitlen) | (b & mask)
# Convert to bytes, using 16 bit big-endian
nbytes = n.to_bytes(2, 'big')
bdata.append(nbytes)
print('{} {:2} {:016b} {} {:>5}'.format(a, b, n, nbytes.hex(), n))
print()
# Save the data to a file
fname = 'test1.bin'
with open(fname, 'wb') as f:
f.write(b''.join(bdata))
# And read it back in
with open(fname, 'rb') as f:
data = np.frombuffer(f.read(), dtype='>u2')
print(data)
# Get the leading bits
digbit1 = data >> 15
print(digbit1)
# Get the second bits
digbit2 = (data >> 14) & 1
print(digbit2)
# Get the 14 bit signed integers
data = ((data & mask) << 2).astype(np.int16) >> 2
print(data)
output
0 -6 0011111111111010 3ffa 16378
1 -5 0111111111111011 7ffb 32763
2 -4 1011111111111100 bffc 49148
3 -3 1111111111111101 fffd 65533
0 -2 0011111111111110 3ffe 16382
1 -1 0111111111111111 7fff 32767
2 0 1000000000000000 8000 32768
3 1 1100000000000001 c001 49153
0 2 0000000000000010 0002 2
1 3 0100000000000011 4003 16387
2 4 1000000000000100 8004 32772
3 5 1100000000000101 c005 49157
[16378 32763 49148 65533 16382 32767 32768 49153 2 16387 32772 49157]
[0 0 1 1 0 0 1 1 0 0 1 1]
[0 1 0 1 0 1 0 1 0 1 0 1]
[-6 -5 -4 -3 -2 -1 0 1 2 3 4 5]
If you do need to use little-endian byte ordering, just change the dtype to '<u2' in the np.frombuffer call. And to test it, change 'big' to 'little' in the n.to_bytes call in the fake data making section.
I have a file which has many lines and each row would look like below:
10 55 19 51 2 9 96 64 60 2 45 39 99 60 34 100 33 71 49 13
77 3 32 100 68 90 44 100 10 52 96 95 36 50 96 39 81 25 26 13
Each line as numbers separated by space and each line( row is of different length)
How can I find the average of each row?
How can I find Sum of all the row wise averages?
Preferred language Python
Below code does the task mentioned:
def rowAverageSum(filename):
import numpy as np
FullMean = 0
li = [map(int, x) for x in [i.strip().split() for i in open(filename).readlines()]]
i=0
while i<len(li):
for k in li:
print "Mean of row ",i+1,":",np.mean(k)
FullMean+=np.mean(k)
i+=1
print "***************************"
print "Grand Average:",FullMean
print "***************************"
Using two utility functions words (to get the words in a line) and average (to get the average of a sequence of integers), I'd start wth something like
def words(s):
return (w for w in s.strip().split())
def average(l):
return sum(l) / len(l)
with open('input.txt') as f:
averages = [average(map(int, words(line))) for line in f]
total = sum(averages)
I like the total = sum(averages) part which very closely resembles your second requirement (the sum of all averages). :-)
I used map(int, words(line)) (to convert a list of strings to a list of integers) simply because it's shorter than [int(x) for x in words(line)] even though the latter would most certainly be considered to be "more Pythonic".
how about trying this in a short way?
avg_per_row = [];
avg_all_row = 0;
f1 = open("myfile") # Default mode is read
for line in f1:
temp = line.split();
avg = sum([int(x) for x in temp])/length(temp)
avg_per_row.append(avg); # Average per row
avg_all_row = sum(avg_per_row)/len(avg_per_row) # Average for all averages
Very compressed, but should work for you
3 / 2 is 1 in the Python. So you want to float result you should convert float.
float(3) / 2 is 1.5
>>> s = '''10 55 19 51 2 9 96 64 60 2 45 39 99 60 34 100 33 71 49 13
77 3 32 100 68 90 44 100 10 52 96 95 36 50 96 39 81 25 26 13'''
>>> line_averages = []
>>> for line in s.splitlines():
... line_averages.append(sum([ float(ix) for ix in line.split()]) / len(line.split()))
...
>>> line_averages
[45.55, 56.65]
>>> sum(line_averages)
102.19999999999999
Or you can use reduce
>>> for line in s.splitlines():
... line_averages.append(reduce(lambda x,y: int(x) + int(y), line.split()) / len(line.split()))
>>> line_averages
[45, 56]
>>> reduce(lambda x,y: int(x) + int(y), line_averages)
101
>>> f = open('yourfile')
>>> averages = [ sum(map(float,x.strip().split()))/len(x.strip().split()) for x in f ]
>>> averages
[45.55, 56.65]
>>> sum(averages)
102.19999999999999
>>> sum(averages)/len(averages)
51.099999999999994
strip removes '\n' then split will split on whitespace will give list of the numbers, map will convert all numbers to float type. sum will sum all numbers.
if you don't understand above code, you can see this , its same as above but expanded :
>>> f = open('ll.txt')
>>> averages = []
>>> for x in f:
... x = x.strip() # removes newline character
... x = x.split() # split the lines on whitespaces and produces list of numbers
... x = [ float(i) for i in x ] # convert all number to type float
... avg = sum(x)/len(x) # calculate average ans store to variable avg
... averages.append(avg) # append the avg to list averages
...
>>> averages
[45.55, 56.65]
>>> sum(averages)/len(averages)
51.099999999999994