I am trying to convert a function from C# to python.
My C# code:
static string Base64Encode(string plainText)
{
char[] arr = plainText.ToCharArray();
List<byte> code16 = new List<byte>();
int i = 1;
string note = "";
foreach (char row in arr)
{
if (i == 1)
{
note += "0x" + row;
}
else if (i == 2)
{
note += row;
code16.Add(Convert.ToByte(note, 16));
note = "";
i = 0;
}
i++;
}
return System.Convert.ToBase64String(code16.ToArray());
}
My Python code:
def Base64Ecode(plainText):
code16 = []
i = 1
note = ''
for row in plainText:
if i == 1:
note += '0x' + row
elif i == 2:
note += row
code16.append(int(note, 16))
note = ''
i = 0
i += 1
test = ''
for blah in code16:
test += chr(blah)
print(base64.b64encode(test.encode()))
Both code16 values are the same but I have an issue when I try to base64 encode the data.
C# takes a byte array but pyton takes a string and I am getting two different results.
string.encode() uses the utf-8 encoding by default, which probably creates some multi-byte chars you don't want.
Use string.encode("latin1") to create bytes from 00 to FF.
That said, there is an easier method in python to convert a Hex-String to a bytearray (or bytes object):
base64.b64encode(bytes.fromhex(plainText))
gives the same result as your function.
Related
I have a csv file with 3 columns, the third column contains the type of data that the row is (Training/PublicTest/PrivateTest), and to get that data im running , then i use an if statement to check if the usage is equal to a keyword, and then do some stuff to that row it if its. For some reason it won't detect the usage, and ever row is labeled with a usage, and spelling/capitlization is correct, not sure what to do?
EDIT
Since this csv file is for a machine learning model im currently working, it is huge and it would be much easier if I provided the link: https://www.kaggle.com/ashishbansal23/emotion-recognition. As for the comments regarding printing "usage", etc, i did that and the output was just the word Training, PublicTest, or PrivateTest.
for row in open(path):
idx = 0
real_idx = idx + 1
with open(path, "r") as c:
emotion, image, usage = c.readlines()[real_idx].split(",")
if usage == "Training\n":
train_labels.append(int(emotion))
imageArr = []
imageArr.append(image)
train_images.append(imageArr)
elif usage == "PublicTest\n" or usage == "PrivateTest\n":
test_labels.append(int(emotion))
imageArr = []
imageArr.append(image)
test_images.append(imageArr)
else:
print("This row was not assigned to any usage!")
idx += 1
def load_data():
return train_images, train_labels, test_images, test_labels
format_data(path)
load_data()```
Hi there.
I would read the CSV file into a string[] like this...
string[] lines = File.ReadAllLines(#csv); // read all CSV lines
int limit = lines.count; // no. of rows
determine the no. of columns based on the "," in the first row
System.Text.RegularExpressions.Regex rex = new System.Text.RegularExpressions.Regex(",");
string fs = lines[0]; // First string in lines
n = rex.Matches(fs).Count; // how many "," in line
Next, get the number of rows and cols...
int rows = limit + 1; // no. of rows
int cols = oc + 1; // no. of cols
public static string[,] table; // The CSV table
table = new string[rows, cols]; // CSV table [rows,cols]
Then break each row in separate values and store'em in the table, that way you can check every item independently
string dummy = "";
string ts = "";
int p = 0;
for (int j = 0; j < rows; j++)
{
ts = lines[r].Trim();
for (int k = 1; k < cols; k++) // break the line into its different parts
{
p = ts.IndexOf(","); // position of 1st comma
if (p == -1)
{
if (ts != "" & ts != null)
table[j, k] = ts.Trim().ToUpper();
else
table[j, k] = "X"; // insert an X if no data
continue;
}
dummy = ts.Substring(0, p);
table[j, k] = dummy.Trim.ToUpper();
}
}
There, you have an array [n,m] with separate values which you can manage
I'm stuck on turning this JS anagram problem into Python solution using the same approach.
Here is the problem:
Here is the JavaScript solution:
if (first.length !== second.length) {
return false;
}
const lookup = {};
for (let i = 0; i < first.length; i++) {
let letter = first[i];
// if letter exists, increment, otherwise set to 1
lookup[letter] ? (lookup[letter] += 1) : (lookup[letter] = 1);
}
for (let i = 0; i < second.length; i++) {
let letter = second[i];
// can't find letter or letter is zero then it's not an anagram
if (!lookup[letter]) {
return false;
} else {
lookup[letter] -= 1;
}
}
return true;
}
console.log(validAnagram('anagram', 'nagaram'));
And here is my Python code using the same approach:
if len(first) != len(second):
return False
lookup = {}
for char in first:
letter = first[char]
if lookup[letter]:
lookup[letter] += 1
else:
lookup[letter] = 1
for char in second:
letter = second[char]
if not lookup[letter]:
return False
else:
lookup[letter] -= 1
return True
print(valid_anagram("anagram", "nagaram"))
This is the error I'm getting when I run my Python solution:
letter = first[char] TypeError: string indices must be integers
Here's the same solution, that uses dict to count letters like your Java code:
from collections import Counter
def valid_anagram( str1, str2 ) :
return Counter(str1) == Counter(str2)
testing:
>>> valid_anagram('anagram', 'nagaram')
True
>>>
I wrote below, and I write here again, the whole point of using python is not reinventing the wheel and use existing libraries to make the code compact, fast and easy to understand.
Take for example your code:
for char in first:
letter = first[char]
if lookup[letter]:
lookup[letter] += 1
else:
lookup[letter] = 1
This can be rewritten as:
lookup = dict()
for letter in first:
if lookup[letter]:
lookup[letter] += 1
else:
lookup[letter] = 1
Or, better yet:
lookup = Counter()
for letter in first:
lookup[letter] += 1
Or, even better:
lookup = Counter( first )
So why waste time and space....
You are attempting to pass in a string to get the index instead of passing in an integer.
first = "hello"
for char in first:
print(char)
Output:
h
e
l
l
o
To get the index use this:
for char in range(len(first)):
print(char)
Output:
0
1
2
3
4
Here is a simpler solution
def valid_anagram(str1, str2):
list_str1 = list(str1)
list_str1.sort()
list_str2 = list(str2)
list_str2.sort()
return (list_str1 == list_str2)
I'm trying to solve this problem. I have seen other solutions that involve lists and using recursion but I'm interested in learning how to solve this with loops and I can't seem to get the right output.
(i.e. no regular expressions, no tuples, no methods of string, etc.)
input: caaabbbaacdddd
expected output:empty string
input:abbabd
expected output:bd
below is my code i have found other methods to solve this problem im just looking for the most basic solution for this.
answer = input("enter a string: ")
new_answer = ""
#while answer != new_answer:
if answer == "":
print("goodBye!")
#break
p = ""
for c in answer:
if p != c:
new_answer += p
p = c
else:
p = c
print(new_answer)
the commented out part is to make the whole program loop through to verify thier is no more duplicates.
The simplest loop-based solution would be:
result = ""
for i in answer:
if result == "" or result[-1] != i:
result += i
You could also use itertools.groupby which does what you're looking for:
print("".join([i for i in map(lambda x: x[0], itertools.groupby(answer))])
Try this! Only issue in your logic is that you are not removibg the character which is being repeated once its added tk the new_answer.
count = 0
for c in answer:
if p != c:
new_answer += p
p = c
else:
new_answer = new_answer.replace(c,””,count)
p = c
count += 1
print(new_answer)
Simplifying it even more without replace function:
count = 0
for c in answer:
if p != c:
new_answer += p
p = c
else:
if count == 0:
new_answer =“”
else:
new_answer=new_answer[:count-1]
count -=1
p = c
count += 1
print(new_answer)
I would do like this with javascript ( i know it's not python, but same logic):
let answer = 'acacbascsacasceoidfewfje';
for(i=0;i<answer.length;i++){
if(obj[answer.substr(i,1)] === undefined){
obj[answer.substr(i,1)] = 1
}else{
obj[answer.substr(i,1)]++;
}
}
JSON.stringify(obj)
Results:
"{"a":4,"c":4,"b":1,"s":2,"e":2,"o":1,"i":1,"d":1,"f":1,"w":1,"j":1}"
It seems you are just comparing the current character with the immediate previous character. You can use the in operator:
if char in String
for all 26 characters
You could also make a dictionary for all 26 characters if you can use that (since you are only using loops)
public class RemoveAdjacentDuplicates {
public static void main(String[] args) {
System.out.println(removeDuplicates("abbabd"));
}
public static String removeDuplicates(String S) {
char[] stack = new char[S.length()];
int i = 0;
for(int j = 0 ; j < S.length() ; j++) {
char currentChar = S.charAt(j);
if(i > 0 && stack[i-1] == currentChar) {
i--;
}else {
stack[i] = currentChar;
i++;
}
}
return new String(stack , 0 , i);
}
}
Results of the program are :
input: caaabbbaacdddd
output:empty string
input:abbabd
output:bd
Working on below algorithm puzzle to decode a string containing numbers into characters. Post full problem statement and reference code. Actually I referred a few solutions, and it seems all solutions I found decode from back to the front, and I think decode from front to end should also be fine, just wondering if any special benefits or considerations why for this problem, it is better to decode from back to front? Thanks.
A message containing letters from A-Z is being encoded to numbers using the following mapping:
'A' -> 1
'B' -> 2
...
'Z' -> 26
Given an encoded message containing digits, determine the total number of ways to decode it.
For example,
Given encoded message "12", it could be decoded as "AB" (1 2) or "L" (12).
The number of ways decoding "12" is 2.
public class Solution {
public int numDecodings(String s) {
int n = s.length();
if (n == 0) return 0;
int[] memo = new int[n+1];
memo[n] = 1;
memo[n-1] = s.charAt(n-1) != '0' ? 1 : 0;
for (int i = n - 2; i >= 0; i--)
if (s.charAt(i) == '0') continue;
else memo[i] = (Integer.parseInt(s.substring(i,i+2))<=26) ? memo[i+1]+memo[i+2] : memo[i+1];
return memo[0];
}
}
thanks in advance,
Lin
There will be no difference whether you decode the string from front-to-back or back-to-front if you break it into sub-strings and store their results.
This implements front-to-back approach:
def decode_string(st):
result_dict = {st[0]:1}
for i in xrange(2,len(st)+1):
if int(st[i-1]) == 0:
if int(st[i-2]) not in [1,2]:
return "Not possible to decode"
result_dict[st[:i]] = 0
else:
result_dict[st[:i]] = result_dict[st[:i-1]]
if int(st[i-2:i]) < 27 and st[i-2] != '0':
result_dict[st[:i]] = result_dict[st[:i]] + result_dict.get(st[:i-2],1)
return result_dict[st]
print decode_string("125312")
result_dict contains all the possibilities for incremental sub-strings. Initialize with first character
Special check for '0' character because the only acceptable values for 0 are 10 and 20. So break from the loop if input contains something else
Then for each index check whether the combination with the previous index is a character (combination < 27) or not. If true, add the result of string upto index-2 to it.
Store the result of each incremental sub-string in the dictionary
Result:
The result_dict contains values like this:
{'12': 2, '12531': 3, '1': 1, '125312': 6, '125': 3, '1253': 3}
So result_dict[st] gives the required answer
Using Lists is a better idea
def decode_string(st):
result_list = [1]
for i in xrange(2,len(st)+1):
if int(st[i-1]) == 0:
if int(st[i-2]) not in [1,2]:
return "Not possible to decode"
result_list.append(0)
else:
result_list.append(result_list[i-2])
if int(st[i-2:i]) < 27 and st[i-2] != '0':
if i>2:
result_list[i-1] = result_list[i-1] + result_list[i-3]
else:
result_list[i-1] = result_list[i-1] + 1
print result_list
return result_list[-1]
print decode_string("125312")
I have tried methods using the struct module, as shown by the lines commented out in my code, but it didn't work out. Basically I have two options: I can either write the binary data code by code (my code are sequences of bits of length varying from 3 to 13 bits), or convert the whole string of n characters (n=25000+ in this case) to binary data. But I don't know how to implement either methods. Code:
import heapq
import binascii
import struct
def createFrequencyTupleList(inputFile):
frequencyDic = {}
intputFile = open(inputFile, 'r')
for line in intputFile:
for char in line:
if char in frequencyDic.keys():
frequencyDic[char] += 1
else:
frequencyDic[char] = 1
intputFile.close()
tupleList = []
for myKey in frequencyDic:
tupleList.append((frequencyDic[myKey],myKey))
return tupleList
def createHuffmanTree(frequencyList):
heapq.heapify(frequencyList)
n = len(frequencyList)
for i in range(1,n):
left = heapq.heappop(frequencyList)
right = heapq.heappop(frequencyList)
newNode = (left[0] + right[0], left, right)
heapq.heappush(frequencyList, newNode)
return frequencyList[0]
def printHuffmanTree(myTree, someCode,prefix=''):
if len(myTree) == 2:
someCode.append((myTree[1] + "#" + prefix))
else:
printHuffmanTree(myTree[1], someCode,prefix + '0')
printHuffmanTree(myTree[2], someCode,prefix + '1')
def parseCode(char, myCode):
for k in myCode:
if char == k[0]:
return k[2:]
if __name__ == '__main__':
myList = createFrequencyTupleList('input')
myHTree = createHuffmanTree(myList)
myCode = []
printHuffmanTree(myHTree, myCode)
inputFile = open('input', 'r')
outputFile = open('encoded_file2', "w+b")
asciiString = ''
n=0
for line in inputFile:
for char in line:
#outputFile.write(parseCode(char, myCode))
asciiString += parseCode(char, myCode)
n += len(parseCode(char, myCode))
#values = asciiString
#print n
#s = struct.Struct('25216s')
#packed_data = s.pack(values)
#print packed_data
inputFile.close()
#outputFile.write(packed_data)
outputFile.close()
You're looking for this:
packed_data = ''.join(chr(int(asciiString[i:i+8], 2))
for i in range(0, len(asciiString), 8))
It will take 8 bits at a time from the asciiString, interpret it as an integer, and output the corresponding byte.
Your problem here is that this requires the length of asciiString to be a multiple of 8 bits to work correctly. If not, you'll insert zero bits before the last few real bits.
So you need to store the number of bits in the last byte somewhere, so you know to ignore those bits when you get them back, instead of interpreting them as zeros. You could try:
packed_data = chr(len(asciiString) % 8) + packed_data
Then when you read it back:
packed_input = coded_file.read()
last_byte_length, packed_input, last_byte = (packed_input[0],
packed_input[1:-1],
packed_input[-1])
if not last_byte_length: last_byte_length = 8
ascii_input = ''.join(chain((bin(ord(byte))[2:].zfill(8) for byte in packed_input),
tuple(bin(ord(last_byte))[2:].zfill(last_byte_length),)))
# OR
# ascii_input = ''.join(chain(('{0:0=8b}'.format(byte) for byte in packed_input),
# tuple(('{0:0=' + str(last_byte_length) + '8b}').format(last_byte),)))
Edit: You either need to strip '0b' from the strings returned by bin() or, on 2.6 or newer, preferably use the new, alternate versions I added that use string formatting instead of bin(), slicing, and zfill().
Edit: Thanks eryksun, good to use chain to avoid making a copy of the ASCII string. Also, need to call ord(byte) in the bin() version.