Read string from html code and save to file [duplicate] - python

This question already has answers here:
Converting special characters to regular c#
(4 answers)
Closed 2 years ago.
How do I translate the bytes %C3%B8 in "testdoc%C3%B8%C3%B8%C3%B8.txt" to ø?
I tried the following which did not work:
var cont = response.Content.Headers.ContentDisposition?.FileName;
var bytes = Encoding.UTF8.GetBytes(cont);
var test = new string(bytes.Select(b => (char)b).ToArray());
var yourText = System.Text.Encoding.UTF8.GetString(bytes);

Don't bother with converting it to bytes at all. As has been noted in the comments, this is URL encoding, not UTF8.
Use HttpUtility in the System.Web namespace:
string input = "testdoc%C3%B8%C3%B8%C3%B8.txt";
string output = HttpUtility.UrlDecode(input);
Console.WriteLine(output); // testdocøøø.txt
Try it online

Related

Unknown pdf encoding from JSON response

I have an API that returns a pdf from json, but it just returns as a long string of integers like following
[{"status":"SUCCESS"},{"data":"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46,...
...,1,32,49,55,10,47,82,111,111,116,32,56,32,48,32,82,10,47,73,110,102,111,32,49,32,48,32,82,62,62,10,115,116,97,114,116,120,114,101,102,10,54,55,54,56,53,10,37,37,69,79,70"}
My questions are:
What is this encoding?
How to convert this into a pdf using python?
P.S: Here is the endpoint to get the full response.
The beginning of data is a hint that you actually have a list of the bytes values of the PDF file: it starts with the byte values of '%PDF-1.4'.
So you must first extract that curious string:
data = json_data[1]['data']
to have:
"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46, ..."
convert it to a list of int first, then a byte string (i if i >=0 else i+256 ensure positive values...):
intlist = [int(i) for i in data.split(",")]
b = bytes(i if i >=0 else i+256 for i in intlist)
to get b'%PDF-1.4\n%\xd3\xeb\xe9\xe1\n1 0 obj\n<</Title (11 CS-II Subjective Q...'
And finaly save that to a file:
with open('file.pdf', 'wb') as fd:
fd.write(b)

How do I make a text to binary? [duplicate]

This question already has answers here:
How to convert string to binary?
(9 answers)
Closed 2 years ago.
So I have been trying to make a text to binary converter but what I do is I convert every single letter and symbol to the binary equivalent but as you can imagine this takes long and i'm wondering if there is a shorter way to do this.
Text to binary convert :
name = "Name"
binary = ' '.join(format(ord(x), 'b') for x in name)
Binary to Text :
binary = '100001'
binary_values = binary.split()
ascii_string = ""
for binary_value in binary_values:
an_integer = int(binary_value, 2)
ascii_character = chr(an_integer)
ascii_string += ascii_character
print(ascii_string)
Here is my git repo
This programm convert text to binaries list :
a_string = "abc"
a_byte_array = bytearray(a_string, "utf8") #Create bytearray
byte_list = []
for byte in a_byte_array:
binary_representation = bin(byte) #Convert to binary
byte_list.append(binary_representation) #Add to list
print(byte_list)
There are plenty of binary modules on PyPI. You could just have the input go through one and print the result.

Change String to Float with (" ") in string [duplicate]

This question already has an answer here:
Reading CSV files in numpy where delimiter is ","
(1 answer)
Closed 4 years ago.
Ok, i have string like that in a file
"0.9986130595207214","16.923500061035156","16.477115631103516","245.2451171875","107.35090637207031","118.8438720703125","254.64633178710938","255.2373046875","264.1331481933594","28.91413116455078"
and i have multiple row.
how to change the data to float or number, i have problem because the item become ' "0.9986130595207214" '.
this code that i've write :
import numpy as np
data = np.loadtxt("data.csv",dtype=str,delimiter=',')
for y in data:
for x in y:
print(float(x))
and got error :
print(float(x)) ValueError: could not convert string to float:
'"0.9986130595207214"'
Thanks
From the error, you got:
x = '"0.9986130595207214"'
Thus, you first need to get rid of the brackets.
float(x.strip('"'))
Output:
0.9986130595207214

How to remove character outside a string from a tab delimited text [duplicate]

This question already has answers here:
How do I get rid of the b-prefix in a string in python?
(9 answers)
Closed 4 years ago.
I have a file lets say "Mrinq_Parts_Available.txt" which looks like this.
Source Date Category SubCategory Present Description Value Units Vendor Part No Package Box Name Location Quantity Ordered Used MOQ=1 MOQ=100 MOQ=1000 Comments Link
Digikey 29-May-15 RF Amplifier No 0.5 W RFMD RFPA3807 SOIC8 10 0 3.4 5V http://www.digikey.com/product-detail/en/RFPA3807TR13/689-1073-1-ND/2567207
I have a python code which does split of those lines.
def removeEmptyLines(inputFile):
with open(inputFile, 'rb') as f:
d = f.readlines()
k = []
for i in d:
k.append(i.split())
print (k)
if __name__=="__main__":
parts_database_file = "Mrinq_Parts_Available.txt"
removeEmptyLines(parts_database_file)
But the output is shown like this:
[b'Source', b'Date', b'Category', b'SubCategory', b'Present', b'Description', b'Value', b'Units', b'Vendor', b'Part', b'No', b'Package', b'Box', b'Name', b'Location', b'Quantity', b'Ordered', b'Used', b'MOQ=1', b'MOQ=100', b'MOQ=1000', b'Comments', b'Link']
[b'Digikey', b'29-May-15', b'RF', b'Amplifier', b'No', b'0.5', b'W', b'RFMD', b'RFPA3807', b'SOIC8', b'10', b'0', b'3.4', b'5V', b'http://www.digikey.com/product-detail/en/RFPA3807TR13/689-1073-1-ND/2567207']
How do I remove the 'b' preceding each parsed data?
Your file is clearly an ASCII file, so you should open it as an ASCII :
with open(inputFile, 'r') as f:

Read Unicode from CSV [duplicate]

This question already has answers here:
General Unicode/UTF-8 support for csv files in Python 2.6
(10 answers)
Closed 9 years ago.
I have a problem reading unicode characters from a csv. The csv file originally had elements with unicode tags:
"[u'Aeron\xe1utica']"
"[u'Ni\u0161']"
"[u'K\xfcnste']"
...
from which I had to remove the u'' tags to give a csv with
Aeron\xe1utica
Ni\u0161
K\xfcnste
....
Now I want to read the csv and output it into a file with the characters i.e.
Aeronáutica
Niš
Künste
....
I tried using the UnicodeWriter in the csv docs, but it gives the same output as the second list
Here's what I did to read and write:
c = open('foo.csv','r')
r = csv.reader(c)
for row in reader:
p = p + row
#The elements in p were ['Aeron\\xe1utica', 'Ni\\u0161', 'K\\xfcnste'...]
c = open('bar.csv','w')
c.write(codecs.BOM_UTF8)
writer = UnicodeWriter(c)
for row in p:
writer.writerow([row])
I also tried codecs.open('','','UTF-8') for both reading and writing, but it didn't help
It appears you have written Python lists directly to your CSV file, resulting in the [...] literal syntax instead of normal columns. You then removed most of the information that could have been used to turn the information back to Python lists with unicode strings again.
What you have left are Python unicode literals, but without the quotes. Use the unicode_escape to decode the values to Unicode again:
with open('foo.csv','r') as b0rken
for line in b0rken:
value = line.rstrip('\r\n').decode('unicode_escape')
print value
or add back the u'..' quoting, using a triple-quoted string in an attempt to avoid needing to escape embedded quotes:
with open('foo.csv','r') as b0rken
for line in b0rken:
value = literal_eval("u'''{}'''".format(line.rstrip('\r\n')))
print value
If you still have the original file (with the [u'...'] formatted lines), use the ast.literal_eval() function to turn those back into Python lists. No point in using the CSV module here:
from ast import literal_eval
with open('foo.csv','r') as b0rken
for line in b0rken:
lis = literal_eval(line)
value = lis[0]
print value
Demo with unicode_escape:
>>> for line in b0rken:
... print line.rstrip('\r\n').decode('unicode_escape')
...
Aeronáutica
Niš
Künste
École de l'Air

Categories

Resources