I am trying to read from the powerball winning numbers file:
http://www.powerball.com/powerball/winnums-text.txt
I am trying to get it line by line and I have this code:
import urllib.request
with urllib.request.urlopen("http://www.powerball.com/powerball/winnums-text.txt") as file:
next(file)
for line in file:
line.lstrip("b'")
line.rstrip(" \r\n'")
print(line)
Each line in the file prints out like this:
b'12/06/1997 15 26 28 08 43 36 \r\n'
b'12/03/1997 18 09 14 47 42 32 \r\n'
b'11/29/1997 11 27 13 02 31 23 \r\n'
b'11/26/1997 15 46 34 23 40 35 \r\n'
b'11/22/1997 22 31 03 07 14 02 \r\n'
I am getting the error:
File "powerball.py", line 5, in <module>
line.lstrip("b'")
TypeError: 'str' does not support the buffer interface
I am trying to get rid of the excess characters and make the line like this:
12/06/1997 15 26 28 08 43 36
How do I fix this?
As someone already mentioned, the file is being read in binary mode. You need to convert the string to a text encoding format.
You can solve this with:
line = line.decode("utf-8","ignore")
This should give you the behaviour you expect.
I highly recommend use use pandas for this kind of IO, it will handle the http request, the parsing, everything in a single line of code; as a bonus you can use it for your data analysis too:
import pandas as pd
df = pd.read_csv('http://www.powerball.com/powerball/winnums-text.txt')
print(df)
Draw Date WB1 WB2 WB3 WB4 WB5 PB PP
0 02/24/2016 67 21 65 31 64 05 3
1 02/20/2016 54 12 11 16 15 25 5
2 02/17/2016 29 27 07 40 17 25 2
3 02/13/2016 07 15 36 18 19 20 2
4 02/10/2016 02 62 40 50 03 05 2
5 02/06/2016 13 04 36 31 52 08 3
6 02/03/2016 26 60 67 31 28 23 3
7 01/30/2016 16 05 12 31 43 18 4
8 01/27/2016 40 52 03 67 12 21 2
9 01/23/2016 32 22 40 69 34 19 4
10 01/20/2016 44 05 39 69 47 24 5
11 01/16/2016 61 52 51 64 03 06 2
line is a byte sequence, not a string. Convert it to a string using the str function.
import urllib.request
with urllib.request.urlopen("http://www.powerball.com/powerball/winnums-text.txt") as file:
next(file)
for bline in file:
line = str(bline, "utf-8")
print(line)
for line in file:
line.lstrip("b'")
line.rstrip(" \r\n'")
print(line)
You have mistaken the representation of the value, versus the value itself.
The values coming from that file are not text (str); they are byte sequences (bytes), which Python's programmer representation communicates to you by enclosing the string in b'…'. Those enclosing characters are not part of the value; you won't succeed in removing them.
Instead, you need to create a text string from the bytes. You do this by telling the byte sequence to decode itself:
for line_bytes in file:
line = line_bytes.decode("utf-8")
print(line)
This requires knowing the text codec for that byte sequence (the above code assumes “utf-8”). You can interrogate the HTTP response to ask the codec, unless you know how to get it elsewhere.
An alternative would be to open the file such that it knows its own text codec; then the items you retrieve from it would already be text.
Related
Strange issue, when I run this code:
data = open("data.txt", "r")
output = open("output.txt", "w")
for line in data:
output.write(line)
It will only start to write onto the output file at line 22
data.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
output.txt
22
23
24
25
26
27
28
29
30
This only happens when running it in a JupyterLab notebook. Bug or feature? Or am I missing something?
Huh, strange because I tried almost similar code on my machine and it copies all the 30 inputs. However, the only different thing I did was use the absolute file path so my code was:
data = open("C:\Users\User\Jupyter Notebook\data.txt", "r")
outputs = open("C:\Users\User\Jupyter Notebook\outputs.txt", "w")
for line in data:
outputs.write(line)
Can you see if this method works?
I have raw data extracted from PDF and I decompressed the raw data and compressed it again.
I expected the same header and trailer, but the header was changed.
Original Hex Header
48 89 EC 57 ....
Converted Hex Header
78 9C EC BD ...
I dug into zlib compression and got header 48 also is one of zlib.header.
But mostly 78 is used for zlib compression.
It's my code which decompress and compress:
decompress_wbit = 12
compress_variable = 6
output_data = zlib.decompress(open(raw_data, "rb").read(), decompress_wbit)
output_data = zlib.compress(output_data, 6)
output_file = open(raw_data + '_', "wb")
output_file.write(output_data)
output_file.close()
I changed the decompress_wbit and compress_variable but still keeps 78.
So not sure how to get 48 as header.
Here is the short description about zlib.header.
CINFO (bits 12-15)
Indicates the window size as a power of two, from 0 (256 bytes) to 7 (32768 bytes). This will usually be 7. Higher values are not allowed.
CM (bits 8-11)
The compression method. Only Deflate (8) is allowed.
FLEVEL (bits 6-7)
Roughly indicates the compression level, from 0 (fast/low) to 3 (slow/high)
FDICT (bit 5)
Indicates whether a preset dictionary is used. This is usually 0. 1 is technically allowed, but I don't know of any Deflate formats that define preset dictionaries.
FCHECK (bits 0-4)
A checksum (5 bits, 0..31), whose value is calculated such that the entire value divides 31 with no remainder.
Typically, only the CINFO and FLEVEL fields can be freely changed, and FCHECK must be calculated based on the final value.* Assuming no preset dictionary, there is no choice in what the other fields contain, so a total of 32 possible headers are valid. Here they are:
FLEVEL: 0 1 2 3
CINFO:
0 08 1D 08 5B 08 99 08 D7
1 18 19 18 57 18 95 18 D3
2 28 15 28 53 28 91 28 CF
3 38 11 38 4F 38 8D 38 CB
4 48 0D 48 4B 48 89 48 C7
5 58 09 58 47 58 85 58 C3
6 68 05 68 43 68 81 68 DE
7 78 01 78 5E 78 9C 78 DA
Please let me know how to keep the zlib.header while decompression & compression
Thanks for your time.
I will first note that it doesn't matter. The data will be decompressed fine with that zlib header. Why do you care?
You are giving zlib.compress a small amount of data that permits a smaller window. Since it is permitted, the Python library is electing to compress with a smaller window.
A way to avoid that would be to use zlib.compressobj instead. Upon initiation, it doesn't know how much data you will be feeding it and will default to the largest window size.
Hi when I try to print a list, it prints out the directory and not the contents of win.txt. I'm trying to enumerate the txt into a list and split then append it to a, then do other things once get a to print. What am I doing wrong?
import os
win_path = os.path.join(home_dir, 'win.txt')
def roundedStr(num):
return str(int(round(num)))
a=[] # i declares outside the loop for recover later
for i,line in enumerate(win_path):
# files are iterable
if i==0:
t=line.split(' ')
else:
t=line.split(' ')
t[1:6]= map(int,t[1:6])
a.append(t) ## a have all the data
a.pop(0)
print a
prints out directory, like example c:\workspace\win.txt
NOT what I want
I want it to print the contents of win.txt
which takes t[1:6] as integers, like
11 21 31 41 59 21
and prints that out like that same way.
win.txt contains this
05/06/2017 11 21 31 41 59 21 3
05/03/2017 17 18 49 59 66 9 2
04/29/2017 22 23 24 45 62 5 2
04/26/2017 01 15 18 26 51 26 4
04/22/2017 21 39 41 48 63 6 3
04/19/2017 01 19 37 40 52 15 3
04/15/2017 05 22 26 45 61 13 3
04/12/2017 08 14 61 63 68 24 2
04/08/2017 23 36 51 53 60 15 2
04/05/2017 08 20 46 53 54 13 2
I just want [1]-[6]
I think what you want is to open the file 'win.txt', and read its content. Using the open function to create a file object, and a with block to scope it. See my example below. This will read the file, and take the first 6 numbers of each line.
import os
win_path = os.path.join(home_dir, 'win.txt')
a=[] # i declares outside the loop for recover later
with open(win_path, 'r') as file:
for i,line in enumerate(file):
line = line.strip()
print(line)
if i==0:
t=line.split(' ')
else:
t=line.split(' ')
t[1:7]= map(int,t[1:7])
t = t[1:7]
a.append(t) ## a have all the data
a.pop(0)
print (a)
For index.csv file, its fourth column has ten numbers ranging from 1-5. Each number can be regarded as an index, and each index corresponds with an array of numbers in filename.csv.
The row number of filename.csv represents the index, and each row has three numbers. My question is about using a nesting loop to transfer the numbers in filename.csv to index.csv.
from numpy import genfromtxt
import numpy as np
import csv
data1 = genfromtxt('filename.csv', delimiter=',')
data2 = genfromtxt('index.csv', delimiter=',')
f = open('index.csv','wb')
write = csv.writer(f, delimiter=',',quoting=csv.QUOTE_ALL)
for row in data2:
for ch_row in data1:
if ( data2[row,3] == ch_row ):
write.writerow(data1[data2[row,3],:])
For example, the fourth column of index.csv contains 1,2,5,3,4,1,4,5,2,3 and filename.csv contains:
# filename.csv
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
What I need is to write the indexed row from filename.csv to index.csv and store these number in 5th, 6th and 7th column:
# index.csv
# 4 5 6 7
... 1 20 30 50
... 2 70 60 45
... 5 13 08 55
... 3 35 26 77
... 4 93 37 68
... 1 20 30 50
... 4 93 37 68
... 5 13 08 55
... 2 70 60 45
... 3 35 26 77
Can anyone help me solve this problem?
You need to indent your last 2 lines. Also, it looks like you are writing to the file from which you are reading.
I have a excel file and I want to read particular cell one by one using Python, and that value of cell I am going to use as one message to send to ECU (Electronic control Unit).
Could anyone please give me some idea? I have two colums for example as i have given below:
Request** Response**
Client -> Server Server -> Client
10 01 50 01
10 81 expected no answer
10 02 50 02
10 01 50 01
10 82 expected no answer
10 03 50 03
10 83 expected no answer
10 04 7F 10 12
10 00 7F 10 12
10 84 7F 10 12
10 FF 7F 10 12
10 01 00 7F 10 13
10 7F 10 13
Maybe this will help you parsing the excel file. Your code could look like this when you use this librabry:
import xlrd
book = xlrd.open_workbook("/path/to/your/file.xls")
first_sheet = book.sheet_by_index(0)
particular_cell_value = first_sheet.cell(12,34).value
# to something with this