JupyterLab skips the first 22 lines when reading and writing a file - python

Strange issue, when I run this code:
data = open("data.txt", "r")
output = open("output.txt", "w")
for line in data:
output.write(line)
It will only start to write onto the output file at line 22
data.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
output.txt
22
23
24
25
26
27
28
29
30
This only happens when running it in a JupyterLab notebook. Bug or feature? Or am I missing something?

Huh, strange because I tried almost similar code on my machine and it copies all the 30 inputs. However, the only different thing I did was use the absolute file path so my code was:
data = open("C:\Users\User\Jupyter Notebook\data.txt", "r")
outputs = open("C:\Users\User\Jupyter Notebook\outputs.txt", "w")
for line in data:
outputs.write(line)
Can you see if this method works?

Related

Pandas dataframe Plotly line chart with two lines

I have a pandas dataframe as below and I would like to produce a few charts with the data. 'Name' column are the names of the accounts, 'Number' column is the number of users under each count, and the months columns are the login times of each account in every month.
Acc User Jan Feb Mar Apr May June
Nora 39 5 13 16 22 14 20
Bianca 53 14 31 22 21 20 29
Anna 65 30 17 18 28 12 13
Katie 46 9 12 30 34 25 15
Melissa 29 29 12 30 10 4 9
1st: I would like to monitor the trend of logins from January to May. One line illustrates Bianca's login and the other line illustrates everyone else's login.
2nd: I would like to monitor the percentage change of logins from January to May. One line illustrates Bianca's login percentage change and the other line illustrates everyone else's login percentage change.
Thank you for your time and assistance. I'm a beginner at this. I appreciate any help on this! Much appreciated!!
I suggest best approach to group is use categoricals. pct_change is not a direct aggregate function so it's a bit more involved to get it.
import io
import matplotlib.pyplot as plt
df = pd.read_csv(io.StringIO("""Acc User Jan Feb Mar Apr May June
Nora 39 5 13 16 22 14 20
Bianca 53 14 31 22 21 20 29
Anna 65 30 17 18 28 12 13
Katie 46 9 12 30 34 25 15
Melissa 29 29 12 30 10 4 9"""), sep="\s+")
# just setup 2 plot areas
fig, ax = plt.subplots(1,2, figsize=[20,5])
# want to divide data into 2 groups
df["grp"] = pd.Categorical(df["Acc"], ["Bianca","Others"])
df["grp"].fillna("Others", inplace=True)
# just get it out of the way...
df.drop(columns="User", inplace=True)
# simple plot where function exists directly. Not transform to get lines..
df.groupby("grp").sum().T.plot(ax=ax[0])
# a bit more sophisticated to get pct change...
df.groupby("grp").sum().T.assign(
Bianca=lambda x: x["Bianca"].pct_change().fillna(0)*100,
Others=lambda x: x["Others"].pct_change().fillna(0)*100
).plot(ax=ax[1])
output

What is wrong with the data that I am trying to send to matplotlib?

I have the following script:
import pandas
from collections import Counter
import matplotlib.pyplot as plt
while True:
data = [int(x) for x in raw_input("Enter the list containing the data: ").split()]
letter_counts = Counter(data)
df = pandas.DataFrame.from_dict(letter_counts, orient="index")
df.plot(kind="bar")
plt.show()
When I either type or copy and paste a series or numbers, for instance,
1 4 5 6 3
the script works perfectly and shows me the histogram. However, when I paste numbers from the output I get from a different terminal window, for instance:
13 13 16 16 16 16 9 9 9 9 9 15 15 15 15 20 20 20 20 20 22 22 22 22 13
13 13 13 12 12 12 12 12 16 16 16 16 15 15 15 15 15 15 15 15 15 15 15
15 15 22 22 22 22 22 15 15 15 15 13 13 13 13 13 18 18 18 18 10 10 10
10 12 12 12 12 12 10 10 10 10 20 20 20 20 20 15 15 15 15 15 15 15 15
17 17 17 17 17 13
The first time I enter the data, it works perfectly; however, when I enter it the second time, it doesn't do anything and then I have to hit enter again. It shows me the plot, but when I close it, it gives me the following error:
> Enter the list containing the data: Traceback (most recent call last):
> File "make_histo.py", line 9, in <module>
> df.plot(kind="bar") File "/usr/local/lib/python2.7/dist-packages/pandas/plotting/_core.py",
> line 2627, in __call__
> sort_columns=sort_columns, **kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/plotting/_core.py",
> line 1869, in plot_frame
> **kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/plotting/_core.py",
> line 1694, in _plot
> plot_obj.generate() File "/usr/local/lib/python2.7/dist-packages/pandas/plotting/_core.py",
> line 243, in generate
> self._compute_plot_data() File "/usr/local/lib/python2.7/dist-packages/pandas/plotting/_core.py",
> line 352, in _compute_plot_data
> 'plot'.format(numeric_data.__class__.__name__))
TypeError: Empty 'DataFrame': no numeric data to plot
What am I doing wrong?
I don't quite get the behavior you described: when I copy-paste the block of numbers from your question I get embedded line breaks and this causes raw_input() to get called multiple times.
A possible workaround for that problem is to make the program treat an empty line as end-of-input: the following very simple code accepts a copy-paste of your block of numbers OK on my system (Windows, Python 2.7):
while True:
print ("Enter the list containing the data: ")
lines = []
while True:
line = raw_input()
if (line):
lines.append(line.lstrip().strip())
else:
break
data = []
for line in lines:
for x in line.split():
data.append(int(x))
print data
Hope this may be helpful.

Python. Trying to print list but its only printing directory structure

Hi when I try to print a list, it prints out the directory and not the contents of win.txt. I'm trying to enumerate the txt into a list and split then append it to a, then do other things once get a to print. What am I doing wrong?
import os
win_path = os.path.join(home_dir, 'win.txt')
def roundedStr(num):
return str(int(round(num)))
a=[] # i declares outside the loop for recover later
for i,line in enumerate(win_path):
# files are iterable
if i==0:
t=line.split(' ')
else:
t=line.split(' ')
t[1:6]= map(int,t[1:6])
a.append(t) ## a have all the data
a.pop(0)
print a
prints out directory, like example c:\workspace\win.txt
NOT what I want
I want it to print the contents of win.txt
which takes t[1:6] as integers, like
11 21 31 41 59 21
and prints that out like that same way.
win.txt contains this
05/06/2017 11 21 31 41 59 21 3
05/03/2017 17 18 49 59 66 9 2
04/29/2017 22 23 24 45 62 5 2
04/26/2017 01 15 18 26 51 26 4
04/22/2017 21 39 41 48 63 6 3
04/19/2017 01 19 37 40 52 15 3
04/15/2017 05 22 26 45 61 13 3
04/12/2017 08 14 61 63 68 24 2
04/08/2017 23 36 51 53 60 15 2
04/05/2017 08 20 46 53 54 13 2
I just want [1]-[6]
I think what you want is to open the file 'win.txt', and read its content. Using the open function to create a file object, and a with block to scope it. See my example below. This will read the file, and take the first 6 numbers of each line.
import os
win_path = os.path.join(home_dir, 'win.txt')
a=[] # i declares outside the loop for recover later
with open(win_path, 'r') as file:
for i,line in enumerate(file):
line = line.strip()
print(line)
if i==0:
t=line.split(' ')
else:
t=line.split(' ')
t[1:7]= map(int,t[1:7])
t = t[1:7]
a.append(t) ## a have all the data
a.pop(0)
print (a)

Python error reading from file buffer instance

I am trying to read from the powerball winning numbers file:
http://www.powerball.com/powerball/winnums-text.txt
I am trying to get it line by line and I have this code:
import urllib.request
with urllib.request.urlopen("http://www.powerball.com/powerball/winnums-text.txt") as file:
next(file)
for line in file:
line.lstrip("b'")
line.rstrip(" \r\n'")
print(line)
Each line in the file prints out like this:
b'12/06/1997 15 26 28 08 43 36 \r\n'
b'12/03/1997 18 09 14 47 42 32 \r\n'
b'11/29/1997 11 27 13 02 31 23 \r\n'
b'11/26/1997 15 46 34 23 40 35 \r\n'
b'11/22/1997 22 31 03 07 14 02 \r\n'
I am getting the error:
File "powerball.py", line 5, in <module>
line.lstrip("b'")
TypeError: 'str' does not support the buffer interface
I am trying to get rid of the excess characters and make the line like this:
12/06/1997 15 26 28 08 43 36
How do I fix this?
As someone already mentioned, the file is being read in binary mode. You need to convert the string to a text encoding format.
You can solve this with:
line = line.decode("utf-8","ignore")
This should give you the behaviour you expect.
I highly recommend use use pandas for this kind of IO, it will handle the http request, the parsing, everything in a single line of code; as a bonus you can use it for your data analysis too:
import pandas as pd
df = pd.read_csv('http://www.powerball.com/powerball/winnums-text.txt')
print(df)
Draw Date WB1 WB2 WB3 WB4 WB5 PB PP
0 02/24/2016 67 21 65 31 64 05 3
1 02/20/2016 54 12 11 16 15 25 5
2 02/17/2016 29 27 07 40 17 25 2
3 02/13/2016 07 15 36 18 19 20 2
4 02/10/2016 02 62 40 50 03 05 2
5 02/06/2016 13 04 36 31 52 08 3
6 02/03/2016 26 60 67 31 28 23 3
7 01/30/2016 16 05 12 31 43 18 4
8 01/27/2016 40 52 03 67 12 21 2
9 01/23/2016 32 22 40 69 34 19 4
10 01/20/2016 44 05 39 69 47 24 5
11 01/16/2016 61 52 51 64 03 06 2
line is a byte sequence, not a string. Convert it to a string using the str function.
import urllib.request
with urllib.request.urlopen("http://www.powerball.com/powerball/winnums-text.txt") as file:
next(file)
for bline in file:
line = str(bline, "utf-8")
print(line)
for line in file:
line.lstrip("b'")
line.rstrip(" \r\n'")
print(line)
You have mistaken the representation of the value, versus the value itself.
The values coming from that file are not text (str); they are byte sequences (bytes), which Python's programmer representation communicates to you by enclosing the string in b'…'. Those enclosing characters are not part of the value; you won't succeed in removing them.
Instead, you need to create a text string from the bytes. You do this by telling the byte sequence to decode itself:
for line_bytes in file:
line = line_bytes.decode("utf-8")
print(line)
This requires knowing the text codec for that byte sequence (the above code assumes “utf-8”). You can interrogate the HTTP response to ask the codec, unless you know how to get it elsewhere.
An alternative would be to open the file such that it knows its own text codec; then the items you retrieve from it would already be text.

Python csv reader-zipping reader with range

I am having a really simple csv file of this type (i have put the Fibonacci numbers for example):
nn,number
1,1
2,1
3,2
4,3
5,5
6,8
7,13
8,21
9,34
10,55
11,89
12,144
13,233
14,377
15,610
16,987
17,1597
18,2584
19,4181
20,6765
21,10946
22,17711
23,28657
24,46368
25,75025
26,121393
27,196418
and i am just trying to bulk process the rows in the following manner (the fib numbers are irrelevant)
import csv
b=0
s=1
i=1
itera=0
maximum=10000
bulk_save=10
csv_file='really_simple.csv'
fo = open(csv_file)
reader = csv.reader(fo)
##Skipping headers
_headers=reader.next()
while (s>0) and itera<maximum:
print 'processing...'
b+=1
tobesaved=[]
for row,i in zip(reader,range(1,bulk_save+1)):
itera+=1
tobesaved.append(row)
print itera,row[0]
s=len(tobesaved)
print 'chunk no '+str(b)+' processed '+str(s)+' rows'
print 'Exit.'
The output i get is a bit weird (as if the reader is omitting an entry at the end of the loop)
processing...
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
chunk no 1 commited 10 rows
processing...
11 12
12 13
13 14
14 15
15 16
16 17
17 18
18 19
19 20
20 21
chunk no 2 commited 10 rows
processing...
21 23
22 24
23 25
24 26
25 27
chunk no 3 commited 5 rows
processing...
chunk no 4 commited 0 rows
Exit.
Do you have any idea what the problem could be?
My guess is the zip function.
The reason i have the code like that (getting chunks of data )is that i need to save in bulk csv entries to sqlite3 database (using executemany and commit at the end of every zip loop, so that I will not overload my memory.
Thanks!
Try following:
import csv
def process(rows, chunk_no):
for no, data in rows:
print no, data
print 'chunk no {} process {} rows'.format(chunk_no, len(rows))
csv_file='really_simple.csv'
with open(csv_file) as fo:
reader = csv.reader(fo)
_headers = reader.next()
chunk_no = 1
tobesaved = []
for row in reader:
tobesaved.append(row)
if len(tobesaved) == 10:
process(tobesaved, chunk_no)
chunk_no += 1
tobesaved = []
if tobesaved:
process(tobesaved, chunk_no)
prints
1 1
2 1
3 2
4 3
5 5
6 8
7 13
8 21
9 34
10 55
chunk no 1 process 10 rows
11 89
12 144
13 233
14 377
15 610
16 987
17 1597
18 2584
19 4181
20 6765
chunk no 2 process 10 rows
21 10946
22 17711
23 28657
24 46368
25 75025
26 121393
27 196418
chunk no 3 process 7 rows

Categories

Resources