How do I fix a .ipynb file?

How do I fix a .ipynb file? - python

I use Jupyter Notebook with Python. I'm not a programmer but I've been learning Python for about a year now.
I was working with some text file that I saved on the same folder of my notebooks, and I accidentally opened a .ipynb file and altered it.
As far as I can tell, I just pasted a text string. I know what I pasted, and I erased it, but now jupyter notebook can't recognize the file. Message is:
Unreadable Notebook: C:\Users\untal\Python\notas analyser.ipynb
NotJSONError('Notebook does not appear to be JSON: \'\\ufeff{\\n "cells": [\\n {\\n "cell_typ...',)
I'm not even close to be able to understand the text file to look for the problem and fix it... I don't even know if that's an option.
Is there any tool or method I can use to recover my notebook?

A possible way to recover corrupted Jupyter notebook files, whether it contains text or not (size = 0KB), is to go to the project folder and display the hidden files.
Once the hidden files are displayed if you are lucky you will see a folder named '.ipynb_checkpoints'.
Open this folder and you should find your notebook.

Using Pycharm worked for me. I wasn't able to actually fix the file, so I had to copy one by one each of the original file's cells to a functional file that I created in python and then opened with Pycharm... After each cell copied, I opened the file with Jupyter to check and fix any problems (going back to Pycharm). Pretty sure is not an optimal solution, but I could save all of my work, so it was an effective solution that can be used by begginers!

The .ipynb file is a JSON file you can try to correct its syntax in online JSON editors. There are many if you look on google. (for example: https://jsoneditoronline.org/#)
Once you have a working JSON file run this text on another notebook to print the cells code.
import json
with open('./1-day.txt', 'r') as f:
data = json.load(f)
for cell in data['cells']:
if 'source' in cell:
[print(i, end='') for i in cell['source'] ]
print('\n#')

Related

Jupyter Notebook issue

I ran some commands on Jupyter Notebook and expected to get a printed output containing data in tabulated form in a .csv file, but then i get an uncompleted output
This is the result i get from the .csv file
I ran this command;
df1=pandas.read_csv("supermarkets.csv", on_bad_lines='skip')
df1
I expected to get a printed output in a tabulated like in the image attached......
The data get printed in well tabulated form here
Here is a link to the online version of the file
[pythonhow.com/supermarkets.csv]

Getting good, clean quality data where the file extension correctly matches the actual content is often a challenge. Assessing the state of the input data is generally always a very important first step.
It appears the data you are trying to get is also online here. Github will render that as a table in the browser because it has a viewer mode. To look at the 'raw' file content, click here. You'll see it is nice comma-delimited file with columns separated by commas and rows each on a different line. The header with the column names is on the first line.
Now open in a good text editor the file you have that you are working with and compare it to the content I pointed you at. That should guide you on what is the issue.
At this point you may just wish to switch to using the version of the file that I pointed you at.
Use the link below to obtain it as proper csv file:
https://raw.githubusercontent.com/kenvilar/data-analysis-using-python/master/supermarkets.csv
You should be able to paste that link in your browser and then right click on the page and choose 'Save as..' to download it to your locak machine. The obtained file should open just fine using the code you showed in the screenshot in your post here.
Please work on writing better questions with specific titles, see here for guidance. The title at present is overly broad and is actually not accurate. This code would not work with the data you apparently have even if you were running it inside a Python code-based script. And so it is not a Jupyter notebook issue. For how to think about making it specific, a good thing to keep in mind is to write for your future self. If you continue to use notebooks you'll have hundreds that would be considered a 'Jupyter Notebook issue', but what makes this issue different from those?

I believe there is an issue with your csv file, not the code.
To me it looks like the data in your csv file are written in json format.
Have you opened the supermarkets.csv file using excel? it should look like a table, not a json formatted file.

did you try df1.show() to see if the csv got read in the first place?

How to read csv data from kaggle in pycharm

Hi there if anyone can answer this.
I am trying to read csv data from kaggle https://www.kaggle.com/stackoverflow/stack-overflow-2018-developer-survey in pycharm and online jupyter notebook but I can not find a command how to read.
I know how to read data if it is in my computer but not know from the online web. I will so grateful if anyone can help me in that.

From the page you linked to, you have a couple of options.
Create a notebook and the input files will be automatically included. Run the first cell that's generated for you and it will print out the paths to the input files. You can use Pandas read_csv in the notebook to load the data using those paths.
Expand the input folder in the Data pane (top right of the notebook), click on the file you want and look for the Download link at the top right of the data grid.

Saving vcf to npz and then loading to Python. No error, but no output

I'm trying to analyze genome data from a huge (1.75GB compressed) vcf file using Python. The technician suggested I use scikit-allel and gave me this link: http://alimanfoo.github.io/2017/06/14/read-vcf.html. I wasn't able to install the module on my computer; but I successfully installed it on a cluster which I access through vpn. There, I successfully opened the file and have been able to access the data. But I can only access the cluster through a command line interface, and that isn't as friendly as the Spyder I have on my computer; so I've been trying to bring the data back. The GitHub link says I can save the data into a npz file which I can read straight into Python's numpy; so I've been trying to do that.
First, I tried allel.vcf_to_npz('existing_name.vcf','new_name.npz',fields='calldata/GT') on the cluster. This created a (suspiciously small) new npz file on the cluster, which I downloaded. But when I opened up Spyder on my computer and typed genotypes=np.load('real_genotypes.npz'), no new variable called genotypes appeared in the Variable Explorer. Adding the line print(genotypes) produces <numpy.lib.npyio.NpzFile object at 0x00000__________>
Next, thinking that I should copy everything to be sure, I tried allel.vcf_to_npz('existing_name.vcf','new_name.npz',fields='*',overwrite=True)
This created a 2.10GB file. After a lengthy download, I tried the same thing, but got the same results: No new variable when I try to np.load the file, and <numpy.lib.npyio.NpzFile object at 0x000001DB0DEC7F88> when I ask to print it.
When I tried to Google search this problem, I saw this question: Load compressed data (.npz) from file using numpy.load. But my case looks different. I don't get an error message; I just get nothing. So what's wrong?
Thanks

How to fix [Errno13] permission denied when trying to read excel file?

I tried the following code to be able to read an excel file from my personal computer.
import xlrd
book = xlrd.open_workbook('C:\\Users\eline\Documents\***\***\Python', 'Example 1.xlsx')
But I am getting the error 'Permission denied'. I am using windows and if I look at the properties of the directory and look at the 'Security' tab I have three groups/users and all three have permissions for all the authorities, except for the last option which is called 'special authorities' (as far as I know I do not need this authority to read the excel file in Python).
I have no idea how to fix this error. Furthermore, I do not have the Excel file open on my computer when running the simulation.
I really hope someone can help me to fix this error.

Sometimes, it is because you try to read the Excel file while it is opened. Close the file in Excel and you are good to go.

book = xlrd.open_workbook('C:\\Users\eline\Documents\***\***\Python', 'Example 1.xlsx')
You cannot give path like this to xlrd. path need to be single string.
If you insist you can use os module
import os
book = xlrd.open_workbook(os.path.join('C:\\Users\eline\Documents\***\***\Python', 'Example 1.xlsx'))
[Errno13] permission denied in your case is happening because you want to read folder like a file which is not allowed.

I ran into this situation also while reading an Excel file into a data frame. To me it appears that it is a Python and/or Excel bug which we should probably not hide by using os.path.join even if that solves the problem. My situation involved an excel spreadsheet that links cells to another CSV file. If this excel file is freshly opened and open when I try to read it in python, it fails.
Python reads it correctly if I do an unnecessary save of the open Excel file.

How to get proper line breaks for code cells when converting iPython notebook to tex?

I am writing a documentation for a python application. For this I am using iPython notebook where I use the code and the markdown cells. I use pandoc to transform the notebook to a .tex document which I can easily convert to .pdf.
My problem is this:
Line breaks (or word wrap) does not seem to work for the code cells in the .tex document. While the content of the markdown cells is formatted nicely, the code from the code cells (As well as the output from this code) is running over the margins.
Any help would be greatly appreciated!

I've been searching for the same question. It looks like in python 2.7 you can add the following line to a file called custom.js:
IPython.Cell.options_default.cm_config.lineWrapping = true;
Custom.js is located in ~\Lib\site-packages\notebook\ or ~\Lib\site-packages\jupyter_core\
Note, however, that this isnt working for me yet. I will update here as soon as I get something working.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.