Jupyter Notebook issue - python

I ran some commands on Jupyter Notebook and expected to get a printed output containing data in tabulated form in a .csv file, but then i get an uncompleted output
This is the result i get from the .csv file
I ran this command;
df1=pandas.read_csv("supermarkets.csv", on_bad_lines='skip')
df1
I expected to get a printed output in a tabulated like in the image attached......
The data get printed in well tabulated form here
Here is a link to the online version of the file
[pythonhow.com/supermarkets.csv]

Getting good, clean quality data where the file extension correctly matches the actual content is often a challenge. Assessing the state of the input data is generally always a very important first step.
It appears the data you are trying to get is also online here. Github will render that as a table in the browser because it has a viewer mode. To look at the 'raw' file content, click here. You'll see it is nice comma-delimited file with columns separated by commas and rows each on a different line. The header with the column names is on the first line.
Now open in a good text editor the file you have that you are working with and compare it to the content I pointed you at. That should guide you on what is the issue.
At this point you may just wish to switch to using the version of the file that I pointed you at.
Use the link below to obtain it as proper csv file:
https://raw.githubusercontent.com/kenvilar/data-analysis-using-python/master/supermarkets.csv
You should be able to paste that link in your browser and then right click on the page and choose 'Save as..' to download it to your locak machine. The obtained file should open just fine using the code you showed in the screenshot in your post here.
Please work on writing better questions with specific titles, see here for guidance. The title at present is overly broad and is actually not accurate. This code would not work with the data you apparently have even if you were running it inside a Python code-based script. And so it is not a Jupyter notebook issue. For how to think about making it specific, a good thing to keep in mind is to write for your future self. If you continue to use notebooks you'll have hundreds that would be considered a 'Jupyter Notebook issue', but what makes this issue different from those?

I believe there is an issue with your csv file, not the code.
To me it looks like the data in your csv file are written in json format.
Have you opened the supermarkets.csv file using excel? it should look like a table, not a json formatted file.

did you try df1.show() to see if the csv got read in the first place?

Related

How to read csv data from kaggle in pycharm

Hi there if anyone can answer this.
I am trying to read csv data from kaggle https://www.kaggle.com/stackoverflow/stack-overflow-2018-developer-survey in pycharm and online jupyter notebook but I can not find a command how to read.
I know how to read data if it is in my computer but not know from the online web. I will so grateful if anyone can help me in that.
From the page you linked to, you have a couple of options.
Create a notebook and the input files will be automatically included. Run the first cell that's generated for you and it will print out the paths to the input files. You can use Pandas read_csv in the notebook to load the data using those paths.
Expand the input folder in the Data pane (top right of the notebook), click on the file you want and look for the Download link at the top right of the data grid.

Error when using Writer.Close() function within my Pandas and Openpyxl code

I have written a code which combines some CSV files into a single Excel file, and ended the 'writer' with the code:
writer.save()
writer.close()
However, I get the following error when trying to then open that file after the code has finalised:
We found a problem with some content in 'the file.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.'
This seems to purely be related to the 'Writer.Close()' aspect, as without it I don't get the error. However, instead I cannot open the file as it states that someone else is using it (ie - openpyxl)
I'm not sure if relevant, but my file system runs on a OneDrive cloud based system.
My current plan beyond the 'writer.close()' is to pause the script to allow me to print the excel to PDF (I found this to be unreliable via Python), and then 'hit continue' to continue with exporting the PDF via Email.
Any ideas on how to resolve this error?
With out seeing more of your code and maybe an example of the data you are writing it's tough to make any assumptions. Based on the error you are experiencing it is likely due to the inputs/data going into the actual xlsx file that is causing the issue and not with the actual 'writer'. This is Excel saying that data in your file is 'corrupted' from their standards perspective and needs to be fixed.
You should be able to do a 'recovery' of the file through excel and it will identify the problem spots in your file which you can then back track into your python program and properly address to eliminate the probelm.

How do I fix a .ipynb file?

I use Jupyter Notebook with Python. I'm not a programmer but I've been learning Python for about a year now.
I was working with some text file that I saved on the same folder of my notebooks, and I accidentally opened a .ipynb file and altered it.
As far as I can tell, I just pasted a text string. I know what I pasted, and I erased it, but now jupyter notebook can't recognize the file. Message is:
Unreadable Notebook: C:\Users\untal\Python\notas analyser.ipynb
NotJSONError('Notebook does not appear to be JSON: \'\\ufeff{\\n "cells": [\\n {\\n "cell_typ...',)
I'm not even close to be able to understand the text file to look for the problem and fix it... I don't even know if that's an option.
Is there any tool or method I can use to recover my notebook?
A possible way to recover corrupted Jupyter notebook files, whether it contains text or not (size = 0KB), is to go to the project folder and display the hidden files.
Once the hidden files are displayed if you are lucky you will see a folder named '.ipynb_checkpoints'.
Open this folder and you should find your notebook.
Using Pycharm worked for me. I wasn't able to actually fix the file, so I had to copy one by one each of the original file's cells to a functional file that I created in python and then opened with Pycharm... After each cell copied, I opened the file with Jupyter to check and fix any problems (going back to Pycharm). Pretty sure is not an optimal solution, but I could save all of my work, so it was an effective solution that can be used by begginers!
The .ipynb file is a JSON file you can try to correct its syntax in online JSON editors. There are many if you look on google. (for example: https://jsoneditoronline.org/#)
Once you have a working JSON file run this text on another notebook to print the cells code.
import json
with open('./1-day.txt', 'r') as f:
data = json.load(f)
for cell in data['cells']:
if 'source' in cell:
[print(i, end='') for i in cell['source'] ]
print('\n#')

Having a trouble with table reading on pandas

I'm really new to programming and I've been trying to emulate the 'pandas.read_table' code from Python for Data Analysis book(the chapter on MovieLens 1M Data Set, pg.23ish). Below is the link to the file used for database and the images of jupyter notebook on which I've typed the codes. As you'll see there, I'm having a trouble with the data values not reading properly as it should, and I can't seem to figure out why. Your help will be much appreciated!
Trouble screen
Database file
If you are reading data from a .csv file, use pd.read_csv.
If you want to use pd.read_table, you have to specify the delimiter as the comma with the argument sep=','. What is happening is that pd.read_table is trying to separate your input information at every ::, but it looks like your data is separated by commas instead.
More information here:
http://pandas.pydata.org/pandas-docs/stable/io.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html

Python XLWT: Excel generated by Python xlwt contains missing value

I'm quite new to Python and trying to fetch data in HTML and saved to excels using xlwt.
So far the program seems work well (all the output are correctly printed on the python console when running the program) except that when I open the excel file, an error message saying 'We found a problem with some content in FILENAME, Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.' And after I click Yes, I found that a lot of data fields are missing.
It seems that roughly the first 150 lines are fine and the problem begins to rise after that (In total around 15000 lines). And missing data fields concentrate at several columns with relative high data volume.
I'm thinking if it's related to sort of cache allocating mechanism of xlwt?
Thanks a lot for your help here.
seems like a caching issue.
Try sheet.flush_row_data() every 100 rows or so ?

Categories

Resources