How to read csv data from kaggle in pycharm

How to read csv data from kaggle in pycharm - python

Hi there if anyone can answer this.
I am trying to read csv data from kaggle https://www.kaggle.com/stackoverflow/stack-overflow-2018-developer-survey in pycharm and online jupyter notebook but I can not find a command how to read.
I know how to read data if it is in my computer but not know from the online web. I will so grateful if anyone can help me in that.

From the page you linked to, you have a couple of options.
Create a notebook and the input files will be automatically included. Run the first cell that's generated for you and it will print out the paths to the input files. You can use Pandas read_csv in the notebook to load the data using those paths.
Expand the input folder in the Data pane (top right of the notebook), click on the file you want and look for the Download link at the top right of the data grid.

Related

Jupyter Notebook issue

I ran some commands on Jupyter Notebook and expected to get a printed output containing data in tabulated form in a .csv file, but then i get an uncompleted output
This is the result i get from the .csv file
I ran this command;
df1=pandas.read_csv("supermarkets.csv", on_bad_lines='skip')
df1
I expected to get a printed output in a tabulated like in the image attached......
The data get printed in well tabulated form here
Here is a link to the online version of the file
[pythonhow.com/supermarkets.csv]

Getting good, clean quality data where the file extension correctly matches the actual content is often a challenge. Assessing the state of the input data is generally always a very important first step.
It appears the data you are trying to get is also online here. Github will render that as a table in the browser because it has a viewer mode. To look at the 'raw' file content, click here. You'll see it is nice comma-delimited file with columns separated by commas and rows each on a different line. The header with the column names is on the first line.
Now open in a good text editor the file you have that you are working with and compare it to the content I pointed you at. That should guide you on what is the issue.
At this point you may just wish to switch to using the version of the file that I pointed you at.
Use the link below to obtain it as proper csv file:
https://raw.githubusercontent.com/kenvilar/data-analysis-using-python/master/supermarkets.csv
You should be able to paste that link in your browser and then right click on the page and choose 'Save as..' to download it to your locak machine. The obtained file should open just fine using the code you showed in the screenshot in your post here.
Please work on writing better questions with specific titles, see here for guidance. The title at present is overly broad and is actually not accurate. This code would not work with the data you apparently have even if you were running it inside a Python code-based script. And so it is not a Jupyter notebook issue. For how to think about making it specific, a good thing to keep in mind is to write for your future self. If you continue to use notebooks you'll have hundreds that would be considered a 'Jupyter Notebook issue', but what makes this issue different from those?

I believe there is an issue with your csv file, not the code.
To me it looks like the data in your csv file are written in json format.
Have you opened the supermarkets.csv file using excel? it should look like a table, not a json formatted file.

did you try df1.show() to see if the csv got read in the first place?

How to open .ipynb pages as normal jupyter notebooks

I am learning python using a course. The course material can be found on the links like the following one:
http://faculty.washington.edu/sbrunton/me564/python/Python_Introduction.ipynb
I'd like to have the jupyter notebook when I go to the link but it shows the raw python file. How can I export the jupyter notebook from such links?
Thanks in advance for any help.

You can just open an already created jupyter notebook (the file with .ipynb) in a notepad and replace its text from the text in your link.
Steps
Create a totally new jupyter notebook project.
Go to the file location and open it with notepad
Remove all the content from the notepad
Replace it with the content in your link https://faculty.washington.edu/sbrunton/me564/python/Python_Introduction.ipynb
Save the notepad and close it.
Open the same file as a notebook using Jupyter notebook or Google Colab

You can copy the raw content and paste to local new file. File extension should
be .ipynb .Then you can open in jupyter lab or notebook.

Go to nbviewer.org, paste in the URL, and press 'Go!'. You'll then be redirected to here which is a page that has the following URL:
https://nbviewer.org/url/faculty.washington.edu/sbrunton/me564/python/Python_Introduction.ipynb
At that URL is the notebook rendering you seek. (nbviewer will even display some 'interactive' items such as Plotly plots and animated matplotlib plots backed by frames, examples here and here, respectively.)
Right-clicking the download icon in the upper right side of the notebook rendering there and selecting Save link As... will allow you to save the .ipynb file to your local machine. (You can do similar from the original page link, but there you have to edit the name. No editing of the name necessary this way for your link!)
If you examine the URL generated by the form, you'll note that there is a pattern based on what you provided. And so you could just change the original portion of the link you provided from http://... to https://nbviewer.org/url/... and go to the notebook rendering directly without the step of filling out the form.
If the page had been hosted at GitHub or another repository that MyBinder.org can use, you'd have in the upper right corner an additional icon looking like three rings on the nbviewer rendered page that could be clicked to open it as an active Jupyter notebook right in your browser without needing to login as it would be served vis MyBinder.org. The pages I link to for the Plotly plots and animation have this icon as an option.

openpyxl corrupts spreadsheet if it contains a data source

I use openpyxl to interact with Excel files using Python 3.7. I open and save my .xlsx spreadsheets as follows:
from openpyxl import load_workbook
wb.load_workbook('file.xlsx', read_only=False)
wb.save('file.xlsx')
If file.xlsx contains no links to external data sources (such as SQL Server or Postgre-SQL), then there is no problem with the saved file and it opens okay in Excel after being processed by my Python script.
However, if file.xlsx does contain a link to external data, then upon executing the above script, the output file is now corrupted. When opening the file in Excel, the following error is reported and I have the option of attempting to recover it. When recovering, the data remains but all links to the data source are gone.
> We found a problem with some content in file.xlsx. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
It is easy to reproduce this error as follows:
Create a blank spreadsheet and save it as file.xlsx.
Run the above three lines of Python code to open and save the file. You will see this works fine and has no impact on the spreadsheet.
Now open file.xlsx in Excel and, from the Data tab, choose a data source. You can choose any data source (link to a csv file, a table within Excel, or an external data source - it doesn't matter).
Save the spreadsheet, then run the above Python script (which again, simply opens and saves it).
Open file.xlsx in Excel. You will see that it is now corrupted.
My conclusion is that, at the moment, openpyxl doesn't support spreadsheets that contain links to external data. It would be useful to have this confirmed, or for a workaround to the above issue to be proposed.
Thanks!!

How do I fix a .ipynb file?

I use Jupyter Notebook with Python. I'm not a programmer but I've been learning Python for about a year now.
I was working with some text file that I saved on the same folder of my notebooks, and I accidentally opened a .ipynb file and altered it.
As far as I can tell, I just pasted a text string. I know what I pasted, and I erased it, but now jupyter notebook can't recognize the file. Message is:
Unreadable Notebook: C:\Users\untal\Python\notas analyser.ipynb
NotJSONError('Notebook does not appear to be JSON: \'\\ufeff{\\n "cells": [\\n {\\n "cell_typ...',)
I'm not even close to be able to understand the text file to look for the problem and fix it... I don't even know if that's an option.
Is there any tool or method I can use to recover my notebook?

A possible way to recover corrupted Jupyter notebook files, whether it contains text or not (size = 0KB), is to go to the project folder and display the hidden files.
Once the hidden files are displayed if you are lucky you will see a folder named '.ipynb_checkpoints'.
Open this folder and you should find your notebook.

Using Pycharm worked for me. I wasn't able to actually fix the file, so I had to copy one by one each of the original file's cells to a functional file that I created in python and then opened with Pycharm... After each cell copied, I opened the file with Jupyter to check and fix any problems (going back to Pycharm). Pretty sure is not an optimal solution, but I could save all of my work, so it was an effective solution that can be used by begginers!

The .ipynb file is a JSON file you can try to correct its syntax in online JSON editors. There are many if you look on google. (for example: https://jsoneditoronline.org/#)
Once you have a working JSON file run this text on another notebook to print the cells code.
import json
with open('./1-day.txt', 'r') as f:
data = json.load(f)
for cell in data['cells']:
if 'source' in cell:
[print(i, end='') for i in cell['source'] ]
print('\n#')

XML to CSV/Excel

I have an RSS formatted XML file - what I usually do is import that into excel using the PC developer tools. That fancy'ness creates a tree for me automatically and I simply drag and drop the root element onto the spreadsheet, hit refresh data and boom I have a CSV or excel file that I can do any number of things with that I could do with the raw RSS file.
I'd like to skip this step of going to excel on PC etc and use something like python to get the job done on my mac. Problem is I don't want to have to tell phyon the tree, elements etc I want it to figure it out and give me a CSV! -
Any guidance on how I might be able to accomplish this task?

XML2Json actually worked out OK
xml2json --input "/Users/me/Downloads/file.xml" --output "file_2.json
There's some formatting issues in terms of headers but I can clean that up.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.