Pandas code running but not doing anything - python

Below is the code:
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer
dataset=pd.read_excel('file_path')
sia=SentimentIntensityAnalyzer()
dataset['polarity scores']=dataset['column_title'].apply(lambda x: sia.polarity_scores(str(x))['compound'])
print("done")
I would like it to take the excel file named/located file_path and give me polarity scores for the text in the column entitled column title but I'm not sure what I'm doing wrong.The code runs without any errors but it does not edit the excel file at all

You forgot to save your file
use dataset.to_excel('output_file_path.xlsx') to save your file
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer
dataset=pd.read_excel('file_path')
sia=SentimentIntensityAnalyzer()
dataset['polarity scores']=dataset['column_title'].apply(lambda x: sia.polarity_scores(str(x))['compound'])
dataset.to_excel('output_file_path.xlsx', index = False) # save file
print("done")

Related

How to import and use my own function from .py file in Python Pandas?

In Jupyter Notebook I created my own function in my_fk.py file like below:
import pandas as pd
def missing_val(df):
df= pd.DataFrame(df.dtypes, columns=["type"])
df["missing"] = pd.DataFrame(df.isna().any())
df["sum_miss"] = pd.DataFrame(df.isna().sum())
df["perc_miss"] = round((df.apply(pd.isna).mean()*100),2)
return df
Then when I try to import and run my function using below code:
import pandas as pd
import numpy as np
import my_fk as fk
df = pd.read_csv("my_data.csv")
fk.missing_val(df)
I have error like below. Error suggests that in my my_fk.py file there is no pandas as pd, but there IS line with code "import pandas as pd". How can I import and use my own function from python file ?
NameError: name 'pd' is not defined
Missing "as". Then place your pd.read_csv() after importing pandas, not before
import pandas as pd
import numpy as np
import my_fk as fk
df = pd.read_csv("my_data.csv")
fk.missing_val(df)

How to import the json file in python

I am trying to import a json file to python and then export is to an excel file using the following code:
import pandas as pd
df = pd.read_json('pub_settings.json')
df.to_excel('pub_settings.xlsx')
but i am getting the following error:
can anyone please tell me what i am doing wrong?
First import json file as a dictionary using following code:-
import json
with open("") as f:
data = json.load(f)
Then you can use following link to convert it to xlsx:-
https://pypi.org/project/tablib/0.9.3/

Create a pandas dataframe from a qrc resource file

I would like to save a CSV file into a qrc file and than read it putting its contents in a pandas dataframe, but I have some problems.
I created a qrc file called res.qrc:
<!DOCTYPE RCC><RCC version="1.0">
<qresource>
<file>dataset.csv</file>
</qresource>
</RCC>
I compiled it obtaining the res_rc.py file.
To read it I created a python script called resource.py:
import pandas as pd
import res_rc
from PySide.QtCore import *
file = QFile(":/dataset.csv")
df = pd.read_csv(file.fileName())
print(df)
But I obtain the error: IOError: File :/dataset.csv does not exist
All the files (resource.py, res.qrs, res_rc.py, dataset.csv) are in the same folder.
If I do res_rc.qt_resource_data I can see the contents.
How can I create the pandas dataframe?
The qresource is a virtual path that only Qt knows how to obtain it and can change internally without warnings, in these cases what must be done is to read all the data and convert it into a stream with io.BytesIO
import io
import pandas as pd
from PySide import QtCore
import res_rc
file = QtCore.QFile(":/dataset.csv")
if file.open(QtCore.QIODevice.ReadOnly):
f = io.BytesIO(file.readAll().data())
df = pd.read_csv(f)
print(df)

How to load a json file in jupyter notebook using pandas?

I am trying to load a json file in my jupyter notebook
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as plt
import json
%matplotlib inline
with open("pud.json") as datafile:
data = json.load(datafile)
dataframe = pd.DataFrame(data)
I am getting the following error
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Please help
If you want to load a json file use pandas.read_json.
pandas.read_json("pud.json")
This will load the json as a dataframe.
The function usage is as shown below
pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, lines=False, chunksize=None, compression='infer')
You can get more information about the parameters here
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html
Another way using json!
import pandas as pd
import json
with open('File_location.json') as f:
data = json.load(f)
df=pd.DataFrame(data)
with open('pud.json', 'r') as file:
variable_name = json.load(file)
The json file will be loaded as python dictionary.
This code you are writing here is completely okay . The problem is the .json file that you are loading is not a JSON file. Kindly check that file.

Downloading data from two Worksheets of a URL with an Excel File

I am looking to gather all the data from the penultimate worksheet in this Excel file along with all the data in the last Worksheet from "Maturity Years" of 5.5 onward. The code I have below currently grabs data from solely the last workbook and I was wondering what the necessary alterations would be.
import urllib2
import pandas as pd
import os
import xlrd
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
socket = urllib2.urlopen(url)
xd = pd.ExcelFile(socket)
df = xd.parse(xd.sheet_names[-1], header=None)
print df
I was thinking of using glob but I haven't seen any application of it with an Online Excel file.
Edit: I think the following allows me to combine two worksheets of data into a single Dataframe. However, if there is a better answer please feel free to show it.
import urllib2
import pandas as pd
import os
import xlrd
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
socket = urllib2.urlopen(url)
xd = pd.ExcelFile(socket)
df1 = xd.parse(xd.sheet_names[-1], header=None)
df2 = xd.parse(xd.sheet_names[-2], header=None)
bigdata = df1.append(df2,ignore_index = True)
print bigdata

Categories

Resources