Python:Pandas dataframe object unable to convert to string - python

I would like to get the string "A" instead of object "A"
>>> comp
1
0
marketCapitalization 27879.5
name Agilent Technologies Inc
exchange NEW YORK STOCK EXCHANGE, INC.
country US
weburl https://www.agilent.com/
ipo 1999-11-18
phone 14083458886
currency USD
logo https://static.finnhub.io/logo/5f1f8412-80eb-1...
ticker A
marketCapitalization 27879.5
finnhubIndustry Life Sciences Tools & Services
shareOutstanding 308.777
>>> comp.loc['ticker']
1 A
Name: ticker, dtype: object
I am trying comp.loc['ticker'].astype(str) but still return an object. I need it to show "A" only

Maybe try to select the first element with .iloc[0]:
comp.loc['ticker'].iloc[0]

Related

Create new column based on value of another column

I have a solution below to give me a new column as a universal identifier, but what if there is additional data in the NAME column, how can I tweak the below to account for a wildcard like search term?
I want to basically have so if German/german or Mexican/mexican is in that row value then to give me Euro or South American value in new col
df["Identifier"] = (df["NAME"].str.lower().replace(
to_replace = ['german', 'mexican'],
value = ['Euro', 'South American']
))
print(df)
NAME Identifier
0 German Euro
1 german Euro
2 Mexican South American
3 mexican South American
Desired output
NAME Identifier
0 1990 German Euro
1 german 1998 Euro
2 country Mexican South American
3 mexican city 2006 South American
Based on an answer in this post:
r = '(german|mexican)'
c = dict(german='Euro', mexican='South American')
df['Identifier'] = df['NAME'].str.lower().str.extract(r, expand=False).map(c)
Another approach would be using np.where with those two conditions, but probably there is a more ellegant solution.
below code will work. i tried it using apply function but somehow can't able to get it. probably in sometime. meanwhile workable code below
df3['identifier']=''
js_ref=[{'german':'Euro'},{'mexican':'South American'}]
for i in range(len(df3)):
for l in js_ref:
for k,v in l.items():
if k.lower() in df3.name[i].lower():
df3.identifier[i]=v
break

Finding company matches in a list of financial news

I have a dataframe with company ticker("ticker"), full name ("longName) and short name ("unofficial_name") - this abridged name is created from the long name by removing inc., plc...
I also have a seperate datefame with company news: date ("date" ) of the news, headline ("name"), news text ("text") and sentiment analysis.
I am trying to find company name matches in the list of articles and create a new dataframe with unique company-article matches (i.e. if one article mentions more than one company, this article would have more rows depending on the number of companies mentioned).
I tried to execute the matching based on the "unofficial_name" with the following code:
dict=[]
for n, c in zip(df_news["text"], sp500_names["unofficial_name"]):
if c in n:
x = {"text":n, "unofficial_name":c}
dict.append(x)
print(dict)
But I get an empty list returned. Any ideas how to solve it?
sp500_names
ticker longName unofficial_name
0 A Agilent Technologies, Inc. Agilent Technologies
1 AAL American Airlines Group Inc. American Airlines Group
df_news
name date text neg neu pos compound
0 Asian stock markets reverse losses on global p... 2020-03-01 [By Tom Westbrook and Swati Pandey SINGAPORE (... 0.086 0.863 0.051 -0.9790
1 Energy & Precious Metals - Weekly Review and C... 2020-03-01 [By Barani Krishnan Investing.com - How much ... 0.134 0.795 0.071 -0.9982
Thank you!

Scrape Embedded Google Sheet from HTML in Python

This one has been relatively tricky for me. I am trying to extract the embedded table sourced from google sheets in python.
Here is the link
I do not own the sheet but it is publicly available.
here is my code thus far, when I go to output the headers it is showing me "". Any help would be greatly appreciated. End goal is to convert this table into a pandas DF. Thanks guys
import lxml.html as lh
import pandas as pd
url = 'https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQ--HR_GTaiv2dxaVwIwWYzY2fXTSJJN0dugyQe_QJnZEpKm7bu5o7eh6javLIk2zj0qtnvjJPOyvu2/pubhtml/sheet?headers=false&gid=1503072727'
page = requests.get(url)
doc = lh.fromstring(page.content)
tr_elements = doc.xpath('//tr')
col = []
i = 0
for t in tr_elements[0]:
i +=1
name = t.text_content()
print('%d:"%s"'%(i,name))
col.append((name,[]))
Well if you would like to get the data into a DataFrame, you could load it directly:
df = pd.read_html('https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQ--HR_GTaiv2dxaVwIwWYzY2fXTSJJN0dugyQe_QJnZEpKm7bu5o7eh6javLIk2zj0qtnvjJPOyvu2/pubhtml/sheet?headers=false&gid=1503072727',
header=1)[0]
df.drop(columns='1', inplace=True) # remove unnecessary index column called "1"
This will give you:
Target Ticker Acquirer \
0 Acacia Communications Inc Com ACIA Cisco Systems Inc Com
1 Advanced Disposal Services Inc Com ADSW Waste Management Inc Com
2 Allergan Plc Com AGN Abbvie Inc Com
3 Ak Steel Holding Corp Com AKS Cleveland Cliffs Inc Com
4 Td Ameritrade Holding Corp Com AMTD Schwab (Charles) Corp Com
Ticker.1 Current Price Take Over Price Price Diff % Diff Date Announced \
0 CSCO $68.79 $70.00 $1.21 1.76% 7/9/2019
1 WM $32.93 $33.15 $0.22 0.67% 4/15/2019
2 ABBV $197.05 $200.22 $3.17 1.61% 6/25/2019
3 CLF $2.98 $3.02 $0.04 1.34% 12/3/2019
4 SCHW $49.31 $51.27 $1.96 3.97% 11/25/2019
Deal Type
0 Cash
1 Cash
2 C&S
3 Stock
4 Stock
Note read_html returns a list. In this case there is only
1 DataFrame, so we can refer to the first and only index location [0]

Python Pandas - Creating a new column using currency_converter

I have a dataframe (dfFF) like this:
Sector Country Currency Amount Fund Start Year
0 Public USA USD 22000 2016
0 Private Hong Kong HKD 42000 2015
...
I want to create a new column that converts the currency/amount/fund start year into Euros and then to GBP (currency_converter only converts to everything to Euros or back hence why I am not converting straight to GBP). I want the currency rate for the year that the funding took place.
I am using the template code given by the website:
https://pypi.org/project/CurrencyConverter/
from currency_converter import CurrencyConverter
from datetime import date
c = CurrencyConverter()
c.convert(100, 'EUR', 'USD', date=date(2013, 3, 21))
I want to use the 1st of January for every year to make it consistent, so I have tried doing the following:
c = CurrencyConverter()
dfFF['Value'] = (c.convert(dfFF['Amount'],dfFF['Currency'],
'EUR',date=date(dfFF['Fund Start Year'],1,1)))
I am getting the error:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
Although I feel that my solution isnt the best way of doing this.
Any suggestions? Even if I just get it into EUROS, and then I can do the same to convert it to GBP would be great.
Thank you
You have create a function for converting currency using data of a row:
def currency_convertor(row):
amount = row['amount']
curr = row['Currency'],
date_r = row['Fund Start Year']
new_curr = c.convert(amount,curr,'EUR',date=date(date_r,1,1))
return new_curr
and then apply it to dataframe:
dfFF['EUR_new'] = dfFF.apply(currency_convert, axis=1) # make sure to set axis=1
General Version
def currency_convertor(row,new_currency='EUR'):
amount = row['amount']
curr = row['Currency']
date_r = row['Fund Start Year']
new_curr = c.convert(amount,curr,new_currency,date=date(date_r,1,1))
return new_curr
dfFF['new_currency'] = dfFF.apply(lambda x: currency_convertor(x,new_currency="GBP"), axis=1)

How python can read and modify the CSV

**iShares Russell 3000 ETF
Inception Date May 22, 2000
Fund Holdings as of 31-oct-16
Total Net Assets $ 6,308,266,677
Shares Outstanding 48,550,000
Stock -
Bond -
Cash -
Other -**
 
Ticker Name Asset Class Weight (%) Price Shares Market Value Notional Value Sector SEDOL ISIN Exchange
AAPL APPLE INC Equity 2.8074 113.54 1,521,794 $ 172,784,491 172,784,490.76 Information Technology 2046251 US0378331005 NASDAQ
MSFT MICROSOFT CORP Equity 2.0474 59.92 2,103,008 $ 126,012,239 126,012,239.36 Information Technology 2588173 US5949181045 NASDAQ
XOM EXXON MOBIL CORP Equity 1.5675 83.32 1,157,835 $ 96,470,812 96,470,812.20 Energy 2326618 US30231G1022 New York Stock Exchange Inc.
JNJ JOHNSON & JOHNSON Equity 1.4378 115.99 762,927 $ 88,491,903 88,491,902.73 Health Care 2475833 US4781601046 New York Stock Exchange Inc.
I am trying to delete the basic info and read just the structured data below, Pandas read csv is throwing error.
Any help will be really helpful.
I assume that you want to discard the non-csv header, and that your file is tab separated afterwards (it cannot be space or it isn't a parsable csv file due to the lack of quotes and presence of space in data)
In that case you could iterate until you find the title, and then create a csv reader on the file, like this:
import csv
with open("input.csv") as f:
while(True):
l=next(f) # read input file lines
if not l or l.startswith("Ticker"): # title found (or end of file, safety)
break
cr = csv.reader(f,delimiter="\t")
for row in cr:
print(row) # get valid rows

Categories

Resources