The yfinance.Ticker module describes how to initialize multiple Ticker objects with the following code.
import yfinance as yf
tickers = yf.Tickers('msft aapl goog')
# ^ returns a named tuple of Ticker objects
# access each ticker using (example)
tickers.msft.info
tickers.aapl.history(period="1mo")
tickers.goog.actions
The code results in the following error
tickers.aapl.history(period="1mo")
# Error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-3111e4668e75> in <module>
----> 1 tickers.aapl.history(period="1mo")
AttributeError: 'Tickers' object has no attribute 'aapl'
tickers.goog.actions
# Error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-47f3d4a48536> in <module>
----> 1 tickers.goog.actions
AttributeError: 'Tickers' object has no attribute 'goog'
How can this issue be resolved, because this is the exact code from the yfinance GitHub Repo?
I'm pretty sure it's a mistake in the documentation.
tickers is the object holding multiple Ticker objects.
The README is missing the correct method call, which is also tickers.
Additionally, the ticker symbols data is accessed with uppercase letters as shown in the following code.
TAB completion at the . will show the correct available methods
See AttributeError: 'Tickers' object has no attribute 'msft' #407 and Update README.rst #408 for future details.
Working Code
The following code produces the expected output
import yfinance as yf
tickers = yf.Tickers('msft aapl goog')
print(tickers.tickers.GOOG.actions)
print(tickers.tickers.AAPL.history(period="1mo"))
print(tickers.tickers.MSFT.info)
tickers.tickers.GOOG.actions
Dividends Stock Splits
Date
2014-03-27 0.0 2.002
2015-04-27 0.0 1.000
tickers.tickers.AAPL.history(period="1mo")
Open High Low Close Volume Dividends Stock Splits
Date
2020-07-06 370.00 375.78 369.87 373.85 29663900 0 0
2020-07-07 375.41 378.62 372.23 372.69 28106100 0 0
2020-07-08 376.72 381.50 376.36 381.37 29273000 0 0
2020-07-09 385.05 385.27 378.69 383.01 31410700 0 0
2020-07-10 381.34 383.92 378.82 383.68 22564300 0 0
2020-07-13 389.06 399.82 381.03 381.91 47912300 0 0
2020-07-14 379.36 389.02 375.51 388.23 42747300 0 0
2020-07-15 395.96 396.99 385.96 390.90 38299500 0 0
2020-07-16 386.25 389.62 383.62 386.09 27644400 0 0
2020-07-17 387.95 388.59 383.36 385.31 23046700 0 0
2020-07-20 385.67 394.00 384.25 393.43 22579500 0 0
2020-07-21 396.69 397.00 386.97 388.00 25911500 0 0
2020-07-22 386.77 391.90 386.41 389.09 22250400 0 0
2020-07-23 387.99 388.31 368.04 371.38 49251100 0 0
2020-07-24 363.95 371.88 356.58 370.46 46359700 0 0
2020-07-27 374.84 379.62 373.92 379.24 30303500 0 0
2020-07-28 377.47 378.20 372.99 373.01 25906400 0 0
2020-07-29 375.00 380.92 374.85 380.16 22582300 0 0
2020-07-30 376.75 385.19 375.07 384.76 39532500 0 0
2020-07-31 411.54 425.66 403.30 425.04 93584200 0 0
2020-08-03 432.80 446.55 431.57 435.75 76955100 0 0
2020-08-04 436.53 443.16 433.56 436.99 35017345 0 0
tickers.tickers.MSFT.info
{'52WeekChange': 0.55063903,
'SandP52WeekChange': 0.14325917,
'address1': 'One Microsoft Way',
'algorithm': None,
'annualHoldingsTurnover': None,
'annualReportExpenseRatio': None,
'ask': 211.39,
'askSize': 1100,
'averageDailyVolume10Day': 31499250,
'averageVolume': 34679039,
'averageVolume10days': 31499250,
'beta': 0.933333,
'beta3Year': None,
'bid': 211.37,
'bidSize': 900,
'bookValue': 15.626,
'category': None,
'circulatingSupply': None,
'city': 'Redmond',
'companyOfficers': [],
'country': 'United States',
'currency': 'USD',
'dateShortInterest': 1594771200,
'dayHigh': 214.77,
'dayLow': 210.31,
'dividendRate': 2.04,
'dividendYield': 0.0094,
'earningsQuarterlyGrowth': -0.151,
'enterpriseToEbitda': 24.277,
'enterpriseToRevenue': 11.078,
'enterpriseValue': 1584317595648,
'exDividendDate': 1597795200,
'exchange': 'NMS',
'exchangeTimezoneName': 'America/New_York',
'exchangeTimezoneShortName': 'EDT',
'expireDate': None,
'fax': '425-706-7329',
'fiftyDayAverage': 203.61559,
'fiftyTwoWeekHigh': 216.38,
'fiftyTwoWeekLow': 130.78,
'fiveYearAverageReturn': None,
'fiveYearAvgDividendYield': 1.87,
'floatShares': 7456408437,
'forwardEps': 7.34,
'forwardPE': 28.803814,
'fromCurrency': None,
'fullTimeEmployees': 163000,
'fundFamily': None,
'fundInceptionDate': None,
'gmtOffSetMilliseconds': '-14400000',
'heldPercentInsiders': 0.014249999000000001,
'heldPercentInstitutions': 0.7409300000000001,
'industry': 'Software—Infrastructure',
'isEsgPopulated': False,
'lastCapGain': None,
'lastDividendValue': None,
'lastFiscalYearEnd': 1593475200,
'lastMarket': None,
'lastSplitDate': 1045526400,
'lastSplitFactor': '2:1',
'legalType': None,
'logo_url': 'https://logo.clearbit.com/microsoft.com',
'longBusinessSummary': 'Microsoft Corporation develops, licenses, and '
'supports software, services, devices, and solutions '
'worldwide. Its Productivity and Business Processes '
'segment offers Office, Exchange, SharePoint, '
'Microsoft Teams, Office 365 Security and Compliance, '
'and Skype for Business, as well as related Client '
'Access Licenses (CAL); Skype, Outlook.com, and '
'OneDrive; LinkedIn that includes Talent and marketing '
'solutions, and subscriptions; and Dynamics 365, a set '
'of cloud-based and on-premises business solutions for '
'small and medium businesses, large organizations, and '
'divisions of enterprises. Its Intelligent Cloud '
'segment licenses SQL and Windows Servers, Visual '
'Studio, System Center, and related CALs; GitHub that '
'provides a collaboration platform and code hosting '
'service for developers; and Azure, a cloud platform. '
'It also provides support services and Microsoft '
'consulting services to assist customers in '
'developing, deploying, and managing Microsoft server '
'and desktop solutions; and training and certification '
'to developers and IT professionals on various '
'Microsoft products. Its More Personal Computing '
'segment offers Windows OEM licensing and other '
'non-volume licensing of the Windows operating system; '
'Windows Commercial, such as volume licensing of the '
'Windows operating system, Windows cloud services, and '
'other Windows commercial offerings; patent licensing; '
'Windows Internet of Things; and MSN advertising. It '
'also provides Microsoft Surface, PC accessories, and '
'other devices; Gaming, including Xbox hardware, and '
'Xbox software and services; video games and '
'third-party video game royalties; and Search, '
'including Bing and Microsoft advertising. It sells '
'its products through distributors and resellers; and '
'directly through digital marketplaces, online stores, '
'and retail stores. It has strategic partnerships with '
'Humana Inc., Nokia, Telkomsel, Swiss Re, Kubota '
'Corporation, FedEx Corp., and Hitachi. The company '
'was founded in 1975 and is headquartered in Redmond, '
'Washington.',
'longName': 'Microsoft Corporation',
'market': 'us_market',
'marketCap': 1599952519168,
'maxAge': 1,
'maxSupply': None,
'messageBoardId': 'finmb_21835',
'morningStarOverallRating': None,
'morningStarRiskRating': None,
'mostRecentQuarter': 1593475200,
'navPrice': None,
'netIncomeToCommon': 44280999936,
'nextFiscalYearEnd': 1656547200,
'open': 214.17,
'openInterest': None,
'payoutRatio': 0.34550000000000003,
'pegRatio': 2.23,
'phone': '425-882-8080',
'previousClose': 216.54,
'priceHint': 2,
'priceToBook': 13.530653,
'priceToSalesTrailing12Months': 11.187305,
'profitMargins': 0.30962,
'quoteType': 'EQUITY',
'regularMarketDayHigh': 214.77,
'regularMarketDayLow': 210.31,
'regularMarketOpen': 214.17,
'regularMarketPreviousClose': 216.54,
'regularMarketPrice': 214.17,
'regularMarketVolume': 35978270,
'revenueQuarterlyGrowth': None,
'sector': 'Technology',
'sharesOutstanding': 7567649792,
'sharesPercentSharesOut': 0.0053,
'sharesShort': 39894144,
'sharesShortPreviousMonthDate': 1592179200,
'sharesShortPriorMonth': 42930465,
'shortName': 'Microsoft Corporation',
'shortPercentOfFloat': 0.0053,
'shortRatio': 1.19,
'startDate': None,
'state': 'WA',
'strikePrice': None,
'symbol': 'MSFT',
'threeYearAverageReturn': None,
'toCurrency': None,
'totalAssets': None,
'tradeable': False,
'trailingAnnualDividendRate': 2.04,
'trailingAnnualDividendYield': 0.009420892,
'trailingEps': 5.76,
'trailingPE': 36.70486,
'twoHundredDayAverage': 179.41402,
'volume': 35978270,
'volume24Hr': None,
'volumeAllCurrencies': None,
'website': 'http://www.microsoft.com',
'yield': None,
'ytdReturn': None,
'zip': '98052'}
For anyone, including my future self, who wants to iterate multiple symbols without knowing what those symbols are, this is what I came up with:
disclaimer: I'm somewhat new to python and very new to yFinance, so maybe there is a better way. But all the googling I did did not lead to that better way.
symbols = ['appl', 'goog', 'msft']
tickers = yfinance.Tickers(','.join(symbols))
for symbol in symbols:
print(tickers.tickers[symbol.upper()].info)
Related
I've found a code here pretty good to retrieve some data I need (Python yahoo finance error market_cap=int(data.get_quote_yahoo(str)['marketCap']) TypeError: 'int' object is not callable):
tickers=["AAPL","GOOG","RY","HPQ"]
# Get market cap (not really necessary for you)
market_cap_data = web.get_quote_yahoo(tickers)['marketCap']
# Get the P/E ratio directly
pe_data = web.get_quote_yahoo(tickers)['trailingPE']
# print stock and p/e ratio
for stock, pe in zip(tickers, pe_data):
print(stock, pe)
# More keys that can be used
['language', 'region', 'quoteType', 'triggerable', 'quoteSourceName',
'currency', 'preMarketChange', 'preMarketChangePercent',
'preMarketTime', 'preMarketPrice', 'regularMarketChange',
'regularMarketChangePercent', 'regularMarketTime', 'regularMarketPrice',
'regularMarketDayHigh', 'regularMarketDayRange', 'regularMarketDayLow',
'regularMarketVolume', 'regularMarketPreviousClose', 'bid', 'ask',
'bidSize', 'askSize', 'fullExchangeName', 'financialCurrency',
'regularMarketOpen', 'averageDailyVolume3Month',
'averageDailyVolume10Day', 'fiftyTwoWeekLowChange',
'fiftyTwoWeekLowChangePercent', 'fiftyTwoWeekRange',
'fiftyTwoWeekHighChange', 'fiftyTwoWeekHighChangePercent',
'fiftyTwoWeekLow', 'fiftyTwoWeekHigh', 'dividendDate',
'earningsTimestamp', 'earningsTimestampStart', 'earningsTimestampEnd',
'trailingAnnualDividendRate', 'trailingPE',
'trailingAnnualDividendYield', 'marketState', 'epsTrailingTwelveMonths',
'epsForward', 'sharesOutstanding', 'bookValue', 'fiftyDayAverage',
'fiftyDayAverageChange', 'fiftyDayAverageChangePercent',
'twoHundredDayAverage', 'twoHundredDayAverageChange',
'twoHundredDayAverageChangePercent', 'marketCap', 'forwardPE',
'priceToBook', 'sourceInterval', 'exchangeDataDelayedBy', 'tradeable',
'firstTradeDateMilliseconds', 'priceHint', 'exchange', 'shortName',
'longName', 'messageBoardId', 'exchangeTimezoneName',
'exchangeTimezoneShortName', 'gmtOffSetMilliseconds', 'market',
'esgPopulated', 'price']
I would like to retrieve most of the commented fields at the end of the previous code, but I've done this so far:
import pandas_datareader as web
tickers = ["AAPL", "GOOG", "RY", "SAB.MC"]
market_cap_data = web.get_quote_yahoo(tickers)['marketCap']
pe_data = web.get_quote_yahoo(tickers)['trailingPE']
fiftytwo_low_data = web.get_quote_yahoo(tickers)['fiftyTwoWeekLowChangePercent']
for stock, mcap, pe, fiftytwo_low in zip(tickers, market_cap_data, pe_data, fiftytwo_low_data):
print(stock, mcap, pe, fiftytwo_low)
Obviously I could continue with my brute force, but do you know any way to make the code more elegant to retrieve the whole string of fields with column names?
['language', 'region', 'quoteType', 'triggerable', 'quoteSourceName',
'currency', 'preMarketChange', 'preMarketChangePercent',
'preMarketTime', 'preMarketPrice', 'regularMarketChange',
'regularMarketChangePercent', 'regularMarketTime', 'regularMarketPrice',
'regularMarketDayHigh', 'regularMarketDayRange', 'regularMarketDayLow',
'regularMarketVolume', 'regularMarketPreviousClose', 'bid', 'ask',
'bidSize', 'askSize', 'fullExchangeName', 'financialCurrency',
'regularMarketOpen', 'averageDailyVolume3Month',
'averageDailyVolume10Day', 'fiftyTwoWeekLowChange',
'fiftyTwoWeekLowChangePercent', 'fiftyTwoWeekRange',
'fiftyTwoWeekHighChange', 'fiftyTwoWeekHighChangePercent',
'fiftyTwoWeekLow', 'fiftyTwoWeekHigh', 'dividendDate',
'earningsTimestamp', 'earningsTimestampStart', 'earningsTimestampEnd',
'trailingAnnualDividendRate', 'trailingPE',
'trailingAnnualDividendYield', 'marketState', 'epsTrailingTwelveMonths',
'epsForward', 'sharesOutstanding', 'bookValue', 'fiftyDayAverage',
'fiftyDayAverageChange', 'fiftyDayAverageChangePercent',
'twoHundredDayAverage', 'twoHundredDayAverageChange',
'twoHundredDayAverageChangePercent', 'marketCap', 'forwardPE',
'priceToBook', 'sourceInterval', 'exchangeDataDelayedBy', 'tradeable',
'firstTradeDateMilliseconds', 'priceHint', 'exchange', 'shortName',
'longName', 'messageBoardId', 'exchangeTimezoneName',
'exchangeTimezoneShortName', 'gmtOffSetMilliseconds', 'market',
'esgPopulated', 'price']
thanks
Using the set, you can get all the items that can be retrieved by the ticker for the initial set, and using the union set, you can also add in a list, so you can get all the item names that have a value in the issue you want to retrieve.
import pandas_datareader as web
import pandas as pd
tickers = ["AAPL", "GOOG", "RY", "SAB.MC"]
names = set()
for t in tickers:
market_cap_data = web.get_quote_yahoo(t)
names |= set(market_cap_data.columns.to_list())
names
{'ask',
'askSize',
'averageAnalystRating',
'averageDailyVolume10Day',
'averageDailyVolume3Month',
'bid',
'bidSize',
'bookValue',
'cryptoTradeable',
'currency',
'customPriceAlertConfidence',
'displayName',
...
'trailingAnnualDividendYield',
'trailingPE',
'triggerable',
'twoHundredDayAverage',
'twoHundredDayAverageChange',
'twoHundredDayAverageChangePercent',
'typeDisp'}
I know this post is pretty old, but I just came across it now. Check out the 'yfinance' library. There's all kinds of stuff available over there!!
import pandas_datareader as web
import pandas as pd
df = web.DataReader('AAPL', data_source='yahoo', start='2011-01-01', end='2021-01-12')
df.head()
import yfinance as yf
aapl = yf.Ticker("AAPL")
aapl
# get stock info
aapl.info
# get historical market data
hist = aapl.history(period="max")
# show actions (dividends, splits)
aapl.actions
# show dividends
aapl.dividends
# show splits
aapl.splits
# show financials
aapl.financials
aapl.quarterly_financials
# show major holders
aapl.major_holders
# show institutional holders
aapl.institutional_holders
# show balance sheet
aapl.balance_sheet
aapl.quarterly_balance_sheet
# show cashflow
aapl.cashflow
aapl.quarterly_cashflow
# show earnings
aapl.earnings
aapl.quarterly_earnings
# show sustainability
aapl.sustainability
# show analysts recommendations
aapl.recommendations
# show next event (earnings, etc)
aapl.calendar
# show ISIN code - *experimental*
# ISIN = International Securities Identification Number
aapl.isin
# show options expirations
aapl.options
# get option chain for specific expiration
opt = aapl.option_chain('YYYY-MM-DD')
Result:
{'zip': '95014',
'sector': 'Technology',
'fullTimeEmployees': 164000,
'longBusinessSummary': 'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. It also sells various related services. In addition, the company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, and HomePod. Further, it provides AppleCare support and cloud services store services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts. Additionally, the company offers various services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription news and magazine service; Apple TV+, which offers exclusive original content; Apple Card, a co-branded credit card; and Apple Pay, a cashless payment service, as well as licenses its intellectual property. The company serves consumers, and small and mid-sized businesses; and the education, enterprise, and government markets. It distributes third-party applications for its products through the App Store. The company also sells its products through its retail and online stores, and direct sales force; and third-party cellular network carriers, wholesalers, retailers, and resellers. Apple Inc. was incorporated in 1977 and is headquartered in Cupertino, California.',
'city': 'Cupertino',
'phone': '408 996 1010',
'state': 'CA',
'country': 'United States',
'companyOfficers': [],
'website': 'https://www.apple.com',
'maxAge': 1,
'address1': 'One Apple Park Way',
'industry': 'Consumer Electronics',
'ebitdaMargins': 0.33105,
'profitMargins': 0.2531,
'grossMargins': 0.43310001,
'operatingCashflow': 122151002112,
'revenueGrowth': 0.081,
'operatingMargins': 0.30289,
'ebitda': 130541002752,
'targetLowPrice': 122,
'recommendationKey': 'buy',
'grossProfits': 170782000000,
'freeCashflow': 90215251968,
'targetMedianPrice': 180,
'currentPrice': 151.29,
'earningsGrowth': 0.048,
'currentRatio': 0.879,
'returnOnAssets': 0.21214001,
'numberOfAnalystOpinions': 41,
'targetMeanPrice': 178.15,
'debtToEquity': 261.446,
'returnOnEquity': 1.75459,
'targetHighPrice': 214,
'totalCash': 48304001024,
'totalDebt': 132480000000,
'totalRevenue': 394328014848,
'totalCashPerShare': 3.036,
'financialCurrency': 'USD',
'revenuePerShare': 24.317,
'quickRatio': 0.709,
'recommendationMean': 1.9,
'exchange': 'NMS',
'shortName': 'Apple Inc.',
'longName': 'Apple Inc.',
'exchangeTimezoneName': 'America/New_York',
'exchangeTimezoneShortName': 'EST',
'isEsgPopulated': False,
'gmtOffSetMilliseconds': '-18000000',
'quoteType': 'EQUITY',
'symbol': 'AAPL',
'messageBoardId': 'finmb_24937',
'market': 'us_market',
'annualHoldingsTurnover': None,
'enterpriseToRevenue': 6.317,
'beta3Year': None,
'enterpriseToEbitda': 19.081,
'52WeekChange': -0.06042725,
'morningStarRiskRating': None,
'forwardEps': 6.82,
'revenueQuarterlyGrowth': None,
'sharesOutstanding': 15908100096,
'fundInceptionDate': None,
'annualReportExpenseRatio': None,
'totalAssets': None,
'bookValue': 3.178,
'sharesShort': 103178670,
'sharesPercentSharesOut': 0.0064999997,
'fundFamily': None,
'lastFiscalYearEnd': 1663977600,
'heldPercentInstitutions': 0.60030997,
'netIncomeToCommon': 99802996736,
'trailingEps': 6.11,
'lastDividendValue': 0.23,
'SandP52WeekChange': -0.15323704,
'priceToBook': 47.60541,
'heldPercentInsiders': 0.00071999995,
'nextFiscalYearEnd': 1727136000,
'yield': None,
'mostRecentQuarter': 1663977600,
'shortRatio': 1.14,
'sharesShortPreviousMonthDate': 1664496000,
'floatShares': 15891414476,
'beta': 1.246644,
'enterpriseValue': 2490915094528,
'priceHint': 2,
'threeYearAverageReturn': None,
'lastSplitDate': 1598832000,
'lastSplitFactor': '4:1',
'legalType': None,
'lastDividendDate': 1667520000,
'morningStarOverallRating': None,
'earningsQuarterlyGrowth': 0.008,
'priceToSalesTrailing12Months': 6.103387,
'dateShortInterest': 1667174400,
'pegRatio': 2.71,
'ytdReturn': None,
'forwardPE': 22.183283,
'lastCapGain': None,
'shortPercentOfFloat': 0.0064999997,
'sharesShortPriorMonth': 103251184,
'impliedSharesOutstanding': 0,
'category': None,
'fiveYearAverageReturn': None,
'previousClose': 150.72,
'regularMarketOpen': 152.305,
'twoHundredDayAverage': 155.0841,
'trailingAnnualDividendYield': 0.005971337,
'payoutRatio': 0.14729999,
'volume24Hr': None,
'regularMarketDayHigh': 152.57,
'navPrice': None,
'averageDailyVolume10Day': 84360340,
'regularMarketPreviousClose': 150.72,
'fiftyDayAverage': 147.0834,
'trailingAnnualDividendRate': 0.9,
'open': 152.305,
'toCurrency': None,
'averageVolume10days': 84360340,
'expireDate': None,
'algorithm': None,
'dividendRate': 0.92,
'exDividendDate': 1667520000,
'circulatingSupply': None,
'startDate': None,
'regularMarketDayLow': 149.97,
'currency': 'USD',
'trailingPE': 24.761045,
'regularMarketVolume': 74496725,
'lastMarket': None,
'maxSupply': None,
'openInterest': None,
'marketCap': 2406736461824,
'volumeAllCurrencies': None,
'strikePrice': None,
'averageVolume': 89929545,
'dayLow': 149.97,
'ask': 150.95,
'askSize': 1000,
'volume': 74496725,
'fiftyTwoWeekHigh': 182.94,
'fromCurrency': None,
'fiveYearAvgDividendYield': 1,
'fiftyTwoWeekLow': 129.04,
'bid': 150.82,
'tradeable': False,
'dividendYield': 0.0061000003,
'bidSize': 1100,
'dayHigh': 152.57,
'coinMarketCapLink': None,
'regularMarketPrice': 151.29,
'preMarketPrice': None,
'logo_url': 'https://logo.clearb
Just pick/choose what you want.
When I'm using
ticker = "MSFT"
stock = yf.Ticker('MSFT').history('5y')
stock
I get a dataframe in response
When I use
ticker = "MSFT"
stock_info = yf.Ticker(ticker).info
stock_info
I'm receiving a list from Yahoo finance API. How can I transform this list to a Dataframe in Pandas?
So I could use
stock.to_csv(folder + ticker + ".csv") => Working with 'history' data
stock_info.to_csv(folder + ticker + ".csv") => Not working with 'info' data because it's a list not a dataframe.
How can I save yf.Ticker(ticker).info data to a csv?
ticker = "MSFT"
stock_info = yf.Ticker(ticker).info
stock_info
stock_info.to_csv(folder + ticker + ".csv")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [158], line 4
2 stock_info = yf.Ticker(ticker).info
3 stock_info
----> 4 stock_info.to_csv(folder + ticker + ".csv")
AttributeError: 'dict' object has no attribute 'to_csv'
ticker = "MSFT"
stock_info = yf.Ticker(ticker).info
stock_info
{'zip': '98052-6399',
'sector': 'Technology',
'fullTimeEmployees': 221000,
'longBusinessSummary': 'Microsoft Corporation develops, licenses, and supports software, services, devices, and solutions worldwide. The company operates in three segments: Productivity and Business Processes, Intelligent Cloud, and More Personal Computing. The Productivity and Business Processes segment offers Office, Exchange, SharePoint, Microsoft Teams, Office 365 Security and Compliance, Microsoft Viva, and Skype for Business; Skype, Outlook.com, OneDrive, and LinkedIn; and Dynamics 365, a set of cloud-based and on-premises business solutions for organizations and enterprise divisions. The Intelligent Cloud segment licenses SQL, Windows Servers, Visual Studio, System Center, and related Client Access Licenses; GitHub that provides a collaboration platform and code hosting service for developers; Nuance provides healthcare and enterprise AI solutions; and Azure, a cloud platform. It also offers enterprise support, Microsoft consulting, and nuance professional services to assist customers in developing, deploying, and managing Microsoft server and desktop solutions; and training and certification on Microsoft products. The More Personal Computing segment provides Windows original equipment manufacturer (OEM) licensing and other non-volume licensing of the Windows operating system; Windows Commercial, such as volume licensing of the Windows operating system, Windows cloud services, and other Windows commercial offerings; patent licensing; and Windows Internet of Things. It also offers Surface, PC accessories, PCs, tablets, gaming and entertainment consoles, and other devices; Gaming, including Xbox hardware, and Xbox content and services; video games and third-party video game royalties; and Search, including Bing and Microsoft advertising. The company sells its products through OEMs, distributors, and resellers; and directly through digital marketplaces, online stores, and retail stores. Microsoft Corporation was founded in 1975 and is headquartered in Redmond, Washington.',
'city': 'Redmond',
'phone': '425 882 8080',
'state': 'WA',
'country': 'United States',
'companyOfficers': [],
'website': 'https://www.microsoft.com',
'maxAge': 1,
'address1': 'One Microsoft Way',
'fax': '425 706 7329',
'industry': 'Software—Infrastructure',
'ebitdaMargins': 0.48672,
'profitMargins': 0.34366,
'grossMargins': 0.6826,
'operatingCashflow': 87693000704,
'revenueGrowth': 0.106,
'operatingMargins': 0.41691002,
'ebitda': 98841001984,
'targetLowPrice': 255,
'recommendationKey': 'buy',
'grossProfits': 135620000000,
'freeCashflow': 46155874304,
...
'dayHigh': 236.6,
'coinMarketCapLink': None,
'regularMarketPrice': 235.87,
'preMarketPrice': None,
'logo_url': 'https://logo.clearbit.com/microsoft.com'}
I don't understand how I can change this list into something I can write to CSV
{'zip': '98052-6399',
'sector': 'Technology',
'fullTimeEmployees': 221000,
'longBusinessSummary': 'Microsoft Corporation develops, licenses, and supports software, services, devices, and solutions worldwide. The company operates in three segments: Productivity and Business Processes, Intelligent Cloud, and More Personal Computing. The Productivity and Business Processes segment offers Office, Exchange, SharePoint, Microsoft Teams, Office 365 Security and Compliance, Microsoft Viva, and Skype for Business; Skype, Outlook.com, OneDrive, and LinkedIn; and Dynamics 365, a set of cloud-based and on-premises business solutions for organizations and enterprise divisions. The Intelligent Cloud segment licenses SQL, Windows Servers, Visual Studio, System Center, and related Client Access Licenses; GitHub that provides a collaboration platform and code hosting service for developers; Nuance provides healthcare and enterprise AI solutions; and Azure, a cloud platform. It also offers enterprise support, Microsoft consulting, and nuance professional services to assist customers in developing, deploying, and managing Microsoft server and desktop solutions; and training and certification on Microsoft products. The More Personal Computing segment provides Windows original equipment manufacturer (OEM) licensing and other non-volume licensing of the Windows operating system; Windows Commercial, such as volume licensing of the Windows operating system, Windows cloud services, and other Windows commercial offerings; patent licensing; and Windows Internet of Things. It also offers Surface, PC accessories, PCs, tablets, gaming and entertainment consoles, and other devices; Gaming, including Xbox hardware, and Xbox content and services; video games and third-party video game royalties; and Search, including Bing and Microsoft advertising. The company sells its products through OEMs, distributors, and resellers; and directly through digital marketplaces, online stores, and retail stores. Microsoft Corporation was founded in 1975 and is headquartered in Redmond, Washington.',
'city': 'Redmond',
'phone': '425 882 8080',
'state': 'WA',
'country': 'United States',
'companyOfficers': [],
'website': 'https://www.microsoft.com',
'maxAge': 1,
'address1': 'One Microsoft Way',
'fax': '425 706 7329',
'industry': 'Software—Infrastructure',
'ebitdaMargins': 0.48672,
'profitMargins': 0.34366,
'grossMargins': 0.6826,
'operatingCashflow': 87693000704,
'revenueGrowth': 0.106,
'operatingMargins': 0.41691002,
'ebitda': 98841001984,
'targetLowPrice': 255,
'recommendationKey': 'buy',
'grossProfits': 135620000000,
'freeCashflow': 46155874304,
'targetMedianPrice': 296,
'currentPrice': 235.87,
'earningsGrowth': -0.133,
'currentRatio': 1.84,
'returnOnAssets': 0.15223,
'numberOfAnalystOpinions': 45,
'targetMeanPrice': 307.58,
'debtToEquity': 44.442,
'returnOnEquity': 0.42875,
'targetHighPrice': 411,
'totalCash': 107244003328,
'totalDebt': 77136003072,
'totalRevenue': 203074994176,
'totalCashPerShare': 14.387,
'financialCurrency': 'USD',
'revenuePerShare': 27.142,
'quickRatio': 1.585,
'recommendationMean': 1.7,
'exchange': 'NMS',
'shortName': 'Microsoft Corporation',
'longName': 'Microsoft Corporation',
'exchangeTimezoneName': 'America/New_York',
'exchangeTimezoneShortName': 'EDT',
'isEsgPopulated': False,
'gmtOffSetMilliseconds': '-14400000',
'quoteType': 'EQUITY',
'symbol': 'MSFT',
'messageBoardId': 'finmb_21835',
'market': 'us_market',
'annualHoldingsTurnover': None,
'enterpriseToRevenue': 8.51,
'beta3Year': None,
'enterpriseToEbitda': 17.484,
'52WeekChange': -0.2838753,
'morningStarRiskRating': None,
'forwardEps': 11.35,
'revenueQuarterlyGrowth': None,
'sharesOutstanding': 7454470144,
'fundInceptionDate': None,
'annualReportExpenseRatio': None,
'totalAssets': None,
'bookValue': 23.276,
'sharesShort': 38213792,
'sharesPercentSharesOut': 0.0050999997,
'fundFamily': None,
'lastFiscalYearEnd': 1656547200,
'heldPercentInstitutions': 0.71777,
'netIncomeToCommon': 69788999680,
'trailingEps': 9.29,
'lastDividendValue': 0.62,
'SandP52WeekChange': -0.1544562,
'priceToBook': 10.133615,
'heldPercentInsiders': 0.00071000005,
'nextFiscalYearEnd': 1719705600,
'yield': None,
'mostRecentQuarter': 1664496000,
'shortRatio': 1.28,
'sharesShortPreviousMonthDate': 1663200000,
'floatShares': 7399682766,
'beta': 0.960206,
'enterpriseValue': 1728178552832,
'priceHint': 2,
'threeYearAverageReturn': None,
'lastSplitDate': 1045526400,
'lastSplitFactor': '2:1',
'legalType': None,
'lastDividendDate': 1660694400,
'morningStarOverallRating': None,
'earningsQuarterlyGrowth': -0.144,
'priceToSalesTrailing12Months': 8.658308,
'dateShortInterest': 1665705600,
'pegRatio': 1.8,
'ytdReturn': None,
'forwardPE': 20.781496,
'lastCapGain': None,
'shortPercentOfFloat': 0.0050999997,
'sharesShortPriorMonth': 42967330,
'impliedSharesOutstanding': 0,
'category': None,
'fiveYearAverageReturn': None,
'previousClose': 226.75,
'regularMarketOpen': 226.24,
'twoHundredDayAverage': 272.84604,
'trailingAnnualDividendYield': 0.0112017635,
'payoutRatio': 0.26700002,
'volume24Hr': None,
'regularMarketDayHigh': 236.6,
'navPrice': None,
'averageDailyVolume10Day': 34581880,
'regularMarketPreviousClose': 226.75,
'fiftyDayAverage': 248.0068,
'trailingAnnualDividendRate': 2.54,
'open': 226.24,
'toCurrency': None,
'averageVolume10days': 34581880,
'expireDate': None,
'algorithm': None,
'dividendRate': 2.72,
'exDividendDate': 1668556800,
'circulatingSupply': None,
'startDate': None,
'regularMarketDayLow': 226.06,
'currency': 'USD',
'trailingPE': 25.389666,
'regularMarketVolume': 40593443,
'lastMarket': None,
'maxSupply': None,
'openInterest': None,
'marketCap': 1758285791232,
'volumeAllCurrencies': None,
'strikePrice': None,
'averageVolume': 26490917,
'dayLow': 226.06,
'ask': 235.7,
'askSize': 1200,
'volume': 40593443,
'fiftyTwoWeekHigh': 349.67,
'fromCurrency': None,
'fiveYearAvgDividendYield': 1.2,
'fiftyTwoWeekLow': 219.13,
'bid': 235.5,
'tradeable': False,
'dividendYield': 0.0115,
'bidSize': 900,
'dayHigh': 236.6,
'coinMarketCapLink': None,
'regularMarketPrice': 235.87,
'preMarketPrice': None,
'logo_url': 'https://logo.clearbit.com/microsoft.com'}
This is the full response from Yahoo Finance API from which I only need 'freeCashflow', 'industry', 'debtToEquity, 'returnOnEquity', 'sharesOutstanding' to CSV
ticker = "MSFT"
stock_info = yf.Ticker(ticker).info
stock_info
df = pd.DataFrame(stock_info)
df
This code returns an empty df but I can see the columns. 154 columns but 0 rows??? I should have 1 row of data.
The response is a dict so wrap it in square brackets and pass it to pandas.
pd.DataFrame([stock_info]).to_csv("/path/to/write/MSFT.csv", index=False)
I fixed it using
stock_info
df = pd.DataFrame.from_dict(stock_info,orient='index').T
Thanks anyway :-)
I am trying to use the scrape_linkedin package. I follow the section on the github page on how to set up the package/LinkedIn li_at key (which I paste here for clarity).
Getting LI_AT
Navigate to www.linkedin.com and log in
Open browser developer tools (Ctrl-Shift-I or right click -> inspect element)
Select the appropriate tab for your browser (Application on Chrome, Storage on Firefox)
Click the Cookies dropdown on the left-hand menu, and select the www.linkedin.com option
Find and copy the li_at value
Once I collect the li_at value from my LinkedIn, I run the following code:
from scrape_linkedin import ProfileScraper
with ProfileScraper(cookie='myVeryLong_li_at_Code_which_has_characters_like_AQEDAQNZwYQAC5_etc') as scraper:
profile = scraper.scrape(url='https://www.linkedin.com/in/justintrudeau/')
print(profile.to_dict())
I have two questions (I am originally an R user).
How can I input a list of profiles:
https://www.linkedin.com/in/justintrudeau/
https://www.linkedin.com/in/barackobama/
https://www.linkedin.com/in/williamhgates/
https://www.linkedin.com/in/wozniaksteve/
and scrape the profiles? (In R I would use the map function from the purrr package to apply the function to each of the LinkedIn profiles).
The output (from the original github page) is returned in a JSON style format. My second question is how I can convert this into a pandas data frame (i.e. it is returned similar to the following).
{'personal_info': {'name': 'Steve Wozniak', 'headline': 'Fellow at
Apple', 'company': None, 'school': None, 'location': 'San Francisco
Bay Area', 'summary': '', 'image': '', 'followers': '', 'email': None,
'phone': None, 'connected': None, 'websites': [],
'current_company_link': 'https://www.linkedin.com/company/sandisk/'},
'experiences': {'jobs': [{'title': 'Chief Scientist', 'company':
'Fusion-io', 'date_range': 'Jul 2014 – Present', 'location': 'Primary
Data', 'description': "I'm looking into future technologies applicable
to servers and storage, and helping this company, which I love, get
noticed and get a lead so that the world can discover the new amazing
technology they have developed. My role is principally a marketing one
at present but that will change over time.", 'li_company_url':
'https://www.linkedin.com/company/sandisk/'}, {'title': 'Fellow',
'company': 'Apple', 'date_range': 'Mar 1976 – Present', 'location': '1
Infinite Loop, Cupertino, CA 94015', 'description': 'Digital Design
engineer.', 'li_company_url': ''}, {'title': 'President & CTO',
'company': 'Wheels of Zeus', 'date_range': '2002 – 2005', 'location':
None, 'description': None, 'li_company_url':
'https://www.linkedin.com/company/wheels-of-zeus/'}, {'title':
'diagnostic programmer', 'company': 'TENET Inc.', 'date_range': '1970
– 1971', 'location': None, 'description': None, 'li_company_url':
''}], 'education': [{'name': 'University of California, Berkeley',
'degree': 'BS', 'grades': None, 'field_of_study': 'EE & CS',
'date_range': '1971 – 1986', 'activities': None}, {'name': 'University
of Colorado Boulder', 'degree': 'Honorary PhD.', 'grades': None,
'field_of_study': 'Electrical and Electronics Engineering',
'date_range': '1968 – 1969', 'activities': None}], 'volunteering':
[]}, 'skills': [], 'accomplishments': {'publications': [],
'certifications': [], 'patents': [], 'courses': [], 'projects': [],
'honors': [], 'test_scores': [], 'languages': [], 'organizations':
[]}, 'interests': ['Western Digital', 'University of Colorado
Boulder', 'Western Digital Data Center Solutions', 'NEW Homebrew
Computer Club', 'Wheels of Zeus', 'SanDisk®']}
Firstly, You can create a custom function to scrape data and use map function in Python to apply it over each profile link.
Secondly, to create a pandas dataframe using a dictionary, you can simply pass the dictionary to pd.DataFrame.
Thus to create a dataframe df, with dictionary dict, you can do like this:
df = pd.DataFrame(dict)
Am new to scraping using selenium python. So i could retrieve some of the data, but i want it in table form as is displayed on the web page:
Here is what i have so far:
url='https://definitivehc.maps.arcgis.com/home/item.html?id=1044bb19da8d4dbfb6a96eb1b4ebf629&view=list&showFilters=false#data'
browser = webdriver.Chrome(r"C:\task\chromedriver")
browser.get(url)
time.sleep(25)
rows_in_table = browser.find_elements_by_xpath('//table[#class="dgrid-row-table"]//tr[th or td]')
for element in rows_in_table:
print(element.text.replace('\n', ''))
result snippet:
Hospital NameHospital TypeCityState AbrvZip CodeCounty NameState Name
Phoenix VA Health Care System (AKA Carl T Hayden VA Medical Center)VA HospitalPhoenixAZ85012MaricopaArizona040130401362620000.001
Southern Arizona VA Health Care SystemVA HospitalTucsonAZ85723PimaArizona04019040192952952202.002
VA Central California Health Care SystemVA HospitalFresnoCA93703FresnoCalifornia060190601954542202.003
VA Connecticut Healthcare System - West Haven Campus (AKA West Haven VA Medical Center)VA HospitalWest HavenCT6516New HavenConnecticut09009090092162161102.004
I will really appreciate a help form an expert on this. Thanks.
This is an updated version to what #Andrej answered, this code will download the table and instead of printing, saves it as an excel document.
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize
config_url = 'https://definitivehc.maps.arcgis.com/sharing/rest/portals/self?culture=en-us&f=json'
page_url = 'https://services7.arcgis.com/{_id}/arcgis/rest/services/Definitive_Healthcare_USA_Hospital_Beds/FeatureServer/0/query?f=json&where=1%3D1&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&orderByFields=OBJECTID%20ASC&resultOffset={offset}&resultRecordCount=50&cacheHint=true&quantizationParameters=%7B%22mode%22%3A%22edit%22%7D'
_id = requests.get(config_url).json()['id']
required=[]
offset = 0
while True:
data = requests.get(page_url.format(_id=_id, offset=offset)).json()
# uncommnet this to print all data:
#pprint(json.dumps(data, indent=4))
for i, f in enumerate(data['features'], offset+1):
required.append(f['attributes'])
if i % 50:
break
offset += 50
df=json_normalize(required)
with pd.ExcelWriter('dataFunction.xlsx', mode='A') as writer:
df.to_excel(writer)
I tried this and uploaded the excel sheet HERE(LINK TO EXCEL SHEET)!
The data is loaded dynamically using Javascript. You can use requests module to simulate those requests:
import json
import requests
config_url = 'https://definitivehc.maps.arcgis.com/sharing/rest/portals/self?culture=en-us&f=json'
page_url = 'https://services7.arcgis.com/{_id}/arcgis/rest/services/Definitive_Healthcare_USA_Hospital_Beds/FeatureServer/0/query?f=json&where=1%3D1&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&orderByFields=OBJECTID%20ASC&resultOffset={offset}&resultRecordCount=50&cacheHint=true&quantizationParameters=%7B%22mode%22%3A%22edit%22%7D'
_id = requests.get(config_url).json()['id']
offset = 0
while True:
data = requests.get(page_url.format(_id=_id, offset=offset)).json()
# uncommnet this to print all data:
# print(json.dumps(data, indent=4))
for i, f in enumerate(data['features'], offset+1):
print(i, f['attributes'])
print('-' * 160)
if i % 50:
break
offset += 50
Prints all 6624 records:
...
6614 {'OBJECTID': 6614, 'HOSPITAL_NAME': 'Walter E Washington Convention Center Field Hospital (Temporarily Open due to COVID-19)', 'HOSPITAL_TYPE': 'Short Term Acute Care Hospital', 'HQ_ADDRESS': '801 Mount Vernon Pl Nw', 'HQ_ADDRESS1': None, 'HQ_CITY': 'Washington', 'HQ_STATE': 'DC', 'HQ_ZIP_CODE': '20001', 'COUNTY_NAME': 'District of Columbia', 'STATE_NAME': 'District of Columbia', 'STATE_FIPS': '11', 'CNTY_FIPS': '001', 'FIPS': '11001', 'NUM_LICENSED_BEDS': None, 'NUM_STAFFED_BEDS': None, 'NUM_ICU_BEDS': 0, 'ADULT_ICU_BEDS': 0, 'PEDI_ICU_BEDS': None, 'BED_UTILIZATION': None, 'Potential_Increase_In_Bed_Capac': 0, 'AVG_VENTILATOR_USAGE': None}
----------------------------------------------------------------------------------------------------------------------------------------------------------------
6615 {'OBJECTID': 6615, 'HOSPITAL_NAME': 'Joint Base Cape Cod Field Hospital (Temporarily Open due to COVID-19)', 'HOSPITAL_TYPE': 'Short Term Acute Care Hospital', 'HQ_ADDRESS': 'Connery Ave', 'HQ_ADDRESS1': None, 'HQ_CITY': 'Buzzards Bay', 'HQ_STATE': 'MA', 'HQ_ZIP_CODE': '2542', 'COUNTY_NAME': 'Barnstable', 'STATE_NAME': 'Massachusetts', 'STATE_FIPS': '25', 'CNTY_FIPS': '001', 'FIPS': '25001', 'NUM_LICENSED_BEDS': None, 'NUM_STAFFED_BEDS': None, 'NUM_ICU_BEDS': 0, 'ADULT_ICU_BEDS': 0, 'PEDI_ICU_BEDS': None, 'BED_UTILIZATION': None, 'Potential_Increase_In_Bed_Capac': 0, 'AVG_VENTILATOR_USAGE': None}
----------------------------------------------------------------------------------------------------------------------------------------------------------------
6616 {'OBJECTID': 6616, 'HOSPITAL_NAME': 'UMass Lowell Recreation Center Field Hospital (Temporarily Open due to COVID-19)', 'HOSPITAL_TYPE': 'Short Term Acute Care Hospital', 'HQ_ADDRESS': '322 Aiken St', 'HQ_ADDRESS1': None, 'HQ_CITY': 'Lowell', 'HQ_STATE': 'MA', 'HQ_ZIP_CODE': '1854', 'COUNTY_NAME': 'Middlesex', 'STATE_NAME': 'Massachusetts', 'STATE_FIPS': '25', 'CNTY_FIPS': '017', 'FIPS': '25017', 'NUM_LICENSED_BEDS': None, 'NUM_STAFFED_BEDS': None, 'NUM_ICU_BEDS': 0, 'ADULT_ICU_BEDS': 0, 'PEDI_ICU_BEDS': None, 'BED_UTILIZATION': None, 'Potential_Increase_In_Bed_Capac': 0, 'AVG_VENTILATOR_USAGE': None}
----------------------------------------------------------------------------------------------------------------------------------------------------------------
6617 {'OBJECTID': 6617, 'HOSPITAL_NAME': 'Miami Beach Convention Center Field Hospital (Temporarily Open due to COVID-19)', 'HOSPITAL_TYPE': 'Short Term Acute Care Hospital', 'HQ_ADDRESS': '1901 Convention Center Dr', 'HQ_ADDRESS1': None, 'HQ_CITY': 'Miami Beach', 'HQ_STATE': 'FL', 'HQ_ZIP_CODE': '33139', 'COUNTY_NAME': 'Miami-Dade', 'STATE_NAME': 'Florida', 'STATE_FIPS': '12', 'CNTY_FIPS': '086', 'FIPS': '12086', 'NUM_LICENSED_BEDS': None, 'NUM_STAFFED_BEDS': None, 'NUM_ICU_BEDS': 0, 'ADULT_ICU_BEDS': 0, 'PEDI_ICU_BEDS': None, 'BED_UTILIZATION': None, 'Potential_Increase_In_Bed_Capac': 0, 'AVG_VENTILATOR_USAGE': None}
...
I have written a script which is opening multiple tabs one by one and taking data from there. Now I am able to get data from the page but when writing in CSV file getting data as per below.
Bedrooms Bathrooms Super area Floor Status
3 See Dimensions 3 See Dimensions 2100 7 (Out of 23 Floors) 3 See Dimensions
Bedrooms Bathrooms Super area Floor Status
3 See Dimensions 3 See Dimensions 2100 7 (Out of 23 Floors) 3 See Dimensions
Bedrooms Bathrooms Super area Floor Status
1 1 520 4 (Out of 40 Floors) 1
Bedrooms Bathrooms Super area Floor Status
3 See Dimensions 3 See Dimensions 2100 7 (Out of 23 Floors) 3 See Dimensions
Bedrooms Bathrooms Super area Floor Status
1 1 520 4 (Out of 40 Floors) 1
In the Status column i am getting wrong value.
I have tried:
# Go through of them and click on each.
for unique_link in my_needed_links:
unique_link.click()
time.sleep(2)
driver.switch_to_window(driver.window_handles[1])
def get_elements_by_xpath(driver, xpath):
return [entry.text for entry in driver.find_elements_by_xpath(xpath)]
search_entries = [
("Bedrooms", "//div[#class='seeBedRoomDimen']"),
("Bathrooms", "//div[#class='p_value']"),
("Super area", "//span[#id='coveredAreaDisplay']"),
("Floor", "//div[#class='p_value truncated']"),
("Lift", "//div[#class='p_value']")]
with open('textfile.csv', 'a+') as f_output:
csv_output = csv.writer(f_output)
# Write header
csv_output.writerow([name for name, xpath in search_entries])
entries = []
for name, xpath in search_entries:
entries.append(get_elements_by_xpath(driver, xpath))
csv_output.writerows(zip(*entries))
get_elements_by_xpath(driver, xpath)
Edit
Entries: as list
[['3 See Dimensions'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', ''], ['2100'], ['7 (Out of 23 Floors)'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', '']]
[['3 See Dimensions'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', ''], ['2100'], ['7 (Out of 23 Floors)'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', '']]
website link: https://www.magicbricks.com/propertyDetails/1-BHK-520-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-West-in-Mumbai&id=4d423333373433343431
Edit 1
my_needed_links = []
list_links = driver.find_elements_by_tag_name("a")
for i in range(0, 2):
# Get unique links.
for link in list_links:
if "https://www.magicbricks.com/propertyDetails/" in link.get_attribute("href"):
if link not in my_needed_links:
my_needed_links.append(link)
# Go through of them and click on each.
for unique_link in my_needed_links:
unique_link.click()
time.sleep(2)
driver.switch_to_window(driver.window_handles[1])
def get_elements_by_xpath(driver, xpath):
return [entry.text for entry in driver.find_elements_by_xpath(xpath)]
search_entries = [
("Bedrooms", "//div[#class='seeBedRoomDimen']"),
("Bathrooms", "//div[#class='p_value']"),
("Super area", "//span[#id='coveredAreaDisplay']"),
("Floor", "//div[#class='p_value truncated']"),
("Lift", "//div[#class='p_value']")]
#with open('textfile.csv', 'a+') as f_output:
entries = []
for name, xpath in search_entries:
entries.append(get_elements_by_xpath(driver, xpath))
data = [entry for entry in entries if len(entry)==28]
df = pd.DataFrame(data)
print (df)
df.to_csv('nameoffile.csv', mode='a',index=False,encoding='utf-8')
#df.to_csv('nameoffile.csv',mode='a', index=False,encoding='utf-8')
get_elements_by_xpath(driver, xpath)
time.sleep(2)
driver.close()
# Switch back to the main tab/window.
driver.switch_to_window(driver.window_handles[0])
Thank you in advance. Please suggest something
The xpath for bathrooms and for lift are the same, therefore you get the same results in these columns. Try to find another way to identify and distinguish between them. You can probably use an index, though if there's another way it's usually preferred.
I could not load the page due to my location. But from your entries, you could do:
#Your selenium imports
import pandas as pd
def get_elements_by_xpath(driver, xpath):
return [entry.text for entry in driver.find_elements_by_xpath(xpath)]
for unique_link in my_needed_links:
unique_link.click()
time.sleep(2)
driver.switch_to_window(driver.window_handles[1])
search_entries = [
("Bedrooms", "//div[#class='seeBedRoomDimen']"), ("Bathrooms", "//div[#class='p_value']"),("Super area", "//span[#id='coveredAreaDisplay']"),("Floor", "//div[#class='p_value truncated']"),("Lift", "//div[#class='p_value']")]
entries = []
for name, xpath in search_entries:
entries.append(get_elements_by_xpath(driver, xpath))
data = [entry for entry in entries if len(entry)>5]
df = pd.DataFrame(data)
df.drop_duplicates(inplace=True)
df.to_csv('nameoffile.csv', sep=';',index=False,encoding='utf-8',mode='a')
get_elements_by_xpath(driver, xpath)