Create multiple dataframes with a loop in Python - python

So I got this part of code that I want to make shorter:
df_1 = investpy.stocks.get_stock_recent_data('Eco','Colombia',False)
df_2 = investpy.stocks.get_stock_recent_data('JPM','United States',False)
df_3 = investpy.stocks.get_stock_recent_data('TSM','United States',False)
df_5 = investpy.stocks.get_stock_recent_data('CSCO','United States',False)
df_8 = investpy.stocks.get_stock_recent_data('NVDA','United States',False)
df_9 = investpy.stocks.get_stock_recent_data('BLK','United States',False)
As I use the same code and only a few things change from one line to another I think I migth solve this using a function. I create this one:
def _get_asset_data(ticker, country, state):
investpy.stocks.get_stock_recent_data(ticker, country, state)
So I tried this:
_get_asset_data('TSLA', 'United States', False)
print(_get_asset_data)
<function _get_asset_data at 0x7f323c912560>
However, I do not know how to make each set of data that I receive as a result of this function to be stored in a data frame for each company.I tried a for loop but got nowhere.
Any ideas? ¡Thank you in advance for your attention and cooperation!

Here is one approach based on the code given. You should refrain from using it in practice, as it contains redundant code, which makes it hard to maintain. You'll find a more flexible approach below.
Based on your solution
import investpy
import pandas as pd
def _get_asset_data(ticker, country, state=False):
return investpy.stocks.get_stock_recent_data(ticker, country, state)
df_1 = _get_asset_data('Eco','Colombia')
df_2 = _get_asset_data('JPM','United States')
df_3 = _get_asset_data('TSM','United States')
df_5 = _get_asset_data('CSCO','United States')
df_8 = _get_asset_data('NVDA','United States')
df_9 = _get_asset_data('BLK','United States')
final = pd.concat([df_1, df_2, df_3, df_5, df_8, df_9], axis=1)
final
More versatile solution:
import investpy
import pandas as pd
def _get_asset_data(ticker, country, state=False):
return investpy.stocks.get_stock_recent_data(ticker, country, state)
stocks = [
('Eco', 'Colombia'),
('JPM', 'United States'),
('TSM', 'United States'),
('CSCO', 'United States'),
('NVDA', 'United States'),
('BLK', 'United States'),
]
results = []
for stock in stocks:
result = _get_asset_data(*stock)
results.append(result)
final = pd.concat(results, axis=1)
final

Related

Save multiple dataframes into the environment in Python

I have a similar problem with this question but the original question was to make multiple csv output. In my case, I am wondering if there's a way to make the multiple dataframe output into environment through a loop so I can carry on some data analysis.
us = df[df['country_code'].str.match("US")]
mx = df[df['country_code'].str.match("MX")]
ca = df[df['country_code'].str.match("CA")]
au = df[df['country_code'].str.match("AU")]
You could use the same code as the link posted, but save the different dfs into a dictionary:
codes = ['US', 'MX', 'CA', 'AU']
result_dict = {}
for code in codes:
temp = df.query(f'country_code.str.match("{code}")')
result_dict[code] = temp
You can create for and check like below and create dict for match:
df = pd.DataFrame({'country_code': ['US','MX', 'CA', 'AU']})
codes = ['US', 'MX', 'CA', 'AU']
out = {code : df[df['country_code'].str.match(code)] for code in codes}
Output:
>>> out["US"]
country_code
0 US
>>> type(out["US"])
pandas.core.frame.DataFrame
>>> out["CA"]
country_code
2 CA

Check whether all unique value of column B are mapped with all unique value of Column A

I need little help, I know it's very easy I tried but didn't reach the goal.
# Import pandas library
import pandas as pd
data1 = [['India', 350], ['India', 600], ['Bangladesh', 350],['Bangladesh', 600]]
df1 = pd.DataFrame(data1, columns = ['Country', 'Bottle_Weight'])
data2 = [['India', 350], ['India', 600],['India', 200], ['Bangladesh', 350],['Bangladesh', 600]]
df2 = pd.DataFrame(data2, columns = ['Country', 'Bottle_Weight'])
data3 = [['India', 350], ['India', 600], ['Bangladesh', 350],['Bangladesh', 600],['Bangladesh', 200]]
df3 = pd.DataFrame(data3, columns = ['Country', 'Bottle_Weight'])
So basically I want to create a function, which will check the mapping by comparing all other unique countries(Bottle weights) with the first country.
According to the 1st Dataframe, It should return text as - All unique value of 'Bottle Weights' are mapped with all unique countries
According to the 2nd Dataframe, It should return text as - 'Country_name' not mapped 'Column name' 'value'
In this case, 'Bangladesh' not mapped with 'Bottle_Weight' 200
According to the 3rd Dataframe, It should return text as - All unique value of Bottle Weights are mapped with all unique countries (and in a new line) 'Country_name' mapped with new value '200'
It is not a particularly efficient algorithm, but I think this should get you the results you are looking for.
def check_weights(df):
success = True
countries = df['Country'].unique()
first_weights = df.loc[df['Country']==countries[0]]['Bottle_Weight'].unique()
for country in countries[1:]:
weights = df.loc[df['Country']==country]['Bottle_Weight'].unique()
for weight in first_weights:
if not np.any(weights[:] == weight):
success = False
print(f"{country} does not have bottle weight {weight}")
if success:
print("All bottle weights are shared with another country")

How do I save result of multiple “for” loops into a dataframe?

How can I add outputs of different for loops into one dataframe. For example I have scraped data from website and have list of Names,Email and phone number using loops. I want to add all outputs into a table in single dataframe.
I am able to do it for One single loop but not for multiple loops.
Please look at the code and output in attached images.
By removing Zip from for loop its giving error. "Too many values to unpack"
Loop
phone = soup.find_all(class_ = "directory_item_phone directory_item_info_item")
for phn in phone:
print(phn.text.strip())
##Output - List of Numbers
Code for df
df = list()
for name,mail,phn in zip(faculty_name,email,phone):
df.append(name.text.strip())
df.append(mail.text.strip())
df.append(phn.text.strip())
df = pd.DataFrame(df)
df
For loops
Code and Output for df
An efficient way to create a pandas.DataFrame is to first create a dict and then convert it into a DataFrame.
In your case you probably could do :
import pandas as pd
D = {'name': [], 'mail': [], 'phone': []}
for name, mail, phn in zip(faculty_name, email, phone):
D['name'].append(name.text.strip())
D['mail'].append(mail.text.strip())
D['phone'].append(phn.text.strip())
df = pd.DataFrame(D)
Another way with a lambda function :
import pandas as pd
text_strip = lambda s : s.text.strip()
D = {
'name': list(map(text_strip, faculty_name)),
'mail': list(map(text_strip, email)),
'phone': list(map(text_strip, phone))
}
df = pd.DataFrame(D)
If lists don't all have the same length you may try this (but I am not sure that is very efficient) :
import pandas as pd
columns_names = ['name', 'mail', 'phone']
all_lists = [faculty_name, email, phone]
max_lenght = max(map(len, all_lists))
D = {c_name: [None]*max_lenght for c_name in columns_names}
for c_name, l in zip(columns_names , all_lists):
for ind, element in enumerate(l):
D[c_name][ind] = element
df = pd.DataFrame(D)
Try this,
data = {'name':[name.text.strip() for name in faculty_name],
'mail':[mail.text.strip() for mail in email],
'phn':[phn.text.strip() for phn in phone],}
df = pd.DataFrame.from_dict(data)

How to group this dataframe by 'Continent' column?

A 'Continent' column is added to an existing data frame using a dictionary to match with the country names in data frame.
I am trying to group the data frame by the 'Continent' column.
I have tried the following:
def answer_eleven():
Top15 = answer_one()
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
ContinentDict= pd.Series(ContinentDict)
Top15= Top15.assign(Continent= ContinentDict)
Top15= Top15.groupby('Continent')
return Top15
answer_eleven()
However, the output i get is:
pandas.core.groupby.groupby.DataFrameGroupBy object at 0x0000021C9C3BC6D8
a way to display a groupby object is
data = answer_eleven()
for key, item in data:
print(data.get_group(key), "\n")

Pandas reading and sorting a file's content

I am reading a file from SIPRI. It reads in to pandas and dataframe is created and I can display it but when I try to sort by a column, I get a KeyError. Here is the code and the error:
import os
import pandas as pd
os.chdir('C:\\Users\\Student\\Documents')
#Find the top 20 countries in military spending by sorting
data = pd.read_excel('SIPRI-Milex-data-1949-2016.xls',
header = 0, index_col = 0, sheetname = 'Current USD')
data.sort_values(by = '2016', ascending = False)
KeyError: '2016'
You get the key error because the column '2016' is not present in the dataframe. Based on the excel file its in the integer form. Cleaning of data must be done in your dataframe to sort the things.
You can skip the top 5 rows and the bottom 8 rows to get the countries, then replace all the string and missing values with NaN. The following code will help you get that.
data = pd.read_excel('./SIPRI-Milex-data-1949-2016.xlsx', header = 0, index_col = 0, sheetname = 'Current USD',skiprows=5,skip_footer = 8)
data = data.replace(r'\s+', np.nan, regex=True).replace('xxx',np.nan)
new_df = data.sort_values(2016,ascending=False)
top_20 = new_df[:20].index.tolist()
Output:
['USA', 'China, P.R.', 'Russian Federation', 'Saudi Arabia', 'India', 'France', 'UK', 'Japan', 'Germany', 'Korea, South', 'Italy', 'Australia', 'Brazil', 'Israel', 'Canada', 'Spain', 'Turkey', 'Iran', 'Algeria', 'Pakistan']
​
Well this could be helpful, I guess:
data = pd.read_excel('SIPRI-Milex-data-1949-2016.xlsx', skiprows=5, index_col = 0, sheetname = 'Current USD')
data.dropna(inplace=True)
data.sort_values(by=2016, ascending=False, inplace=True)
And to get Top20 you can use:
data[data[2016].apply(lambda x: isinstance(x, (int, float)))][:20]
I downloaded the file and looks like the 2016 is not a column itself so you need to modify the dataframe a bit so as to change the row of country to be the header.
The next thing is you need to say data.sort_values(by = 2016, ascending = False). treat the column name as an integer instead of a string.
data = pd.read_excel('SIPRI-Milex-data-1949-2016.xlsx',
header = 0, index_col = 0, sheetname = 'Current USD')
data = data[4:]
data.columns = data.iloc[0]
data.sort_values(by =2016, ascending = False)

Categories

Resources