I want to write a specific key with tuple values in a CSV file using Python. I cannot currently use numby or any other python external library. I am using "zip" to achieve this but only the first value associated with the key is getting directed, whereas, I want to print all the values in the tuple.
A sample dictionary and code are provided below:
data = {
"Pakistan": (0.57, 0.05, 0.79),
"India": (0.47, 0.12, 0.54),
"Bangladesh": (0.49, 0.17, 0.81)
}
con_name = input("Write up to three comma-separated countries for which you want to extract data: ")
count = len(re.findall(r'\w+', con_name))
if count == 1:
con_check1 = con_name.split()[0]
if con_check1.lower() in map(str.lower, data.keys()):
con_check1 = con_check1.capitalize()
x = list(data.keys()).index(con_check1)
y = [key for key in data.keys()][x]
csv_columns = ['Country Name','1997','1998','1999']
with open('Emissions_subset.csv','w') as out:
csv_out=csv.writer(out)
csv_out.writerow(csv_columns)
z = [y]
csv_out.writerows(zip(z, data[con_check1]))
The current output in the CSV file:
Country Name, 1997, 1998, 1999
Pakistan 0.57
The desired output:
Country Name, 1997, 1998, 1999
Pakistan 0.57, 0.05, 0.79
Can you please help me with this issue? I have been asking some questions lately and nobody is answering me. I am really stuck here and only ask a question after I am exhausted of trying.
Try this:
[In] kv_list = [[key,*val] for key, val in data.items()]
[In] print(kv_list)
[Out] [['Pakistan', 0.57, 0.05, 0.79], ['India', 0.47, 0.12, 0.54], ['Bangladesh', 0.49, 0.17, 0.81]]
Then just use csv_out.writerows(kv_list).
Related
I am trying to get a list of directors and calculate their average score based on all the movies I have in this .csv file. I have written some sample code so it is easier to understand. The sample code works fine but when I'm using the columns from the .csv file it gives me this error, '<' not supported between instances of 'str' and 'float'. Here is the sample code:
df = pd.DataFrame(data={"Director":[ 'Christopher Nolan', 'David Fincher', 'Christopher Nolan', 'Quentin Tarantino', 'Quentin Tarantino', 'Christopher Nolan' ],
"Score": [ 8.9, 9.0, 8.8, 7.8, 9.2, 7.9]})
director_list = []
avg_scores = []
for director in np.unique(df["Director"]):
director_list.append(director)
avg_scores.append(df.loc[df["Director"]==director, "Score"].mean())
df = pd.DataFrame(data={"Director":director_list, "Score": avg_scores})
df
If anyone could help I would greatly appreciate it :)
This is the code in my main file that is causing the error.
data = pd.read_csv('movies.csv') # read in file
dataDirector = data
dataDirector.dropna(subset=['Director', 'Score']) # create data set for year score graph
dataDirector.sort_values(by=['Score'], inplace=True) # order the scores
dataDirector.reset_index()
df4 = pd.DataFrame(data={"Director":dataDirector['Director'], "Score": dataDirector['Score']})
director_list4 = []
avg_scores4 = []
for director in np.unique(df4["Director"]):
director_list4.append(director)
avg_scores4.append(df4.loc[df4["Director"]==director, "Score"].mean())
df4 = pd.DataFrame(data={"Director":director_list4, "Score": avg_scores4})
df4
Is it right, that you try to say something like:
if score < x: #do something
Please check if your x is also a float or integer datatype. As the error says, you probably use a string like "6" instead of an integer or float like 6.
Update:
This statement raises the error:
np.unique(df4["Director"])
You can't use it for Strings. Try something like
df4["Director"].unique()
I have a dictionary with unique ID and [sample distribution of scores] pairs, e.g.: '100': [0.5, 0.6, 0.2, 0.7, 0.3]. The arrays are not all the same length.
For each item/'scores' array in my dictionary, I want to fit a beta distribution like scipy.stats.beta.fit() over the distribution of scores and get the alpha/beta parameters for each sample. And then I want this in a new dictionary — so it'd be like, '101': (1.5, 1.8).
I know I could do this by iterating over my dictionary with a for-loop, but the dictionary is pretty massive/I'd like to know if there's a more computationally efficient way of doing it.
For context, the way I get this dictionary is from a pandas dataframe, where I do:
my_dictionary = df.groupby('unique_id')['score'].apply(list).to_dict()
The df looks like this:
For example:
df = pd.DataFrame({
'id': ['100', '100', '100', '101', '101', '102'],
'score' : [0.5, 0.3, 0.2, 1, 0.2, 0.9]
})
And then the resulting dictionary looks like:
{'100': [0.5, 0.3, 0.2], '101': [0.2, 0.1], '102': [0.9]}
Is there maybe also a way of fitting the beta distribution straight from the df.groupby level/without having to convert it into a dictionary first and then looping over the dictionary with scipy? Like is there something where I could do:
df.groupby('unique_id')['score'].apply(stats.beta.fit()).to_dict()
...or something like that?
Try this:
df=df.groupby('id').apply(lambda x: list(beta.fit(x.score)))
dc=df.to_dict()
Output:
df
id
100 [0.2626434905176847, 0.37866242902872393, 0.18...
101 [1.253982875508286, 0.8832540117966552, -0.093...
102 [1.044551187075241, 1.0167687597781938, 0.8999...
dtype: object
dc
{'100': [0.2626434905176847, 0.37866242902872393, 0.18487097639113187, 0.3151290236088682],
'101': [1.253982875508286, 0.8832540117966552, -0.09383386122371801, 1.0938338612237182],
'102': [1.044551187075241, 1.0167687597781938, 0.8999999999999999, 1.1272504901983386e-16]}
As I recognize You need to fit multiple beta.fit per row of dataframe df:
df['beta_fit'] = df['score'].apply( lambda x: stats.beta.fit(x))
Now result is stored in df['beta_fit']:
0 (0.5158954356434775, 0.4824876600627905, 0.154...
1 (0.18219650169013427, 0.18228236200252418, 0.1...
2 (2.874609362944296, 0.8497751096020354, -0.341...
3 (1.313976940871222, 0.5956397575363881, -0.093...
Name: beta_fit, dtype: object
If you want to keep the location (loc) and scale (scale) fixed, you need to indicate this in scipy.stats.beta.fit. You can use functools.partial for this:
import pandas as pd
>>> import scipy.stats
>>> from functools import partial
>>> df = pd.DataFrame({
... 'id': ['100', '100', '100', '101', '101', '102'],
... 'score' : [0.5, 0.3, 0.2, 0.1, 0.2, 0.9]
... })
>>> beta = partial(scipy.stats.beta.fit, floc=0, fscale=1)
>>> df.groupby('id')['score'].apply(beta)
id
100 (4.82261025047374, 9.616623800842953, 0, 1)
101 (0.7079910251948778, 0.910200073771759, 0, 1)
Name: score, dtype: object
Note that I have adjusted your input example, since it contains an incorrect value (1.0), and too few values for the fit to succeed in some cases.
I want to convert a Pandas DataFrame into separate dicts, where the names of the dict are the columnn names and all dics have the same index.
the dataframe looks like this:
cBmsExp cCncC cDnsWd
PlantName
A.gre 2.5 0.45 896.8
A.rig 2.5 0.40 974.9
A.tex 3.5 0.45 863.1
the result should be:
cBmsExp = {"A.gre":2.5, "A.rig": 2.5, "A.tex": 3.5}
cCncC = {"A.gre":0.45, "A.rig": 0.4, "A.tex": 0.45}
cDnsWd = {"A.gre":898.8, "A.rig": 974.9, "A.tex": 863.1}
I can't figure out how a column name can become the name of a variable in my Python code.
I went through piles of stack overflow questions and answers, but I didn't find this type of problem among them.
Suggestions for code are very much appreciated!
It is not recommended, better is create dict of dicts and select by keys:
d = df.to_dict()
print (d)
{'cBmsExp': {'A.gre': 2.5, 'A.rig': 2.5, 'A.tex': 3.5},
'cCncC': {'A.gre': 0.45, 'A.rig': 0.4, 'A.tex': 0.45},
'cDnsWd': {'A.gre': 896.8, 'A.rig': 974.9, 'A.tex': 863.1}}
print (d['cBmsExp'])
{'A.gre': 2.5, 'A.rig': 2.5, 'A.tex': 3.5}
But possible, e.g. by globals:
for k, v in d.items():
globals()[k] = v
print (cBmsExp)
{'A.gre': 2.5, 'A.rig': 2.5, 'A.tex': 3.5}
I have a dataframe and want to convert it to a list of dictionaries. I use read_csv() to create this dataframe. The dataframe looks like the following:
AccountName AccountType StockName Allocation
0 MN001 #1 ABC 0.4
1 MN001 #1 ABD 0.6
2 MN002 #2 EFG 0.5
3 MN002 #2 HIJ 0.4
4 MN002 #2 LMN 0.1
The desired output:
[{'ABC':0.4, 'ABD':0.6}, {'EFG':0.5, 'HIJ':0.4,'LMN':0.1}]
I have tried to research on similar topics and used the Dataframe.to_dict() function. I look forward to getting this done. Many thanks for your help!
import pandas as pd
import numpy as np
d = np.array([['MN001','#1','ABC', 0.4],
['MN001','#1','ABD', 0.6],
['MN002', '#2', 'EFG', 0.5],
['MN002', '#2', 'HIJ', 0.4],
['MN002', '#2', 'LMN', 0.1]])
df = pd.DataFrame(data=d, columns = ['AccountName','AccountType','StockName', 'Allocation'])
by_account_df = df.groupby('AccountName').apply(lambda x : dict(zip(x['StockName'],x['Allocation']))).reset_index(name='dic'))
by_account_lst = by_account_df['dic'].values.tolist()
And the result should be:
print(by_account_lst)
[{'ABC': '0.4', 'ABD': '0.6'}, {'EFG': '0.5', 'HIJ': '0.4', 'LMN': '0.1'}]
This should do it:
portfolios = []
for _, account in df.groupby('AccountName'):
portfolio = {stock['StockName']: stock['Allocation']
for _, stock in account.iterrows()}
portfolios.append(portfolio)
First use the groupby() function to group the rows of the dataframe by AccountName. To access the individual rows (stocks) for each account, you use the iterrows() method. As user #ebb-earl-co explained in the comments, the _ is there as a placeholder variable, because iterrows() returns (index, Series) tuples, and we only need the Series (the rows themselves). From there, use a dict comprehension to create a dictionary mapping StockName -> Allocation for each stock. Finally, append that dictionary to the list of portfolios, resulting in the expected output:
[{'ABC': 0.4, 'ABD': 0.6}, {'EFG': 0.5, 'HIJ': 0.4, 'LMN': 0.1}]
One more thing: if you decide later that you want to label each dict in the portfolios with the account name, you could do it like this:
portfolios = []
for acct_name, account in df.groupby('AccountName'):
portfolio = {stock['StockName']: stock['Allocation']
for _, stock in account.iterrows()}
portfolios.append({acct_name: portfolio})
This will return a list of nested dicts like this:
[{'MN001': {'ABC': 0.4, 'ABD': 0.6}},
{'MN002': {'EFG': 0.5, 'HIJ': 0.4, 'LMN': 0.1}}]
Note that in this case, I used the variable acct_name instead of assigning to _ because we actually will use the index to "label" the dicts in the portfolios list.
I am currently working on an assignment where I need to convert a nested list to a dictionary, where i have to separate the codes from the nested list below.
data = [['ABC', "Tel", "12/07/2017", 1.5, 1000],['ACE', "S&P", "12/08/2017", 3.2, 2000],['AEB', "ENG", "04/03/2017", 1.4, 3000]]
to get this
Code Name Purchase Date Price Volume
ABC Tel 12/07/2017 1.5 1000
ACE S&P 12/08/2017 3.2 2000
AEB ENG 04/03/2017 1.4 3000
so the remaining values are still in a list, but tagged to codes as keys.
Could anyone advice on this please,thank you!
You can use a dictcomp:
keys = ['Code','Name','Purchase Date','Price','Volume']
{k: v for k, *v in zip(keys, *data)}
Result:
{'Code': ['ABC', 'ACE', 'AEB'],
'Name': ['Tel', 'S&P', 'ENG'],
'Purchase Date': ['12/07/2017', '12/08/2017', '04/03/2017'],
'Price': [1.5, 3.2, 1.4],
'Volume': [1000, 2000, 3000]}
You can use pandas dataframe for that:
import pandas as pd
data = [['ABC', "Tel", "12/07/2017", 1.5, 1000],['ACE', "S&P", "12/08/2017", 3.2, 2000],['AEB', "ENG", "04/03/2017", 1.4, 3000]]
columns = ["Code","Name","Purchase Date","Price","Volume"]
df = pd.DataFrame(data, columns=columns)
print(df)
I assume that by dictionaries you mean a list of dictionaries, each representing a row with the header as its keys.
You can do that like this:
keys = ['Code','Name','Purchase Date','Price','Volume']
dictionaries = [ dict(zip(keys,row)) for row in data ]