I am stuck in one problem and am not able to go ahead.. please need help to move further.
I have input excel in this format...
Name usn Sub marks
dhdn 1bm15mca13 c 90
java 95
python 98
subbu 1bm15mca13 java 92
perl 91
paddu 1bm15mca13 c# 80
java 81
And am trying to get expected dictionary in this format:
d = [
{
"name":"dhdn",
"usn":1bm15mca13",
"sub":["c","java","python"],
"marks":[90,95,98]
},
{
"name":"subbu",
"usn":1bm15mca14",
"sub":["java","perl"],
"marks":[92,91]
},
{
"name":"paddu",
"usn":1bm15mca17",
"sub":["c#","java"],
"marks":[80,81]
}
]
Tried code but it is working for only two column
import pandas as pd
existing_excel_file = 'test.xls'
df_service = pd.read_excel(existing_excel_file, sheet_name='Sheet1')
df_service = df_service.fillna(method='ffill')
result = [{'name':k,'sub':g["sub"].tolist()} for k,g in df_service.groupby("name")]
print (result)
Please provide idea or suggestion to solve my problem.
import pandas as pd
existing_excel_file = 'test.xls'
df_service = pd.read_excel(existing_excel_file, sheet_name='Sheet1')
df_service = df_service.fillna(method='ffill')
result = [{'name':k[0],'usn':k[1],'sub':v["sub"].tolist(),"marks":v["marks"].tolist()} for k,v in df_service.groupby(['name', 'usn'])]
pprint (result)
Related
I am new to Python. I read data from an excel file. How may I turn the column into a list? The column is part of a pandas dataframe, read from an xlsx file by xlrd package. Any better way to solve the problem is also appreciated.
import pandas as pd
import xlrd
workbook = xlrd.open_workbook("MyData_XYZ.xlsx")
sheet1 = workbook.sheet_by_index(0)
def get_cell_range2(sheet, start_col, start_row, end_col, end_row):
return [sheet.row_slice(row, start_colx=start_col-1, end_colx=end_col) for row in range(start_row-1, end_row)]
er_aaa = get_cell_range2(sheet1, 1, 2, 2, 67)
er_aaa_df = pd.DataFrame(er_aaa, columns= ['date', 'aaa'])
raw_seq = list(er_aaa_df['aaa'])
I got this in Spyder
raw_seq
Out[61]:
0 number:25.405
1 number:25.427
2 number:25.411
3 number:25.423
4 number:25.45
61 number:26.054
62 number:26.09
63 number:26.103
64 number:26.1
65 number:26.03
Name: aaa, Length: 66, dtype: object
How can I turn the result to a simple list, namely,
[25.405, 25.427, 25.411, ...... 26.03]
Thank you!!
If I understand it correctly, you want to read data from xlsx and get one of its columns. You can get it such as below.
df = pd.read_excel("MyData_XYZ.xlsx")
list = df['aaa'].tolist()
Why don't you just use: data=pd.read_excel("MyData_XYZ.xlsx", header=1), see this link
Then you can just select your values with data['number']. But carefull this is still a dataframe object, if you want a pure list, just do list( data['number'])
I have output from a REST call that I've converted to JSON.
It's a highly nested collection of dicts and lists, but I'm eventually able to convert it to dataframe as follows:
import panads as pd
from requests import get
url = 'http://stats.oecd.org/SDMX-JSON/data/MEI_FIN/IR3TIB.GBR+USA.M/all'
params = {
'startTime' : '2008-06',
'dimensionAtObservation' : 'TimeDimension'
}
r = get(url, params = params)
x = r.json()
d = x['dataSets'][0]['series']
a = pd.DataFrame(d['0:0:0']['observations'])
b = pd.DataFrame(d['0:1:0']['observations'])
This works absent some manipulation to make it easier to work with, and as there are multiple time series, I can do a version of the same for each, but it goes without saying it's kind of clunky.
Is there a better/cleaner way to do this.
The pandasdmx library makes this super-simple:
import pandasdmx as sdmx
df = sdmx.Request('OECD').data(
resource_id='MEI_FIN',
key='IR3TIB.GBR+USA.M',
params={'startTime': '2008-06', 'dimensionAtObservation': 'TimeDimension'},
).write()
Absent any responses, here's the solution I came up with. I added a list comprehension to deal with getting each series into a dataframe, and then a transpose as this source resulted in the series being aligned across rows instead of down columns.
import panads as pd
from requests import get
url = 'http://stats.oecd.org/SDMX-JSON/data/MEI_FIN/IR3TIB.GBR+USA.M/all'
params = {
'startTime' : '2008-06',
'dimensionAtObservation' : 'TimeDimension'
}
r = get(url, params = params)
x = r.json()
d = x['dataSets'][0]['series']
df = [pd.DataFrame(d[i]['observations']).loc[0] for i in d]
df = pd.DataFrame(df).T
I have read through many articles and posts to connect to an api then format it into int/str however I did mange to make possibly the longest winded way ever its real ugly please could someone show me the shortest most efficient way to accomplish the below code any suggestions would be greatly appreciated bassically looking to print out "eos" in str format and "price" as int Thanks!
import urllib
import json
import pandas as pd
import numpy as np
import requests
r = requests.get('https://api.coinmarketcap.com/v1/ticker/eos/')
with open('events.csv','w') as fd:
fd.write(r.text)
data = pd.read_csv('events.csv', names=['Choose One'])
i = data.iloc[[6], [0]]
a = str(i)
name,price = a.split(":")
string = price[2:-1]
print(string)
It's simpler to just use pandas read_json to read the file into a data frame, read_json will automatically assign the apt datatype to each column, then use column selection to select 'name','price_usd' columns (of-course in this case there is only one row, but the same code can be used with multiple rows)
i.e.
import pandas as pd
df = pd.read_json('https://api.coinmarketcap.com/v1/ticker/eos/')
print(df[['name','price_usd']].apply(lambda row:'{}: {:.0f}'.format(ro
w['name'],row['price_usd']),axis=1))
using .0f in the format statement will display the integer part (rounded) of the price_usd value so the output will be.
0 EOS: 9
alternatively using the round function will round the float values
i.e.
In [34]: import pandas as pd
...: df = pd.read_json('https://api.coinmarketcap.com/v1/ticker/eos/')
...: print(df[['name','price_usd']].apply(lambda row:'{}: {:}'.format(row['n
...: ame'],round(row['price_usd'],2)),axis=1))
...:
...:
0 EOS: 8.99
dtype: object
Simply use json.loads(r.text) or much easier directly r.json().
Say, right now the api returns the following data:
[
{
"id": "eos",
"name": "EOS",
"symbol": "EOS",
"rank": "9",
"price_usd": "9.31992",
"price_btc": "0.00106154",
"24h_volume_usd": "596467000.0",
"market_cap_usd": "6034993504.0",
"available_supply": "647537050.0",
"total_supply": "900000000.0",
"max_supply": "1000000000.0",
"percent_change_1h": "1.3",
"percent_change_24h": "-6.81",
"percent_change_7d": "-36.4",
"last_updated": "1517755757"
}
]
If you use r.json(), you get this as a json, otherwise load it with data = json.loads(r.text) and save it to a pandas DataFrame with df = pd.DataFrame(data) which then looks like the following:
In [15]: df
Out[15]:
24h_volume_usd available_supply id last_updated market_cap_usd max_supply name percent_change_1h percent_change_24h percent_change_7d price_btc price_usd rank symbol total_supply
0 596467000.0 647537050.0 eos 1517755757 6034993504.0 1000000000.0 EOS 1.3 -6.81 -36.4 0.00106154 9.31992 9 EOS 900000000.0
Access the data with pandas indexing:
In [8]: df[['name', 'price_usd']]
Out[8]:
name price_usd
0 EOS 9.29186
Or for printing:
In [18]: print df.loc[0, 'name'], ': ', df.loc[0, 'price_usd']
EOS : 9.31992
I'm having trouble getting this nested JSON object into a pandas dataframe using python:
{
"count":275,
"calls":[
{
"connectedTo":"18885068980",
"serviceName":"",
"callGuid":"01541af0-d87c-4911-a868-f5ac573d1e31",
"origin":"+19178558701",
"stateChangedAt":"2016-04-15T18:21:23Z",
"sequence":9,
"appletName":"ACD Sales General"
}
]
}
I've tried using json_normalize and am going in circles. Any help would be very much appreciated!
I know that it includes json_normalize, but I think this is what you are trying to do.
import json
import pandas as pd
from pandas.io.json import json_normalize
from pprint import pprint
j = json.dumps( //to create the json
{'count': 275,
"calls":
[{'connectedTo': "18885068980",
"serviceName":"",
"callGuid":"01541af0-d87c-4911-a868-f5ac573d1e31",
"stateChangedAt":"2016-04-15T18:21:23Z",
"sequence":9,
"appletName":"ACD Sales General"}]})
data = json.loads(j)
pprint(json_normalize(data['calls']))
which returns
appletName callGuid connectedTo \
0 ACD Sales General 01541af0-d87c-4911-a868-f5ac573d1e31 18885068980
sequence serviceName stateChangedAt
0 9 2016-04-15T18:21:23Z
I have a generated file as follows:
[{"intervals": [{"overwrites": 35588.4, "latency": 479.52}, {"overwrites": 150375.0, "latency": 441.1485001192274}], "uid": "23"}]
I simplified the file a bit for space reasons (there are more columns besides for the "overwrites" and "latency" ). I would like to import the data into a dataframe so I can later on draw the latency. I tried the following:
with open(os.path.join(path, "my_file.json")) as json_file:
curr_list=json.load(json_file)
df=pd.Series(curr_list[0]['intervals'])
print df
which returned:
0 {u'overwrites': 35588.4, u'latency...
1 {u'overwrites': 150375.0, u'latency...
However I couldn't get to store df in a data structure that allows me to access the latency field as follows:
graph = df[['latency']]
graph.plot(title="latency")
Any ideas?
Thanks for the help!
I think you can use json_normalize:
import pandas as pd
from pandas.io.json import json_normalize
data = [{"intervals": [{"overwrites": 35588.4, "latency": 479.52},
{"overwrites": 150375.0, "latency": 441.1485001192274}],
"uid": "23"}]
result = json_normalize(data, 'intervals', ['uid'])
print result
latency overwrites uid
0 479.5200 35588.4 23
1 441.1485 150375.0 23