Pull date and numbers from json and append them to pandas dataframe - python

I want to pull the date and the powerball numbers and append them to a pandas dataframe. I have made the columns, but I can't seem to get the data to the column. When I go to https://jsonparser.org/ and put in the url I see . But when I try to list the number I.E. ['8'] or ['9'] it doesn't append the data. I've been working on this for about 3 days. Thanks in advance.
###########
# MODULES #
###########
import json
import requests
import urllib
import pandas as pd
###########
# HISTORY #
###########
#We need to pull the data from the website.
#Then we need to organize the numbers based off of position.
#Basically it will be several lists of numbers
URL = "https://data.ny.gov/api/views/d6yy-54nr/rows"
r = requests.get(URL)
my_json = r.json()
#my_json_dumps = json.dumps(my_json, indent = 2)
#print(my_json_dumps)
df = pd.DataFrame(columns=["Date","Powerball Numbers","1","2","3","4","5","6","7"])#Create columns in df
for item in my_json['data']:
df = pd.DataFrame(my_json['data'])
l_date = df.iloc['8']#Trying to pull columns from json
p_num = (df.iloc['9'])#Trying to pull columns from json
df = df.append({"Date": l_date,
"Powerball Numbers": p_num,
},ignore_index=True)
#test = item['id']
print(l_date)
EDIT: This is what I am trying to get.

Try with this:
Old code:
l_date = df.iloc['8'] #Trying to pull columns from json
p_num = (df.iloc['9']) #Trying to pull columns from json
Change this lines with quotations df.iloc['8'] :
l_date = df.iloc[8] # without quotation
p_num = (df.iloc[9]) # without quotation
Work fine. The result is:
0 row-w3y7.4r9a-caat
1 00000000-0000-0000-3711-B495A26E6E8B
2 0
3 1619710755
4 None
5 1619710755
6 None
7 { }
8 2020-10-24T00:00:00
9 18 20 27 45 65 06
10 2
Name: 8, dtype: object
0 row-w3y7.4r9a-caat
1 00000000-0000-0000-3711-B495A26E6E8B
2 0
3 1619710755
4 None
5 1619710755
6 None
7 { }
8 2020-10-24T00:00:00
9 18 20 27 45 65 06
10 2
I hope you find it useful.

This is how I did it:
URL = "https://data.ny.gov/resource/d6yy-54nr.json"
r = requests.get(URL)
my_json = r.json()
for item in my_json:
df = pd.DataFrame(my_json)
l_date = item['draw_date']#Trying to pull columns from json
p_num = item['winning_numbers']#Trying to pull columns from json
n_list = list(map(int, p_num.split())) #Convert powerball numbers into list
n_1 = n_list[0]
n_2 = n_list[1]
n_3 = n_list[2]
n_4 = n_list[3]
n_5 = n_list[4]
n_6 = n_list[5]
df1 = pd.DataFrame(columns=["Date","Powerball Numbers", "1","2","3","4","5","6"])#Create columns in df
df = df1.append({"Date": l_date,
"Powerball Numbers": p_num,
"1": n_1,
"2": n_2,
"3": n_3,
"4": n_4,
"5": n_5,
"6": n_6,
},ignore_index=True)
print(df)
Which produced

Related

How do I create

I have a text file that needs to be read line by line and converted into a data frame with the 4 following columns
import re
import pandas as pd
with open('/Users/Desktop/Final Semester Fall 2022/archive/combined_data_1.txt',encoding='latin-1') as f:
for line in f:
result = re.search(r"^(\d+),(\d+),(\d{4}-\d{2}-\d{2})/gm", line)
if re.search(r"(^\d+):", line) is not None:
movie_id = re.search(r"(^\d+):", line).group(1)
elif result:
customerid = result.group(1)
rating = result.group(2)
date = result.group(3)
else:
continue
data_list = [customerid, rating, date, movie_id]
df1 = pd.DataFrame(data_list)
df1.to_csv(r'/Users/Desktop/Final Semester Fall 2022/archive/combineddata1.csv')
Im getting the following error:
How do I fix this error???
Thanks in advance!!
here is one way to do it
# read the csv file using read_csv, using ":" as a separator
# since there is only one colon ":" per movie, you end up with a row for movie following by rows for the rest of the data.
df=pd.read_csv(r'c:\csv.csv', sep=':', header=None, names=['col1', 'col2'])
# when there is no comma in a row, means its only a movie id,
# so we populate the movieid column and downfill for all rows
df['MovieId'] = df['col1'].mask(df['col1'].str.contains(',')).ffill()
# split the data into CusotmerId, rating and date
df[['CustomerID','Rating','Date']] = df['col1'].str.split(',',expand=True)
# drop the unwanted columns and rows
df2=df[df['col1'].ne(df['MovieId'])].drop(columns=['col1','col2'])
df2
# sample created from the data you shared above as image
MovieId CustomerID Rating Date
1 1 1488844 3 2005-09-06
2 1 822109 5 2005-05-13
3 1 885013 4 2005-10-19
4 1 30878 4 2005-12-26
5 1 823519 3 2004-05-03
6 1 893988 3 2005-11-17
7 1 124105 4 2004-08-05
8 1 1248629 3 2004-04-22
9 1 1842128 4 2004-05-09
10 1 2238063 3 2005-05-11
11 1 1503895 4 2005-05-19
13 2 1288844 3 2005-09-06
14 2 832109 5 2005-05-13
You can parse that structure quite easily (without regex, using a few lines of very readable vanilla Python) and build a dictionary while reading the data file. You can then convert the dictionary to a DataFrame in one go.
import pandas as pd
df = {'MovieID':[], 'CustomerID':[], 'Rating':[], 'Date':[]}
with open('data.txt', 'r') as f:
for line in f:
line = line.strip()
if line: #skip empty lines
if line.endswith(':'): #MovieID
movie_id = line[:-1]
else:
customer_id, rating, date = line.split(',')
df['MovieID'].append(movie_id)
df['CustomerID'].append(customer_id)
df['Rating'].append(rating)
df['Date'].append(date)
df = pd.DataFrame(df)
print(df)
MovieID CustomerID Rating Date
0 1 1488844 3 2005-09-06
1 1 822109 5 2005-05-13
2 1 885013 4 2005-10-19
3 1 30878 4 2005-12-26
4 2 823519 3 2004-05-03
5 2 893988 3 2005-11-17
6 2 124105 4 2004-08-05
7 2 1248629 3 2004-04-22
8 2 1842128 4 2004-05-09
9 3 2238063 3 2005-05-11
10 3 1503895 4 2005-05-19
11 3 1288844 3 2005-09-06
12 3 832109 5 2005-05-13
It hardly gets easier than this.
An error in a regular expression
You've got the NameError because of /gm in the regular expression you use to identify result.
I suppose that /gm was coppied here by mistake. In other languages this could be GLOBAL and MULTILINE match modifiers, which by the way are not needed in this case. But in the python re module they are just three character. As far as you have no line with /gm inside, your result was allways None, so the elif result: ... block was never executed and variables customerid, rating, date were not initialized.
An error in working with variables
If you remove /gm from the first matching, you'll have another problem: the variables customerid, rating, date, movie_id are just strings, so the resulting data frame will reflect only the last record of the source file.
To avoid this we have to work with them as with a list-like structure. For example, in the code below, they are keys in the data dictionary, each referring to a separate list:
file_name = ...
data = {'movie_id': [], 'customerid': [], 'rating': [], 'date': []}
with open(file_name, encoding='latin-1') as f:
for line in f:
result = re.search(r"^(\d+),(\d+),(\d{4}-\d{2}-\d{2})", line)
if re.search(r"(^\d+):", line) is not None:
movie_id = re.search(r"(^\d+):", line).group(1)
elif result:
data['movie_id'].append(movie_id)
data['customerid'].append(result.group(1))
data['rating'].append(result.group(2))
data['date'].append(result.group(3))
else:
continue
df = pd.DataFrame(data)
Code with test data
import re
import pandas as pd
data = '''\
1:
1488844,3,2005-09-06
822109,5,2005-05-13
885013,4,2005-10-19
30878,4,2005-12-26
2:
823519,3,2004-05-03
893988,3,2005-11-17
124105,4,2004-08-05
1248629,3,2004-04-22
1842128,4,2004-05-09
3:
2238063,3,2005-05-11
1503895,4,2005-05-19
1288844,3,2005-09-06
832109,5,2005-05-13
'''
file_name = "data.txt"
with open(file_name, 'tw', encoding='latin-1') as f:
f.write(data)
data = {'movie_id': [], 'customerid': [], 'rating': [], 'date': []}
with open(file_name, encoding='latin-1') as f:
for line in f:
result = re.search(r"^(\d+),(\d+),(\d{4}-\d{2}-\d{2})", line)
if re.search(r"(^\d+):", line) is not None:
movie_id = re.search(r"(^\d+):", line).group(1)
elif result:
data['movie_id'].append(movie_id)
data['customerid'].append(result.group(1))
data['rating'].append(result.group(2))
data['date'].append(result.group(3))
else:
continue
df = pd.DataFrame(data)
df.to_csv(file_name[:-3] + 'csv', index=False)
An alternative
df = pd.read_csv(file_name, names = ['customerid', 'rating', 'date'])
df.insert(0, 'movie_id', pd.NA)
isnot_movie_id = ~df['customerid'].str.endswith(':')
df['movie_id'] = df['customerid'].mask(isnot_movie_id).ffill().str[:-1]
df = df.dropna().reset_index(drop=True)

Extracting and aggregating data out of filenames in python or pandas

I have these four lists, which are the filenames of images and the filenames are in the format:
(disease)-(randomized patient ID)-(image number by this patient)
A single patient can have multiple images per disease.
See these slices below:
print(train_cnv_list[0:3])
print(train_dme_list[0:3])
print(train_drusen_list[0:3])
print(train_normal_list[0:3])
>>>
['CNV-9911627-77.jpeg', 'CNV-9935363-45.jpeg', 'CNV-9911627-94.jpeg']
['DME-8889850-2.jpeg', 'DME-8773471-3.jpeg', 'DME-8797076-11.jpeg']
['DRUSEN-8986660-50.jpeg', 'DRUSEN-9100857-3.jpeg', 'DRUSEN-9025088-5.jpeg']
['NORMAL-9490249-31.jpeg', 'NORMAL-9509694-5.jpeg', 'NORMAL-9504376-3.jpeg']
I'd like to figure out:
How many images are there per patient / per list?
Is there any overlap in the "randomized patient ID" across the four lists? If so, can I aggregate that into some kind of report (patient, disease, number of images) using something like groupby?
patient - disease1 - total number of images
- disease2 - total number of images
- disease3 - total number of images
where total number of images is a max(image number by this patient)
I did see that this yields a patient id:
train_cnv_list[0][4:11]
>>> 9911627
Thanks, in advance, for any guidance.
You can do it easily with Pandas:
import pandas as pd
cnv_list=['CNV-9911627-77.jpeg', 'CNV-9935363-45.jpeg', 'CNV-9911627-94.jpeg']
dme_list=['DME-8889850-2.jpeg', 'DME-8773471-3.jpeg', 'DME-8797076-11.jpeg']
dru_list=['DRUSEN-8986660-50.jpeg', 'DRUSEN-9100857-3.jpeg', 'DRUSEN-9025088-5.jpeg']
nor_list=['NORMAL-9490249-31.jpeg', 'NORMAL-9509694-5.jpeg', 'NORMAL-9504376-3.jpeg']
data =[]
data.extend(cnv_list)
data.extend(dme_list)
data.extend(dru_list)
data.extend(nor_list)
df = pd.DataFrame(data, columns=["files"])
df["files"]=df["files"].str.replace ('.jpeg','')
df=df["files"].str.split('-', expand=True).rename(columns={0:"disease",1:"PatientID",2:"pictureName"})
res = df.groupby(['PatientID','disease']).apply(lambda x: x['pictureName'].count())
print(res)
Result:
PatientID disease
8773471 DME 1
8797076 DME 1
8889850 DME 1
8986660 DRUSEN 1
9025088 DRUSEN 1
9100857 DRUSEN 1
9490249 NORMAL 1
9504376 NORMAL 1
9509694 NORMAL 1
9911627 CNV 2
9935363 CNV 1
and even more now than you have a dataFrame...
Here are a few functions that might get you on the right track, but as #rick-supports-monica mentioned, this is a great use case for pandas. You'll have an easier time manipulating data.
def contains_duplicate_ids(img_list):
patient_ids = []
for image in img_list:
patient_id = image.split('.')[0].split('-')[1]
patient_ids.append(patient_id)
if len(set(patient_ids)) == len(patient_ids):
return False
return True
def get_duplicates(img_list):
patient_ids = []
duplicates = []
for image in img_list:
patient_id = image.split('.')[0].split('-')[1]
if patient_id in patient_ids:
duplicates.append(patient_id)
patient_ids.append(patient_id)
return duplicates
def count_images(img_list):
return len(set(img_list))
From get_duplicates you can use the patient IDs returned to lookup whatever you want from there. I'm not sure I completely understand the structure of the lists. It looks like {disease}-{patient_id}-{some_other_int}.jpg. I'm not sure how to add additional lookups to the functionality without understanding the input a bit more.
I mentioned pandas, but didn't mention how to use it, here's one way you could get your existing data into a dataframe:
import pandas as pd
# Sample data
train_cnv_list = ['CNV-9911627-77.jpeg', 'CNV-9935363-45.jpeg', 'CNV-9911628-94.jpeg', 'CNM-9911629-94.jpeg']
train_dme_list = ['DME-8889850-2.jpeg', 'DME-8773471-3.jpeg', 'DME-8797076-11.jpeg']
train_drusen_list = ['DRUSEN-8986660-50.jpeg', 'DRUSEN-9100857-3.jpeg', 'DRUSEN-9025088-5.jpeg']
train_normal_list = ['NORMAL-9490249-31.jpeg', 'NORMAL-9509694-5.jpeg', 'NORMAL-9504376-3.jpeg']
# Convert list to dataframe
def dataframe_from_list(img_list):
df = pd.DataFrame(img_list, columns=['filename'])
df['disease'] = [filename.split('.')[0].split('-')[0] for filename in img_list]
df['patient_id'] = [filename.split('.')[0].split('-')[1] for filename in img_list]
df['some_other_int'] = [filename.split('.')[0].split('-')[2] for filename in img_list]
return df
# Generate a dataframe for each list
cnv_df = dataframe_from_list(train_cnv_list)
dme_df = dataframe_from_list(train_dme_list)
drusen_df = dataframe_from_list(train_drusen_list)
normal_df = dataframe_from_list(train_normal_list)
# or combine them into one long dataframe
df = pd.concat([cnv_df, dme_df, drusen_df, normal_df], ignore_index=True)
Start by creating a well defined data structure, use counter in order to answer your first question.
from typing import NamedTuple
from collections import Counter,defaultdict
class FileInfo(NamedTuple):
disease:str
patient_id:str
image_id: str
l1 = ['CNV-9911627-77.jpeg', 'CNV-9935363-45.jpeg', 'CNV-9911627-94.jpeg']
l2 = ['DME-8889850-2.jpeg', 'DME-8773471-3.jpeg', 'DME-8797076-11.jpeg']
l3 = ['DRUSEN-8986660-50.jpeg', 'DRUSEN-9100857-3.jpeg', 'DRUSEN-9025088-5.jpeg']
l4 = ['NORMAL-9490249-31.jpeg', 'NORMAL-9509694-5.jpeg', 'NORMAL-9504376-3.jpeg']
lists = [l1,l2,l3,l4]
data_lists = []
for l in lists:
data_lists.append([FileInfo(*f[:-5].split('-')) for f in l])
counters = []
for l in data_lists:
counters.append(Counter(fi.patient_id for fi in l))
print(counters)
print('-----------')
cross_lists_data = dict()
for l in data_lists:
for file_info in l:
if file_info.patient_id not in cross_lists_data:
cross_lists_data[file_info.patient_id] = defaultdict(int)
cross_lists_data[file_info.patient_id][file_info.disease] += 1
print(cross_lists_data)
Start by concatenating your data
import pandas as pd
import numpy as np
train_cnv_list = ['CNV-9911627-77.jpeg', 'CNV-9935363-45.jpeg', 'CNV-9911627-94.jpeg']
train_dme_list = ['DME-8889850-2.jpeg', 'DME-8773471-3.jpeg', 'DME-8797076-11.jpeg']
train_drusen_list = ['DRUSEN-8986660-50.jpeg', 'DRUSEN-9100857-3.jpeg', 'DRUSEN-9025088-5.jpeg']
train_normal_list = ['NORMAL-9490249-31.jpeg', 'NORMAL-9509694-5.jpeg', 'NORMAL-9504376-3.jpeg']
train_data = np.array([
train_cnv_list,
train_dme_list,
train_drusen_list,
train_normal_list
])
Create a Series with the flattened array
>>> train = pd.Series(train_data.flat)
>>> train
0 CNV-9911627-77.jpeg
1 CNV-9935363-45.jpeg
2 CNV-9911627-94.jpeg
3 DME-8889850-2.jpeg
4 DME-8773471-3.jpeg
5 DME-8797076-11.jpeg
6 DRUSEN-8986660-50.jpeg
7 DRUSEN-9100857-3.jpeg
8 DRUSEN-9025088-5.jpeg
9 NORMAL-9490249-31.jpeg
10 NORMAL-9509694-5.jpeg
11 NORMAL-9504376-3.jpeg
dtype: object
Use Series.str.extract together with regex to extract the information from the filenames and separate it into different columns
>>> pat = '(?P<Disease>\w+)-(?P<Patient_ID>\d+)-(?P<IMG_ID>\d+).jpeg'
>>> train = train.str.extract(pat)
>>> train
Disease Patient_ID IMG_ID
0 CNV 9911627 77
1 CNV 9935363 45
2 CNV 9911627 94
3 DME 8889850 2
4 DME 8773471 3
5 DME 8797076 11
6 DRUSEN 8986660 50
7 DRUSEN 9100857 3
8 DRUSEN 9025088 5
9 NORMAL 9490249 31
10 NORMAL 9509694 5
11 NORMAL 9504376 3
Finally, aggregate the data and compute the total number of images per group based on the maximum IMG_ID number.
>>> report = train.groupby(["Patient_ID","Disease"])['IMG_ID'].agg(Total_IMGs="max")
>>> report
Total_IMGs
Patient_ID Disease
8773471 DME 3
8797076 DME 11
8889850 DME 2
8986660 DRUSEN 50
9025088 DRUSEN 5
9100857 DRUSEN 3
9490249 NORMAL 31
9504376 NORMAL 3
9509694 NORMAL 5
9911627 CNV 94
9935363 CNV 45

I can't correctly visualize a json dataframe from api

I am currently trying to read some data from a public API. It has different ways of reading (json, csv, txt, among others), just change the label in the url (/ json, / csv, / txt ...). The url is as follows:
https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/
https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/json/
...
My problem is that when trying to import into the Pandas dataframe it doesn't read the data correctly. I am trying the following alternatives:
import pandas as pd
import requests
url = 'https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/json/'
r = requests.get(url)
rjson = r.json()
df= json_normalize(rjson)
df['periods']
Also I try to read the data in csv format:
import pandas as pd
import requests
url = 'https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/'
collisions = pd.read_csv(url, sep='<br>')
collisions.head()
But I don't get good results; the dataframe cannot be visualized correctly since the 'periods' column is grouped with all the values ...
the output is displayed as follows:
all data appears as columns: /
Here is an example of how the data is displayed correctly:
What alternative do you recommend trying?
Thank you in advance for your time and help !!
I will be attentive to your answers, regards!
For csv you can use StringIO from io package
In [20]: import requests
In [21]: res = requests.get("https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/")
In [22]: import pandas as pd
In [23]: import io
In [24]: df = pd.read_csv(io.StringIO(res.text.strip().replace("<br>","\n")), engine='python')
In [25]: df
Out[25]:
Mes/Año Tipo de cambio - promedio del periodo (S/ por US$) - Bancario - Promedio
0 Jul.2018 3.276595
1 Ago.2018 3.288071
2 Sep.2018 3.311325
3 Oct.2018 3.333909
4 Nov.2018 3.374675
5 Dic.2018 3.364026
6 Ene.2019 3.343864
7 Feb.2019 3.321475
8 Mar.2019 3.304690
9 Abr.2019 3.303825
10 May.2019 3.332364
11 Jun.2019 3.325650
12 Jul.2019 3.290214
13 Ago.2019 3.377560
14 Sep.2019 3.357357
15 Oct.2019 3.359762
16 Nov.2019 3.371700
17 Dic.2019 3.355190
18 Ene.2020 3.327364
19 Feb.2020 3.390350
20 Mar.2020 3.491364
21 Abr.2020 3.397500
22 May.2020 3.421150
23 Jun.2020 3.470167
erh, sorry couldnt find the link for the read json with multiple objects inside it. the thing is we cant use load/s for this kind of format. so have to use raw_decode() instead
this code should work
import pandas as pd
import json
import urllib.request as ur
from pprint import pprint
d = json.JSONDecoder()
url = 'https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/json/'
#reading and transforming json into list of dictionaries
data = []
with ur.urlopen(url) as json_file:
x = json_file.read().decode() # decode to convert bytes string into normal string
while True:
try:
j, n = d.raw_decode(x)
except ValueError:
break
#print(j)
data.append(j)
x = x[n:]
#pprint(data)
#creating list of dictionaries to convert into dataframe
clean_list = []
for i, d in enumerate(data[0]['periods']):
dict_data = {
"month_year": d['name'],
"value": d['values'][0],
}
clean_list.append(dict_data)
#print(clean_list)
#pd.options.display.width = 0
df = pd.DataFrame(clean_list)
print(df)
result
month_year value
0 Jul.2018 3.27659523809524
1 Ago.2018 3.28807142857143
2 Sep.2018 3.311325
3 Oct.2018 3.33390909090909
4 Nov.2018 3.374675
5 Dic.2018 3.36402631578947
6 Ene.2019 3.34386363636364
7 Feb.2019 3.321475
8 Mar.2019 3.30469047619048
9 Abr.2019 3.303825
10 May.2019 3.33236363636364
11 Jun.2019 3.32565
12 Jul.2019 3.29021428571428
13 Ago.2019 3.37756
14 Sep.2019 3.35735714285714
15 Oct.2019 3.3597619047619
16 Nov.2019 3.3717
17 Dic.2019 3.35519047619048
18 Ene.2020 3.32736363636364
19 Feb.2020 3.39035
20 Mar.2020 3.49136363636364
21 Abr.2020 3.3975
22 May.2020 3.42115
23 Jun.2020 3.47016666666667
if I somehow found the link again, I'll edit/comment my answer

Not able to extract data into Pandas dataframe in correct format

I am trying to extract data from API and write into a Pandas Dataframe so that I can do some transformations .
import requests
headers = {
'Authorization': 'Api-Key',
}
params = (
('locodes', 'PLWRO,DEHAM'),
)
response = requests.get('https://api.xxx.com/weather/v1/forecasts', headers=headers, params=params)
The result of the API Call
response.text
'{"results":[{"place":{"type":"locode","value":"PLWRO"},"measures":[{"ts":1571896800000,"t2m":10.72,"t_min":10.53,"t_max":11.99,"wspd":8,"dir":"SE","wgust":12,"rh2m":87,"prsmsl":1012,"skcover":"clear","precip":0.0,"snowd":0,"thunderstorm":"N","fog":"H"}]},{"place":{"type":"locode","value":"DEHAM"},"measures":[{"ts":1571896800000,"t2m":10.79,"t_min":10.3,"t_max":10.9,"wspd":13,"dir":"ESE","wgust":31,"rh2m":97,"prsmsl":1008,"skcover":"partly_cloudy","precip":0.0,"snowd":0,"thunderstorm":"N","fog":"H"}]}]}'
When Try to into pandas dataframe its not coming in the correct format.
import pandas as pd
import io
urlData = response.content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
Current Output
How can I have values populating correctly under each header.
Expected format
First convert json to dictionaries, then is necessary some processing for add locode to measures, merge dictionaries, append them to list and last call DataFrame constructor:
import json
d = json.loads(response.text)
out = []
for x in d['results']:
t = x['place']['type']
v = x['place']['value']
for y in x['measures']:
y = {**{t:v}, **y}
out.append(y)
#print (out)
df = pd.DataFrame(out)
print (df)
locode ts t2m t_min t_max wspd dir wgust rh2m prsmsl \
0 PLWRO 1571896800000 10.72 10.53 11.99 8 SE 12 87 1012
1 DEHAM 1571896800000 10.79 10.30 10.90 13 ESE 31 97 1008
skcover precip snowd thunderstorm fog
0 clear 0.0 0 N H
1 partly_cloudy 0.0 0 N H
you can process use module Abstract Syntax Tree (import ast) to convert string to Python dictionary. You can read more about a user-case of ast at this StackOverflow post
In your case I would do:
import ast
import pandas as pd
response = '{"results":[{"place":{"type":"locode","value":"PLWRO"},"measures":[{"ts":1571896800000,"t2m":10.72,"t_min":10.53,"t_max":11.99,"wspd":8,"dir":"SE","wgust":12,"rh2m":87,"prsmsl":1012,"skcover":"clear","precip":0.0,"snowd":0,"thunderstorm":"N","fog":"H"}]},{"place":{"type":"locode","value":"DEHAM"},"measures":[{"ts":1571896800000,"t2m":10.79,"t_min":10.3,"t_max":10.9,"wspd":13,"dir":"ESE","wgust":31,"rh2m":97,"prsmsl":1008,"skcover":"partly_cloudy","precip":0.0,"snowd":0,"thunderstorm":"N","fog":"H"}]}]}'
# convert response to python dict
response_to_dict = ast.literal_eval(response)
# convert response_to_dict into pandas DataFrame
df = pd.DataFrame(response_to_dict['results'][0]['measures'])
Output:
|---|---|------|------|----|---|-----|------------|----|-----|----|
|dir|fog|precip|prsmsl|rh2m|...|t_min|thunderstorm|ts |wgust|wspd|
|SE |H |0.0 |1012 | 87 |...|10.53|N |15..|12 | 8 |
|---|---|------|------|----|---|-----|------------|----|-----|----|

Unable to store output in a customized manner in dataframe

I've created a script in python to parse some urls and store them in a dataframe. My script can do it. However, it doesn't do the way I expect.
I've tried with:
import requests
from bs4 import BeautifulSoup
import pandas as pd
base = 'http://opml.radiotime.com/Search.ashx?query=kroq'
linklist = []
r = requests.get(base)
soup = BeautifulSoup(r.text,"xml")
for item in soup.select("outline[type='audio'][URL]"):
find_match = base.split("=")[-1].lower()
if find_match in item['text'].lower():
linklist.append(item['URL'])
df = pd.DataFrame(linklist, columns=[find_match])
print(df)
Current output:
0 http://opml.radiotime.com/Tune.ashx?id=s35105
1 http://opml.radiotime.com/Tune.ashx?id=s26581
2 http://opml.radiotime.com/Tune.ashx?id=t122458...
3 http://opml.radiotime.com/Tune.ashx?id=t132149...
4 http://opml.radiotime.com/Tune.ashx?id=t131867...
5 http://opml.radiotime.com/Tune.ashx?id=t120569...
6 http://opml.radiotime.com/Tune.ashx?id=t125126...
7 http://opml.radiotime.com/Tune.ashx?id=t131068...
8 http://cdn-cms.tunein.com/service/Audio/nostre...
9 http://cdn-cms.tunein.com/service/Audio/notcom...
Expected output (I wish to kick out the indices as well if possible):
0 http://opml.radiotime.com/Tune.ashx?id=s35105
1 http://opml.radiotime.com/Tune.ashx?id=s26581
2 http://opml.radiotime.com/Tune.ashx?id=t122458
3 http://opml.radiotime.com/Tune.ashx?id=t132149
4 http://opml.radiotime.com/Tune.ashx?id=t131867
5 http://opml.radiotime.com/Tune.ashx?id=t120569
6 http://opml.radiotime.com/Tune.ashx?id=t125126
7 http://opml.radiotime.com/Tune.ashx?id=t131068
8 http://cdn-cms.tunein.com/service/Audio/nostre
9 http://cdn-cms.tunein.com/service/Audio/notcom
You can align. To get rid of index drop it when writing to csv
df.style.set_properties(**{'text-align': 'left'})
df.to_csv(r'Data.csv', sep=',', encoding='utf-8-sig',index = False )

Categories

Resources