Pandas-Python : How do you write new lines in Pandas? - python

I'm trying to save a list of JSON output from API's GET requests into CSV file using Pandas but below codes only generates single entry, it doesn't create new lines.
sample JSON output :
ID : 27980
Title : ELSVIOS 6 Colors Boho Split Long <font><b>Dress</b></font> Fashion Women O-Neck Maxi <font><b>Dress</b></font> Summer Short Sleeve Solid <font><b>Dress</b></font> With Belt Vestidos XS-3XL32815751265US
Price : $10.32US
Sale Price :$10.32
for resultsget in getlistproductsx:
producturls = resultsget['productTitle']
productids = resultsget['productId']
originalprices = resultsget['originalPrice']
saleprices = resultsget['salePrice']
print(producturls + str(productids) + originalprices + saleprices)
raw_data = {'product_title': [producturls],
'product_id': [productids],
'original_price': [originalprices],
'sale_price': [saleprices]}
df = pd.DataFrame(raw_data, columns = ['product_title', 'product_id', 'original_price', 'sale_price'])
df.to_csv('example2.csv')

As kosist said, you're overwriting your CSV File.
Create a second DataFrame to which you will append the data you imported in the loop.
import pandas as pd
cols = ['product_title', 'product_id', 'original_price', 'sale_price']
df = pd.DataFrame(columns=cols)
for resultsget in getlistproductsx:
producturls = resultsget['productTitle']
productids = resultsget['productId']
originalprices = resultsget['originalPrice']
saleprices = resultsget['salePrice']
print(producturls + str(productids) + originalprices + saleprices)
raw_data = {'product_title': [producturls],
'product_id': [productids],
'original_price': [originalprices],
'sale_price': [saleprices]}
# create second DataFrame to which the data is added
df2 = pd.DataFrame(raw_data, columns=cols)
# append the newly created DataFrame to the one keeping the data
df = df.append(df2)
# then write the DataFrame to csv
df.to_csv('csv.csv')

You probably want to load all your lines into a pandas DataFrame and after that do to_csv like:
import pandas as pd
df = pd.DataFrame(getlistproductsx)
df.to_csv('csv.csv')

Related

How to multipy values splited from str in DataFrame, Python?

For example DataFrame:
import pandas as pd
df = pd.DataFrame.from_dict({
'art1':['n1','n2'],
'sizes':['35 36 37', '36 38']
})
print (df)
# need that
df_result = pd.DataFrame.from_dict({
'art1':['n1','n1','n1','n2','n2'],
'sizes':[35,36,37,36,38]
})
print (df_result)
BELOW IS CORRECT BUT NOT EFFICIENT DECISION !!!
lst_art = []
lst_sizes = [x.split() for x in df['sizes']]
for i in range(len(lst_sizes)):
for j in range(len(lst_sizes[i])):
lst_art.append(df['art1'][i])
lst_sizes = sum(lst_sizes, [])
df = pd.DataFrame({'art1':lst_art, 'sizes':lst_sizes})
print (df)
any pandas efficient way to get df_result from df?
You can first split the string column into a list and then you can explode each item in the list into a new row
df = pd.DataFrame.from_dict({
'art1':['n1','n2'],
'sizes':['35 36 37', '36 38']
})
# convert str to list
df['sizes'] = df['sizes'].str.split()
# create one new row per item in list of `sizes`
df_result = df.explode('sizes')
or you can do an overly powerful one liner
df.assign(sizes=df['sizes'].str.split()).explode('sizes')

how can i display json array to python dataframe

I have a json file.
[
{
'orderId': 1811,
'deliveryId': '000001811-1634732661563000',
'shippingBook': '[{"qtyOrdered":1,"bookNoList":["B8303-V05","B8304-V05","B8305-V05","B8306-V05","B8307-V05"],"courseCode":"A8399-S26"},{"courseCode":"A1399-S70","qtyOrdered":1,"bookNoList":["B1301-V06","B1302-V06","B1303-V06","B1304-V06","B1305-1-V06","B1305-2-V06","B1306-V06","B1307-V06"]}]',
}
]
but how can i display in dataframe in format
thank you
You have string in 'shippingBook' which may need json.loads() to convert it to Python's list with dictionaries.
And you could use normal for-loops to convert all data to normal list with expected data - and later convert it to DataFrame
import json
import pandas as pd
data = [
{
'orderId': 1811,
'deliveryId': '000001811-1634732661563000',
'shippingBook': '[{"qtyOrdered":1,"bookNoList":["B8303-V05","B8304-V05","B8305-V05","B8306-V05","B8307-V05"],"courseCode":"A8399-S26"},{"courseCode":"A1399-S70","qtyOrdered":1,"bookNoList":["B1301-V06","B1302-V06","B1303-V06","B1304-V06","B1305-1-V06","B1305-2-V06","B1306-V06","B1307-V06"]}]',
}
]
# --- organize data ---
all_rows = []
for order in data:
order_id = order['orderId']
delivery_id = order['deliveryId']
for book in json.loads(order['shippingBook']):
row = [order_id, delivery_id, book['courseCode'], book['bookNoList']]
#print(row)
all_rows.append(row)
# --- convert to DataFrame ---
df = pd.DataFrame(all_rows, columns=['orderId', 'deliveryId', 'courseCode', 'bookNoList'])
print(df.to_string()) # `to_string()` to display all data without `...`
Result:
orderId deliveryId courseCode bookNoList
0 1811 000001811-1634732661563000 A8399-S26 [B8303-V05, B8304-V05, B8305-V05, B8306-V05, B8307-V05]
1 1811 000001811-1634732661563000 A1399-S70 [B1301-V06, B1302-V06, B1303-V06, B1304-V06, B1305-1-V06, B1305-2-V06, B1306-V06, B1307-V06]
EDIT:
You may also try do the same directly in DataFrame.
It needs explode to split list into rows
import json
import pandas as pd
data = [
{
'orderId': 1811,
'deliveryId': '000001811-1634732661563000',
'shippingBook': '[{"qtyOrdered":1,"bookNoList":["B8303-V05","B8304-V05","B8305-V05","B8306-V05","B8307-V05"],"courseCode":"A8399-S26"},{"courseCode":"A1399-S70","qtyOrdered":1,"bookNoList":["B1301-V06","B1302-V06","B1303-V06","B1304-V06","B1305-1-V06","B1305-2-V06","B1306-V06","B1307-V06"]}]',
}
]
#df = pd.DataFrame.from_records(data)
df = pd.DataFrame(data)
# convert string to list with dictionares
df['shippingBook'] = df['shippingBook'].apply(json.loads)
# split list `'shippingBook'` into rows
df = df.explode('shippingBook')
df = df.reset_index()
del df['index']
# split elements into columns
#df['courseCode'] = df['shippingBook'].apply(lambda item:item['courseCode'])
#df['bookNoList'] = df['shippingBook'].apply(lambda item:item['bookNoList'])
df['courseCode'] = df['shippingBook'].str['courseCode'] # unexpected behaviour for string functions `.str`
df['bookNoList'] = df['shippingBook'].str['bookNoList'] # unexpected behaviour for string functions `.str`
# remove `'shippingBook'`
del df['shippingBook']
print(df.to_string())
And the same with apply(pd.Series) to convert list into columns.
import json
import pandas as pd
data = [
{
'orderId': 1811,
'deliveryId': '000001811-1634732661563000',
'shippingBook': '[{"qtyOrdered":1,"bookNoList":["B8303-V05","B8304-V05","B8305-V05","B8306-V05","B8307-V05"],"courseCode":"A8399-S26"},{"courseCode":"A1399-S70","qtyOrdered":1,"bookNoList":["B1301-V06","B1302-V06","B1303-V06","B1304-V06","B1305-1-V06","B1305-2-V06","B1306-V06","B1307-V06"]}]',
}
]
#df = pd.DataFrame.from_records(data)
df = pd.DataFrame(data)
# convert string to list with dictionares
df['shippingBook'] = df['shippingBook'].apply(json.loads)
# split list `'shippingBook'` into rows
df = df.explode('shippingBook')
df = df.reset_index()
del df['index']
# split elements into columns
new_columns = df['shippingBook'].apply(pd.Series)
#df[['qtyOrdered', 'bookNoList', 'courseCode']] = new_columns
#del df['qtyOrdered']
#df[['bookNoList', 'courseCode']] = new_columns[['bookNoList', 'courseCode']]
df = df.join(new_columns[['bookNoList', 'courseCode']])
# remove `'shippingBook'`
del df['shippingBook']
print(df.to_string())

How to read this JSON file in Python?

I'm trying to read such a JSON file in Python, to save only two of the values of each response part:
{
"responseHeader":{
"status":0,
"time":2,
"params":{
"q":"query",
"rows":"2",
"wt":"json"}},
"response":{"results":2,"start":0,"docs":[
{
"name":["Peter"],
"country":["England"],
"age":["23"]},
{
"name":["Harry"],
"country":["Wales"],
"age":["30"]}]
}}
For example, I want to put the name and the age in a table. I already tried it this way (based on this topic), but it's not working for me.
import json
import pandas as pd
file = open("myfile.json")
data = json.loads(file)
columns = [dct['name', 'age'] for dct in data['response']]
df = pd.DataFrame(data['response'], columns=columns)
print(df)
I also have seen more solutions of reading a JSON file, but that all were solutions of a JSON file with no other header values at the top, like responseHeader in this case. I don't know how to handle that. Anyone who can help me out?
import json
with open("myfile.json") as f:
columns = [(dic["name"],dic["age"]) for dic in json.load(f)["response"]["docs"]]
print(columns)
result:
[(['Peter'], ['23']), (['Harry'], ['30'])]
You can pass the list data["response"]["docs"] to pandas directly as it's a recordset.
df = pd.DataFrame(data["response"]["docs"])`
print(df)
>>> name country age
0 [Peter] [England] [23]
1 [Harry] [Wales] [30]
The data in you DatFrame will be bracketed though as you can see. If you want to remove the brackets you can consider the following:
for column in df.columns:
df.loc[:, column] = df.loc[:, column].str.get(0)
if column == 'age':
df.loc[:, column] = df.loc[:, column].astype(int)
sample = {"responseHeader":{
"status":0,
"time":2,
"params":{
"q":"query",
"rows":"2",
"wt":"json"}},
"response":{"results":2,"start":0,"docs":[
{
"name":["Peter"],
"country":["England"],
"age":["23"]},
{
"name":["Harry"],
"country":["Wales"],
"age":["30"]}]
}}
data = [(x['name'][0], x['age'][0]) for x in
sample['response']['docs']]
df = pd.DataFrame(names, columns=['name',
'age'])

How can we represent a pandas.series value on Django?

I have the following code, where I am binning a Pandas dataframe into given number of bins:
def contibin(data, target, bins=10):
#Empty Dataframe
newDF,woeDF = pd.DataFrame(), pd.DataFrame()
#Extract Column Names
cols = data.columns
for ivars in cols[~cols.isin([target])]:
if (data[ivars].dtype.kind in 'bifc') and (len(np.unique(data[ivars]))>10):
binned_x = pd.qcut(data[ivars], bins, duplicates='drop')
d0 = pd.DataFrame({'x': binned_x, 'y': data[target]})
#print(d0)
else:
d0 = pd.DataFrame({'x': data[ivars], 'y': data[target]})
d = d0.groupby("x", as_index=False).agg({"y": ["count", "sum"]})
d.columns = ['Range', 'Total', 'No. of Good']
d['No. of Bad'] = d['Total'] - d['No. of Good']
d['Dist. of Good'] = np.maximum(d['No. of Good'], 0.5) / d['No. of Good'].sum()
d['Dist. of Bad'] = np.maximum(d['No. of Bad'], 0.5) / d['No. of Bad'].sum()
d['WoE'] = np.log(d['Dist. of Good']/d['Dist. of Bad'])
d['IV'] = d['WoE'] * (d['Dist. of Good'] - d['Dist. of Bad'])
#temp =pd.DataFrame({"Variable" : [ivars], "IV" : [d['IV'].sum()]}, columns = ["Variable", "IV"])
#newDF=pd.concat([newDF,temp], axis=0)
woeDF=pd.concat([woeDF,d], axis=0)
return woeDF
The problem I am facing is when I try to integrate the code on front end using Django, I am not being able to represent woeDF['Range'] in Django the way I am able to see it normally. I tried converting the Pandas.Series to string, but it still isn't giving me what I want. To illustrate what I want to see in my frontend, I am attaching a picture of a sample table which I got by running this code on the Churn modelling Dataset.The image of the table I need
You can turn the Dataframe in an array of objects using DataFrame.itertuples(index=False)
you will then be able to iterate through the dataframe in Jinja by accessing the columns via their names. See the below example in Python:
import pandas as pd
columns = {"name": ["john", "skip", "abu", "harry", "ben"],
"age": [10, 20, 30, 40, 50]}
df = pd.DataFrame(columns)
print(df)
df_objects = df.itertuples(index=False)
for person in df_objects:
print("{0}: {1}".format(person.name, person.age))

How to Merge a list of Multiple DataFrames and Tag each Column with a another list

I have a lisit of DataFrames that come from the census api, i had stored each year pull into a list.
So at the end of my for loop i have a list with dataframes per year and a list of years to go along side the for loop.
The problem i am having is merging all the DataFrames in the list while also taging them with a list of years.
So i have tried using the reduce function, but it looks like it only taking 2 of the 6 Dataframes i have.
concat just adds them to the dataframe with out tagging or changing anything
# Dependencies
import pandas as pd
import requests
import json
import pprint
import requests
from census import Census
from us import states
# Census
from config import (api_key, gkey)
year = 2012
c = Census(api_key, year)
for length in range(6):
c = Census(api_key, year)
data = c.acs5.get(('NAME', "B25077_001E","B25064_001E",
"B15003_022E","B19013_001E"),
{'for': 'zip code tabulation area:*'})
data_df = pd.DataFrame(data)
data_df = data_df.rename(columns={"NAME": "Name",
"zip code tabulation area": "Zipcode",
"B25077_001E":"Median Home Value",
"B25064_001E":"Median Rent",
"B15003_022E":"Bachelor Degrees",
"B19013_001E":"Median Income"})
data_df = data_df.astype({'Zipcode':'int64'})
filtervalue = data_df['Median Home Value']>0
filtervalue2 = data_df['Median Rent']>0
filtervalue3 = data_df['Median Income']>0
cleandata = data_df[filtervalue][filtervalue2][filtervalue3]
cleandata = cleandata.dropna()
yearlst.append(year)
datalst.append(cleandata)
year += 1
so this generates the two seperate list one with the year and other with dataframe.
So my output came out to either one Dataframe with missing Dataframe entries or it just concatinated all without changing columns.
what im looking for is how to merge all within a list, but datalst[0] to be tagged with yearlst[0] when merging if at all possible
No need for year list, simply assign year column to data frame. Plus avoid incrementing year and have it the iterator column. In fact, consider chaining your process:
for year in range(2012, 2019):
c = Census(api_key, year)
data = c.acs5.get(('NAME', "B25077_001E","B25064_001E", "B15003_022E","B19013_001E"),
{'for': 'zip code tabulation area:*'})
cleandata = (pd.DataFrame(data)
.rename(columns={"NAME": "Name",
"zip code tabulation area": "Zipcode",
"B25077_001E": "Median_Home_Value",
"B25064_001E": "Median_Rent",
"B15003_022E": "Bachelor_Degrees",
"B19013_001E": "Median_Income"})
.astype({'Zipcode':'int64'})
.query('(Median_Home_Value > 0) & (Median_Rent > 0) & (Median_Income > 0)')
.dropna()
.assign(year_column = year)
)
datalst.append(cleandata)
final_data = pd.concat(datalst, ignore_index = True)

Categories

Resources