I'm trying to add a new row at the top of my existing dataframe (df_PRED). Data are coming from a json. The keys of the json (df_NEW) have exactly the same naming like the columns in the existing dataframe.
df_NEW = pd.read_json(dataJSON, lines=True)
df_PRED[-1] = df_NEW
Error: Wrong number of items passed 36, placement implies 1
What's going wrong? Thank you for your hints.
You can concatenate df_PRED and df_NEW:
df_PRED = pd.concat([df_NEW,df_PRED])
Related
I am receiving a nested dictionary as a response to an API call. I tried converting it into a dataframe but I am not able to get the output I want.
I wrote some code to handle the file, but I have a massive chunk of nested dictionary data in the "items" columns. How do I parse that and create a dataframe from it?
df1 = pd.json_normalize(response.json())
df.to_csv('file1.csv')
This is the csv file I was able to generate:
https://drive.google.com/file/d/1wg0QqkFmIpv_aUYefbrQxBMz_x4hRWMX/view?usp=share_link (check the items column)
I tried the json_normalize and flatdict route among the other json/dict to df answers on stackoverflow as well but those did not work.
Any help is appreciated.
you can use:
df=df.explode('items')
mask=pd.json_normalize(df.pop('items'))
df=df.join(mask)
There are two columns left to convert.
print(df[['tags','productConfiguration.allowedOrderQuantities']])
'''
tags productConfiguration.allowedOrderQuantities
0 [popular, onsale] []
0 [popular, onsale] []
0 [popular, onsale] []
0 [popular, onsale] []
'''
explode this to the new rows:
df=df.explode('tags').explode('productConfiguration.allowedOrderQuantities').drop_duplicates()
but there is a situation. After this operation we have 2 new rows. This means that each row will be repeated 2 times. If there are 100 rows in the dataset there will now be 200 rows because we have converted the json strings into columns and rows.
For a more general explode method:
explode_cols=[]
for i in df.columns:
if type(df[i][0])==list: #check column value is a list or not ?
exploded_cols.append(i) # if type is list append column name to explode_cols
df=df.explode(explode_cols) #explode df with given column list.
In one of the code snippet, the authors provide the input as:
variants = [ 'rs425277', 'rs1571149', 'rs1240707', 'rs1240708', 'rs873927', 'rs880051', 'rs1878745', 'rs2296716', 'rs2298217', 'rs2459994' ]
However I have similar values as one of the column in csv format. I would like to know how I can supply one of the column as input similar to above example?
Thanks in advance
First, import your csv as a Pandas df.
df = pd.read_csv('data.csv')
Then, you can get a list from pandas dataframe column:
col_one_list = df['column_one'].tolist()
sorry that might be very simple question but I am new to python/json and everything. I am trying to filter my twitter json data set based on user_location/country_code/gb. but I have no idea how to do this. I have tried several ways but still no chance. I have attached my data set and some codes I have used here. I would appreciate any help.
here is what I did to get the best result however I do not know how to tell it to go for whole data set and print out the result of tweet_id:
import json
import pandas as pd
df = pd.read_json('example.json', lines=True)
if df['user_location'][4]['country_code'] == 'th':
print (df.tweet_id[4])
else:
print('false')
this code show me the tweet_id : 1223489829817577472
however, I couldn't extend it to the whole data set.
I have tried theis code as well, still no chance:
dataset = df[df['user_location'].isin([ "gb" ])].copy()
print (dataset)
that is what my data set looks like:
I would break the user_location column into multiple columns using the following
df = pd.concat([df, df.pop('user_location').apply(pd.Series)], axis=1)
Running this should give you a column each for the keys contained within the user_location json. Then it should be easy to print out tweet_ids based on country_code using:
df[df['country_code']=='th']['tweet_id']
An explanation of what is actually happening here:
df.pop('user_location') removes the 'user_location' column from df and returns it at the same time
With the returned column, we use the .apply method to apply a function to the column
pd.Series converts the JSON data/dictionary into a DataFrame
pd.concat concatenates the original df (now without the 'user_location' column) with the new columns created from the 'user_location' data
I am trying to create a dataframe where the column lengths are not equal. How can I do this?
I was trying to use groupby. But I think this will not be the right way.
import pandas as pd
data = {'filename':['file1','file1'], 'variables':['a','b']}
df = pd.DataFrame(data)
grouped = df.groupby('filename')
print(grouped.get_group('file1'))
Above is my sample code. The output of which is:
What can I do to just have one entry of 'file1' under 'filename'?
Eventually I need to write this to a csv file.
Thank you
If you only have one entry in a column the other will be NaN. So you could just filter the NaNs by doing something like df = df.at[df["filename"].notnull()]
I'm a newbie for programming and python, so I would appreciate your advice!
I have a dataframe like this.
In 'info' column, there are 7 different categories: activities, locations, groups, skills, sights, types and other. and each categories have unique values within [ ].(ie,"activities":["Tour"])
I would like to split 'info' column into 7 different columns based on each category as shown below.
I would like to allocate appropriate column names and also put corresponding unique strings within [ ] to each row.
Is there any easy way to split dataframe like that?
I was thinking to use str.split functions to split into pieces and merge everthing later. But not sure that is the best way to go and I wanted to see if there is more sophisticated way to make a dataframe like this.
Any advice is appreciated!
--UPDATE--
When print(dframe['info']), it shows like this.
It looks like the content of the info column is JSON-formatted, so you can parse that into a dict object easily:
>>> import json
>>> s = '''{"activities": ["Tour"], "locations": ["Tokyo"], "groups": []}'''
>>> j = json.loads(s)
>>> j
{u'activities': [u'Tour'], u'locations': [u'Tokyo'], u'groups': []}
Once you have the data as a dict, you can do whatever you like with it.
Ok, here is how to do it :
import pandas as pd
import ast
#Initial Dataframe is df
mylist = list(df['info'])
mynewlist = []
for l in mylist:
mynewlist.append(ast.literal_eval(l))
df_info = pd.DataFrame(mynewlist)
#Add columns of decoded info to the initial dataset
df_new = pd.concat([df,df_info],axis=1)
#Remove the column info
del df_new['info']
You can use the json library to do that.
1) import the json libray
import json
2) Turn into string all the rows of that column and then Apply the json.loads function to all of them. Insert the result in an object
jsonO = df['info'].map(str).apply(json.loads)
3)The Json object is now a json dataframe in which you can navigate. For each columns of your Json dataframe, create a column in your final dataframe
df['Activities'] = jsonO.apply(lambda x: x['Activities'])
Here for one column of your json dataframe each 'rows' is dump in the new column of your final dataframe df
4) Re-do 3 for all the columns you're interested in