I have used for loop for text extraction from images. So i getting errors while converting list into python pandas dataframe.
info = []
for item in dirs:
if os.path.isfile(path+item):
for a in x:
img = Image.open(path+item)
crop = img.crop(a)
text = pytesseract.image_to_string(crop)
info.append(text)
df = pd.DataFrame([info], colnames=['col1','col2'])
df
Expected result: data store in dataframe row wise.
Yes list is not a list of two items. i have 14 predefined columns.
Here it is another code
for i in range(info):
df.loc[i] = [ info for n in range(14))
Please check documentation for .DataFrame
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
The line in which you create your dataframe
df = pd.DataFrame([info], colnames=['col1','col2']
Is missing parenthesis at the end, uses colnames instead of columns, has unnecessary square brackets around your list and is creating two columns where you only need one.
Please mention the exact error
There are two problems here I think.
First of all, you are passing to the DataFrame [info] although info is already a list. You can just pass this list as it is.
Now that you pass a list of items as argument, you're trying to convert the list as a DataFrame with two columns: colnames=['col1','col2']. And the keyword is columns not colnames.
I think that's the problem. You list is not a list of two-items list (like [[a, b], [c, d]]). Just use:
df = pd.DataFrame(info, columns=['col1'])
Best
Related
I am receiving a nested dictionary as a response to an API call. I tried converting it into a dataframe but I am not able to get the output I want.
I wrote some code to handle the file, but I have a massive chunk of nested dictionary data in the "items" columns. How do I parse that and create a dataframe from it?
df1 = pd.json_normalize(response.json())
df.to_csv('file1.csv')
This is the csv file I was able to generate:
https://drive.google.com/file/d/1wg0QqkFmIpv_aUYefbrQxBMz_x4hRWMX/view?usp=share_link (check the items column)
I tried the json_normalize and flatdict route among the other json/dict to df answers on stackoverflow as well but those did not work.
Any help is appreciated.
you can use:
df=df.explode('items')
mask=pd.json_normalize(df.pop('items'))
df=df.join(mask)
There are two columns left to convert.
print(df[['tags','productConfiguration.allowedOrderQuantities']])
'''
tags productConfiguration.allowedOrderQuantities
0 [popular, onsale] []
0 [popular, onsale] []
0 [popular, onsale] []
0 [popular, onsale] []
'''
explode this to the new rows:
df=df.explode('tags').explode('productConfiguration.allowedOrderQuantities').drop_duplicates()
but there is a situation. After this operation we have 2 new rows. This means that each row will be repeated 2 times. If there are 100 rows in the dataset there will now be 200 rows because we have converted the json strings into columns and rows.
For a more general explode method:
explode_cols=[]
for i in df.columns:
if type(df[i][0])==list: #check column value is a list or not ?
exploded_cols.append(i) # if type is list append column name to explode_cols
df=df.explode(explode_cols) #explode df with given column list.
I'm filtering out a big dataframe in subsequent steps, willing to temporary store filtered out ones in a list to eventually tamper with them later.
When I append the filtered dataframe to the list (i.e. temp.append(df[df.isna().any(axis=1)])), the item is stored as pandas Series, while if I assign it to the same list it appear as a dataframe (as expected):
check = []
check[0] = pdo[pdo.isnull().any(axis=1)]
check.append(pdo[pdo.isnull().any(axis=1)])
type(check[0]), type(check[1])
Out: (pandas.core.frame.DataFrame, pandas.core.series.Series)
Is your full line of code is the following?:
temp = temp.append(df[df.isna().any(axis=1)])
#^^^^^^
I am trying to insert or add from one dataframe to another dataframe. I am going through the original dataframe looking for certain words in one column. When I find one of these terms I want to add that row to a new dataframe.
I get the row by using.
entry = df.loc[df['A'] == item]
But when trying to add this row to another dataframe using .add, .insert, .update or other methods i just get an empty dataframe.
I have also tried adding the column to a dictionary and turning that into a dataframe but it writes data for the entire row rather than just the column value. So is there a way to add one specific row to a new dataframe from my existing variable ?
So the entry is a dataframe containing the rows you want to add?
you can simply concatenate two dataframe using concat function if both have the same columns' name
import pandas as pd
entry = df.loc[df['A'] == item]
concat_df = pd.concat([new_df,entry])
pandas.concat reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
The append function expect a list of rows in this formation:
[row_1, row_2, ..., row_N]
While each row is a list, representing the value for each columns
So, assuming your trying to add one row, you shuld use:
entry = df.loc[df['A'] == item]
df2=df2.append( [entry] )
Notice that unlike python's list, the DataFrame.append function returning a new object and not changing the object called it.
See also enter link description here
Not sure how large your operations will be, but from an efficiency standpoint, you're better off adding all of the found rows to a list, and then concatenating them together at once using pandas.concat, and then using concat again to combine the found entries dataframe with the "insert into" dataframe. This will be much faster than using concat each time. If you're searching from a list of items search_keys, then something like:
entries = []
for i in search_keys:
entry = df.loc[df['A'] == item]
entries.append(entry)
found_df = pd.concat(entries)
result_df = pd.concat([old_df, found_df])
I am trying to append a list looking like this
myList = ['2018-01-12', 'MMM', 'BUY', 42, 236.5229]
to an empty dataframe (with "header" / columns names).
To create the dataframe I've done the following:
tradeLog = pd.DataFrame(columns=["DATE", "TICKER", "ORDER_TYPE", "AMOUNT", "PRICE"])
I am trying to append the list as a row in the following way:
tradeLog.append(myList, ignore_index=True)
(NOTICE: My goal i to iterate over some data - a lot of lists in the same format - and the add them one by one to the dataframe)
The pandas documentation reads
DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=None)
other : DataFrame or Series/dict-like object, or list of these The
data to append.
So you need to have to transform your list prior to appending it to your DataFrame:
something that might work is to zip the list of your columns to the content of your myList so it would be:
tradeLog = pd.DataFrame(columns=["DATE", "TICKER", "ORDER_TYPE", "AMOUNT", "PRICE"])
myList = ['2018-01-12', 'MMM', 'BUY', 42, 236.5229]
myDict = dict(zip(tradeLog.columns.tolist(), myList))
tradeLog.append(myDict, ignore_index=True)
or tradeLog.append(pd.DataFrame(myDict), ignore_index=True)
This being said you need to ensure your lists are always the same length as your columns names list.
DataFrame.append() is for appending rows from an other pandas dataframe or series (see the docs).
So, if it is absolutely necessary to do this line by line, you can
tradeLog = tradeLog.append(pd.Series(myList, index=tradeLog.columns), ignore_index=True)
(N.b.: tradeLog.loc[len(tradeLog)] = ... appends to the end only as long as you have a simple integer index on tradeLog, but might break for more complex use cases.)
You might also want to consider this remark from the docs:
Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.
I have a list having Pandas Series objects, which I've created by doing something like this:
li = []
li.append(input_df.iloc[0])
li.append(input_df.iloc[4])
where input_df is a Pandas Dataframe
I want to convert this list of Series objects back to Pandas Dataframe object, and was wondering if there is some easy way to do it
Based on the post you can do this by doing:
pd.DataFrame(li)
To everyone suggesting pd.concat, this is not a Series anymore. They are adding values to a list and the data type for li is a list. So to convert the list to dataframe then they should use pd.Dataframe(<list name>).
Since the right answer has got hidden in the comments, I thought it would be better to mention it as an answer:
pd.concat(li, axis=1).T
will convert the list li of Series to DataFrame
It seems that you wish to perform a customized melting of your dataframe.
Using the pandas library, you can do it with one line of code. I am creating below the example to replicate your problem:
import pandas as pd
input_df = pd.DataFrame(data={'1': [1,2,3,4,5]
,'2': [1,2,3,4,5]
,'3': [1,2,3,4,5]
,'4': [1,2,3,4,5]
,'5': [1,2,3,4,5]})
Using pd.DataFrame, you will be able to create your new dataframe that melts your two selected lists:
li = []
li.append(input_df.iloc[0])
li.append(input_df.iloc[4])
new_df = pd.DataFrame(li)
if what you want is that those two lists present themselves under one column, I would not pass them as list to pass those list back to dataframe.
Instead, you can just append those two columns disregarding the column names of each of those columns.
new_df = input_df.iloc[0].append(input_df.iloc[4])
Let me know if this answers your question.
The answer already mentioned, but i would like to share my version:
li_df = pd.DataFrame(li).T
If you want each Series to be a row of the dataframe, you should not use concat() followed by T(), unless all your values are of the same datatype.
If your data has both numerical and string values, then the transpose() function will mangle the dtypes, likely turning them all to objects.
The right way to do this in general is:
Convert each series to a dict()
Pass the list of dicts either into the pd.Dataframe() constructor directly or use pd.Dataframe.from_dicts and set the orient keyword to "index."
In your case the following should work:
my_list_of_dicts = [s.to_dict() for s in li]
my_df = pd.Dataframe(my_list_of_dicts)