DataFrame constructor not properly called! error - python

I am new to Python and I am facing problem in creating the Dataframe in the format of key and value i.e.
data = [{'key':'\[GlobalProgramSizeInThousands\]','value':'1000'},]
Here is my code:
columnsss = ['key','value'];
query = "select * from bparst_tags where tag_type = 1 ";
result = database.cursor(db.cursors.DictCursor);
result.execute(query);
result_set = result.fetchall();
data = "[";
for row in result_set:
`row["tag_expression"]`)
data += "{'value': %s , 'key': %s }," % ( `row["tag_expression"]`, `row["tag_name"]` )
data += "]" ;
df = DataFrame(data , columns=columnsss);
But when I pass the data in DataFrame it shows me
pandas.core.common.PandasError: DataFrame constructor not properly called!
while if I print the data and assign the same value to data variable then it works.

You are providing a string representation of a dict to the DataFrame constructor, and not a dict itself. So this is the reason you get that error.
So if you want to use your code, you could do:
df = DataFrame(eval(data))
But better would be to not create the string in the first place, but directly putting it in a dict. Something roughly like:
data = []
for row in result_set:
data.append({'value': row["tag_expression"], 'key': row["tag_name"]})
But probably even this is not needed, as depending on what is exactly in your result_set you could probably:
provide this directly to a DataFrame: DataFrame(result_set)
or use the pandas read_sql_query function to do this for you (see docs on this)

Just ran into the same error, but the above answer could not help me.
My code worked fine on my computer which was like this:
test_dict = {'x': '123', 'y': '456', 'z': '456'}
df=pd.DataFrame(test_dict.items(),columns=['col1','col2'])
However, it did not work on another platform. It gave me the same error as mentioned in the original question. I tried below code by simply adding the list() around the dictionary items, and it worked smoothly after:
df=pd.DataFrame(list(test_dict.items()),columns=['col1','col2'])
Hopefully, this answer can help whoever ran into a similar situation like me.

import json
# Opening JSON file
f = open('data.json')
# returns JSON object as
# a dictionary
data1 = json.load(f)
#converting it into dataframe
df = pd.read_json(data1, orient ='index')

Related

I am having issues with my code working properly and im stuck

I am having a problem with my code and getting it to work. Im not sure if im sorting this correctly. I am trying to sort with out lambda pandas or itemgetter.
Here is my code that I am having issues with.
with open('ManufacturerList.csv', 'r') as man_list:
ml = csv.reader(man_list, delimiter=',')
for row in ml:
manufacturerList.append(row)
print(row)
with open('PriceList.csv', 'r') as price_list:
pl = csv.reader(price_list, delimiter=',')
for row in pl:
priceList.append(row)
print(row)
with open('ManufacturerList.csv', 'r') as service_list:
sl = csv.reader(service_list, delimiter=',')
for row in sl:
serviceList.append(row)
print(row)
new_mfl = (sorted(manufacturerList, key='None'))
new_prl = (sorted(priceList, key='None'))
new_sdl = (sorted(serviceList, key='None'))
for x in range(0, len(new_mfl)):
new_mfl[x].append(priceList[x][1])
for x in range(0, len(new_mfl)):
new_mfl[x].append(serviceList[x][1])
new_list = new_mfl
inventoryList = (sorted(list, key=1))
i have tried to use the def function to try to get it to work but i dont know if im doing it right. This is what i tried.
def new_mfl(x):
return x[0]
x.sort(key=new_mfl)
You can do it like this:
def manufacturer_key(x):
return x[0]
sorted_mfl = sorted(manufacturerList, key=manufacturer_key)
The key argument is the function that extracts the field of the CSV that you want to sort by.
sorted_mfl = sorted(manufacturerList, key=lambda x: x[0])
There are different Dialects and Formatting Parameters that allow to handle input and output of data from comma separated value files; Maybe it could be used in a way with fewer statements using the correct delimiter which depends on the type of data you handle, this would be added to using built in methods like split for string data or another method to sort and manipulate lists, for example for single column data, delimiter=',' separate data by comma and it would iterate trough each value and not as a list of lists when you call csv.reader
['9.310788653967691', '4.065746465800029', '6.6363356879192965', '7.279020237137884', '4.010297786910394']
['9.896092029283933', '7.553018448286675', '0.3268282119829197', '2.348011394854333', '3.964531054345021']
['5.078622663277619', '4.542467725728741', '3.743648062104161', '12.761916277286993', '9.164698479088221']
# out:
column1 column2 column3 column4 column5
0 4.737897984379577 6.078414943611958 2.7021438955897095 5.8736388919905895 7.878958949784588
1 4.436982168483749 3.9453563399358544 12.66647791861843 5.323017508568736 4.156777982870004
2 4.798241413768279 12.690268531982028 9.638858110105895 7.881360524434767 4.2948334000783195
This is achieved because I am using lists that contain singular values, since for columns or lists that are of the form sorted_mfl = {'First Name' : ['name', 'name', 'name'], 'Second Name ' : [...], 'ID':[...]}, new_prl = ['example', 'example', 'example'] new_sdl = [...] the data would be added by something like sorted_mfl + new_prl + new_sdl and since different modules are also used to set and manage comma separated files, you should add more information to your question like the data type you use or create a minimal reproducible example with pandas.

Why are multiple values incorrectly updated in my dynamically created nested dicts?

Dfs is a dict with dataframes and the keys are named like this: 'datav1_135_gl_b17'
We would like to calculate a matrix with constants. It should be possible to assign the values in the matrix according to the attributes from the df name. In this example '135' and 'b17'.
If you want code to create an example dfs, let me know, I've cut it out to more clearly state the problem.
We create a nested dict dynamically with the following function:
def ex_calc_time(dfs):
formats = []
grammaturs = []
for i in dfs:
# (...)
# format
split1 = i.split('_')
format = split1[-1]
format.replace(" ", "")
formats.append(format)
formats = list(set(formats))
# grammatur
# split1 = i.split('_')
grammatur = split1[-3]
grammatur.replace(" ", "")
grammaturs.append(grammatur)
grammaturs = list(set(grammaturs))
# END FLOOP
dict_mean_time = dict.fromkeys(formats, dict.fromkeys(grammaturs, ''))
return dfs, dict_mean_time
Then we try to fill the nested dict and change the values like this (which should be working according to similiar nested dict questions, but it doesn't). 'Nope' is updated for both keys:
ex_dict_mean_time['b17']['170'] = 'nope'
ex_dict_mean_time
{'a18': {'135': '', '170': 'nope', '250': ''},
'b17': {'135': '', '170': 'nope', '250': ''}}
I also tried creating a dataframe from ex_dict_mean_time and filling it with .loc, but that didn't work either (df remains empty). Moreover I tried this method, but I always end up with the same problem and the values are overwritten. I appreciate any help. If you have any improvements for my code please let me know, I welcome any opportunity to improve.

How to iterate over a CSV file with Pywikibot

I wanted to try uploading a series of items to test.wikidata, creating the item and then adding a statement of inception P571. The csv file sometimes has a date value, sometimes not. When no date value is given, I want to write out a placeholder 'some value'.
Imagine a dataframe like this:
df = {'Object': [1, 2,3], 'Date': [250,,300]}
However, I am not sure using Pywikibot how to iterate over a csv file with pywikibot to create an item for each row and add a statement. Here is the code I wrote:
import pywikibot
import pandas as pd
site = pywikibot.Site("test", "wikidata")
repo = site.data_repository()
df = pd.read_csv('experiment.csv')
item = pywikibot.ItemPage(repo)
for item in df:
date = df['date']
prop_date = pywikibot.Claim(repo, u'P571')
if date=='':
prop_date.setSnakType('somevalue')
else:
target = pywikibot.WbTime(year=date)
prop_date.setTarget(target)
item.addClaim(prop_date)
When I run this through PAWS, I get the message: KeyError: 'date'
But I think the real issue here is that I am not sure how to get Pywikibot to iterate over each row of the dataframe and create a new claim for each new date value. I would value any feedback or suggestions for good examples and documentation. Many thanks!
Looking back on this, the solution was to use .iterrows() or .itertuples() or .loc[] to access the values in the row.
So
for item in df.itertuples():
prop_date = pywikibot.Claim(repo, u'P571')
if item.date=='':
prop_date.setSnakType('somevalue')
else:
target = pywikibot.WbTime(year=date)
prop_date.setTarget(target)
item.addClaim(prop_date)

Python, how to create a table from JSON data - indexing

I am trying to create a table from JSON data. I have already used the json.dumps for my data:
this is what I am trying to export to the table:
label3 = json.dumps({'class': CLASSES[idx],"confidence": str(round(confidence * 100, 1)) + "%","startX": str(startX),"startY": str(startY),"EndX": str(endX),"EndY": str(endY),"Timestamp": now.strftime("%d/%m/%Y, %H:%M")})
I have tryied with:
val1 = json.loads(label3)
df = pd.DataFrame(val1)
print(df.T)
The system gives me an error that I must pass an index.
And also with:
val = ast.literal_eval(label3)
val1 = json.loads(json.dumps(val))
print(val1)
val2 = val1["class"][0]["confidence"][0]["startX"][0]["startY"][0]["endX"][0]["endY"][0]["Timestamp"][0]
df = pd.DataFrame(data=val2, columns=["class", "confidence", "startX", "startY", "EndX", "EndY", "Timestamp"])
print(df)
When I try this, the error it gives is that String indices mustb be integers.
How can I create the index?
Thank you,
There are two ways we can tackle this issue.
Do as directed by the error, pass the index to the dataframe function
pd.Dataframe(val1, index=list(range(number_of_rows)) # number of rows is 1 in your case.
While dumping the data using json.dumps, dump dictionary which has the mapping from key:list of values instead of key:value. For example
json.dumps({ 'class': [ CLASSES[idx] ],"confidence": [ ' some confidence ' ] })
I have shortened your given example. See I am passing values as list of values(even if it is only one value per key).

Convert to executable values in dictionary python

I have one dictionary named column_types with values as below.
column_types = {'A': 'pa.int32()',
'B': 'pa.string()'
}
I want to pass the dictionary to pyarrow read csv function as below
from pyarrow import csv
table = csv.read_csv(file_name,
convert_options=csv.ConvertOptions(column_types=column_types)
)
But it is giving an error because values in dictionary is a string.
The below statement will work without any issues.
from pyarrow import csv
table = csv.read_csv(file_name, convert_options=csv.ConvertOptions(column_types = {
'A':pa.int32(),
'B':pa.string()
}))
How can I change dictionary values to executable statements and pass it into the csv.ConvertOptions ?
There are two ways that worked for me you can use both of them however I would recommend the second one as the first one uses eval() and using it is risky in user input cases. If you are not using input string given by user you can use method 1 too.
1) USING eval()
import pyarrow as pa
column_types={}
column_types['A'] = 'pa.'+'string'+'()'
column_types['B'] = 'pa.'+'int32'+'()'
final_col_types={key:eval(val) for key,val in column_types.items()} # calling eval() to parse each string as a function and creating a new dict containing 'col':function()
from pyarrow import csv
table = csv.read_csv(filename,convert_options=csv.ConvertOptions(column_types=final_col_types))
print(table)
2) By creating a master dictionary dict_dtypes that contains the callable function name for a particular string. And further using dict_dtypes to map the string to its corresponding function.
import pyarrow as pa
column_types={}
column_types['A'] = 'pa.'+'string'+'()'
column_types['B'] = 'pa.'+'int32'+'()'
dict_dtypes={'pa.string()':pa.string(),'pa.int32()':pa.int32()} # master dict containing callable function for a string
final_col_types={key:dict_dtypes[val] for key,val in column_types.items() } # final column_types dictionary created after mapping master dict and the column_types dict
from pyarrow import csv
table = csv.read_csv(filename,convert_options=csv.ConvertOptions(column_types=final_col_types))
print(table)
Why don't we use something like this:
column_types = {'A': pa.int32(),
'B': pa.string()}
table = csv.read_csv(file_name,
convert_options=csv.ConvertOptions(column_types=column_types))

Categories

Resources