String to dataframe - python

I am developing QR code scanner to csv. I am very much new in python. I am getting this output after scanning QR 'Employee ID: 101\nEmployee Name: Abhinav Jha\nDesignation: Student\nDepartment: Mechanical'.
Can anyone help me to covert it to csv. Thanks in advance.

Well, you can convert this incoming string to a python dictionary and then you can easily write the values in a CSV file.
dict_1 = {(item.split(':')[0]).strip():(item.split(':')[1]).strip() for item in s.split('\n')}
print(dict_1)
output -
{'Employee ID': '101',
'Employee Name': 'Abhinav Jha',
'Designation': 'Student',
'Department': 'Mechanical'}
Now, if you wanna access keys then use the .values() function -
print(dict_1.values()) # will print ['101', 'Abhinav Jha', 'Student', 'Mechanical']
I'm assuming you wanna write these values in a CSV file.

Related

JSON Fields with Panda DataFrame

I'm using Python (google colb) and I have a json dataframe with some fields like:
[{'ActedBy': ['team'], 'ActedAt': '2022-03-07T22:43:46Z', 'Status': 'Completed', 'LAB': 'No'}]
I need to get the "ActedAt" in order to get the "date" how can I get this?
Thanks!
You have an array of dictionaries. First, grab a dictionary from the array by index, then proceed to get the ActedAt property. Something like this:
json = [{'ActedBy': ['team'], 'ActedAt': '2022-03-07T22:43:46Z', 'Status': 'Completed', 'LAB': 'No'}]
# index into a variable for explicit readability
index = 0
# get the date you want
date = json[index]['ActedAt']
print(date)

python convert text rows to dictionary based on conditional match

I have the below string and need help on how write an if condition in a for loop that check if the row.startswith('name') then take the value and store is in a variable called name. Similarly for dob as well.
Once the for loop completes the output should be a dictionary as below which i can convert to a pandas dataframe.
'name john\n \n\nDOB\n12/08/1984\n\ncurrent company\ngoogle\n'
This is what i have tried so far but do not know how to get the values into a dictionary
for row in lines.split('\n'):
if row.startswith('name'):
name = row.split()[-1]
Final Ouput
data = {"name":"john", "dob": "12/08/1984"}
Try using a list comprehension and split:
s = '''name
john
dob
12/08/1984
current company
google'''
d = dict([i.splitlines() for i in s.split('\n\n')])
print(d)
Output:
{'name': 'john', 'dob': '12/08/1984', 'current company': 'google'}

MemoryError in Python when saving list to dataframe

I was trying to do the following, which is to save a python list that contains json strings into a dataframe in jupyternotebook
df = pd.io.json.json_normalize(mon_list)
df[['gfmsStr','_id']]
But then I received this error:
MemoryError
Then if I run other blocks, they all start to show the memory error. I am wondering what caused this and if there is anyway I can increase the memory to avoid the error.
Thanks!
update:
what's in mon_list is like the following:
mon_list[1]
[{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}},
{'name': {'given': 'Mose', 'family': 'Regner'}},
{'id': 2, 'name': 'Faye Raker'}]
Do you really have a list? Or do you have a JSON file? What format is the "mon_list" variable?
This is how you convert a list to a Dataframe
# import pandas as pd
import pandas as pd
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
https://www.geeksforgeeks.org/create-a-pandas-dataframe-from-lists/

Convert list of Dictionaries to a Dataframe [duplicate]

This question already has answers here:
Convert list of dictionaries to a pandas DataFrame
(7 answers)
Closed 4 years ago.
I am facing a basic problem of converting a list of dictionaries obtained from parsing a column with text in json format. Below is the brief snapshot of data:
[{u'PAGE TYPE': u'used-serp.model.brand.city'},
{u'BODY TYPE': u'MPV Cars',
u'ENGINE CAPACITY': u'1461',
u'FUEL TYPE': u' Diesel',
u'MODEL NAME': u'Renault Lodgy',
u'OEM NAME': u'Renault',
u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
{u'PAGE TYPE': u'used-serp.brand.city'},
{u'BODY TYPE': u'SUV Cars',
u'ENGINE CAPACITY': u'2477',
u'FUEL TYPE': u' Diesel',
u'MODEL NAME': u'Mitsubishi Pajero',
u'OEM NAME': u'Mitsubishi',
u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
{u'BODY TYPE': u'Hatchback Cars',
u'ENGINE CAPACITY': u'1198',
u'FUEL TYPE': u' Petrol , Diesel',
u'MODEL NAME': u'Volkswagen Polo',
u'OEM NAME': u'Volkswagen',
u'PAGE TYPE': u'New-ModelPage.GalleryTab'},
Furthermore, the code i am using to parse is detailed below:
stdf_noncookie = []
stdf_noncookiejson = []
for index, row in df_noncookie.iterrows():
try:
loop_data = json.loads(row['attributes'])
stdf_noncookie.append(loop_data)
except ValueError:
loop_nondata = row['attributes']
stdf_noncookiejson.append(loop_nondata)
stdf_noncookie is the list of dictionaries i am trying to convert into a pandas dataframe. 'attributes' is the column with text in json format. I have tried to get some learning from this link, however this was not able to solve my problem. Any suggestion/tips for converting a list of dictionaries to panda dataframe will be helpful.
To convert your list of dicts to a pandas dataframe use the following:
stdf_noncookiejson = pd.DataFrame.from_records(data)
pandas.DataFrame.from_records
DataFrame.from_records (data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)
You can set the index, name the columns etc as you read it in
If youre working with json you can also use the read_json method
stdf_noncookiejson = pd.read_json(data)
pandas.read_json
pandas.read_json (path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True,
keep_default_dates=True, numpy=False, precise_float=False,
date_unit=None, encoding=None, lines=False)
Reference this answer.
Assuming d is your List of Dictionaries, simply use:
df = pd.DataFrame(d)
Simply, you can use the pandas DataFrame constructor.
import pandas as pd
print (pd.DataFrame(data))
Finally found a way to convert a list of dict to panda dataframe. Below is the code:
Method A
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = stdf_noncookie.apply(pd.Series)
Method B
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = pd.DataFrame(stdf_noncookie.tolist())
Method A is much quicker than Method B. I will create another post asking for help on the difference between the two methods. Also, on some datasets Method B is not working.
I was able to do it with a list comprehension. But my problem was that I left my dict's json encoded so they looked like strings.
d = r.zrangebyscore('live-ticks', '-inf', time.time())
dform = [json.loads(i) for i in d]
df = pd.DataFram(dfrom)

Using Python CSV DictReader to create multi-level nested dictionary

Total Python noob here, probably missing something obvious. I've searched everywhere and haven't found a solution yet, so I thought I'd ask for some help.
I'm trying to write a function that will build a nested dictionary from a large csv file. The input file is in the following format:
Product,Price,Cost,Brand,
blue widget,5,4,sony,
red widget,6,5,sony,
green widget,7,5,microsoft,
purple widget,7,6,microsoft,
etc...
The output dictionary I need would look like:
projects = { `<Brand>`: { `<Product>`: { 'Price': `<Price>`, 'Cost': `<Cost>` },},}
But obviously with many different brands containing different products. In the input file, the data is ordered alphabetically by brand name, but I know that it becomes unordered as soon as DictReader executes, so I definitely need a better way to handle the duplicates. The if statement as written is redundant and unnecessary.
Here's the non-working, useless code I have so far:
def build_dict(source_file):
projects = {}
headers = ['Product', 'Price', 'Cost', 'Brand']
reader = csv.DictReader(open(source_file), fieldnames = headers, dialect = 'excel')
current_brand = 'None'
for row in reader:
if Brand != current_brand:
current_brand = Brand
projects[Brand] = {Product: {'Price': Price, 'Cost': Cost}}
return projects
source_file = 'merged.csv'
print build_dict(source_file)
I have of course imported the csv module at the top of the file.
What's the best way to do this? I feel like I'm way off course, but there is very little information available about creating nested dicts from a CSV, and the examples that are out there are highly specific and tend not to go into detail about why the solution actually works, so as someone new to Python, it's a little hard to draw conclusions.
Also, the input csv file doesn't normally have headers, but for the sake of trying to get a working version of this function, I manually inserted a header row. Ideally, there would be some code that assigns the headers.
Any help/direction/recommendation is much appreciated, thanks!
import csv
from collections import defaultdict
def build_dict(source_file):
projects = defaultdict(dict)
headers = ['Product', 'Price', 'Cost', 'Brand']
with open(source_file, 'rb') as fp:
reader = csv.DictReader(fp, fieldnames=headers, dialect='excel',
skipinitialspace=True)
for rowdict in reader:
if None in rowdict:
del rowdict[None]
brand = rowdict.pop("Brand")
product = rowdict.pop("Product")
projects[brand][product] = rowdict
return dict(projects)
source_file = 'merged.csv'
print build_dict(source_file)
produces
{'microsoft': {'green widget': {'Cost': '5', 'Price': '7'},
'purple widget': {'Cost': '6', 'Price': '7'}},
'sony': {'blue widget': {'Cost': '4', 'Price': '5'},
'red widget': {'Cost': '5', 'Price': '6'}}}
from your input data (where merged.csv doesn't have the headers, only the data.)
I used a defaultdict here, which is just like a dictionary but when you refer to a key that doesn't exist instead of raising an Exception it simply makes a default value, in this case a dict. Then I get out -- and remove -- Brand and Product, and store the remainder.
All that's left I think would be to turn the cost and price into numbers instead of strings.
[modified to use DictReader directly rather than reader]
Here I offer another way to satisfy your requirement(different from DSM)
Firstly, this is my code:
import csv
new_dict={}
with open('merged.csv','rb')as csv_file:
data=csv.DictReader(csv_file,delimiter=",")
for row in data:
dict_brand=new_dict.get(row['Brand'],dict())
dict_brand[row['Product']]={k:row[k] for k in ('Cost','Price')}
new_dict[row['Brand']]=dict_brand
print new_dict
Briefly speaking, the main point to solve is to figure out what the key-value pairs are in your requirements. According to your requirement,it can be called as a 3-level-dict,here the key of first level is the value of Brand int the original dictionary, so I extract it from the original csv file as
dict_brand=new_dict.get(row['Brand'],dict())
which is going to judge if there exists the Brand value same as the original dict in our new dict, if yes, it just inserts, if no, it creates, then maybe the most complicated part is the second level or middle level, here you set the value of Product of original dict as the value of the new dict of key Brand, and the value of Product is also the key of the the third level dict which has Price and Cost of the original dict as the value,and here I extract them like:
dict_brand[row['Product']]={k:row[k] for k in ('Cost','Price')}
and finally, what we need to do is just set the created 'middle dict' as the value of our new dict which has Brand as the key.
Finally, the output is
{'sony': {'blue widget': {'Price': '5', 'Cost': '4'},
'red widget': {'Price': '6', 'Cost': '5'}},
'microsoft': {'purple widget': {'Price': '7', 'Cost': '6'},
'green widget': {'Price': '7', 'Cost': '5'}}}
That's that.

Categories

Resources