Issues transforming tuple to denormalized dataframe

Issues transforming tuple to denormalized dataframe - python

I have a tuple which is a list of 200 dicts:
eg:
mytuple= ([{'reviewId': '1234', 'userName': 'XXX', 'userImage': 'imagelink', 'content': 'AAA', 'score': 1, 'thumbsUpCount': 1, 'reviewCreatedVersion': '3.31.0', 'at': datetime.datetime(2022, 12, 1, 11, 49, 34), 'replyContent': "replycontent", 'repliedAt': datetime.datetime(2022, 12, 1, 12, 19, 51)},
{'reviewId': '5678', 'userName': 'S L', 'userImage': 'imagelink2', 'content': "content2", 'score': 1, 'thumbsUpCount': 0, 'reviewCreatedVersion': '3.31.0', 'at': datetime.datetime(2022, 11, 29, 12, 27, 46), 'replyContent': "replycontent2", 'repliedAt': datetime.datetime(2022, 11, 29, 12, 30, 40)}])
Ideally, I'd like to transform this into a dataframe with the following column headers:
reviewId
userName
userImage
1234
XXXX
imagelink
5678
S L
imagelink2
and so on with the column headers as the key and the columns containing the values.
mytuple was initially of size 2, from which I removed the second index and brought it down to just a list of dicts.
I tried different possibilities which include:
df=pd.DataFrame(mytuple)
df=pd.DataFrame.from_dict(mytuple)
df=pd.json_normalize(mytuple)
However, in all these cases, I get a dataframe as below
1
2
3
4
{'reviewId':..}
{'reviewId':..}
{}
{}
I'd like to understand where I'm going wrong. Thanks in advance!

Related

How to transform list from database into dataframe?

I have a following problem. My database returns a list:
[Order(id=22617, frm=datetime.datetime(2020, 6, 1, 8, 0), to=datetime.datetime(2020, 6, 1, 10, 0), loc=Location(lat=14.491272455461, lng=50.130463596998), address='Makedonska 619/11, Praha', duration=600), datetime.datetime(2020, 6, 1, 11, 38, 46), Order(id=22615, frm=datetime.datetime(2020, 6, 1, 8, 0), to=datetime.datetime(2020, 6, 1, 14, 0), loc=Location(lat=14.681866313487, lng=50.007439571346), address='Výhledová 256, Říčany', duration=600), datetime.datetime(2020, 6, 1, 10, 33, 33)]
Every output from the database is a type routes_data_loading.data_structures.Order and datetime.datetime. I would like to save it as a pandas dataframe.
Desired output for the first row is:
id;frm;to;lat;lng;address;duration;time
22617;2020-06-01 08:00;2020-06-01 10:00;14.491272455461;50.130463596998;Makedonska 619/11, Praha;600;2020-06-01 11:38:46
Semicolumn stands for a new column. Note that the last column time has to be created, because its name is not in the original list.
Can you help me how to convert this list into pandas df, please? I know how to convert simple list into df, but not this complicated one. Thanks a lot.

Try this without guarantee of success:
data = []
for order, time in zip(lst[::2], lst[1::2]):
data.append({'id': order.id, 'frm': order.frm, 'to': order.to,
'lat': order.loc.lat, 'lng': order.loc.lng,
'address': order.address, 'duration': order.duration,
'time': time})
df = pd.DataFrame(data)
Output:
>>> df
id frm to lat lng address duration time
0 22617 2020-06-01 08:00:00 2020-06-01 10:00:00 14.491272 50.130464 Makedonska 619/11, Praha 600 2020-06-01 11:38:46
1 22615 2020-06-01 08:00:00 2020-06-01 14:00:00 14.681866 50.007440 Výhledová 256, Říčany 600 2020-06-01 10:33:33
How do I setup:
from collections import namedtuple
import datetime
Order = namedtuple('Order', ['id', 'frm', 'to', 'loc', 'address', 'duration'])
Location = namedtuple('Location', ['lat', 'lng'])
lst = [Order(id=22617, frm=datetime.datetime(2020, 6, 1, 8, 0), to=datetime.datetime(2020, 6, 1, 10, 0), loc=Location(lat=14.491272455461, lng=50.130463596998), address='Makedonska 619/11, Praha', duration=600),
datetime.datetime(2020, 6, 1, 11, 38, 46),
Order(id=22615, frm=datetime.datetime(2020, 6, 1, 8, 0), to=datetime.datetime(2020, 6, 1, 14, 0), loc=Location(lat=14.681866313487, lng=50.007439571346), address='Výhledová 256, Říčany', duration=600),
datetime.datetime(2020, 6, 1, 10, 33, 33)]

Converting string of lists of timestamps to lists of timestamps in pandas

I parsed data from s3 which is similar to this
ID departure
1 "[Timestamp('2021-05-25 09:00:00'), datetime.datetime(2021, 5, 25, 9, 21, 35, 769406)]"
2 "[Timestamp('2021-05-25 08:00:00'), datetime.datetime(2021, 5, 25, 11, 15), datetime.datetime(2021, 5, 25, 14, 15)]"
Is there any way to convert the departure into list
I tried this
samp['departure'] = samp['departure'].apply(lambda x: eval(x))
-> Error: eval() arg 1 must be a string, bytes or code object
and
samp['departure'] = samp['departure'].apply(lambda x: x[1:-1].split(','))
# Here datetime.datetime(2021, 5, 25, 11, 15) splited into many sub-parts
and
samp.departure = samp.departure.apply(ast.literal_eval)
error -> malformed node or string: ["Timestamp('2021-05-25 09:00:00')", ' datetime.datetime(2021', ' 5', ' 25', ' 9', ' 21', ' 35', ' 769406)']
Output should be
ID departure
1 [Timestamp('2021-05-25 09:00:00'), datetime.datetime(2021, 5, 25, 9, 21, 35, 769406)]
2 [Timestamp('2021-05-25 08:00:00'), datetime.datetime(2021, 5, 25, 11, 15), datetime.datetime(2021, 5, 25, 14, 15)]
(I tried converters while read_csv initially but getting an error too)

If you are trying to replace " present in your departure column:
Try via replace():
samp['departure']=samp['departure'].replace('"','',regex=True)
OR
try via strip():
samp['departure']=samp['departure'].str.strip('"')
If you are evaluating the values inside:
from pandas import Timestamp
import datetime
samp['departure']=samp['departure'].astype(str).apply(pd.eval)
OR
from pandas import Timestamp
import datetime
import ast
samp.departure = samp.departure.astype(str).apply(ast.literal_eval)

change values of a csv file into an integer value of a dictionary

I have a .csv list of values (C3,H5,HK,HA,SK) (column names: card1, card2, card3, card4, card5) and want to change for example C3 into an integer value with a dictionary.
Let's say the dictionary is d = {'C2': 0, 'C3': 1, 'C4': 2, 'C5': 3, 'C6': 4, 'C7': 5, 'C8': 6, 'C9': 7, 'CT': 8, 'CJ': 9, 'CQ': 10, 'CK': 11, 'CA': 12, 'D2': 13, 'D3': 14, 'D4': 15, 'D5': 16, 'D6': 17, 'D7': 18, 'D8': 19, 'D9': 20, 'DT': 21, 'DJ': 22, 'DQ': 23, 'DK': 24, 'DA': 25, 'H2': 26, 'H3': 27, 'H4': 28, 'H5': 29, 'H6': 30, 'H7': 31, 'H8': 32, 'H9': 33, 'HT': 34, 'HJ': 35, 'HQ': 36, 'HK': 37, 'HA': 38, 'S2': 39, 'S3': 40, 'S4': 41, 'S5': 42, 'S6': 43, 'S7': 44, 'S8': 45, 'S9': 46, 'ST': 47, 'SJ': 48, 'SQ': 49, 'SK': 50, 'SA': 51}
this is the code I have to change the column of 'card1':
testdata = read_csv()
def convert_card_to_int(c):
if c == '' or c == ' ':
print ('card slot is empty or a blank')
return 0
if c in d:
return d.get(c)
else:
print ('card is not part of cardDict')
return 0
for index, rec in testdata.iterrows():
testdata['card1'][index] = convert_card_to_int(testdata['card1'][index])
testdata['card1'] = testdata['card1'].astype(int)
I am new in python and have not worked with dictionaries before. So I was searching some forums but did not get anything I needed, maybe I was even typing the wrong questions.
Well the problem is that I want to check if the value of the list is a value of the dictionary and if it is then it should replace it with the integer value.
The second if statement is the part where the problem occurs. Or it is in the for loop beneath. The error message tells me that it is a [TypeError: unhashable type 'dict']
testinput (file.csv):
card1,card2,card3,card4,card5
C3,H5,HK,HA,SK
C9,HJ,ST,SQ,SA
S6,S7,S8,S9,ST
testoutput:
testdata.head()
idx card1 card2 card3 card4 card5
0 1 H5 HK HA SK
1 7 HJ ST SQ SA
2 43 S7 S8 S9 ST

I think you are looking for this
df=pd.read_csv(filepath)
d is your dictionary
d={{'C2': 0, 'C3': 1, 'C4': 2, 'C5': 3, 'C6': 4, 'C7': 5, 'C8': 6....}
if you want to do for a specific columns Ex: card1 you can do like this
df['card1'] = df['card1'].apply(lambda x:str(d[x]))

How to retrieve the top-5 largest values (integers) from a dictionary of key-value pairs?

I have created 3 dictionaries from a function that iterates through a dataframe of the popular ios apps. The 3 dictionaries contain key-value pairs based on how often the key occur in the dataframe. From these dictionaries, I want to retrieve the 5 largest values of each dictionary as well as the corresponding keys. These are the results from the dataframe iterations. Obviously I can see this manually but I want python to determine the 5 largest.
Prices: {0.0: 415, 4.99: 10, 2.99: 13, 0.99: 31, 1.99: 13, 9.99: 1, 3.99: 2, 6.99: 3}
Genres: {'Productivity': 9, 'Shopping': 12, 'Reference': 3, 'Finance': 6, 'Music': 19, 'Games': 308, 'Travel': 6, 'Sports': 7, 'Health & Fitness': 8, 'Food & Drink': 4, 'Entertainment': 19, 'Photo & Video': 25, 'Social Networking': 21, 'Business': 4, 'Lifestyle': 4, 'Weather': 8, 'Navigation': 2, 'Book': 4, 'News': 2, 'Utilities': 12, 'Education': 5}
Content Ratings: {'4+': 304, '12+': 100, '9+': 54, '17+': 30}

You can sort the dictionaries by values and then slice the top 5:
sorted(Prices, key=Prices.get, reverse=True)[:5]
Same for the other two dicts.

you can also accomplish this using itemgetter.
prices= {0.0: 415, 4.99: 10, 2.99: 13, 0.99: 31, 1.99: 13, 9.99: 1, 3.99: 2, 6.99: 3}
genres= {'Productivity': 9, 'Shopping': 12, 'Reference': 3, 'Finance': 6, 'Music': 19, 'Games': 308, 'Travel': 6, 'Sports': 7, 'Health & Fitness': 8, 'Food & Drink': 4, 'Entertainment': 19, 'Photo & Video': 25, 'Social Networking': 21, 'Business': 4, 'Lifestyle': 4, 'Weather': 8, 'Navigation': 2, 'Book': 4, 'News': 2, 'Utilities': 12, 'Education': 5}
contentRatings= {'4+': 304, '12+': 100, '9+': 54, '17+': 30}
arr = [prices,contentRatings,genres]
from operator import itemgetter
for test_dict in arr:
# printing original dictionary
print("The original dictionary is : " + str(test_dict))
# 5 largest values in dictionary
# Using sorted() + itemgetter() + items()
res = dict(sorted(test_dict.items(), key = itemgetter(1), reverse = True)[:5])
# printing result
print("The top 5 value pairs are " + str(res))

DataFrame.replace by nested dict

I have a huge Dataframe, where a lot of entries needs to be changed. So, I've created a translation dict, which has the following structure:
{'Data.Bar Layer': {'1': 0,
'1.E': 21,
'2': 13,
'2.E': 22,
'3': 14,
'3.E': 24,
'4': 15,
'4.E': 23,
'B': 16,
'CL1': 1,
'CL2': 2,
'CL2a': 6,
'CL3': 3,
'CL3a': 4,
'CL4': 5,
'E': 18,
'L1': 7,
'L2': 8,
'L2a': 12,
'L3': 9,
'L3a': 10,
'L4': 11,
'T': 17,
'T&B': 19,
'T+B': 20},
'Data.Bar Type': {'N': 0, 'R': 1},
'Data.Defined No. Bars': {'No': 0, 'Yes': 1},
'Data.Design Option': {'-1': 0, 'Main Model': 1},...}
screenshot of the dictionaries print representation
The first key corresponds to the dataframe column and the second key to the value that needs to be changed, e.g. in column Data.Bar Layer all '1' should be 0. This is how the documentation of pandas.dataframe.replace states the dictionary to look like
However, same values have to be exchanged multiple times, which (I guess) leads to the error:
Replacement not allowed with overlapping keys and values
Here is a snippet of the Dataframe. Is there any work around to avoid this error? I tried some approaches with apply and map, but they didn't work, unfortunately.
Thanks in advance and kind regards,
Max

There might be a more pythonic way, but this code works for me;
for col in your_dict.keys():
df[col].replace(your_dict[col], inplace=True)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Issues transforming tuple to denormalized dataframe - python

Related

How to transform list from database into dataframe?

Converting string of lists of timestamps to lists of timestamps in pandas

change values of a csv file into an integer value of a dictionary

How to retrieve the top-5 largest values (integers) from a dictionary of key-value pairs?

DataFrame.replace by nested dict

Categories

Resources