How to take the first two letters of a variable in python? - python

I have a dataset like below:
data
id message
1 ffjffjf
2 ddjbgg
3 vvnvvv
4 eevvff
5 ggvfgg
Expected output:
data
id message splitmessage
1 ffjffjf ff
2 ddjbgg dd
3 vvnvvv vv
4 eevvff ee
5 ggvfgg gg
I am very new to Python. So how can I take 1st two letters from each row in splitmessage variable.
my data exactly looks like below image
so from the image i want only hour and min's which are 12 to 16 elements in each row of vfreceiveddate variable.

dataset = [
{ "id": 1, "message": "ffjffjf" },
{ "id": 2, "message": "ddjbgg" },
{ "id": 3, "message": "vvnvvv" },
{ "id": 4, "message": "eevvff" },
{ "id": 5, "message": "ggvfgg" }
]
for d in dataset:
d["splitmessage"] = d["message"][:2]

What you want is a substring.
mystr = str(id1)
splitmessage = mystr[:2]
Method we are using is called slicing in python, works for more than strings.
In next example 'message' and 'splitmessage' are lists.
message = ['ffjffjf', 'ddjbgg']
splitmessage = []
for myString in message:
splitmessage.append(myString[:2])
print splitmessage

If you want to do it for the entire table, I would do it like this:
dataset = [id1, id2, id3 ... ]
for id in dataset:
splitid = id[:2]

Related

How can I define a structure of a json to transform it to csv

I have a json structured as this:
{
"data": [
{
"groups": {
"data": [
{
"group_name": "Wedding planning - places and bands (and others) to recommend!",
"date_joined": "2009-03-12 01:01:08.677427"
},
{
"group_name": "Harry Potter and the Deathly Hollows",
"date_joined": "2009-01-15 01:38:06.822220"
},
{
"group_name": "Xbox , Playstation, Wii - console fans",
"date_joined": "2010-04-02 04:02:58.078934"
}
]
},
"id": "0"
},
{
"groups": {
"data": [
{
"group_name": "Lost&Found (Strzegom)",
"date_joined": "2010-02-01 14:13:34.551920"
},
{
"group_name": "Tennis, Squash, Badminton, table tennis - looking for sparring partner (Strzegom)",
"date_joined": "2008-09-24 17:29:43.356992"
}
]
},
"id": "1"
}
]
}
How does one parse jsons in this form? Should i try building a class resembling this format? My desired output is a csv where index is an "id" and in the first column I have the most recently taken group, in the second column the second most recently taken group and so on.
Meaning the result of this would be:
most recent second most recent
0 Xbox , Playstation, Wii - console fans Wedding planning - places and bands (and others) to recommend!
1 Lost&Found (Strzegom) Tennis, Squash, Badminton, table tennis - looking for sparring partner (Strzegom)
solution could be like this:
data = json.load(f)
result = []
# it's max element in there for each id. Helping how many group_name here for this example [3,2]
max_element_group_name = [len(data['data'][i]['groups']['data']) for i in range(len(data['data']))]
max_element_group_name.sort()
for i in range(len(data['data'])):
# get id for each groups
id = data['data'][i]['id']
# sort data_joined in groups
sorted_groups_by_date = sorted(data['data'][i]['groups']['data'],key=lambda x : time.strptime(x['date_joined'],'%Y-%m-%d %H:%M:%S.%f'),reverse=True)
# get groups name using minumum value in max_element_group_name for this example [2]
group_names = [sorted_groups_by_date[j]['group_name'] for j in range(max_element_group_name[0])]
# add result list with id
result.append([id]+group_names)
# create df for list
df = pd.DataFrame(result, columns = ['id','most recent', 'second most recent'])
# it could be better.

JSON/Python get output from subcategories

New in JSON/Python... I'd like to select a part of sub category.
Here is a part of the JSON file:
{
"items": [{
"seasonId": 59,
"createdDate": "20200721T205735.000Z",
"participants": [{
"tag": "#8CJ89RJ",
"name": "Cåmille",
"cardsEarned": 1401,
"battlesPlayed": 1,
"wins": 1,
"collectionDayBattlesPlayed": 3,
"numberOfBattles": 1
}, {
"tag": "#Y2828CQ",
"name": "<c2>MoutBrout",
"cardsEarned": 1869,
"battlesPlayed": 1,
"wins": 1,
"collectionDayBattlesPlayed": 3,
"numberOfBattles": 1
}, {
"tag": "#2Q8CRC8RY",
"name": "Desnoss",
"cardsEarned": 2337,
"battlesPlayed": 1,
"wins": 0,
"collectionDayBattlesPlayed": 3,
"numberOfBattles": 1
}, {
"tag": "#80CGRR2CY",
"name": "pixtango",
"cardsEarned": 1402,
"battlesPlayed": 1,
"wins": 1,
"collectionDayBattlesPlayed": 2,
"numberOfBattles": 1
}]
}]
}
I would like a result as:
Camille - 1401 cards - 1 win
etc
However, my issue is that those infos are under items/0/participants.
I know how to do with data under one category. Here is an exemple for another JSON file and how I'd like the new one to be:
for item in data ["items"][:5]:
print("Name: %s\nTrophies: %s\nTag: %s\n\n" % (
item["name"],
item["trophies"],
item["tag"],
))
Any idea please ?
EDIT: I'm sorry, here is how it looks:
For exemple, I would like to print the 5 first names. I put this:
for item in data ["items"][:5]:
print (data[items][0][participants]['name'])
And I received this error:
NameError: name 'items' is not defined
maybe you need something like this:
items_str = [f'Name: {i["name"]}\nTrophies: {i["trophies"]}\nTag: "{i["tag"]}'
for i in json_dict['items']]
for i in items_str:
print(i)
sorry, it's not so easy to understand from your data
UPD: If there are many 'items' with 'participants' in each, this code should work for you:
participants = []
for item in json_dict['items']:
for participant in item['participants']:
p = 'Name: {}\nTrophies: {}\nTag: {}'.format(item["name"], item["trophies"], item["tag"])
participants.append(p)
print(p)
sub_dict = dict['items'][0]['participants']
print("Name: {}\nTrophies: {}\nTag:{}\n\n".format(sub_dict['name'],sub_dict['trophies'],sub_dict['tag']))
For other participants increase the array index.
Reading the JSON data from the file and prints it.
import json
with open('j.json','r') as f:
data = json.loads(f.read())
for item in data['items']:
for p in item['participants']:
print p
print("Name: %s\nTrophies: %s\nTag: %s\n\n" % (
p["name"],
p["trophies"],
p["tag"]))
I think it may help you:
Code:
import json
import pandas as pd
j='{"items":[{"seasonId":59,"createdDate":"20200721T205735.000Z","participants":[{"tag":"#8CJ89RJ","name":"Cåmille","cardsEarned":1401,"battlesPlayed":1,"wins":1,"collectionDayBattlesPlayed":3,"numberOfBattles":1},{"tag":"#Y2828CQ","name":"<c2>MoutBrout","cardsEarned":1869,"battlesPlayed":1,"wins":1,"collectionDayBattlesPlayed":3,"numberOfBattles":1},{"tag":"#2Q8CRC8RY","name":"Desnoss","cardsEarned":2337,"battlesPlayed":1,"wins":0,"collectionDayBattlesPlayed":3,"numberOfBattles":1},{"tag":"#80CGRR2CY","name":"pixtango","cardsEarned":1402,"battlesPlayed":1,"wins":1,"collectionDayBattlesPlayed":2,"numberOfBattles":1}]}]}'
y = json.loads(j)
y=pd.DataFrame([x for x in y['items'][0]['participants']])
print(y)
output:

How can I convert nested dictionary to pd.dataframe faster?

I have a json file which looks like this
{
"file": "name",
"main": [{
"question_no": "Q.1",
"question": "what is ?",
"answer": [{
"user": "John",
"comment": "It is defined as",
"value": [
{
"my_value": 5,
"value_2": 10
},
{
"my_value": 24,
"value_2": 30
}
]
},
{
"user": "Sam",
"comment": "as John said above it simply means",
"value": [
{
"my_value": 9,
"value_2": 10
},
{
"my_value": 54,
"value_2": 19
}
]
}
],
"closed": "no"
}]
}
desired result:
Question_no question my_value_sum value_2_sum user comment
Q.1 what is ? 29 40 john It is defined as
Q.1 what is ? 63 29 Sam as John said above it simply means
What I have tried is data = json_normalize(file_json, "main") and then using a for loop like
for ans, row in data.iterrows():
....
....
df = df.append(the data)
But the issue using this is that it is taking a lot of time that my client would refuse the solution. there is around 1200 items in the main list and there are 450 json files like this to convert. So this intermediate process of conversion would take almost an hour to complete.
EDIT:
is it possible to get the sum of the my_value and value_2 as a column? (updated the desired result also)
Select dictionary by main with parameter record_path and meta:
data = pd.json_normalize(file_json["main"],
record_path='answer',
meta=['question_no', 'question'])
print (data)
user comment question_no question
0 John It is defined as Q.1 what is ?
1 Sam as John said above it simply means Q.1 what is ?
Then if order is important convert last N columns to first positions:
N = 2
data = data[data.columns[-N:].tolist() + data.columns[:-N].tolist()]
print (data)
question_no question user comment
0 Q.1 what is ? John It is defined as
1 Q.1 what is ? Sam as John said above it simply means

Write json format using pandas Series and DataFrame

I'm working with csvfiles. My goal is to write a json format with csvfile information. Especifically, I want to get a similar format as miserables.json
Example:
{"source": "Napoleon", "target": "Myriel", "value": 1},
According with the information I have the format would be:
[
{
"source": "Germany",
"target": "Mexico",
"value": 1
},
{
"source": "Germany",
"target": "USA",
"value": 2
},
{
"source": "Brazil",
"target": "Argentina",
"value": 3
}
]
However, with the code I used the output looks as follow:
[
{
"source": "Germany",
"target": "Mexico",
"value": 1
},
{
"source": null,
"target": "USA",
"value": 2
}
][
{
"source": "Brazil",
"target": "Argentina",
"value": 3
}
]
Null source must be Germany. This is one of the main problems, because there are more cities with that issue. Besides this, the information is correct. I just want to remove several list inside the format and replace null to correct country.
This is the code I used using pandas and collections.
csvdata = pandas.read_csv('file.csv', low_memory=False, encoding='latin-1')
countries = csvdata['country'].tolist()
newcountries = list(set(countries))
for element in newcountries:
bills = csvdata['target'][csvdata['country'] == element]
frquency = Counter(bills)
sourceTemp = []
value = []
country = element
for k,v in frquency.items():
sourceTemp.append(k)
value.append(int(v))
forceData = {'source': Series(country), 'target': Series(sourceTemp), 'value': Series(value)}
dfForce = DataFrame(forceData)
jsondata = dfForce.to_json(orient='records', force_ascii=False, default_handler=callable)
parsed = json.loads(jsondata)
newData = json.dumps(parsed, indent=4, ensure_ascii=False, sort_keys=True)
# since to_json doesn´t have append mode this will be written in txt file
savetxt = open('data.txt', 'a')
savetxt.write(newData)
savetxt.close()
Any suggestion to solve this problem are appreciate!
Thanks
Consider removing the Series() around the scalar value, country. By doing so and then upsizing the dictionaries of series into a dataframe, you force NaN (later converted to null in json) into the series to match the lengths of other series. You can see this by printing out the dfForce dataframe:
from pandas import Series
from pandas import DataFrame
country = 'Germany'
sourceTemp = ['Mexico', 'USA', 'Argentina']
value = [1, 2, 3]
forceData = {'source': Series(country),
'target': Series(sourceTemp),
'value': Series(value)}
dfForce = DataFrame(forceData)
# source target value
# 0 Germany Mexico 1
# 1 NaN USA 2
# 2 NaN Argentina 3
To resolve, simply keep country as scalar in dictionary of series:
forceData = {'source': country,
'target': Series(sourceTemp),
'value': Series(value)}
dfForce = DataFrame(forceData)
# source target value
# 0 Germany Mexico 1
# 1 Germany USA 2
# 2 Germany Argentina 3
By the way, you do not need a dataframe object to output to json. Simply use a list of dictionaries. Consider the following using an Ordered Dictionary collection (to maintain the order of keys). In this way the growing list dumps into a text file without appending which would render an invalid json as opposite facing adjacent square brackets ...][... are not allowed.
from collections import OrderedDict
...
data = []
for element in newcountries:
bills = csvdata['target'][csvdata['country'] == element]
frquency = Counter(bills)
for k,v in frquency.items():
inner = OrderedDict()
inner['source'] = element
inner['target'] = k
inner['value'] = int(v)
data.append(inner)
newData = json.dumps(data, indent=4)
with open('data.json', 'w') as savetxt:
savetxt.write(newData)

Dynamic approach to iterate nested dict and list of dict in Python

I am looking for a dynamic approach to solve my issue. I have a very complex structure, but for simplicity,
I have a dictionary structure like this:
dict1={
"outer_key1" : {
"total" : 5 #1.I want the value of "total"
},
"outer_key2" :
[{
"type": "ABC", #2. I want to count whole structure where type="ABC"
"comments": {
"nested_comment":[
{
"key":"value",
"id": 1
},
{
"key":"value",
"id": 2
}
] # 3. Count Dict inside this list.
}}]}
I want to this iterate dictionary and solve #1, #2 and #3.
My attempt to solve #1 and #3:
def getTotal(dict1):
#for solving #1
for key,val in dict1.iteritems():
val = dict1[key]
if isinstance(val, dict):
for k1 in val:
if k1=='total':
total=val[k1]
print total #gives output 5
#for solving #3
if isinstance(val,list):
print len(val[0]['comment']['nested_comment']) #gives output 2
#How can i get this dynamicallty?
Output:
total=5
2
Que 1 :What is a pythonic way to get the total number of dictionaries under "nested_comment" list ?
Que 2 :How can i get total count of type where type="ABC". (Note: type is a nested key under "outer_key2")
Que 1 :What is a pythonic way to get the total number of dictionaries under "nested_comment" list ?
User Counter from the standard library.
from collections import Counter
my_list = [{'hello': 'world'}, {'foo': 'bar'}, 1, 2, 'hello']
dict_count = Counter([x for x in my_list if type(x) is dict])
Que 2 :How can i get total count of type where type="ABC". (Note: type is a nested key under "outer_key2")
It's not clear what you're asking for here. If by "total count", you are referring to the total number of comments in all dicts where "type" equals "ABC":
abcs = [x for x in dict1['outer_key2'] if x['type'] == 'ABC']
comment_count = sum([len(x['comments']['nested_comment']) for x in abcs])
But I've gotta say, that is some weird data you're dealing with.
You got answers for #1 and #3, check this too
from collections import Counter
dict1={
"outer_key1" : {
"total" : 5 #1.I want the value of "total"
},
"outer_key2" :
[{
"type": "ABC", #2. I want to count whole structure where type="ABC"
"comments": {
"nested_comment":[
{
"key":"value",
"key": "value"
},
{
"key":"value",
"id": 2
}
] # 3. Count Dict inside this list.
}}]}
print "total: ",dict1['outer_key1']['total']
print "No of nested comments: ", len(dict1['outer_key2'][0]['comments'] ['nested_comment']),
Assuming that below is the data structure for outer_key2 this is how you get total number of comments of type='ABC'
dict2={
"outer_key1" : {
"total" : 5
},
"outer_key2" :
[{
"type": "ABC",
"comments": {'...'}
},
{
"type": "ABC",
"comments": {'...'}
},
{
"type": "ABC",
"comments": {'...'}
}]}
i=0
k=0
while k < len(dict2['outer_key2']):
#print k
if dict2['outer_key2'][k]['type'] == 'ABC':
i+=int(1)
else:
pass
k+=1
print ("\r\nNo of dictionaries with type = 'ABC' : "), i

Categories

Resources