Python Sorting Contents of txt file

Python Sorting Contents of txt file - python

I have a function that opens a file called: "table1.txt" and outputs the comma separated values into a certain format.
My function is:
def sort_and_format():
contents = []
with open('table1.txt', 'r+') as f:
for line in f:
contents.append(line.split(','))
max_name_length = max([len(line[0]) for line in contents])
print(" Team Points Diff Goals \n")
print("--------------------------------------------------------------------------\n")
for i, line in enumerate(contents):
line = [el.replace('\n', '') for el in line]
print("{i:3} {0:{fill_width}} {1:3} {x:3} {2:3} :{3:3}".format(i=i+1, *line,
x = (int(line[2])- int(line[3])), fill_width=max_name_length))
I figured out how to format it correctly so for a "table1.txt file of:
FC Ingolstadt 04, 13, 4, 6
Hamburg, 9, 8, 10
SV Darmstadt 98, 9, 8, 9
Mainz, 9, 6, 9
FC Augsburg, 4, 7, 12
Werder Bremen, 6, 7, 12
Borussia Moenchengladbach, 6, 9, 15
Hoffenheim, 5, 8, 12
VfB Stuttgart, 4, 9, 17
Schalke 04, 16, 14, 3
Hannover 96, 2, 6, 18
Borrusia Dortmund, 16, 15, 4
Bayern Munich, 18, 18, 2
Bayer Leverkusen, 14, 11, 8
Eintracht Frankfurt, 9, 13, 9
Hertha BSC Berlin, 14, 5, 4
1. FC Cologne, 13, 10, 10
VfB Wolfsburg, 14, 10, 6
It would output:
Team Points Diff Goals
--------------------------------------------------------------------------
1 FC Ingolstadt 04 13 -2 4 : 6
2 Hamburg 9 -2 8 : 10
3 SV Darmstadt 98 9 -1 8 : 9
4 Mainz 9 -3 6 : 9
5 FC Augsburg 4 -5 7 : 12
6 Werder Bremen 6 -5 7 : 12
7 Borussia Moenchengladbach 6 -6 9 : 15
8 Hoffenheim 5 -4 8 : 12
9 VfB Stuttgart 4 -8 9 : 17
10 Schalke 04 16 11 14 : 3
11 Hannover 96 2 -12 6 : 18
12 Borrusia Dortmund 16 11 15 : 4
13 Bayern Munich 18 16 18 : 2
14 Bayer Leverkusen 14 3 11 : 8
15 Eintracht Frankfurt 9 4 13 : 9
16 Hertha BSC Berlin 14 1 5 : 4
17 1. FC Cologne 13 0 10 : 10
18 VfB Wolfsburg 14 4 10 : 6
I am trying to figure out how to sort the file so that the team with the highest points would be ranked number 1, and if a team has equal points then they are ranked by diff(the difference in goals for and against the team), and if the diff is the same they are ranked by goals scored.
I thought of implementing a bubble sort function similar to:
def bubble_sort(lst):
j = len(lst)
made_swap = True
swaps = 0
while made_swap:
made_swap = False
for cnt in range (j-1):
if lst[cnt] < lst[cnt+1]:
lst[cnt], lst[cnt+1] = lst[cnt+1], lst[cnt]
made_swap = True
swaps = swaps + 1
return swaps
But I do not know how to isolate each line and compare the values of each to one another to sort.

The following code will sort the list in the ways you asked:
from operator import itemgetter
def sort_and_format():
contents = []
with open('table1.txt', 'r+') as f:
for line in f:
l = line.split(',')
l[1:]=map(int,l[1:])
contents.append(l)
contents.sort(key=itemgetter(2))
contents.sort(key=lambda team:team[2]-team[3])
contents.sort(key=itemgetter(1))
[printing and formatting code]
What this does diferently:
First of all, it converts all the data about each team to numbers, excluding the name. This allows the later code to do math on them.
Then the first contents.sort statement sorts the list by goals scored (index 2). operator.itemgetter(2) is just a faster way to say lambda l:l[2]. The next contents.sort statement stably sorts the list by goals for minus goals against, as that is what the lambda does. Stable sorting means that the order of equally-compairing elements does not change, so teams with equal goal diff remain sorted by goals scored. The third contents.sort statement does the same stable sort by points.

contents = [row.strip('\n').split(', ') for row in open('table1.txt', 'r+')]
so that your rows look like:
['FC Ingolstadt 04', '13', '4', '6']
Then you can use Python's built-in sort function:
table = sorted(contents, key=lambda r: (int(r[1]), int(r[2])-int(r[3]), int(r[3])), reverse=True)
and print 'table' with the specific formatting you want.

I have joined spaces in the first column with _ to make life easier, so the data looks like:
F_ngolstad_4 13 -2 4:6
Hamburg 9 -2 8:10
S_armstad_8 9 -1 8:9
Mainz 9 -3 6:9
F_ugsburg 4 -5 7:12
Werde_remen 6 -5 7:12
Borussi_oenchengladbach 6 -6 9:15
Hoffenheim 5 -4 8:12
Vf_tuttgart 4 -8 9:17
Schalk_4 16 11 14:3
Hannove_6 2 -12 6:18
Borrusi_ortmund 16 11 15:4
Bayer_munich 18 16 18:2
Baye_everkusen 14 3 11:8
Eintrach_rankfurt 9 4 13:9
Herth_S_erlin 14 1 5:4
1._F_ologne 13 0 10:10
Vf_olfsburg 14 4 10:6
all_lines = []
with open('data', 'r') as f:
for line in f:
li = line.split()
all_lines.append(li)
l = sorted(all_lines,key=lambda x: (int(x[1]),int(x[2])),reverse=True)
for el in l:
print(el)
['Bayer_munich', '18', '16', '18:2']
['Schalk_4', '16', '11', '14:3']
['Borrusi_ortmund', '16', '11', '15:4']
['Vf_olfsburg', '14', '4', '10:6']
['Baye_everkusen', '14', '3', '11:8']
['Herth_S_erlin', '14', '1', '5:4']
['1._F_ologne', '13', '0', '10:10']
['F_ngolstad_4', '13', '-2', '4:6']
['Eintrach_rankfurt', '9', '4', '13:9']
['S_armstad_8', '9', '-1', '8:9']
['Hamburg', '9', '-2', '8:10']
['Mainz', '9', '-3', '6:9']
['Werde_remen', '6', '-5', '7:12']
['Borussi_oenchengladbach', '6', '-6', '9:15']
['Hoffenheim', '5', '-4', '8:12']
['F_ugsburg', '4', '-5', '7:12']
['Vf_tuttgart', '4', '-8', '9:17']
['Hannove_6', '2', '-12', '6:18']

Related

Is it possible to check a string comparing two regex then adding it to a dictionary?

Question
How can I run through the string so that when locationRegex condition is met it will add it's output to a dictionary, then add any subsequent numbers from numbersRegex to the same dictionary then create a new one with the next location arrives. As shown in Desired output.
Code
import re
# Text to check
text = "Italy Roma 20 40 10 4902520 10290" \
"Italy Milan 20 10 49 20 1030" \
"Germany Berlin 20 10 10 10 29 490" \
"Germany Frankfurt 20 0 0 0 0" \
"Luxemburg Luxemburg 20 10 49"
# regex to find location
locationRegex = re.compile(r'[A-Z]\w+\s[A-Z]\w+')
# regex to find numbers
numberRegex = re.compile(r'[0-9]+')
# Desired output
locations = {'Italy Roma': {'numbers': [10, 40, 10, 4902520]},
'Italy Milan': {'numbers': [20, 10, 49, 20, 1030]}}
What I have tried
I have ran the regex against the string with re.findall however I have the issue of assigning the numbers to the locations as they sit in two separate pots of locations and numbers.

Use a single regex to split the text in chunks, use groups within the regex to separate the data (note the parenthesis), and finally use split to split the number string on the spaces:
import re
text = (
"Italy Roma 20 40 10 4902520 10290"
"Italy Milan 20 10 49 20 1030"
"Germany Berlin 20 10 10 10 29 490"
"Germany Frankfurt 20 0 0 0 0"
"Luxemburg Luxemburg 20 10 49"
)
line_regex = re.compile(r"([A-Z]\w+\s[A-Z]\w+) ([0-9 ]+)")
loc_dict = {}
for match in re.finditer(line_regex, text):
print(match.group(1))
print(match.group(2))
loc_dict[match.group(1)] = {"numbers": match.group(2).split(" ")}
print(loc_dict)
The dict will be:
{'Italy Roma': {'numbers': ['20', '40', '10', '4902520', '10290']},
'Italy Milan': {'numbers': ['20', '10', '49', '20', '1030']},
'Germany Berlin': {'numbers': ['20', '10', '10', '10', '29', '490']},
'Germany Frankfurt': {'numbers': ['20', '0', '0', '0', '0']},
'Luxemburg Luxemburg': {'numbers': ['20', '10', '49']}}
Note that you should check for edge cases: no numbers, cities with a space in the name and so on.
Cheers!

Use regex to find and merge words in a string Python

I'm trying to find a way to match and merge the teams name from a string like below. I've tried few different ways with regex but was unsuccessful. few examples:
'30 Detroit Red Wings 12 47:06 3 8 1 3 7 0.292'
'31 Los Angeles Kings 11 47:45 4 7 0 4 8'
24 Anaheim Ducks 12 47:49 7 5 0 7 14 0.583
I want the output to look like this:
[30, 'Detroit Red Wings', 12, 47:06, 3, 8, 1, 3, 7, 0.292]
[24, 'Anaheim Ducks', 12, 47:49, 7, 5, 0, 7, 14, 0.583]
Here is what I tried with regex but with no success:
pattern = re.compile(r'\b\w+\b')
matches = pattern.finditer(i)

Here is an option using re.findall:
inp = '30 Detroit Red Wings 12 47:06 3 8 1 3 7 0.292'
matches = re.findall(r'\d+:\d+|\d+(?:\.\d+)?|[A-Za-z]+(?: [A-Za-z]+)*', inp)
print(matches)
This prints:
['30', 'Detroit Red Wings', '12', '47:06', '3', '8', '1', '3', '7', '0.292']
The regex pattern used matches either a time string, an integer/floating point number, or a series of letter-only words:
\d+:\d+ match a time string (e.g. '47:06')
| or
\d+(?:\.\d+)? match an integer/floating point number
| or
[A-Za-z]+(?: [A-Za-z]+)* match a series of words (e.g. Detroit Red Wings)

How to Read a Text File of Dictionaries into a DataFrame

I have a text file from Kaggle of Clash Royale stats. It's in a format of Python Dictionaries. I am struggling to find out how to read that into a file in a meaningful way. Curious what the best way is to do this. It's a fairly complex Dict with Lists.
Original Dataset here:
https://www.kaggle.com/s1m0n38/clash-royale-matches-dataset
{'players': {'right': {'deck': [['Mega Minion', '9'], ['Electro Wizard', '3'], ['Arrows', '11'], ['Lightning', '5'], ['Tombstone', '9'], ['The Log', '2'], ['Giant', '9'], ['Bowler', '5']], 'trophy': '4258', 'clan': 'TwoFiveOne', 'name': 'gpa raid'}, 'left': {'deck': [['Fireball', '9'], ['Archers', '12'], ['Goblins', '12'], ['Minions', '11'], ['Bomber', '12'], ['The Log', '2'], ['Barbarians', '12'], ['Royal Giant', '13']], 'trophy': '4325', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['2', '0'], 'time': '2017-07-12'}
{'players': {'right': {'deck': [['Ice Spirit', '10'], ['Valkyrie', '9'], ['Hog Rider', '9'], ['Inferno Tower', '9'], ['Goblins', '12'], ['Musketeer', '9'], ['Zap', '12'], ['Fireball', '9']], 'trophy': '4237', 'clan': 'The Wolves', 'name': 'TITAN'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4296', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['1', '0'], 'time': '2017-07-12'}
{'players': {'right': {'deck': [['Miner', '3'], ['Ice Golem', '9'], ['Spear Goblins', '12'], ['Minion Horde', '12'], ['Inferno Tower', '8'], ['The Log', '2'], ['Skeleton Army', '6'], ['Fireball', '10']], 'trophy': '4300', 'clan': '#LA PERLA NEGRA', 'name': 'Victor'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4267', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['0', '1'], 'time': '2017-07-12'}

According to this dataset's synopsis on kaggle, each dictionary represents a match between two players. I felt it would make sense to have each row in the dataframe represent all the characteristics of a single match.
This can be accomplished in a few short steps.
Store all the match dictionaries (each row of the dataset from kaggle) inside one list:
matches = [
{'players': {'right': {'deck': [['Mega Minion', '9'], ['Electro Wizard', '3'], ['Arrows', '11'], ['Lightning', '5'], ['Tombstone', '9'], ['The Log', '2'], ['Giant', '9'], ['Bowler', '5']], 'trophy': '4258', 'clan': 'TwoFiveOne', 'name': 'gpa raid'}, 'left': {'deck': [['Fireball', '9'], ['Archers', '12'], ['Goblins', '12'], ['Minions', '11'], ['Bomber', '12'], ['The Log', '2'], ['Barbarians', '12'], ['Royal Giant', '13']], 'trophy': '4325', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['2', '0'], 'time': '2017-07-12'},
{'players': {'right': {'deck': [['Ice Spirit', '10'], ['Valkyrie', '9'], ['Hog Rider', '9'], ['Inferno Tower', '9'], ['Goblins', '12'], ['Musketeer', '9'], ['Zap', '12'], ['Fireball', '9']], 'trophy': '4237', 'clan': 'The Wolves', 'name': 'TITAN'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4296', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['1', '0'], 'time': '2017-07-12'},
{'players': {'right': {'deck': [['Miner', '3'], ['Ice Golem', '9'], ['Spear Goblins', '12'], ['Minion Horde', '12'], ['Inferno Tower', '8'], ['The Log', '2'], ['Skeleton Army', '6'], ['Fireball', '10']], 'trophy': '4300', 'clan': '#LA PERLA NEGRA', 'name': 'Victor'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4267', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['0', '1'], 'time': '2017-07-12'}
]
Create a dataframe from the above list, which will automatically populate columns that contain info for the type, time, and result of the match:
df = pd.DataFrame(matches)
Then, use some simple logic to populate columns containing info on the deck, trophy, clan, and name of both the left and right players in the match:
sides = ['right', 'left']
player_keys = ['deck', 'trophy', 'clan', 'name']
for side in sides:
for key in player_keys:
for i, row in df.iterrows():
df[side + '_' + key] = df['players'].apply(lambda x: x[side][key])
df = df.drop('players', axis=1) # no longer need this after populating the other columns
df = df.iloc[:, ::-1] # made sense to display columns in order of player info from left to right,
# followed by general match info at the far right of the dataframe
The resulting dataframe looks like this:
left_name left_clan left_trophy left_deck right_name right_clan right_trophy right_deck type time result
0 Supr4 battusai 4325 [[Fireball, 9], [Archers, 12], [Goblins, 12], ... gpa raid TwoFiveOne 4258 [[Mega Minion, 9], [Electro Wizard, 3], [Arrow... ladder 2017-07-12 [2, 0]
1 Supr4 battusai 4296 [[Royal Giant, 13], [Ice Wizard, 2], [Bomber, ... TITAN The Wolves 4237 [[Ice Spirit, 10], [Valkyrie, 9], [Hog Rider, ... ladder 2017-07-12 [1, 0]
2 Supr4 battusai 4267 [[Royal Giant, 13], [Ice Wizard, 2], [Bomber, ... Victor #LA PERLA NEGRA 4300 [[Miner, 3], [Ice Golem, 9], [Spear Goblins, 1... ladder 2017-07-12 [0, 1]

I saved your data to .json files, then just needed to loop through each line and treat it as it's own JSON file, then I used pandas.json_normalize to load it into a DataFrame and I made some guesses at how you wanted the df to look but I came up with this:
note: proper JSON needs to have double quotes not single so I used replace to work around this. Be careful that no data inside is destroyed using this.
note: The way I got this to work, I had to merge 'right' and 'left' so you are losing this data. If this is needed you could use a dict comp as a workaround
import json
import pandas as pd
with open('cr.json', 'r') as f:
df = None
for line in f:
data = json.loads(line.replace("'", '"'))
#needed to put the right and left keys together, maybe you can find a way around this, I wasn't
df1 = pd.json_normalize([data['players']['right'], data['players']['left']],
'deck',
['name', 'trophy', 'clan'],
meta_prefix='player.',
errors='ignore')
df = pd.concat([df, df1])
df.rename(columns={0: 'player.troop.name', 1: 'player.troop.level'},
inplace=True)
print(df)
This prints:
player.troop.name player.troop.level player.name player.clan \
0 Mega Minion 9 gpa raid TwoFiveOne
1 Electro Wizard 3 gpa raid TwoFiveOne
2 Arrows 11 gpa raid TwoFiveOne
3 Lightning 5 gpa raid TwoFiveOne
4 Tombstone 9 gpa raid TwoFiveOne
5 The Log 2 gpa raid TwoFiveOne
6 Giant 9 gpa raid TwoFiveOne
7 Bowler 5 gpa raid TwoFiveOne
8 Fireball 9 Supr4 battusai
9 Archers 12 Supr4 battusai
10 Goblins 12 Supr4 battusai
11 Minions 11 Supr4 battusai
12 Bomber 12 Supr4 battusai
13 The Log 2 Supr4 battusai
14 Barbarians 12 Supr4 battusai
15 Royal Giant 13 Supr4 battusai
0 Ice Spirit 10 TITAN The Wolves
1 Valkyrie 9 TITAN The Wolves
2 Hog Rider 9 TITAN The Wolves
3 Inferno Tower 9 TITAN The Wolves
4 Goblins 12 TITAN The Wolves
5 Musketeer 9 TITAN The Wolves
6 Zap 12 TITAN The Wolves
7 Fireball 9 TITAN The Wolves
8 Royal Giant 13 Supr4 battusai
9 Ice Wizard 2 Supr4 battusai
10 Bomber 12 Supr4 battusai
11 Knight 12 Supr4 battusai
12 Fireball 9 Supr4 battusai
13 Barbarians 12 Supr4 battusai
14 The Log 2 Supr4 battusai
15 Archers 12 Supr4 battusai
0 Miner 3 Victor #LA PERLA NEGRA
1 Ice Golem 9 Victor #LA PERLA NEGRA
2 Spear Goblins 12 Victor #LA PERLA NEGRA
3 Minion Horde 12 Victor #LA PERLA NEGRA
4 Inferno Tower 8 Victor #LA PERLA NEGRA
5 The Log 2 Victor #LA PERLA NEGRA
6 Skeleton Army 6 Victor #LA PERLA NEGRA
7 Fireball 10 Victor #LA PERLA NEGRA
8 Royal Giant 13 Supr4 battusai
9 Ice Wizard 2 Supr4 battusai
10 Bomber 12 Supr4 battusai
11 Knight 12 Supr4 battusai
12 Fireball 9 Supr4 battusai
13 Barbarians 12 Supr4 battusai
14 The Log 2 Supr4 battusai
15 Archers 12 Supr4 battusai
player.trophy
0 4258
1 4258
2 4258
3 4258
4 4258
5 4258
6 4258
7 4258
8 4325
9 4325
10 4325
11 4325
12 4325
13 4325
14 4325
15 4325
0 4237
1 4237
2 4237
3 4237
4 4237
5 4237
6 4237
7 4237
8 4296
9 4296
10 4296
11 4296
12 4296
13 4296
14 4296
15 4296
0 4300
1 4300
2 4300
3 4300
4 4300
5 4300
6 4300
7 4300
8 4267
9 4267
10 4267
11 4267
12 4267
13 4267
14 4267
15 4267
And df.iloc[0] is as follows:
player.troop.name Mega Minion
player.troop.level 9
player.name gpa raid
player.trophy 4258
player.clan TwoFiveOne
Name: 0, dtype: object
You can rework the json_normalize parameters how you see fit, but I hope this is more than enough to get you going

The other answers only work with the toy data, as presented in the OP. This answer deals with the actual file from Kaggle, and how to clean it.
The Kaggle file, matches.txt, is rows of nested dicts
Within the file, each row has 4 top level keys, ['players', 'type', 'result', 'time']
Read the file in, which will make each row a str type
Convert it from str to dict type with ast.literal_eval
Some of the rows are not correctly formatted, and will result in a SyntaxError
The data can be converted to a dataframe with pandas.json_normalize
Imports
import pandas as pd
from ast import literal_eval
Clean the File
# store the data
data = list()
# store the broken rows
broken_row = list()
# read in the file
with open('matches.txt', 'r', encoding='utf-8') as f:
# read the rows
rows = f.readlines()
for row in rows:
# try to convert a row from a string to dict
try:
row = literal_eval(row)
data.append(row)
except SyntaxError:
broken_row.append(row)
continue
Convert data to a long DataFrame
For each match, each 'players.right.deck', 'players.left.deck' gets a separate row.
# convert data to a dataframe
players = pd.json_normalize(data)
# add a unique id for each row, which can be used to identify players for a particular game
df['id'] = df.index
# split the list of lists in right.deck and left.deck to separate rows
players = df[['id', 'players.right.deck', 'players.left.deck']].apply(pd.Series.explode).reset_index(drop=True)
# drop the original columns
df.drop(columns=['players.right.deck', 'players.left.deck'], inplace=True)
# right.deck and left.deck are still a list with two values, which need to have separate columns
players[['right.deck.name', 'right.deck.number']] = pd.DataFrame(players.pop('players.right.deck').values.tolist())
players[['left.deck.name', 'left.deck.number']] = pd.DataFrame(players.pop('players.left.deck').values.tolist())
# separate the result column into two columns
df[['right.result', 'left.result']] = pd.DataFrame(df.pop('result').values.tolist())
# merge df with players
df = df.merge(players, on='id')
df.head(8)
type time players.right.trophy players.right.clan players.right.name players.left.trophy players.left.clan players.left.name id right.result left.result right.deck.name right.deck.number left.deck.name left.deck.number
0 ladder 2017-07-12 4258 TwoFiveOne gpa raid 4325 battusai Supr4 0 2 0 Mega Minion 9 Fireball 9
1 ladder 2017-07-12 4258 TwoFiveOne gpa raid 4325 battusai Supr4 0 2 0 Electro Wizard 3 Archers 12
2 ladder 2017-07-12 4258 TwoFiveOne gpa raid 4325 battusai Supr4 0 2 0 Arrows 11 Goblins 12
3 ladder 2017-07-12 4258 TwoFiveOne gpa raid 4325 battusai Supr4 0 2 0 Lightning 5 Minions 11
4 ladder 2017-07-12 4258 TwoFiveOne gpa raid 4325 battusai Supr4 0 2 0 Tombstone 9 Bomber 12
5 ladder 2017-07-12 4258 TwoFiveOne gpa raid 4325 battusai Supr4 0 2 0 The Log 2 The Log 2
6 ladder 2017-07-12 4258 TwoFiveOne gpa raid 4325 battusai Supr4 0 2 0 Giant 9 Barbarians 12
7 ladder 2017-07-12 4258 TwoFiveOne gpa raid 4325 battusai Supr4 0 2 0 Bowler 5 Royal Giant 13
Convert data to a wide DataFrame
This option uses the flatten_json function.
For each match, each 'players.right.deck', 'players.left.deck' gets a separate column.
# convert data to a wide dataframe
df = pd.DataFrame([flatten_json(x) for x in data])
# display(df.head(3))
players_right_deck_0_0 players_right_deck_0_1 players_right_deck_1_0 players_right_deck_1_1 players_right_deck_2_0 players_right_deck_2_1 players_right_deck_3_0 players_right_deck_3_1 players_right_deck_4_0 players_right_deck_4_1 players_right_deck_5_0 players_right_deck_5_1 players_right_deck_6_0 players_right_deck_6_1 players_right_deck_7_0 players_right_deck_7_1 players_right_trophy players_right_clan players_right_name players_left_deck_0_0 players_left_deck_0_1 players_left_deck_1_0 players_left_deck_1_1 players_left_deck_2_0 players_left_deck_2_1 players_left_deck_3_0 players_left_deck_3_1 players_left_deck_4_0 players_left_deck_4_1 players_left_deck_5_0 players_left_deck_5_1 players_left_deck_6_0 players_left_deck_6_1 players_left_deck_7_0 players_left_deck_7_1 players_left_trophy players_left_clan players_left_name type result_0 result_1 time
0 Mega Minion 9 Electro Wizard 3 Arrows 11 Lightning 5 Tombstone 9 The Log 2 Giant 9 Bowler 5 4258 TwoFiveOne gpa raid Fireball 9 Archers 12 Goblins 12 Minions 11 Bomber 12 The Log 2 Barbarians 12 Royal Giant 13 4325 battusai Supr4 ladder 2 0 2017-07-12
1 Ice Spirit 10 Valkyrie 9 Hog Rider 9 Inferno Tower 9 Goblins 12 Musketeer 9 Zap 12 Fireball 9 4237 The Wolves TITAN Royal Giant 13 Ice Wizard 2 Bomber 12 Knight 12 Fireball 9 Barbarians 12 The Log 2 Archers 12 4296 battusai Supr4 ladder 1 0 2017-07-12
2 Miner 3 Ice Golem 9 Spear Goblins 12 Minion Horde 12 Inferno Tower 8 The Log 2 Skeleton Army 6 Fireball 10 4300 #LA PERLA NEGRA Victor Royal Giant 13 Ice Wizard 2 Bomber 12 Knight 12 Fireball 9 Barbarians 12 The Log 2 Archers 12 4267 battusai Supr4 ladder 0 1 2017-07-12

How can I change this form of dictionary to pandas dataframe?

I'm now processing tweet data using python pandas module,
and I stuck with the problem.
I want to make a frequency table(pandas dataframe) from this dictionary:
d = {"Nigeria": 9, "India": 18, "Saudi Arabia": 9, "Japan": 60, "Brazil": 3, "United States": 38, "Spain": 5, "Russia": 3, "Ukraine": 3, "Azerbaijan": 5, "China": 1, "Germany": 3, "France": 12, "Philippines": 8, "Thailand": 5, "Argentina": 9, "Indonesia": 3, "Netherlands": 8, "Turkey": 2, "Mexico": 9, "Italy": 2}
desired output is:
>>> import pandas as pd
>>> df = pd.DataFrame(?????)
>>> df
Country Count
Nigeria 9
India 18
Saudi Arabia 9
.
.
.
(no matter if there's index from 0 to n at the leftmost column)
Can anyone help me to deal with this problem?
Thank you in advance!

You have only a single series (a column of data with index values), really, so this works:
pd.Series(d, name='Count')
You can then construct a DataFrame if you want:
df = pd.DataFrame(pd.Series(d, name='Count'))
df.index.name = 'Country'
Now you have:
Count
Country
Argentina 9
Azerbaijan 5
Brazil 3
...

Use DataFrame constructor and pass values and keys separately to columns:
df = pd.DataFrame({'Country':list(d.keys()),
'Count': list(d.values())}, columns=['Country','Count'])
print (df)
Country Count
0 Azerbaijan 5
1 Indonesia 3
2 Germany 3
3 France 12
4 Mexico 9
5 Italy 2
6 Spain 5
7 Brazil 3
8 Thailand 5
9 Argentina 9
10 Ukraine 3
11 United States 38
12 Turkey 2
13 Nigeria 9
14 Saudi Arabia 9
15 Philippines 8
16 China 1
17 Japan 60
18 Russia 3
19 India 18
20 Netherlands 8

Pass it as a list
pd.DataFrame([d]).T.rename(columns={0:'count'})
That might get the work done but will kill the performance since we are saying the keys are columns and then transposing it. So since d.items() gives us the tuples we can do
df = pd.DataFrame(list(d.items()),columns=['country','count'])
df.head()
country count
0 Germany 3
1 Philippines 8
2 Mexico 9
3 Nigeria 9
4 Saudi Arabia 9

Manipulate CSV file 1 column into multiple - NFL scores

Working on an NFL CSV file that can help me automate scoring for games. Right now, I can upload the teams and scores into ONLY 1 column of the csv file.
THESE ARE ALL IN COLUMN A
Example:
A
1 NYJ
2 27
3 PHI
4 20
5 BUF
6 13
7 DET
8 35
9 CIN
10 27
11 IND
12 10
13 MIA
14 24
15 NO
16 21
OR
[['NYJ`'], ['27'], ['PHI'], ['20'], ['BUF'], ['13'], ['DET'], ['35'], ['CIN'], ['27'], ['IND'], ['10'], ['MIA'], ['24'], ['NO'], ['21'], ['TB'], ['12'], ['WAS'], ['30'], ['CAR'], ['25'], ['PIT'], ['10'], ['ATL'], ['16'], ['JAC'], ['20'], ['NE'], ['28'], ['NYG'], ['20'], ['MIN'], ['24'], ['TEN'], ['23'], ['STL'], ['24'], ['BAL'], ['21'], ['CHI'], ['16'], ['CLE'], ['18'], ['KC'], ['30'], ['GB'], ['8'], ['DAL'], ['6'], ['HOU'], ['24'], ['DEN'], ['24'], ['ARI'], ['32'], ['SD'], ['6'`], ['SF'], ['41'], ['SEA'], ['22'], ['OAK'], ['6']]
What I want is this:
A B C D
1 NYJ 27 PHI 20
2 BUF 13 DET 35
3 CIN 27 IND 10
4 MIA 24 NO 21
I have read through previous articles on this and have not got it to work yet. Any ideas on this?
Any help is appreciated and thanks!
current script:
import nflgame
import csv
print "Purpose of this script is to get NFL Scores to help out with GUT"
pregames = nflgame.games(2013, week=[4], kind='PRE')
out = open("scores.csv", "wb")
output = csv.writer(out)
for score in pregames:
output.writerows([[score.home],[score.score_home],[score.away],[score.score_away]])

You're currently using .writerows() to write 4 rows, each with one column.
Instead, you want:
output.writerow([score.home, score.score_home, score.away, score.score_away])
to write a single row with 4 columns.

Without knowing the score data, try to change writerows to writerow:
import nflgame
import csv
print "Purpose of this script is to get NFL Scores to help out with GUT"
pregames = nflgame.games(2013, week=[4], kind='PRE')
out = open("scores.csv", "wb")
output = csv.writer(out)
for score in pregames:
output.writerow([[score.home],[score.score_home],[score.away],[score.score_away]])
This will output it all in one line.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Sorting Contents of txt file - python

Related

Is it possible to check a string comparing two regex then adding it to a dictionary?

Use regex to find and merge words in a string Python

How to Read a Text File of Dictionaries into a DataFrame

How can I change this form of dictionary to pandas dataframe?

Manipulate CSV file 1 column into multiple - NFL scores

Categories

Resources